-
It is required to have NodeJs with version 8.5 or higher
-
If you do not have installed node.js in your machine then go to this link in order to install node.
Node Web Crawlr requires at least node v.8.5.0.
- Clone this repository.
https://github.com/aximilli1212/nodejs-web-crawlr.git
-
Go to the cloned directory (e.g.
cd nodejs-web-crawlr
). -
Run
npm install
. -
Run
npm run start
. -
Server starts on:
localhost:3000
. -
Make a post request to :
localhost:3000/api
. -
Request should have parameters: hostname, regexes,numLevels
hostname = url string
regexes = comma separated string of regexes
numLevels = integersample:
{hostname:http://****, regexes:ai,facebook\.com%2F([^-]+)-,instagram ,numLevels:3}
NB: all regex runs in default global flag/g
hence regex string becomes/ai/g
,/facebook\.com%2F([^-]+)-/g
,/instagram/g
-
All generated ndjson files will be exported to the
/document/match.ndjson
-
With a get request, Download and inspect your loot with
localhost:3000/document/match.ndjson
`Crawler runs best on already rendered sites support for browser rendered sites (React*Angular*VueJS sites) will be made available soon`
App has been tested againt https:knust.edu.gh
,https://google.com
, dff.qbelimited.com
,https://expressjs.com
, https://ucc.edu.gh/
etc.
RegExs tested include a,as,instagram,(?:twitter.com)?,ar,facebook.com%2F([^-]+)- More RegEx Being tested.
Feel free to contribute as Crawlr still needs more updates and fixes.