Git Product home page Git Product logo

websitecontactharvester's Introduction

WebsiteContactHarvester

Crawl websites for contact information. Extract email, phone, facebook, twitter.

How to use

  1. Clone the repo.
  2. Restore NPM packages.
  3. Update the sites to crawl in the sitesToCrawl.js file.
  4. Execute node app.js
  5. Harvested contact info will be placed into the ./output directory.

Output

Currently all potential phone numbers, email (mailto) address, twitter, and facebook URLs are harvested from retrieved HTML files. You can harvest additional data by modifying the harvestContactInfo method of the websiteContactHarvester.js class. The harvested data is saved to the ./output directory, one .json file per domain in the source sitesToCrawl.js file. You can use a tool like https://konklone.io/json/ to convert the .json files into .csv files.

Roadmap

  1. Produce a .csv output file in addition to the .json files.
  2. Eliminate duplicate values from the output files.

websitecontactharvester's People

Contributors

aaronhoffman avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.