Git Product home page Git Product logo

web-scraper-plus's Introduction

Web Scraper Plus

Web Scraper Plus is a chrome browser extension built for data extraction from web pages. Using this extension you can create a plan (sitemap) how a web site should be traversed and what should be extracted. Using these sitemaps the Web Scraper will navigate the site accordingly and extract all data. Scraped data later can be exported as CSV.

Install the extension from chrome-store

Document for new features: wiki

This tool is forked form Web-Scraper with many more features

New Features

  1. CLI Support: Start scraping from CMD/Terminal
  2. MySQL Support: Support MySQL database (v5.7+)
  3. Anti Lazy-Loading: Anti Lazy-Loading feature on pages
  4. Data Filter: Support user defined JS code for data preprocess and much more
  5. Distinct: Remove dulplicate data before the end of every task.
  6. Custom Columns: Define the columns you want to display, please use this feature together with Data Filter
  7. Easy Scrape: Create & scrape sitemap in a more easily way. (Based on https://github.com/aagiss)
  8. Random Interval: Add a random delay between requests. (Provided by https://github.com/Euphorbium)

Features(Forked from original work)

  1. Scrape multiple pages
  2. Sitemaps and scraped data are stored in browsers local storage or in CouchDB
  3. Multiple data selection types
  4. Extract data from dynamic pages (JavaScript+AJAX)
  5. Browse scraped data
  6. Export scraped data as CSV
  7. Import, Export sitemaps
  8. Depends only on Chrome browser

Help

Basic documentation and tutorials are available on webscraper.io

Submit bugs and suggest features on github-issues

Bugs

When submitting a bug please attach an exported sitemap if possible.

License

LGPLv3

web-scraper-plus's People

Contributors

martinsbalodis avatar hejiheji001 avatar eldos-dl avatar euphorbium avatar willhirsch avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.