Git Product home page Git Product logo

web-scraping's Introduction

Web-Scraping

Demo

BeautifulSoup

BeautifulSoup can pull data out of HTML in XML files. This library is best choice for beginners beacuse it is the easiest web scrapping library in Python.
Unfortunately, beautifulsoup doesn't have the support for JavaScript driven websites. This is a big disadvantage as nowadays majority of the websites run on JavaScript. Also beautifulsoup is inefficient and it has some dependencies that make it complicated to transfer a code between projects.

Selenium

Selenium wasn't actually designed for web scrapping. In fact, selenium is Web driver designed to render Web pages for test automation of Web applications. This makes selenium great for web scraping because many websites rely on JavaScript to create dynamic content on the page.
So we can say that selenium is one of the best libraries for scraping JavaScript driven websites. Another advantage of selenium is that is easier to learn than the Scrapy.
Unfortunately, selenium is a slow. Web Scraping with selenium is a slower than HTP request to the web browser because all the scripts present on the Web page will be executed.
However, if it isn't our top priority, selenium will be a good option.

Scrapy

Scrapy is a web scraping framework built especially for web scraping and written entirely in Python. This is without a doubt the most complete web scraping tool in Python.
Unfortunately, a scrapie is harder to learn than selenium or beautifulsoup.
That said, one of the biggest advantages of a scrapie is the speed, since it's synchronous scrapy spiders don't have to wait to make requests one at a time, but it can make requests in parallel. This increases efficiency, which makes it memory and CPU efficient compared to the beautifulsoup and selenium. You can easily store data in databases, create crullers and do more with scrapy.

So which one is the best?

  • BeautifulSoup will be great for beginners.
  • Selenium will be good for small projects that need to scrape JavaScript driven websites.
  • Scrapy will be great for large projects where speed is priority.

web-scraping's People

Contributors

sahaavi avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.