Git Product home page Git Product logo

image

Scrapy

PyPI Version

Supported Python Versions

Ubuntu

Windows

Wheel Status

Coverage report

Conda Version

Overview

Scrapy is a BSD-licensed fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

Scrapy is maintained by Zyte (formerly Scrapinghub) and many other contributors.

Check the Scrapy homepage at https://scrapy.org for more information, including a list of features.

Requirements

  • Python 3.8+
  • Works on Linux, Windows, macOS, BSD

Install

The quick way:

pip install scrapy

See the install section in the documentation at https://docs.scrapy.org/en/latest/intro/install.html for more details.

Documentation

Documentation is available online at https://docs.scrapy.org/ and in the docs directory.

Releases

You can check https://docs.scrapy.org/en/latest/news.html for the release notes.

Community (blog, twitter, mail list, IRC)

See https://scrapy.org/community/ for details.

Contributing

See https://docs.scrapy.org/en/master/contributing.html for details.

Code of Conduct

Please note that this project is released with a Contributor Code of Conduct.

By participating in this project you agree to abide by its terms. Please report unacceptable behavior to [email protected].

Companies using Scrapy

See https://scrapy.org/companies/ for a list.

Commercial Support

See https://scrapy.org/support/ for details.

Scrapy project's Projects

base-chromium icon base-chromium

base component forked from Chromium source https://chromium.googlesource.com/chromium/src/base/

booksbot icon booksbot

A crawler for http://books.toscrape.com

dirbot icon dirbot

Scrapy project to scrape public web directories (educational) [DEPRECATED]

itemloaders icon itemloaders

Library to populate items using XPath and CSS with a convenient API

parsel icon parsel

Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors

protego icon protego

A pure-Python robots.txt parser with support for modern conventions.

pypydispatcher icon pypydispatcher

A fork of http://pydispatcher.sourceforge.net/ with PyPy support

queuelib icon queuelib

Collection of persistent (disk-based) and non-persistent (memory-based) queues for Python

quotesbot icon quotesbot

This is a sample Scrapy project for educational purposes

scrapely icon scrapely

A pure-python HTML screen-scraping library

scrapy icon scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

scrapy-itemloader icon scrapy-itemloader

[Archived] Library to populate Scrapy items using XPath and CSS with a convenient API

scrapyd icon scrapyd

A service daemon to run Scrapy spiders

scurl icon scurl

Performance-focused replacement for Python urllib

url-chromium icon url-chromium

url component from Chromium source code, forked from https://chromium.googlesource.com/chromium/src/url

w3lib icon w3lib

Python library of web-related functions

xtractmime icon xtractmime

https://mimesniff.spec.whatwg.org/ implementation for Python

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.