Git Product home page Git Product logo

pyconus2024-tutorial's Introduction

Gathering data from the web using Python

PyCon US 2024 - 16 / 05 / 2024

About the tutorial

Information is abundant and readily available on the internet. However, the sheer amount of data can be overwhelming and time-consuming to navigate through. That's where web scraping comes in - a powerful tool used to extract data from websites and turn it into a usable format.

In this tutorial, we will explore the basics of web scraping and how to implement it using Scrapy (a Python framework). Whether you are a data analyst, programmer, or researcher, this tutorial will equip you with the fundamental skills needed to create your own web scraper and extract valuable information from websites.

Before the tutorial

Clone this repository so you will get access to the presentation and also for the code solution of the suggested exercises: https://github.com/rennerocha/pyconus2024-tutorial

During the tutorial it is expected that you try to solve some small exercises using Scrapy, a web scraping framework. It will be the only required library to be installed (any other dependencies should be installed together with it).

Follow the Scrapy installation guide according your platform and ensure that you are able to run the scrapy version in your terminal and see Scrapy 2.11.1 as result (with no errors).

In Linux platform I suggest the use of a virtual environment. If everything is right, it should be a few commands to have everything up and running:

$ git clone https://github.com/rennerocha/pyconus2024-tutorial pyconus2024-tutorial
$ cd pyconus2024-tutorial
$ python -m venv .venv
$ source .venv/bin/activate
$ pip install -r requirements.txt

pyconus2024-tutorial's People

Contributors

rennerocha avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.