Git Product home page Git Product logo

scraper-allocine's Introduction

Scraper Allociné

Just a random scraper to retrieve some data about movies listed on Allociné.fr.

The script will save movie data available on the http://www.allocine.fr/films webpage as a .csv file and in a postgres database.

Movies informations scraped

The movie attributes retrieved when available are:

  • The movie ID ;

  • The title ;

  • The release date ;

  • The duration ;

  • The genre(s) ;

  • The director(s) ;

  • The main actor(s) ;

  • The press rating ;

  • The spectators rating ;

  • The movie Summary.

Installation

First, clone the repository:

git clone [email protected]:kinoute/scraper-allocine.git

Go to the folder and build the container:

docker-compose build
# or "make build"

Usage

Important: First, you have to rename the .env.dist template file to .env. Then fill it with your own values. At first start, the postgres environment variables will be used to create the postgres server.

By default, the script will:

  • Scrap the first 50 pages of Allociné ;
  • Save every movie to the postgres database in its own container ;
  • Wait 10 seconds between each page scraped ;
  • Save the full results in a csv filename called allocine.csv in the files folder.

To run the script with these default options, simply do:

docker-compose up --build
# or make start

Change default options

The script has 3 customizable options that can be changed in the .env file:

  • The number of pages to scrap (Default: 50) ;
  • The time in sec to wait before each page is scraped (Default: 10) ;
  • The CSV filename where results will be stored (Default: allocine.csv).

Data

The script automatically update and save the results after every page scraped for the .csv file. For postgres, the database is updated on every movie scraped.

If for whatever reason, you want to stop the scraping, just do Ctrl+C in your Terminal.

Test

While the scraper is running, you can connect into the postgres container and use psql to do any SQL operation by typing make admin-db in your project.

You can also simply type make test-db. It should return 5 records for the movies table if everything went well.

Abuse

This script was just made for fun to play around with BeautifulSoup and Python. Please don't use to do bad things and ruin Allociné servers!

scraper-allocine's People

Contributors

dependabot[bot] avatar kinoute avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

ibmw

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.