Git Product home page Git Product logo

airbnb-paris's Introduction

Airbnb - Paris

Getting Started

  1. Install Python modules
pip install -r requirements.txt
  1. Create a dev.env file in the root folder with the following content:
POSTGRESQL_HOST=
POSTGRESQL_USER=
POSTGRESQL_PASSWORD=
POSTGRESQL_DATABASE=
DATASETS_FOLDER_PATH=

To access and visualize the database, you can use pgAdmin.

Organization of Files and Folders

  • datasets: Regroup all datasets files
    • datasets/listings: Regroup listings datasets
    • datasets/reviews: Regroup reviews datasets
    • datasets/calendar: Regroup calendars datasets
  • tests: All the tests files
  • notebook: All the files containing ideas to be implemented

Getting Started

  1. Create a table calendars on your database :
CREATE TABLE public.calendars
(
    cal_key serial,
    listing_id integer,
    available text COLLATE pg_catalog."default",
    start_date date,
    end_date date,
    num_day integer,
    minimum_nights double precision,
    maximum_nights double precision,
    label text COLLATE pg_catalog."default",
    validation boolean DEFAULT false,
    proba double precision,
    ext_validation double precision DEFAULT 0.0,
    CONSTRAINT calendars_pkey PRIMARY KEY (cal_key)
)
  1. Create a table listings on your database :
CREATE TABLE public.listings
(
    id integer NOT NULL,
    listing_url text COLLATE pg_catalog."default",
    scrape_id bigint,
    last_scraped text COLLATE pg_catalog."default",
    name text COLLATE pg_catalog."default",
    description text COLLATE pg_catalog."default",
    neighborhood_overview text COLLATE pg_catalog."default",
    host_id integer,
    host_acceptance_rate text COLLATE pg_catalog."default",
    host_listings_count integer,
    neighbourhood text COLLATE pg_catalog."default",
    neighbourhood_cleansed text COLLATE pg_catalog."default",
    neighbourhood_group_cleansed text COLLATE pg_catalog."default",
    latitude double precision,
    longitude double precision,
    property_type text COLLATE pg_catalog."default",
    room_type text COLLATE pg_catalog."default",
    minimum_nights integer,
    maximum_nights integer,
    calendar_updated text COLLATE pg_catalog."default",
    has_availability text COLLATE pg_catalog."default",
    availability_365 integer,
    calendar_last_scraped text COLLATE pg_catalog."default",
    number_of_reviews integer,
    first_review text COLLATE pg_catalog."default",
    last_review text COLLATE pg_catalog."default",
    license text COLLATE pg_catalog."default",
    instant_bookable text COLLATE pg_catalog."default",
    calculated_host_listings_count integer,
    reviews_per_month double precision,
    CONSTRAINT id PRIMARY KEY (id)
)
  1. Create a table results on your database :
CREATE TABLE public.results
(
    extraction_date date,
    listing_id integer,
    past12_m50 integer,
    past12_m75 integer,
    past12_m95 integer,
    past12_m100 integer,
    past12_m95_e75 integer,
    past12_m100_e75 integer,
    civil_m50 integer,
    civil_m75 integer,
    civil_m95 integer,
    civil_m100 integer,
    civil_m95_e75 integer,
    civil_m100_e75 integer,
    predict_m50 integer,
    predict_m75 integer,
    predict_m95 integer
)

Daily execution

This project has been created in such a way that it can be run every day. To process the data present on InsideAirbnb yesterday, just run the Daily.py file. To automate this execution, you can schedule it.

For Windows, you can for example follow this tutorial.

On average, the script execution time to retrieve the files and do the processing is 30 minutes. Your computer will still be usable because Python uses only one core to run.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.