Git Product home page Git Product logo

airbnb-scraper's Introduction

airbnb_scraper 🕷️

Spider built with scrapy and ScrapySplash to crawl listings

Checklist

This checklist is for personal use and isn't relevant to using the scraper.

  • Spider can successfully parse one page of listings
  • Spider can successfully parse mutliple/all pages of designated location
  • Spider can take price ranges as arguments (price_lb and price_ub)
  • Spider can take location as argument

Set up

Since Airbnb uses JavaScript to render content, just scrapy on its own cannot suffice sometimes. We need to use Splash as well, which is a plugin created by the Scrapy team that integrates nicely with scrapy.

To install Splash, we need to do several things:

  1. Install Docker, create a Docker account (if you don't already have one), and run Docker in the background before crawling with
docker run -p 8050:8050 scrapinghub/splash

It might take a few minutes to pull the image for the first time doing this. When this is done, you can type localhost:8050 in your browser to check if it's working. If an interface opens up, you are good to go.

  1. Install scrapy-splash using pip
pip install scrapy-splash

See scrapy-splash if you run into any issues.

Crawling

Run the spider with scrapy crawl airbnb -o {filename}.json -a city='{cityname}' -a price_lb='{pricelowerbound}' -a price_ub='{priceupperbound}'

cityname refers to a valid city name

pricelowerbound refers to a lower bound for price from 0 to 999

priceupperbound refers to upper bound for price from 0 to 999. Spider will close if priceupperbound is less than pricelowerbound
Note: Airbnb only returns a maximum of ~300 listings per specific filter (price range). To get more listings, I recommend scraping multiple times using small increments in price and concatenating the datasets.

If you would like to do multiple scrapes over a wide price range (e.g. 10-spaced intervals from 20 to 990), see cancun.sh which I used to crawl a large number listings for Cancún.

Acknowledgements

I would like to thank Ahmed Rafik for his guidance and teachings.

airbnb-scraper's People

Contributors

kailu3 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.