Git Product home page Git Product logo

actor-booking-scraper's Introduction

What does Booking Scraper do?

Our free Booking Scraper allows you to scrape data from Booking.com, one of the best-known platforms for hotels, apartments, resorts, villas, and other types of accommodation worldwide.

Our Booking Scraper is capable of extracting data such as:

🏖 Hotel names and locations

🗓 Availability

⏱ Check-in and check-out times

🛏 Room types

💵 Prices

🖋 Reviews

📃 Conditions

💰 Promotions

The Booking.com API interface is quite user-friendly, but getting that data in machine-processable format is no easy task. Booking.com places a lot of restrictions on how data can be collected from its listings, one of them being that it will only display a maximum of 1,000 results for any given search. Apify's Booking Scraper doesn't impose any limitations on your results, so you can scrape data from Booking.com at scale.

How much will it cost to scrape Booking?

Apify gives you $5 free usage credits every month on the Apify Free plan. You can get 2,000 results per month from Booking.com for that, so those 2,000 results will be completely free!

But if you need to regularly scrape data from Booking.com, you should grab an Apify subscription. We recommend our $49/month Personal plan - you can get up to 20,000 every month with the $49 monthly plan!

Or get 200,000 results for $499 with the Team plan - wow!

How can I scrape Booking?

If you want a step-by-step tutorial on how to scrape Booking, read our blog post on how to scrape Booking.com or just sit back and enjoy this quick tutorial video:

Watch the video

Tips for scraping Booking

1️⃣ The actor will not work without a proxy. If you try running the actor without a proxy, it will fail with a message stating exactly that. There could be a slight difference in price depending on the type of proxy you use.

2️⃣ Booking.com will only display a maximum of 1,000 results; if you need to circumvent this limitation, you can utilize the useFilters INPUT attribute. However, using any limiting filters in start URLs will not be possible because the scraper will override those.

3️⃣ If you need to get detailed data about specific rooms, the scraper needs to be started with checkIn and checkOut INPUT attributes (Booking.com only shows complete room info for specific dates).

4️⃣ Booking.com may return some suggested hotels outside of the expected city/region as a recommendation. The scraper will return all of them in the data results, so you may get more results than your search.

Is it legal to scrape Booking?

Note that personal data is protected by GDPR in the European Union and by other regulations around the world. You should not scrape personal data unless you have a legitimate reason to do so. If you're unsure whether your reason is legitimate, consult your lawyers. We also recommend that you read our blog post: is web scraping legal?

actor-booking-scraper's People

Contributors

davidjohnbarton avatar dtrungtin avatar gahabeen avatar lhotanok avatar metalwarrior665 avatar mvolfik avatar pocesar avatar zpelechova avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

actor-booking-scraper's Issues

Paid support

Hello,

Does this project offer paid support. We would like to pay for some enhancements.
@dtrungtin Would you like us to chat about it?

Thanks.

Actor does not get all results from the booking website.

I am trying to scraper information using this actor but it does not give all the information available on booking for one url.
The url given to the actor is this one

Booking says that there are 1571 results on that page.

The input given to the actor is the following

{
  "currency": "USD",
  "debug": true,
  "extendOutputFunction": "($) => { return {} }",
  "language": "en-us",
  "maxPages": 100,
  "minMaxPrice": "0-999999",
  "proxyConfig": {
    "useApifyProxy": true
  },
  "scrapeReviewerName": false,
  "search": "BRASOV",
  "simple": true,
  "sortBy": "class_asc",
  "startUrls": [
    {
      "url": "https://www.booking.com/searchresults.en-us.html?ss=Bra%C5%9Fov%2C+Brasov%2C+Romania&ssne=Boto%C5%9Fani&ssne_untouched=Boto%C5%9Fani&efdco=1&label=gen173nr-1FCAEoggI46AdIM1gEaMABiAEBmAExuAEXyAEM2AEB6AEB-AEDiAIBqAIDuALB94ukBsACAdICJDUxMmRhZWJjLWNlMjItNDNiYi05OGQzLWRiZGY1YmZiMTU5MNgCBeACAQ&aid=304142&lang=en-us&sb=1&src_elem=sb&src=searchresults&dest_id=-1153613&dest_type=city&ac_position=0&ac_click_type=b&ac_langcode=en&ac_suggestion_list_length=5&search_selected=true&search_pageview_id=84f548bb140f009a&ac_meta=GhA4NGY1NDhiYjE0MGYwMDlhIAAoATICZW46BmJyYXNvdkAASgBQAA%3D%3D&group_adults=2&no_rooms=1&group_children=0&sb_travel_purpose=leisure"
    }
  ],
  "testProxy": false,
  "useFilters": true, // this should get more than 1000 results according to documentation
  "destType": "city",
  "propertyType": "none",
  "checkIn": "",
  "checkOut": "",
  "rooms": 1,
  "adults": 2,
  "children": 0,
  "maxReviews": 25
}

Unfortunately, the actor only returns 795 results.

The apify run is here if you want to inspect the configuration in more detail.

I have also noticed that the same configuration of the actor can give different results for consecutive runs. Why can that happen?

Thank you in advance for looking into this issue.

Extract all images

Keep the first image in image field for backwards compability. All of them into images

Pagination doesn't work on region / country level hotel list

When handing the page that shows all hotels in a country (in this case with filters, but reproducible for the unfiltered one, too) it extracts the first page, but the pagination doesn't work.

https://my.apify.com/tasks/JraRQpybELy44c8CN#/runs/xLH4YXCa52QyeaBLk

Here's the input:

{
  "startUrls": [
    {
      "url": "https://www.booking.com/searchresults.en-gb.html?aid=356980&label=gog235jc-1FCAIouwE4EkgzWANoZ4gBAZgBCbgBB8gBDNgBAegBAfgBDIgCAagCA7gC25z-9gXAAgHSAiQ3NmQ1NmJlYy0yYjlmLTRjNzItOGQ3My0wOWQ3YTEyYjBiMTTYAgbgAgE&sid=abd941fed66d95d8f81172211ad2deec&tmpl=searchresults&checkin_year_month_monthday=2020-08-15&checkout_year_month_monthday=2020-08-16&class_interval=1&dest_id=171&dest_type=country&group_adults=2&group_children=0&label_click=undef&no_rooms=1&percent_htype_hotel=1&raw_dest_type=country&room1=A%2CA&sb_price_type=total&shw_aparth=1&slp_r_match=0&srpvid=b2625ec6b4960044&ssb=empty&top_ufis=0&nflt=class%3D4%3Bclass%3D5%3Bht_id%3D204%3Bht_id%3D206%3B&rsf=",
      "method": "GET"
    }
  ],
  "sortBy": "bayesian_review_score",
  "currency": "ARS",
  "language": "en-gb",
  "minMaxPrice": "none",
  "propertyType": "none",
  "proxyConfig": {
    "useApifyProxy": true,
    "apifyProxyGroups": [
      "SHADER"
    ]
  },
  "simple": true,
  "useFilters": false,
  "testProxy": false,
  "extendOutputFunction": "($) => { return {} }"
}

Couldn't quite figure out where things are going wrong, the pagination looks similar to the same.

Parsing of most attributes seems to be broken

Running a simple mode for Prague and getting null for most parameters:

{"url": "https://www.booking.com/hotel/cz/stylish-new-town-apartments.cs.html?selected_currency=CZK&changed_currency=1&top_currency=1&lang=cs&group_adults=2&no_rooms=1",
"name": "Stylish New Town Apartments",
"rating": null,
"reviews": null,
"stars": null,
"price": null,
"currency": null,
"roomType": "",
"persons": null,
"address": "Praha 1, Praha",
"location": {"lat": "14.422967",
"lng": "50.078501"
}
}

https://api.apify.com/v2/datasets/e8udZgBC27pZL5wyi/items?format=json&clean=1

Also, full mode from the detail page is missing most data: https://api.apify.com/v2/datasets/J2eGhrckkbW5buz1H/items?format=json&clean=1

Setting price range causes errors

When setting a price range to anything else but "none" in the task input, the actor fails to scrape anything, and the run log is full of errors like this one:

ERROR BasicCrawler: handleRequestFunction failed, reclaiming failed request back to the list or queue {"url":"https://www.booking.com/searchresults.html?dest_type=city&ss=Paris&order=upsort_bh&selected_currency=ARS&changed_currency=1&top_currency=1&lang=en-gb&group_adults=2&no_rooms=1&rows=25","retryCount":1,"id":"Eq6PwCmNCaqevfU"}
  TypeError: Cannot read property 'includes' of null
      at module.exports.isMinMaxPriceSet (/home/myuser/src/util.js:198:18)
      at runMicrotasks (<anonymous>)
      at processTicksAndRejections (internal/process/task_queues.js:97:5)
      at async PuppeteerCrawler.handlePageFunction (/home/myuser/src/main.js:209:77)
      at async /home/myuser/node_modules/apify/build/utils.js:317:26

All the other inputs were left at their defaults, I've just set the price range.

Problem of the time of the review

The actor does not appear full date of the review. The data output of reviewer's date only appear year and date, they dont have month. Example
image

Error: Wrong currency

ERROR BasicCrawler: handleRequestFunction failed, reclaiming failed request back to the list or queue {https://www.booking.com/...} 2020-10-30T19:56:48.432Z Error: Wrong currency: null, re-enqueuing... 2020-10-30T19:56:48.434Z at PuppeteerCrawler.handlePageFunction (/home/myuser/src/main.js:161:23) 2020-10-30T19:56:48.436Z at runMicrotasks (<anonymous>) 2020-10-30T19:56:48.460Z at processTicksAndRejections (internal/process/task_queues.js:97:5) 2020-10-30T19:56:48.462Z at async /home/myuser/node_modules/apify/build/utils.js:317:26

The following error happens for each URL being processed, no matter what the input is (and no matter whether the task is run manually or using API). As a result, each URL gets enqueued again when processed (which means the run can never finish since the same URL's get enqueued again and again).

startUrls with checkin / checkout doesn't work

Using this url as a startUrl: https://www.booking.com/hotel/hr/jaegerhorn-zagreb.hr.html?checkin=2020-07-01;checkout=2020-07-02

the query is rewritten and dropping the checkout part. visiting the url works as expected

Cannot read property '$' of undefined at module.exports.setMinMaxPrice

Happened to a user running this locally. The parsing is not well handled.

My input is {
   "search": "MARTA-Dunwoody Station, Atlanta, Georgia, United States"
} 
 Error is given below 
ERROR PuppeteerCrawler: handleRequestFunction failed, reclaiming failed request back to the list or queue {"url":"https://www.booking.com/searchresults.html?dest_type=city&ss=MARTA-Dunwoody%20Station%2C%20Atlanta%2C%20Georgia%2C%20United%20States&order=bayesian_review_score&rows=25%22,%22retryCount%22:3,%22id%22:%22Lh6NSEFfDuMttje%22%7D
  TypeError: Operation failed. Error detail: Cannot read property '$' of undefined
      at module.exports.setMinMaxPrice (C:\bookingPOC\actor-booking-scraper\actor-booking-scraper-master\src\util.js:214:42)

Cannot read property 'match' of null on pagination

TypeError: Cannot read property 'match' of null
     at PuppeteerCrawler.handlePageFunction (/home/myuser/src/main.js:276:83)
     at runMicrotasks (<anonymous>)
     at processTicksAndRejections (internal/process/task_queues.js:97:5)
     at async /home/myuser/node_modules/apify/build/utils.js:317:26

const pageRange = (await getAttribute(prItem, 'textContent')).match(/\d+/g);

this is breaking the other ongoing pagination tabs, that leads to Error: Protocol error (Runtime.callFunctionOn): Target closed., after a while, every attempt leads to this error

Enable `useFilters` + `minMaxPrice` and `propertyType` input configuration

Actor doesn't allow combination of automatic filters and custom minMaxPrice and propertyType options (the run fails if this input configuration is discovered). It should accept this combination and not use price range and property type for automatic filtering if they're set in the input (value has to be respected). When their value is not specified in the input and useFilters option is checked, actor should use price range and property type for automatic filtering and split the results by their different values.

Rooms data is not available in response

I have tried to use direct hotel url to get some data, it have a lot but the rooms object is empty, where in booking there is rooms selected. Is that expected?

Proposed new features from a customer

Following are the Features missing from booking.com Scraper:

  1. Scraper captures only a single image but we need all the images related to the property.
  2. Scraper captures only a few facilities but we require all the facilities, sub-facilities along with their icons.
  3. Scraper does not capture Most Popular Facilities and their Icons but we need that.
  4. Scraper does not capture Property surroundings.
  5. Scraper does not capture the full description.
  6. Scraper not captures House rules.
  7. Scraper does not capture FAQ.
  8. Scraper does not capture the information contained in the left sidebar of the detail page(Image, Title, and Description).

Fails to paginate on some searches

{
  "search": "Crawley",
  "destType": "city",
  "sortBy": "upsort_bh",
  "checkIn": "2020-11-02",
  "checkOut": "2020-11-06",
  "currency": "GBP",
  "language": "en-gb",
  "minMaxPrice": "none",
  "propertyType": "none",
  "proxyConfig": {
    "useApifyProxy": true,
    "apifyProxyGroups": [
      "SHADER"
    ]
  },
  "simple": true,
  "useFilters": false,
  "testProxy": false,
  "extendOutputFunction": "($) => { return {} }",
  "rooms": 1,
  "adults": 2,
  "children": 0
}

Gets 25 results and then soft fails with

Error: Protocol error (Runtime.callFunctionOn): Target closed.

Property type no longer works properly

Beginning today the "Property Type" began to malfunction. Normally, we use "Hotel" & "Motel" on successive runs; however, today either of these yielded errors and the only solution to running the scraper was to set "Property Type = None".

Full time Apify / Node.js engineer

Hi @dtrungtin, really sorry for bothering you here, but could not find any contacts on your page!

We are looking for a browser automation expert for our team in GigRadar.io

We are a product! Our founding team are engineers, so the work is fun but challenging, the team is cross-functional and self-organized.

We are a small team based in Bali, Indonesia. We offer remote work + relocation opportunities after a few months of collaboration.

Full job description here: https://djinni.co/jobs/551201-browser-automation-expert-with-apify-typescri/

Let's discuss?

Error with check-in date = today

Hello.
I want to get information about one hotel. I choose next parameters:
...
"checkIn": "2021-12-17",
"checkOut": "2021-12-18",
...
and i see the error:

2021-12-17T06:12:46.930Z ERROR
2021-12-17T06:12:46.933Z Error: WRONG INPUT: You can't use a date in the past: 2021-12-17
2021-12-17T06:12:46.937Z at module.exports.checkDate (/home/myuser/src/util.js:296:19)
2021-12-17T06:12:46.940Z at module.exports.validateInput (/home/myuser/src/input.js:38:39)
2021-12-17T06:12:46.942Z at /home/myuser/src/main.js:12:5
2021-12-17T06:12:46.945Z at processTicksAndRejections (node:internal/process/task_queues:96:5)
2021-12-17T06:12:46.948Z at async run (/home/myuser/node_modules/apify/build/actor.js:182:13)

But on booking.com i can choose today as check-in date. How can I fix it?
Thanks you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.