maxcopell / tripadvisor-scraper Goto Github PK
View Code? Open in Web Editor NEWScrape Tripadvisor restaurant, hotels, and places.
Home Page: https://apify.com/maxcopell/tripadvisor
Scrape Tripadvisor restaurant, hotels, and places.
Home Page: https://apify.com/maxcopell/tripadvisor
The Tripadvisor scraper makes calls to an internal Tripadvisor API like
https://api.tripadvisor.com/api/internal/1.14/location/187275/hotels?currency=USD&lang=en&limit=1
This call returns now an "Error: Status code 400" in the logs, and if called directly in the browser it shows an "UnauthorizedException" error
{
errors: [{
type: "UnauthorizedException",
message: "client key not set",
code: "160"
}]
}
Apparently this breaks the scraper 😞
Hey, thanks for this great scraper! Is there a way to change the top level domain (to get it in the needed language) ?
Greetings
Like restaurants and hotels do:
https://my.apify.com/view/runs/RDXLjblSBy4lTHiCQ
Would be nice to add the business owner's responses to reviews
Or not include more likely - I have includeReviews: false and reviews count left to 1 and got 1 review for each place...
https://my.apify.com/view/runs/Eo3MOJHByPpNFBoMo
https://www.tripadvisor.com/Hotels-g274707-Prague_Bohemia-Hotels.html
https://my.apify.com/view/runs/Qk6th16QAXU4sxYAo
With the restaurants it has a slightly opposit problem - more results in scrape than the web: https://my.apify.com/view/runs/lLEbx4IknW8zcduHV / https://www.tripadvisor.com/Restaurants-g274707-Prague_Bohemia.html
I would very much appreciate if when I open a task could just run it to see if the actor is working.
https://my.apify.com/view/runs/Ty8iHOGDZqREnrFyD
For some reason it works for Roudnice - https://my.apify.com/view/runs/RDXLjblSBy4lTHiCQ ?
Hi Maximillian.
I work on a wholesale food distributor in Rio de Janeiro, Brazil and i am interested in a database of all restaurants, markets and hotels in my state.
I saw your profile in github and looked that you have expertise in tripadvisor and google scraper. I have tried your scraper in apify but it doesnt work very well.
How much do your charge for an excel sheet with all restaurants and hotels in Rio de Janeiro State, Brazil?
Best,
Lucas Saúde.
When I run node index in powershell, I get the following error. How/Where do I set "locationFullName" so that the app runs?
INFO System info {"apifyVersion":"0.20.3","apifyClientVersion":"0.6.0","osType":"Windows_NT","nodeVersion":"v12.16.3"}
WARN Neither APIFY_LOCAL_STORAGE_DIR nor APIFY_TOKEN environment variable is set, defaulting to APIFY_LOCAL_STORAGE_DIR="C:\Users\chris\documents\Scraper\apify_storage"
ERROR The function passed to Apify.main() threw an exception:
TypeError: Cannot destructure property 'locationFullName' of 'input' as it is null.
at validateInput (C:\Users\chris\documents\Scraper\src\tools\general.js:208:9)
at C:\Users\chris\documents\Scraper\src\main.js:35:5
at async run (C:\Users\chris\documents\Scraper\node_modules\apify\build\actor.js:238:13)
PS C:\Users\chris\documents\Scraper>
Is this scraper no longer maintained? It relies on the tripadvisor API which is forbidden without a proper token.
Here are the logs from APIFY
2022-04-09T14:25:21.681Z ACTOR: Pulling Docker image from repository.
2022-04-09T14:25:22.895Z ACTOR: Creating Docker container.
2022-04-09T14:25:22.926Z ACTOR: Starting Docker container.
2022-04-09T14:25:26.230Z INFO System info {"apifyVersion":"2.2.2","apifyClientVersion":"2.2.0","osType":"Linux","nodeVersion":"v16.14.0"}
2022-04-09T14:25:26.335Z INFO Input validation OK
2022-04-09T14:25:26.670Z INFO BasicCrawler:AutoscaledPool: state {"currentConcurrency":0,"desiredConcurrency":2,"systemStatus":{"isSystemIdle":true,"memInfo":{"isOverloaded":false,"limitRatio":0.2,"actualRatio":null},"eventLoopInfo":{"isOverloaded":false,"limitRatio":0.6,"actualRatio":null},"cpuInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":null},"clientInfo":{"isOverloaded":false,"limitRatio":0.3,"actualRatio":null}}}
2022-04-09T14:25:28.282Z WARN Could not create create for session due to: Proxy responded with 400: 155 bytes {"stack":"RequestError: Proxy responded with 400: 155 bytes\n at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)\n at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)\n at processTicksAndRejections (node:internal/process/task_queues:96:5)\n at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)\n at Object.onceWrapper (node:events:640:26)\n at ClientRequest.emit (node:events:520:28)\n at Socket.socketOnData (node:_http_client:522:11)\n at Socket.emit (node:events:520:28)\n at addChunk (node:internal/streams/readable:315:12)\n at readableAddChunk (node:internal/streams/readable:289:9)\n at Socket.Readable.push (node:internal/streams/readable:228:10)\n at TCP.onStreamRead (node:internal/stream_base_commons:190:23)\n at TCP.callbackTra... [line-too-long]
2022-04-09T14:25:28.294Z ERROR BasicCrawler: handleRequestFunction failed, reclaiming failed request back to the list or queue {"url":"https://www.tripadvisor.com/","retryCount":1,"id":"nMLwWw9hmTQOlju"}
2022-04-09T14:25:28.298Z RequestError: Proxy responded with 400: 155 bytes
2022-04-09T14:25:28.300Z at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)
2022-04-09T14:25:28.302Z at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)
2022-04-09T14:25:28.304Z at processTicksAndRejections (node:internal/process/task_queues:96:5)
2022-04-09T14:25:28.306Z at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)
2022-04-09T14:25:28.307Z at Object.onceWrapper (node:events:640:26)
2022-04-09T14:25:28.309Z at ClientRequest.emit (node:events:520:28)
2022-04-09T14:25:28.311Z at Socket.socketOnData (node:_http_client:522:11)
2022-04-09T14:25:28.313Z at Socket.emit (node:events:520:28)
2022-04-09T14:25:28.315Z at addChunk (node:internal/streams/readable:315:12)
2022-04-09T14:25:28.317Z at readableAddChunk (node:internal/streams/readable:289:9)
2022-04-09T14:25:28.319Z at Socket.Readable.push (node:internal/streams/readable:228:10)
2022-04-09T14:25:28.321Z at TCP.onStreamRead (node:internal/stream_base_commons:190:23)
2022-04-09T14:25:28.323Z at TCP.callbackTrampoline (node:internal/async_hooks:130:17)
2022-04-09T14:25:31.561Z WARN Could not create create for session due to: Proxy responded with 400: 155 bytes {"stack":"RequestError: Proxy responded with 400: 155 bytes\n at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)\n at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)\n at processTicksAndRejections (node:internal/process/task_queues:96:5)\n at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)\n at Object.onceWrapper (node:events:640:26)\n at ClientRequest.emit (node:events:520:28)\n at Socket.socketOnData (node:_http_client:522:11)\n at Socket.emit (node:events:520:28)\n at addChunk (node:internal/streams/readable:315:12)\n at readableAddChunk (node:internal/streams/readable:289:9)\n at Socket.Readable.push (node:internal/streams/readable:228:10)\n at TCP.onStreamRead (node:internal/stream_base_commons:190:23)\n at TCP.callbackTra... [line-too-long]
2022-04-09T14:25:31.577Z ERROR BasicCrawler: handleRequestFunction failed, reclaiming failed request back to the list or queue {"url":"https://www.tripadvisor.com/","retryCount":2,"id":"nMLwWw9hmTQOlju"}
2022-04-09T14:25:31.580Z RequestError: Proxy responded with 400: 155 bytes
2022-04-09T14:25:31.582Z at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)
2022-04-09T14:25:31.584Z at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)
2022-04-09T14:25:31.587Z at processTicksAndRejections (node:internal/process/task_queues:96:5)
2022-04-09T14:25:31.589Z at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)
2022-04-09T14:25:31.591Z at Object.onceWrapper (node:events:640:26)
2022-04-09T14:25:31.593Z at ClientRequest.emit (node:events:520:28)
2022-04-09T14:25:31.595Z at Socket.socketOnData (node:_http_client:522:11)
2022-04-09T14:25:31.597Z at Socket.emit (node:events:520:28)
2022-04-09T14:25:31.599Z at addChunk (node:internal/streams/readable:315:12)
2022-04-09T14:25:31.601Z at readableAddChunk (node:internal/streams/readable:289:9)
2022-04-09T14:25:31.604Z at Socket.Readable.push (node:internal/streams/readable:228:10)
2022-04-09T14:25:31.606Z at TCP.onStreamRead (node:internal/stream_base_commons:190:23)
2022-04-09T14:25:31.608Z at TCP.callbackTrampoline (node:internal/async_hooks:130:17)
2022-04-09T14:25:34.705Z WARN Could not create create for session due to: Proxy responded with 400: 155 bytes {"stack":"RequestError: Proxy responded with 400: 155 bytes\n at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)\n at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)\n at processTicksAndRejections (node:internal/process/task_queues:96:5)\n at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)\n at Object.onceWrapper (node:events:640:26)\n at ClientRequest.emit (node:events:520:28)\n at Socket.socketOnData (node:_http_client:522:11)\n at Socket.emit (node:events:520:28)\n at addChunk (node:internal/streams/readable:315:12)\n at readableAddChunk (node:internal/streams/readable:289:9)\n at Socket.Readable.push (node:internal/streams/readable:228:10)\n at TCP.onStreamRead (node:internal/stream_base_commons:190:23)\n at TCP.callbackTra... [line-too-long]
2022-04-09T14:25:34.714Z ERROR BasicCrawler: handleRequestFunction failed, reclaiming failed request back to the list or queue {"url":"https://www.tripadvisor.com/","retryCount":3,"id":"nMLwWw9hmTQOlju"}
2022-04-09T14:25:34.716Z RequestError: Proxy responded with 400: 155 bytes
2022-04-09T14:25:34.718Z at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)
2022-04-09T14:25:34.720Z at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)
2022-04-09T14:25:34.722Z at processTicksAndRejections (node:internal/process/task_queues:96:5)
2022-04-09T14:25:34.724Z at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)
2022-04-09T14:25:34.726Z at Object.onceWrapper (node:events:640:26)
2022-04-09T14:25:34.728Z at ClientRequest.emit (node:events:520:28)
2022-04-09T14:25:34.730Z at Socket.socketOnData (node:_http_client:522:11)
2022-04-09T14:25:34.732Z at Socket.emit (node:events:520:28)
2022-04-09T14:25:34.734Z at addChunk (node:internal/streams/readable:315:12)
2022-04-09T14:25:34.736Z at readableAddChunk (node:internal/streams/readable:289:9)
2022-04-09T14:25:34.739Z at Socket.Readable.push (node:internal/streams/readable:228:10)
2022-04-09T14:25:34.741Z at TCP.onStreamRead (node:internal/stream_base_commons:190:23)
2022-04-09T14:25:34.743Z at TCP.callbackTrampoline (node:internal/async_hooks:130:17)
2022-04-09T14:25:37.877Z WARN Could not create create for session due to: Proxy responded with 400: 155 bytes {"stack":"RequestError: Proxy responded with 400: 155 bytes\n at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)\n at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)\n at processTicksAndRejections (node:internal/process/task_queues:96:5)\n at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)\n at Object.onceWrapper (node:events:640:26)\n at ClientRequest.emit (node:events:520:28)\n at Socket.socketOnData (node:_http_client:522:11)\n at Socket.emit (node:events:520:28)\n at addChunk (node:internal/streams/readable:315:12)\n at readableAddChunk (node:internal/streams/readable:289:9)\n at Socket.Readable.push (node:internal/streams/readable:228:10)\n at TCP.onStreamRead (node:internal/stream_base_commons:190:23)\n at TCP.callbackTra... [line-too-long]
2022-04-09T14:25:37.890Z ERROR BasicCrawler: handleRequestFunction failed, reclaiming failed request back to the list or queue {"url":"https://www.tripadvisor.com/","retryCount":4,"id":"nMLwWw9hmTQOlju"}
2022-04-09T14:25:37.892Z RequestError: Proxy responded with 400: 155 bytes
2022-04-09T14:25:37.895Z at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)
2022-04-09T14:25:37.899Z at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)
2022-04-09T14:25:37.901Z at processTicksAndRejections (node:internal/process/task_queues:96:5)
2022-04-09T14:25:37.904Z at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)
2022-04-09T14:25:37.906Z at Object.onceWrapper (node:events:640:26)
2022-04-09T14:25:37.909Z at ClientRequest.emit (node:events:520:28)
2022-04-09T14:25:37.913Z at Socket.socketOnData (node:_http_client:522:11)
2022-04-09T14:25:37.915Z at Socket.emit (node:events:520:28)
2022-04-09T14:25:37.918Z at addChunk (node:internal/streams/readable:315:12)
2022-04-09T14:25:37.921Z at readableAddChunk (node:internal/streams/readable:289:9)
2022-04-09T14:25:37.923Z at Socket.Readable.push (node:internal/streams/readable:228:10)
2022-04-09T14:25:37.925Z at TCP.onStreamRead (node:internal/stream_base_commons:190:23)
2022-04-09T14:25:37.927Z at TCP.callbackTrampoline (node:internal/async_hooks:130:17)
2022-04-09T14:26:12.037Z WARN Could not create create for session due to: Proxy responded with 400: 155 bytes {"stack":"RequestError: Proxy responded with 400: 155 bytes\n at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)\n at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)\n at processTicksAndRejections (node:internal/process/task_queues:96:5)\n at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)\n at Object.onceWrapper (node:events:640:26)\n at ClientRequest.emit (node:events:520:28)\n at Socket.socketOnData (node:_http_client:522:11)\n at Socket.emit (node:events:520:28)\n at addChunk (node:internal/streams/readable:315:12)\n at readableAddChunk (node:internal/streams/readable:289:9)\n at Socket.Readable.push (node:internal/streams/readable:228:10)\n at TCP.onStreamRead (node:internal/stream_base_commons:190:23)\n at TCP.callbackTra... [line-too-long]
2022-04-09T14:26:12.046Z ERROR BasicCrawler: handleRequestFunction failed, reclaiming failed request back to the list or queue {"url":"https://www.tripadvisor.com/","retryCount":5,"id":"nMLwWw9hmTQOlju"}
2022-04-09T14:26:12.048Z RequestError: Proxy responded with 400: 155 bytes
2022-04-09T14:26:12.050Z at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)
2022-04-09T14:26:12.052Z at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)
2022-04-09T14:26:12.054Z at processTicksAndRejections (node:internal/process/task_queues:96:5)
2022-04-09T14:26:12.056Z at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)
2022-04-09T14:26:12.058Z at Object.onceWrapper (node:events:640:26)
2022-04-09T14:26:12.060Z at ClientRequest.emit (node:events:520:28)
2022-04-09T14:26:12.062Z at Socket.socketOnData (node:_http_client:522:11)
2022-04-09T14:26:12.065Z at Socket.emit (node:events:520:28)
2022-04-09T14:26:12.067Z at addChunk (node:internal/streams/readable:315:12)
2022-04-09T14:26:12.070Z at readableAddChunk (node:internal/streams/readable:289:9)
2022-04-09T14:26:12.072Z at Socket.Readable.push (node:internal/streams/readable:228:10)
2022-04-09T14:26:12.075Z at TCP.onStreamRead (node:internal/stream_base_commons:190:23)
2022-04-09T14:26:12.077Z at TCP.callbackTrampoline (node:internal/async_hooks:130:17)
2022-04-09T14:26:26.675Z INFO BasicCrawler:AutoscaledPool: state {"currentConcurrency":0,"desiredConcurrency":3,"systemStatus":{"isSystemIdle":true,"memInfo":{"isOverloaded":false,"limitRatio":0.2,"actualRatio":0},"eventLoopInfo":{"isOverloaded":false,"limitRatio":0.6,"actualRatio":0},"cpuInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":0},"clientInfo":{"isOverloaded":false,"limitRatio":0.3,"actualRatio":0}}}
2022-04-09T14:26:26.723Z INFO Statistics: BasicCrawler request statistics: {"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":null,"requestsFinishedPerMinute":0,"requestsFailedPerMinute":0,"requestTotalDurationMillis":0,"requestsTotal":0,"crawlerRuntimeMillis":60111,"retryHistogram":[]}
2022-04-09T14:26:34.587Z WARN Could not create create for session due to: Proxy responded with 400: 155 bytes {"stack":"RequestError: Proxy responded with 400: 155 bytes\n at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)\n at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)\n at processTicksAndRejections (node:internal/process/task_queues:96:5)\n at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)\n at Object.onceWrapper (node:events:640:26)\n at ClientRequest.emit (node:events:520:28)\n at Socket.socketOnData (node:_http_client:522:11)\n at Socket.emit (node:events:520:28)\n at addChunk (node:internal/streams/readable:315:12)\n at readableAddChunk (node:internal/streams/readable:289:9)\n at Socket.Readable.push (node:internal/streams/readable:228:10)\n at TCP.onStreamRead (node:internal/stream_base_commons:190:23)\n at TCP.callbackTra... [line-too-long]
2022-04-09T14:26:34.594Z ERROR BasicCrawler: handleRequestFunction failed, reclaiming failed request back to the list or queue {"url":"https://www.tripadvisor.com/","retryCount":6,"id":"nMLwWw9hmTQOlju"}
2022-04-09T14:26:34.597Z RequestError: Proxy responded with 400: 155 bytes
2022-04-09T14:26:34.598Z at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)
2022-04-09T14:26:34.600Z at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)
2022-04-09T14:26:34.602Z at processTicksAndRejections (node:internal/process/task_queues:96:5)
2022-04-09T14:26:34.604Z at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)
2022-04-09T14:26:34.606Z at Object.onceWrapper (node:events:640:26)
2022-04-09T14:26:34.608Z at ClientRequest.emit (node:events:520:28)
2022-04-09T14:26:34.611Z at Socket.socketOnData (node:_http_client:522:11)
2022-04-09T14:26:34.613Z at Socket.emit (node:events:520:28)
2022-04-09T14:26:34.615Z at addChunk (node:internal/streams/readable:315:12)
2022-04-09T14:26:34.617Z at readableAddChunk (node:internal/streams/readable:289:9)
2022-04-09T14:26:34.618Z at Socket.Readable.push (node:internal/streams/readable:228:10)
2022-04-09T14:26:34.620Z at TCP.onStreamRead (node:internal/stream_base_commons:190:23)
2022-04-09T14:26:34.622Z at TCP.callbackTrampoline (node:internal/async_hooks:130:17)
2022-04-09T14:26:37.924Z WARN Could not create create for session due to: Proxy responded with 400: 155 bytes {"stack":"RequestError: Proxy responded with 400: 155 bytes\n at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)\n at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)\n at processTicksAndRejections (node:internal/process/task_queues:96:5)\n at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)\n at Object.onceWrapper (node:events:640:26)\n at ClientRequest.emit (node:events:520:28)\n at Socket.socketOnData (node:_http_client:522:11)\n at Socket.emit (node:events:520:28)\n at addChunk (node:internal/streams/readable:315:12)\n at readableAddChunk (node:internal/streams/readable:289:9)\n at Socket.Readable.push (node:internal/streams/readable:228:10)\n at TCP.onStreamRead (node:internal/stream_base_commons:190:23)\n at TCP.callbackTra... [line-too-long]
2022-04-09T14:26:37.935Z ERROR BasicCrawler: handleRequestFunction failed, reclaiming failed request back to the list or queue {"url":"https://www.tripadvisor.com/","retryCount":7,"id":"nMLwWw9hmTQOlju"}
2022-04-09T14:26:37.940Z RequestError: Proxy responded with 400: 155 bytes
2022-04-09T14:26:37.943Z at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)
2022-04-09T14:26:37.945Z at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)
2022-04-09T14:26:37.947Z at processTicksAndRejections (node:internal/process/task_queues:96:5)
2022-04-09T14:26:37.949Z at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)
2022-04-09T14:26:37.952Z at Object.onceWrapper (node:events:640:26)
2022-04-09T14:26:37.954Z at ClientRequest.emit (node:events:520:28)
2022-04-09T14:26:37.956Z at Socket.socketOnData (node:_http_client:522:11)
2022-04-09T14:26:37.959Z at Socket.emit (node:events:520:28)
2022-04-09T14:26:37.961Z at addChunk (node:internal/streams/readable:315:12)
2022-04-09T14:26:37.963Z at readableAddChunk (node:internal/streams/readable:289:9)
2022-04-09T14:26:37.966Z at Socket.Readable.push (node:internal/streams/readable:228:10)
2022-04-09T14:26:37.968Z at TCP.onStreamRead (node:internal/stream_base_commons:190:23)
2022-04-09T14:26:37.970Z at TCP.callbackTrampoline (node:internal/async_hooks:130:17)
2022-04-09T14:26:41.050Z WARN Could not create create for session due to: Proxy responded with 400: 155 bytes {"stack":"RequestError: Proxy responded with 400: 155 bytes\n at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)\n at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)\n at processTicksAndRejections (node:internal/process/task_queues:96:5)\n at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)\n at Object.onceWrapper (node:events:640:26)\n at ClientRequest.emit (node:events:520:28)\n at Socket.socketOnData (node:_http_client:522:11)\n at Socket.emit (node:events:520:28)\n at addChunk (node:internal/streams/readable:315:12)\n at readableAddChunk (node:internal/streams/readable:289:9)\n at Socket.Readable.push (node:internal/streams/readable:228:10)\n at TCP.onStreamRead (node:internal/stream_base_commons:190:23)\n at TCP.callbackTra... [line-too-long]
2022-04-09T14:26:41.057Z ERROR BasicCrawler: handleRequestFunction failed, reclaiming failed request back to the list or queue {"url":"https://www.tripadvisor.com/","retryCount":8,"id":"nMLwWw9hmTQOlju"}
2022-04-09T14:26:41.060Z RequestError: Proxy responded with 400: 155 bytes
2022-04-09T14:26:41.062Z at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)
2022-04-09T14:26:41.064Z at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)
2022-04-09T14:26:41.066Z at processTicksAndRejections (node:internal/process/task_queues:96:5)
2022-04-09T14:26:41.068Z at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)
2022-04-09T14:26:41.070Z at Object.onceWrapper (node:events:640:26)
2022-04-09T14:26:41.072Z at ClientRequest.emit (node:events:520:28)
2022-04-09T14:26:41.074Z at Socket.socketOnData (node:_http_client:522:11)
2022-04-09T14:26:41.076Z at Socket.emit (node:events:520:28)
2022-04-09T14:26:41.078Z at addChunk (node:internal/streams/readable:315:12)
2022-04-09T14:26:41.080Z at readableAddChunk (node:internal/streams/readable:289:9)
2022-04-09T14:26:41.082Z at Socket.Readable.push (node:internal/streams/readable:228:10)
2022-04-09T14:26:41.084Z at TCP.onStreamRead (node:internal/stream_base_commons:190:23)
2022-04-09T14:26:41.086Z at TCP.callbackTrampoline (node:internal/async_hooks:130:17)
2022-04-09T14:26:44.198Z WARN Could not create create for session due to: Proxy responded with 400: 155 bytes {"stack":"RequestError: Proxy responded with 400: 155 bytes\n at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)\n at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)\n at processTicksAndRejections (node:internal/process/task_queues:96:5)\n at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)\n at Object.onceWrapper (node:events:640:26)\n at ClientRequest.emit (node:events:520:28)\n at Socket.socketOnData (node:_http_client:522:11)\n at Socket.emit (node:events:520:28)\n at addChunk (node:internal/streams/readable:315:12)\n at readableAddChunk (node:internal/streams/readable:289:9)\n at Socket.Readable.push (node:internal/streams/readable:228:10)\n at TCP.onStreamRead (node:internal/stream_base_commons:190:23)\n at TCP.callbackTra... [line-too-long]
2022-04-09T14:26:44.206Z ERROR BasicCrawler: handleRequestFunction failed, reclaiming failed request back to the list or queue {"url":"https://www.tripadvisor.com/","retryCount":9,"id":"nMLwWw9hmTQOlju"}
2022-04-09T14:26:44.208Z RequestError: Proxy responded with 400: 155 bytes
2022-04-09T14:26:44.210Z at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)
2022-04-09T14:26:44.212Z at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)
2022-04-09T14:26:44.214Z at processTicksAndRejections (node:internal/process/task_queues:96:5)
2022-04-09T14:26:44.216Z at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)
2022-04-09T14:26:44.218Z at Object.onceWrapper (node:events:640:26)
2022-04-09T14:26:44.220Z at ClientRequest.emit (node:events:520:28)
2022-04-09T14:26:44.222Z at Socket.socketOnData (node:_http_client:522:11)
2022-04-09T14:26:44.225Z at Socket.emit (node:events:520:28)
2022-04-09T14:26:44.227Z at addChunk (node:internal/streams/readable:315:12)
2022-04-09T14:26:44.229Z at readableAddChunk (node:internal/streams/readable:289:9)
2022-04-09T14:26:44.231Z at Socket.Readable.push (node:internal/streams/readable:228:10)
2022-04-09T14:26:44.233Z at TCP.onStreamRead (node:internal/stream_base_commons:190:23)
2022-04-09T14:26:44.235Z at TCP.callbackTrampoline (node:internal/async_hooks:130:17)
2022-04-09T14:26:47.414Z WARN Could not create create for session due to: Proxy responded with 400: 155 bytes {"stack":"RequestError: Proxy responded with 400: 155 bytes\n at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)\n at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)\n at processTicksAndRejections (node:internal/process/task_queues:96:5)\n at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)\n at Object.onceWrapper (node:events:640:26)\n at ClientRequest.emit (node:events:520:28)\n at Socket.socketOnData (node:_http_client:522:11)\n at Socket.emit (node:events:520:28)\n at addChunk (node:internal/streams/readable:315:12)\n at readableAddChunk (node:internal/streams/readable:289:9)\n at Socket.Readable.push (node:internal/streams/readable:228:10)\n at TCP.onStreamRead (node:internal/stream_base_commons:190:23)\n at TCP.callbackTra... [line-too-long]
2022-04-09T14:26:47.423Z ERROR BasicCrawler: handleRequestFunction failed, reclaiming failed request back to the list or queue {"url":"https://www.tripadvisor.com/","retryCount":10,"id":"nMLwWw9hmTQOlju"}
2022-04-09T14:26:47.426Z RequestError: Proxy responded with 400: 155 bytes
2022-04-09T14:26:47.428Z at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)
2022-04-09T14:26:47.430Z at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)
2022-04-09T14:26:47.433Z at processTicksAndRejections (node:internal/process/task_queues:96:5)
2022-04-09T14:26:47.443Z at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)
2022-04-09T14:26:47.446Z at Object.onceWrapper (node:events:640:26)
2022-04-09T14:26:47.448Z at ClientRequest.emit (node:events:520:28)
2022-04-09T14:26:47.450Z at Socket.socketOnData (node:_http_client:522:11)
2022-04-09T14:26:47.452Z at Socket.emit (node:events:520:28)
2022-04-09T14:26:47.454Z at addChunk (node:internal/streams/readable:315:12)
2022-04-09T14:26:47.456Z at readableAddChunk (node:internal/streams/readable:289:9)
2022-04-09T14:26:47.457Z at Socket.Readable.push (node:internal/streams/readable:228:10)
2022-04-09T14:26:47.459Z at TCP.onStreamRead (node:internal/stream_base_commons:190:23)
2022-04-09T14:26:47.461Z at TCP.callbackTrampoline (node:internal/async_hooks:130:17)
2022-04-09T14:26:50.973Z WARN Could not create create for session due to: Proxy responded with 400: 155 bytes {"stack":"RequestError: Proxy responded with 400: 155 bytes\n at Request._beforeError (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:333:21)\n at Request.flush (/usr/src/app/node_modules/got-cjs/dist/source/core/index.js:322:18)\n at processTicksAndRejections (node:internal/process/task_queues:96:5)\n at ClientRequest.<anonymous> (/usr/src/app/node_modules/got-scraping/dist/resolve-protocol.js:37:28)\n at Object.onceWrapper (node:events:640:26)\n at ClientRequest.emit (node:events:520:28)\n at Socket.socketOnData (node:_http_client:522:11)\n at Socket.emit (node:events:520:28)\n at addChunk (node:internal/streams/readable:315:12)\n at readableAddChunk (node:internal/streams/readable:289:9)\n at Socket.Readable.push (node:internal/streams/readable:228:10)\n at TCP.onStreamRead (node:internal/stream_base_commons:190:23)\n at TCP.callbackTra... [line-too-long]
2022-04-09T14:26:51.009Z INFO Request https://www.tripadvisor.com/ failed too many times
2022-04-09T14:26:51.152Z INFO BasicCrawler: All the requests from request list and/or request queue have been processed, the crawler will shut down.
2022-04-09T14:26:51.304Z INFO BasicCrawler: Final request statistics: {"requestsFinished":0,"requestsFailed":1,"retryHistogram":[null,null,null,null,null,null,null,null,null,null,1],"requestAvgFailedDurationMillis":35,"requestAvgFinishedDurationMillis":null,"requestsFinishedPerMinute":0,"requestsFailedPerMinute":0,"requestTotalDurationMillis":35,"requestsTotal":1,"crawlerRuntimeMillis":84691}
2022-04-09T14:26:51.307Z INFO Requests failed: 1
2022-04-09T14:26:51.309Z INFO Crawler finished.
It runs very fast at the beginning (Almost till the end), then it gets stuck on last 5 results or so for next 15 hrs and 50Cus - it would be great to have the actor stopped.
Compare long run:
https://my.apify.com/view/runs/2kZ84ZYGtFspOcSk3
Short run (aborted): https://my.apify.com/view/runs/nBU6aToV1BAFWsLeF
``I keep getting this error when trying to fetch data from a single hotel or restaurant.
2021-12-16T21:08:43.851Z ERROR BasicCrawler: handleRequestFunction failed, reclaiming failed request back to the list or queue {"url":"https://www.tripadvisor.com","retryCount":2,"id":"nMLwWw9hmTQOlju"} 2021-12-16T21:08:43.854Z TypeError: Cannot destructure property 'location_id' of 'placeInfo' as it is undefined. 2021-12-16T21:08:43.856Z at processHotel (/usr/src/app/src/tools/hotel-tools.js:20:26)
Json template i use ($val are replaced):
{ "maxItems": 1, "includeRestaurants": true, "includeHotels": false, "includeAttractions": false, "includeTags": false, "includeReviews": true, "maxReviews": $max, "lastReviewDate": "2018-01-01", "locationId": "$id", "restaurantId": "$url", "language": "en", "currency": "USD", "proxyConfiguration": { "useApifyProxy": true }, "debugLog": false }
https://my.apify.com/view/runs/4LpjhdbVxamUIJMrZ
At the same moment I have 343 items on the web: https://www.tripadvisor.com/Hotels-g34438-Miami_Florida-Hotels.html
https://my.apify.com/view/runs/sLFBohM8H3C06rq8s
It seems it reaches the limit before saving anything.
Also in the log there are many more places tham max item - maybe it would be worth having concurency tied up to the maxItem number?
When scraping the data of a restaurant I'm only able to get 20 reviews
When looking through the code I found that it gets reviews 20 at the time with a while true I belive there is a issue with the exit condition where it always exit on the first iteration of the loop
https://github.com/maxCopell/tripadvisor-scraper/blob/master/src/tools/general.js
while (true) {
//...
//...
if (reviews.length < limit || result.length >= maxReviews || shouldSlice) break;
}
My input
{
"locationId": "4879161",
"includeRestaurants": true,
"includeAttractions": false,
"includeHotels": false,
"includeReviews": true,
"proxyConfiguration": {
"useApifyProxy": true
},
"maxReviews": 0,
"maxItems": 1,
"language": "en",
"currency": "CAD",
"debugLog": true,
"checkInDate": "",
"includeTags": false
}
Run with this input hanged for 6 minutes with 5 requests still in the queue.
{
"lastReviewDate": "2019-06-06",
"locationFullName": "Como",
"includeRestaurants": true,
"includeAttractions": false,
"includeHotels": false,
"includeReviews": false,
"proxyConfiguration": {
"useApifyProxy": true
},
"locationId": ""
}
Hi,
I've been using the tripadvisor scraper for restaurants and would always uncheck hotels and didn't need to enter a check in date. Since the last time I used it I am required to enter a check in date even though I don't need the hotel data. Whichever city I try to run I get the same error: "The main function of the actor threw an exception."
Not very technical so hope my explanation makes sense
Hi, I recently downloaded Singapore hotel data, and found that there were missing hotels and some were duplicated. Not sure why this is so but would appreciate any help. I screenshot a picture of the data when I was going through it using Power BI. I'm not a developer but my developer gave feedback and I went to explore it on my own. I'm thinking of signing up for Apify but want to make sure that the data I get is usable without much cleaning. Thanks!
Another screenshot where a hotel was recorded 6 times.
Hi
Running a vanilla query for scrapping attraction reviews on a specific location gets an error of type "Could not get reviews for attraction xyz due to session.getCookieString is not a function".
The attractions of the location are correctly identified but the reviews are not retrieved.
Can you please fix this?
Log summary:
2021-05-31T09:45:07.452Z ACTOR: Pulling Docker image from repository.
2021-05-31T09:45:07.558Z ACTOR: Creating Docker container.
2021-05-31T09:45:07.656Z ACTOR: Starting Docker container.
2021-05-31T09:45:11.066Z INFO System info {"apifyVersion":"0.20.3","apifyClientVersion":"0.6.0","osType":"Linux","nodeVersion":"v12.18.3"}
2021-05-31T09:45:11.089Z WARN You are using an outdated version (0.20.3) of Apify SDK. We recommend you to update to the latest version (1.1.2).
.....
2021-05-31T09:45:11.133Z INFO Input validation OK
2021-05-31T09:45:11.148Z INFO Processing locationId: 660089
...
2021-05-31T09:45:17.125Z INFO Found 20 attractions
2021-05-31T09:45:17.126Z INFO Processing detail for Babur Tomb attraction
.....
2021-05-31T09:45:17.169Z INFO Processing detail for Bibi Mahroo Hill attraction
2021-05-31T09:45:17.170Z ERROR Could not get reviews for attraction Babur Tomb due to session.getCookieString is not a function
...
2021-05-31T09:45:17.176Z ERROR Could not get reviews for attraction Bibi Mahroo Hill due to session.getCookieString is not a function
2021-05-31T09:45:17.550Z ERROR Could not process attraction... Data item at index 0 is not serializable to JSON.
2021-05-31T09:45:17.581Z Cause: Parameter "item" of type Object must be provided
2021-05-31T09:45:17.754Z INFO BasicCrawler: All the requests from request list and/or request queue have been processed, the crawler will shut down.
2021-05-31T09:45:17.859Z INFO Crawler final request statistics: {"avgDurationMillis":1560,"perMinute":34,"finished":1,"failed":0,"retryHistogram":[1]}
2021-05-31T09:45:17.860Z INFO Requests failed: 0
2021-05-31T09:45:17.861Z INFO Crawler finished.
nZWGLC2Ua16iDl4vz (1).log
Thanks for making this to make scraping TripAdvisor so much easier! The API docs show a sample output for hotels containing an array of prices from various providers. However, an empty prices array is always returned for me. Even the example run does not contain that data.
I noticed that the call to getPlacePrices has been commented out. Are there plans to enable support for retrieving hotel prices?
Hi Max,
The data I'm looking for is the review counts by Traveler Rating. I want to be able to scrape this data weekly so I can track the change in rating.
Hello,
I am trying to run the script on Apitfy plateform. I got this error.
2020-09-13T07:44:10.864Z ERROR The function passed to Apify.main() threw an exception: 2020-09-13T07:44:10.866Z TypeError: Cannot read property 'data' of undefined 2020-09-13T07:44:10.867Z at getLocationId (/usr/src/app/src/tools/api.js:58:29) 2020-09-13T07:44:10.869Z at processTicksAndRejections (internal/process/task_queues.js:97:5)
No proxy is used, I selected a city and a date as input.
Thank you.
Thanks.
Hi, we are trying to scrape restaurants for this location https://www.tripadvisor.it/Restaurants-g187849-zfn8117507-Milan_Lombardy.html that in Tripadvisor is called Milan City Center (Milano Centro Storico in Italian). But I'm getting error. I guess he cannot find this location, but the weird thing is that in Tripadviso exsists. Any suggestion on how to scrape this location?
Thanks
And ideally also max reviews.
Hi there,
first of all congrats on the amazing work you've done so far.
When searching for restaurants, I'm suddenly encountering an issue which doesn't allow the scraper to return a lot of data.
I tried with different cities as input but still this keeps on happening.
Here's the error details:
ERROR: BasicCrawler: handleRequestFunction failed, reclaiming failed request back to the list or queue TypeError: Cannot read property 'replace' of null at getSecurityToken (/usr/src/app/src/tools/general.js:33:26) at getClient (/usr/src/app/src/tools/general.js:197:31)
Thanks in advance
The returned data for reviews do not contain the name of the reviewer nor URL to their profile image
I'm just trying out the free scraper, mostly it's working great, but I've noticed the following while scraping Hotel Monge, Paris
Thanks.
the actor gets e.g. reviewsCoun of 232 but no or 1-3 reviews: https://my.apify.com/view/runs/nj1HEvPwWItcEQZWH
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.