Comments (24)
It's honestly a fight against windmills.
I know there are hundreds of Fredy user out there coz I keep getting emails about ppl asking me to fix the immoscout scraper...
from fredy.
Thanks for the information @kami4ka
And yeah, I can imagine @orangecoding. Fredy is a real game changer, especially in Berlin, where every second counts (it saved my ass a couple of months ago). And I can imagine that Immoscout is constantly changing. But I have to say, considering that, Fredy has been running quite smoothly for the last months, thanks for that! I have it set up for a few friends, who share the scraping ant fee (currently stopped until it is working again). I just saw your sponsoring option. I'll make sure to include you in the shared costs once we are up again :)
If there is anything I can contribute, please let me know. It's just not my expertise at all unfortunately.
from fredy.
Immoscout is working for me every now and then. @kami4ka Do you have an update for us?
from fredy.
Sorry for the delay.
We've a bit stuck with moving our PoC for this detection to the production environment, so it's getting delayed.
We're doing our best, as it would allow us to cover more protections like this, so it's our top priority.
I'll keep you updated once we'll figure it out totally.
from fredy.
ScrapingBee (not ScrapingAnt) and Zyte API are able to scrape Immoscout.
A request on ScrapingBee with a "stealth proxy" costs approx. $0.04 while Zyte API costs $0.008
from fredy.
Yeah I am also considering providing different solutions.. not sure however whether to replace scrapingant or just add scrapingbee
from fredy.
Unfortunately, it fails always.
We're acknowledged of this situation and currently trying to fight it out.
We've already changed the technology behind the service and improved the detection rate for various different websites while we were working on this issue, but not immoscout yet.
Still, we're still in progress and would notify all the users (who tried to make a request to immoscout) via email.
from fredy.
The latest update is that we've found a way to fix it and bypass it.
We're going to test and prepare everything for the cluster deployment (some stuff is still unclear at that part) and reach anyone who made Immoscout calls over the last two months via email.
from fredy.
Awesome.
Can you share with us how many user we are taking about?
@kami4ka
from fredy.
Hey! First of all thanks for the amazing project :) I was trying around how to evade the immoscout restrictions and tested these approaches:
scrapingAnt
: not working (maybe they can fix that somehow?)puppeteer
: not workingpuppeteer-extra-plugin-stealth
: not workingpython with selenium
: not workingpython with selenium and undetected_chromedriver
: works! but only when I really render the browser, headless option is detected :/ also after repeated calls they were somehow able to block me, but the next day it worked again
Maybe the info helps, but having to render the browser is a bit of a bummer for easy deployment. And this undetected_chromedriver
library only is in python and does some fancy stuff I do not completely understand.
from fredy.
Hi phil,
Thanks. As I said earlier this is a cat and mice game.
We might be able to overcome this by using unprotected api endpoints. However this too might be something that only works for a limited amount of time..
from fredy.
Hey @orangecoding,
agreed it is a very nasty cat and mice game with the other side having probably a lot more developers than we have here working on this project. But I mean if we somehow manage to use a chrome based browser using a package like undetected_chromedriver
with rendering the screen it will be very difficult to detect that without blocking "legitimate" users out of immoscout. The only problem with that I still haven't found a way to run that in docker. Unprotected API endpoints will get fixed for sure at some point and I guess immoscout is probably even monitoring repos like this one here ;)
from fredy.
Hi @phil-bergmann,
can you provide your approach with undetected_chromedriver
? Would be nice to give it a try. I've also seen your approach with ScrapingBee, but would like to avoid the payed account.
from fredy.
Stumbled upon this project today and was asking myself the same thing. I really hope this gets fixed. @kami4ka I would subscribe right away!
from fredy.
For some reason, @kami4ka is currently unavailable. I hope he's doing ok as he's from the ukraine... In the meantime, I see that nearly all my tests were successful after a couple of retries.
Can you guys confirm?
from fredy.
hey @kami4ka, just wondering if there is an update available for this? I notice immoscout is not able to be used; it never finds any listings. Thank you!
from fredy.
@ilindaniel By the way, I was trying to use ScrapingBee to scrape Immoscout (used it on their website) but hit the bot detection every time. Are you totally sure, scrapingBee found a way around it? I honestly don't want to implement various services just to see that they too don't work
from fredy.
Have you checked the "stealth proxy" checkbox?
Nevertheless I'd suggest to have a look at Zyte since they are 5x cheaper than ScrapingBee
from fredy.
Hey guys.
I'd suggest you trying out ScrapeOps: https://scrapeops.io/proxy-aggregator/
They are aggregating web scraping providers and it could be the best way for such cases.
Each provider could have similar tech, but still different (for example, of how a browser executes in the cluster), so it would allow not to tight with some particular one, but aggregate all of them.
You can check more at landing page.
from fredy.
@kami4ka I tried them (as well as a bunch of others) however I always hit the wall.
{"status":"Failed to get successful response from website. Please retry the request."}
To be quite honest with you I am sick and tired of this cats and mice game and currently thinking about totally removing the support for immoscout.
from fredy.
@orangecoding Yeah, I totally understand you
We always suggest finding an alternative data source when the cost of the specific data-source extraction becomes a problem, including the detection avoidance creation cost. Unfortunately, it looks like it is a case with Immoscout too.
from fredy.
As of now, immoscout still doesn't work right? Or am I missing something in my setup? Cheers and thank you!
from fredy.
No and it doesn't seem like @kami4ka is having much trust in fixing this.
I was recently playing around with ai to overcome the capture but there is actually a legal issue.
See scraping is ok-ish until you do not harm the website OR you are not trying to defeat things that have been put in place in order to block scraping. Like captures.
And tbh, I don't want to mess with them.. ;)
from fredy.
Zyte is still able to scrape ImmoScout:
However I'm quite lazy and use ImmoScout's email notification service at the moment. Might not be as instant as scraping it, but that's the quick fix for now.
from fredy.
Related Issues (20)
- Windows does not understand "export" HOT 2
- Error message on console HOT 2
- Immoscout with ScrapingAnt: Try a data-center request first before using residential proxies HOT 4
- Setting Interval and Working Hours job-wise or provider-wise HOT 2
- Residential and datacenter strategy for Immobilienscout24 scraping HOT 15
- Telegram Adapter 'Too Many Requests' HOT 1
- Run the application on a cloud HOT 9
- Telegram notifications quickly run into rate limit HOT 10
- Allow multiple instances of one provider HOT 2
- Make fredy runnable on raspberry pi HOT 5
- Possible to exclude estate agents? HOT 2
- "Unexpected end of file" when trying to scrape Immowelt HOT 7
- Add landlord/owner/offerer and address option HOT 2
- Support for meinestadt.de HOT 1
- Crash after login (Docker): "Error: EACCES: permission denied, open '/fredy/db/.users.json.tmp'" HOT 2
- Ebay kleinanzeigen links not recognized HOT 3
- SendGrid leads to: "TypeError: Cannot read properties of undefined (reading 'fields')" HOT 3
- Error when clicking on general settings HOT 4
- New Notification Adapter HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fredy.