Git Product home page Git Product logo

Comments (24)

orangecoding avatar orangecoding commented on May 27, 2024 2

It's honestly a fight against windmills.

I know there are hundreds of Fredy user out there coz I keep getting emails about ppl asking me to fix the immoscout scraper...

from fredy.

denisalevi avatar denisalevi commented on May 27, 2024 2

Thanks for the information @kami4ka

And yeah, I can imagine @orangecoding. Fredy is a real game changer, especially in Berlin, where every second counts (it saved my ass a couple of months ago). And I can imagine that Immoscout is constantly changing. But I have to say, considering that, Fredy has been running quite smoothly for the last months, thanks for that! I have it set up for a few friends, who share the scraping ant fee (currently stopped until it is working again). I just saw your sponsoring option. I'll make sure to include you in the shared costs once we are up again :)

If there is anything I can contribute, please let me know. It's just not my expertise at all unfortunately.

from fredy.

orangecoding avatar orangecoding commented on May 27, 2024 1

Immoscout is working for me every now and then. @kami4ka Do you have an update for us?

from fredy.

kami4ka avatar kami4ka commented on May 27, 2024 1

Sorry for the delay.

We've a bit stuck with moving our PoC for this detection to the production environment, so it's getting delayed.
We're doing our best, as it would allow us to cover more protections like this, so it's our top priority.

I'll keep you updated once we'll figure it out totally.

from fredy.

ilindaniel avatar ilindaniel commented on May 27, 2024 1

ScrapingBee (not ScrapingAnt) and Zyte API are able to scrape Immoscout.

A request on ScrapingBee with a "stealth proxy" costs approx. $0.04 while Zyte API costs $0.008

from fredy.

orangecoding avatar orangecoding commented on May 27, 2024 1

Yeah I am also considering providing different solutions.. not sure however whether to replace scrapingant or just add scrapingbee

from fredy.

kami4ka avatar kami4ka commented on May 27, 2024

Unfortunately, it fails always.
We're acknowledged of this situation and currently trying to fight it out.
We've already changed the technology behind the service and improved the detection rate for various different websites while we were working on this issue, but not immoscout yet.
Still, we're still in progress and would notify all the users (who tried to make a request to immoscout) via email.

from fredy.

kami4ka avatar kami4ka commented on May 27, 2024

The latest update is that we've found a way to fix it and bypass it.
We're going to test and prepare everything for the cluster deployment (some stuff is still unclear at that part) and reach anyone who made Immoscout calls over the last two months via email.

from fredy.

orangecoding avatar orangecoding commented on May 27, 2024

Awesome.

Can you share with us how many user we are taking about?
@kami4ka

from fredy.

phil-bergmann avatar phil-bergmann commented on May 27, 2024

Hey! First of all thanks for the amazing project :) I was trying around how to evade the immoscout restrictions and tested these approaches:

  • scrapingAnt: not working (maybe they can fix that somehow?)
  • puppeteer: not working
  • puppeteer-extra-plugin-stealth: not working
  • python with selenium: not working
  • python with selenium and undetected_chromedriver: works! but only when I really render the browser, headless option is detected :/ also after repeated calls they were somehow able to block me, but the next day it worked again

Maybe the info helps, but having to render the browser is a bit of a bummer for easy deployment. And this undetected_chromedriver library only is in python and does some fancy stuff I do not completely understand.

from fredy.

orangecoding avatar orangecoding commented on May 27, 2024

Hi phil,

Thanks. As I said earlier this is a cat and mice game.

We might be able to overcome this by using unprotected api endpoints. However this too might be something that only works for a limited amount of time..

from fredy.

phil-bergmann avatar phil-bergmann commented on May 27, 2024

Hey @orangecoding,

agreed it is a very nasty cat and mice game with the other side having probably a lot more developers than we have here working on this project. But I mean if we somehow manage to use a chrome based browser using a package like undetected_chromedriver with rendering the screen it will be very difficult to detect that without blocking "legitimate" users out of immoscout. The only problem with that I still haven't found a way to run that in docker. Unprotected API endpoints will get fixed for sure at some point and I guess immoscout is probably even monitoring repos like this one here ;)

from fredy.

Lukewa avatar Lukewa commented on May 27, 2024

Hi @phil-bergmann,
can you provide your approach with undetected_chromedriver? Would be nice to give it a try. I've also seen your approach with ScrapingBee, but would like to avoid the payed account.

from fredy.

mygrexit avatar mygrexit commented on May 27, 2024

Stumbled upon this project today and was asking myself the same thing. I really hope this gets fixed. @kami4ka I would subscribe right away!

from fredy.

orangecoding avatar orangecoding commented on May 27, 2024

For some reason, @kami4ka is currently unavailable. I hope he's doing ok as he's from the ukraine... In the meantime, I see that nearly all my tests were successful after a couple of retries.

Can you guys confirm?

from fredy.

liebecode avatar liebecode commented on May 27, 2024

hey @kami4ka, just wondering if there is an update available for this? I notice immoscout is not able to be used; it never finds any listings. Thank you!

from fredy.

orangecoding avatar orangecoding commented on May 27, 2024

@ilindaniel By the way, I was trying to use ScrapingBee to scrape Immoscout (used it on their website) but hit the bot detection every time. Are you totally sure, scrapingBee found a way around it? I honestly don't want to implement various services just to see that they too don't work

from fredy.

ilindaniel avatar ilindaniel commented on May 27, 2024

Have you checked the "stealth proxy" checkbox?

Nevertheless I'd suggest to have a look at Zyte since they are 5x cheaper than ScrapingBee

from fredy.

kami4ka avatar kami4ka commented on May 27, 2024

Hey guys.
I'd suggest you trying out ScrapeOps: https://scrapeops.io/proxy-aggregator/
They are aggregating web scraping providers and it could be the best way for such cases.

Each provider could have similar tech, but still different (for example, of how a browser executes in the cluster), so it would allow not to tight with some particular one, but aggregate all of them.

You can check more at landing page.

from fredy.

orangecoding avatar orangecoding commented on May 27, 2024

@kami4ka I tried them (as well as a bunch of others) however I always hit the wall.
{"status":"Failed to get successful response from website. Please retry the request."}

To be quite honest with you I am sick and tired of this cats and mice game and currently thinking about totally removing the support for immoscout.

from fredy.

kami4ka avatar kami4ka commented on May 27, 2024

@orangecoding Yeah, I totally understand you
We always suggest finding an alternative data source when the cost of the specific data-source extraction becomes a problem, including the detection avoidance creation cost. Unfortunately, it looks like it is a case with Immoscout too.

from fredy.

HerzogVonWiesel avatar HerzogVonWiesel commented on May 27, 2024

As of now, immoscout still doesn't work right? Or am I missing something in my setup? Cheers and thank you!

from fredy.

orangecoding avatar orangecoding commented on May 27, 2024

No and it doesn't seem like @kami4ka is having much trust in fixing this.

I was recently playing around with ai to overcome the capture but there is actually a legal issue.

See scraping is ok-ish until you do not harm the website OR you are not trying to defeat things that have been put in place in order to block scraping. Like captures.

And tbh, I don't want to mess with them.. ;)

from fredy.

ilindaniel avatar ilindaniel commented on May 27, 2024

Zyte is still able to scrape ImmoScout:

278884897-90d9e1af-82b0-45fd-9f33-b17cd14815a9

However I'm quite lazy and use ImmoScout's email notification service at the moment. Might not be as instant as scraping it, but that's the quick fix for now.

from fredy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.