flathunters / flathunter Goto Github PK

This project forked from mordax7/flathunter

A bot to help people with their rental real-estate search. 🏠🤖

License: GNU Affero General Public License v3.0

Python 33.88% CSS 0.56% HTML 65.48% Dockerfile 0.08%

2captcha chromedriver crawlers idealista imagetyperz immobiliare immobilienscout24 immowelt kleinanzeigen mattermost-bot python real-estate realestate rental selenium subito telegram-bot vrmimmo wg-gesucht

flathunter's People

Contributors

Stargazers

Watchers

Forkers

codders jannickfahlbusch pattivacek michaelglass hilakista reixd missxa sourcloud nikhammer ial32 shoggomo benedikt1992 cookiemonster42 arubacao janniklass misterthamm dnberlin laurel1012 nanohanno dave291 tnfib fkalbe annoj sbaghdadi philipehausner mariowahl crab-apple bonamesa miszu bottina choeffer benflasta ervl82 rs22 nbruechmann lexxodus mrcosta joppevos tandomayo bananaminion intxcc aymanezizi khalidbutt786 jurij faultyagatha umcanpolat flolimebit step21 cortex3 psteinroe janskuli a0n shipmechaniac y-fone manuel030 xogorx gprestes akionuernberger sebrut mykola-merkulov jangocg lukaschoebel tobias-goertz pljeske schallerr erayaslan amila-ku d-justen cwallenwein noricetoday abuchmueller tmillich soheilrt ralfons-06 schnaaabeltier dimfred bitsformbyte gallardoalba michelangelomo singpay sven128 numb95 groverj324 krush-pak flowni keksboter maxisl maniklem lighttemplar bvolkmer pastushenkoy hippotized lap1n doing-something-why clamoris samuelradimak kadgeist p-maksimov kosmodromov dlbas

flathunter's Issues

Index error for immobilienscout

Hello, the verbose mode tells me that I get an index error for immobilienscout. I am not that versatile with coding and can't fix this myself. Can anyone help me?

[2020/08/19 12:44:23|flathunt.py |DEBUG ]: Settings from config: <flathunter.config.Config object at 0x10a500e10>
[2020/08/19 12:44:23|crawl_immobilienscout.py|DEBUG ]: Got search URL https://www.immobilienscout24.de/Suche/shape/wohnung-mieten?shape=aWB3ZEhpZWVlQWZ4QGlhQGR7QHllQXp2QGtoQ3pBc2VAeWJAd3ZAbUBzZUB0b0BlZkNvaUF7YEltdUB9eEBxfEBkUGNxQGx7QmdaYF1xfEB6S298QGJoQnt0QWRQbUpkbEF_Um5gR2JGeGdAZlp4Y0JqdUBoakJobEFfTg..&numberofrooms=2.0-&price=-1300.0&livingspace=40.0-&sorting=2&pagenumber={0}
[2020/08/19 12:44:24|crawl_immobilienscout.py|DEBUG ]: Index Error occurred
[2020/08/19 12:44:24|crawl_immobilienscout.py|DEBUG ]: []
[2020/08/19 12:44:24|crawl_immobilienscout.py|DEBUG ]: extracted: 0

I think the issue is the provided search URL. If I use it in a browser, it does not lead anywhere. The '&pagenumber={0}' part at the end comes from the crawl_immobilienscout.py file, the URL in my config does not have this part. I tried a few things but all I got where errors.

Thanks guys!

KeyError after 2captcha for immobilienscout24

I'm happy to see that 2captcha was integrated into flathunters since immobilienscout24.de is triggering the captcha after 2x hits for a "selbst gezeichnetes Suchgebiet" link.
Unfortunately, I'm not able to get it completely up and running.
It seems like the captcha was successfully solved, it also shows up in the 2captcha statistics.
This is the resulting log:

[2020/10/03 23:58:50|config.py         |INFO    ]: Using config /app/config.yaml
[2020/10/03 23:58:51|flathunt.py       |DEBUG   ]: Settings from config: <flathunter.config.Config object at 0x7fdb050b43a0>
[2020/10/03 23:58:51|crawl_immobilienscout.py|DEBUG   ]: Got search URL https://www.immobilienscout24.de/Suche/shape/wohnung-mieten?shape=fWZ0ZEh3dGtlQXhWZWtAblJlQGBUdUFoaUB3Q0pvc0FtaUBjfUFtW19gQHNOakhvRmJBdUNqR3lKSXNFa0FxSHdKeWJAZXZBfVZ7cEB3akBpWnFCe29AbWJAfV1jUHxNZ1h4fkBtXW1iQG1iQHNWfU5vR2dLYnRAa01waUB0RGxxQG1HYHBAdEJmX0B7Q2p0QGtFbGtBfGVAYmpAblhlYEB6X0BkRnZlQWhQYF52ZkBmbkFqdEBoXHpE&numberofrooms=2.0-&price=-1300.0&livingspace=50.0-&sorting=2&enteredFrom=result_list&pagenumber={0}
[2020/10/03 23:58:53|abstract_crawler.py|DEBUG   ]: Google site key: <re.Match object; span=(49, 93), match='&k=6LeaILIZAAAAALTgLZV1AQXPc2dAsLItNYJ8jVvB&'>
[2020/10/03 23:58:58|abstract_crawler.py|DEBUG   ]: Captcha status: CAPCHA_NOT_READY
[2020/10/03 23:59:03|abstract_crawler.py|DEBUG   ]: Captcha status: CAPCHA_NOT_READY
[2020/10/03 23:59:08|abstract_crawler.py|DEBUG   ]: Captcha status: CAPCHA_NOT_READY
[2020/10/03 23:59:13|abstract_crawler.py|DEBUG   ]: Captcha status: CAPCHA_NOT_READY
[2020/10/03 23:59:19|abstract_crawler.py|DEBUG   ]: Captcha status: CAPCHA_NOT_READY
[2020/10/03 23:59:24|abstract_crawler.py|DEBUG   ]: Captcha status: CAPCHA_NOT_READY
[2020/10/03 23:59:29|abstract_crawler.py|DEBUG   ]: Captcha status: CAPCHA_NOT_READY
[2020/10/03 23:59:34|abstract_crawler.py|DEBUG   ]: Captcha status: CAPCHA_NOT_READY
[2020/10/03 23:59:39|abstract_crawler.py|DEBUG   ]: Captcha status: CAPCHA_NOT_READY
[2020/10/03 23:59:44|abstract_crawler.py|DEBUG   ]: Captcha status: CAPCHA_NOT_READY
[2020/10/03 23:59:49|abstract_crawler.py|DEBUG   ]: Captcha status: CAPCHA_NOT_READY
[2020/10/03 23:59:54|abstract_crawler.py|DEBUG   ]: Captcha status: CAPCHA_NOT_READY
[2020/10/03 23:59:54|abstract_crawler.py|DEBUG   ]: Captcha promise: OK|03AGdBq25VNbxPhhj-ezyN11xsib3B5QPdtKJo9ZUANoJCXhNG6e3juFSmFIrHvZYVXO_63cEOhOsl9vxmUDdwqfQKj858qVkI-zSoT7idq99rB6uoV0z9WsX4D4TeQdjwlFozEIrIgZ5u12XcAUIfGAGSDrkA3xgdUtwWOuk8swEiW7u51Y_sf4r3GMX0UMgX0KksNv238L9eM26fEa3hDPMr0raK6vpsFUOPW0NxH-vjYTQ4ZDp_LL-8auv9FGn4bfkQTzNpDE71Nn-CGw5iVyEZc_mRupXhrHwD9FENIOokXeB1Iwm6v1pccKEvsrnsoJ5rmkBP5wr2YXDvld8p7VufhtwAk8GUaipG0kEiwpBEOrFKQc1L7t-qp4gZcITfQvjyqOlxEdo8tPTfIt4tIVcZrlGvGOql0qeR57d_X8w5oISKzsMSP6glD3RMy6GeSNVzWpaqfq861qOiDy9nqeQhWyRmBrhJtDGbqBOMQ1EtWKnPQ3mjU98
[2020/10/03 23:59:58|crawl_immobilienscout.py|DEBUG   ]: [123231487, 123210362, 123202495, 123201463, 48470193, 123184168, 123145073, 123020022, 123083698, 123063242, 123054184, 123041195, 123030893, 122974183, 122960443, 122923251, 122915078, 119193192, 122543400, 122810740]
Traceback (most recent call last):
  File "/app/flathunter/crawl_immobilienscout.py", line 134, in extract_data
    image = image_tag["src"]
  File "/usr/local/lib/python3.8/site-packages/bs4/element.py", line 992, in __getitem__
    return self.attrs[key]
KeyError: 'src'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "flathunt.py", line 89, in <module>
    main()
  File "flathunt.py", line 86, in main
    launch_flat_hunt(config)
  File "flathunt.py", line 46, in launch_flat_hunt
    hunter.hunt_flats()
  File "/app/flathunter/hunter.py", line 42, in hunt_flats
    for expose in processor_chain.process(self.crawl_for_exposes(max_pages)):
  File "/app/flathunter/hunter.py", line 21, in crawl_for_exposes
    return chain(*[searcher.crawl(url, max_pages)
  File "/app/flathunter/hunter.py", line 21, in <listcomp>
    return chain(*[searcher.crawl(url, max_pages)
  File "/app/flathunter/abstract_crawler.py", line 121, in crawl
    return self.get_results(url, max_pages)
  File "/app/flathunter/crawl_immobilienscout.py", line 64, in get_results
    entries = self.extract_data(soup)
  File "/app/flathunter/crawl_immobilienscout.py", line 136, in extract_data
    image = image_tag["data-lazy-src"]
  File "/usr/local/lib/python3.8/site-packages/bs4/element.py", line 992, in __getitem__
    return self.attrs[key]
KeyError: 'data-lazy-src'

Great repo btw! Thx for the good work!

Cant get flathunt.py to run

(flathunter-main-tulP8G4X) pi@raspberrypi:~/shared/flathunter-main $ python3 flathunt.py --config config.yaml
[2021/03/11 00:23:12|config.py         |INFO    ]: Using config config.yaml
Traceback (most recent call last):
  File "flathunt.py", line 89, in <module>
    main()
  File "flathunt.py", line 68, in main
    config = Config(config_handle.name)
  File "/home/pi/shared/flathunter-main/flathunter/config.py", line 28, in __init__
    self.config = yaml.safe_load(file)
  File "/home/pi/.local/share/virtualenvs/flathunter-main-tulP8G4X/lib/python3.7/site-packages/yaml/__init__.py", line 162, in safe_load
    return load(stream, SafeLoader)
  File "/home/pi/.local/share/virtualenvs/flathunter-main-tulP8G4X/lib/python3.7/site-packages/yaml/__init__.py", line 114, in load
    return loader.get_single_data()
  File "/home/pi/.local/share/virtualenvs/flathunter-main-tulP8G4X/lib/python3.7/site-packages/yaml/constructor.py", line 41, in get_single_data
    node = self.get_single_node()
  File "/home/pi/.local/share/virtualenvs/flathunter-main-tulP8G4X/lib/python3.7/site-packages/yaml/composer.py", line 36, in get_single_node
    document = self.compose_document()
  File "/home/pi/.local/share/virtualenvs/flathunter-main-tulP8G4X/lib/python3.7/site-packages/yaml/composer.py", line 55, in compose_document
    node = self.compose_node(None, None)
  File "/home/pi/.local/share/virtualenvs/flathunter-main-tulP8G4X/lib/python3.7/site-packages/yaml/composer.py", line 84, in compose_node
    node = self.compose_mapping_node(anchor)
  File "/home/pi/.local/share/virtualenvs/flathunter-main-tulP8G4X/lib/python3.7/site-packages/yaml/composer.py", line 133, in compose_mapping_node
    item_value = self.compose_node(node, item_key)
  File "/home/pi/.local/share/virtualenvs/flathunter-main-tulP8G4X/lib/python3.7/site-packages/yaml/composer.py", line 64, in compose_node
    if self.check_event(AliasEvent):
  File "/home/pi/.local/share/virtualenvs/flathunter-main-tulP8G4X/lib/python3.7/site-packages/yaml/parser.py", line 98, in check_event
    self.current_event = self.state()
  File "/home/pi/.local/share/virtualenvs/flathunter-main-tulP8G4X/lib/python3.7/site-packages/yaml/parser.py", line 449, in parse_block_mapping_value
    if not self.check_token(KeyToken, ValueToken, BlockEndToken):
  File "/home/pi/.local/share/virtualenvs/flathunter-main-tulP8G4X/lib/python3.7/site-packages/yaml/scanner.py", line 115, in check_token
    while self.need_more_tokens():
  File "/home/pi/.local/share/virtualenvs/flathunter-main-tulP8G4X/lib/python3.7/site-packages/yaml/scanner.py", line 152, in need_more_tokens
    self.stale_possible_simple_keys()
  File "/home/pi/.local/share/virtualenvs/flathunter-main-tulP8G4X/lib/python3.7/site-packages/yaml/scanner.py", line 292, in stale_possible_simple_keys
    "could not find expected ':'", self.get_mark())
yaml.scanner.ScannerError: while scanning a simple key
  in "config.yaml", line 27, column 1
could not find expected ':'
  in "config.yaml", line 28, column 1

Add Hacktoberfest tag

It would be nice if the 3 PR that I made counted towards the goal for my shirt (never got one, and this year
the rules are very strict)

Henceforth, I think it would be nice if the repository added the tag.

https://youtu.be/FxBv4FMcDgo

The link above shows how.

If you are not familiar with what Hacktoberfest is, you can check out a description here: https://hacktoberfest.digitalocean.com/

Thanks for the consideration!

Immowelt is limited to first 4 entries of result list

The crawler of immowelt will only search for matching flats in the first 4 entries of the result list.

All further results are loaded asynchronously on immowelt. The crawler does not load these entries. Usually they would be located in a <div id="listItemWrapperAsync" ... > after loading.

flathunter / chromedriver on headles system?

HI,

first of all, do i understand it correctly, that i can use the new captacha resolv methode also on a headless system?
i am on a debian system and downloaded chromedriver from here (https://chromedriver.storage.googleapis.com/index.html?path=87.0.4280.20/) wich seems to work:

Starting ChromeDriver 87.0.4280.20 (c99e81631faa0b2a448e658c0dbd8311fb04ddbd-refs/branch-heads/4280@{#355}) on port 9515
Only local connections are allowed.
Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
ChromeDriver was started successfully.

i also installed chromium. when i start flathunter now i get the following error message:


python3 flathunt.py 
[2020/10/16 13:24:52|config.py         |INFO    ]: Using config /home/user/flathunter/config.yaml
Traceback (most recent call last):
  File "flathunt.py", line 89, in <module>
    main()
  File "flathunt.py", line 68, in main
    config = Config(config_handle.name)
  File "/home/user/flathunter/flathunter/config.py", line 29, in __init__
    self.__searchers__ = [CrawlImmobilienscout(self),
  File "/home/user/flathunter/flathunter/crawl_immobilienscout.py", line 42, in __init__
    self.driver = self.configure_driver(self.driver_executable_path, self.driver_arguments)
  File "/home/user/flathunter/flathunter/abstract_crawler.py", line 51, in configure_driver
    driver = webdriver.Chrome(executable_path=driver_path, options=chrome_options)
  File "/usr/local/lib/python3.7/dist-packages/selenium/webdriver/chrome/webdriver.py", line 73, in __init__
    self.service.start()
  File "/usr/local/lib/python3.7/dist-packages/selenium/webdriver/common/service.py", line 98, in start
    self.assert_process_still_running()
  File "/usr/local/lib/python3.7/dist-packages/selenium/webdriver/common/service.py", line 111, in assert_process_still_running
    % (self.path, return_code)
selenium.common.exceptions.WebDriverException: Message: Service /usr/bin/chromium unexpectedly exited. Status code was: 1

any idea what to do here?

Thanks a lot!!

requirements.txt: pkg-resources==0.0.0 breaks install of dependencies

The installation of the dependencies as specified in requirements.txt fails on Ubuntu 18.04.5 LTS as well as in docker FROM python:3.

▶ pip3 install -r requirements.txt
Requirement already satisfied: astroid==2.4.2 in ./lib/python3.6/site-packages (from -r requirements.txt (line 2)) (2.4.2)
Requirement already satisfied: async-timeout==3.0.1 in ./lib/python3.6/site-packages (from -r requirements.txt (line 3)) (3.0.1)
Requirement already satisfied: beautifulsoup4==4.8.1 in ./lib/python3.6/site-packages (from -r requirements.txt (line 5)) (4.8.1)
Requirement already satisfied: bs4==0.0.1 in ./lib/python3.6/site-packages (from -r requirements.txt (line 6)) (0.0.1)
Requirement already satisfied: CacheControl==0.12.6 in ./lib/python3.6/site-packages (from -r requirements.txt (line 7)) (0.12.6)
Requirement already satisfied: certifi==2019.9.11 in ./lib/python3.6/site-packages (from -r requirements.txt (line 9)) (2019.9.11)
Requirement already satisfied: chardet==3.0.4 in ./lib/python3.6/site-packages (from -r requirements.txt (line 11)) (3.0.4)
Requirement already satisfied: click==7.1.2 in ./lib/python3.6/site-packages (from -r requirements.txt
(line 12)) (7.1.2)
Requirement already satisfied: colorama==0.4.3 in ./lib/python3.6/site-packages (from -r requirements.txt (line 14)) (0.4.3)
Requirement already satisfied: decorator==4.4.2 in ./lib/python3.6/site-packages (from -r requirements.txt (line 16)) (4.4.2)
Requirement already satisfied: Flask==1.1.2 in ./lib/python3.6/site-packages (from -r requirements.txt
(line 18)) (1.1.2)
Requirement already satisfied: Flask-API==2.0 in ./lib/python3.6/site-packages (from -r requirements.txt (line 19)) (2.0)
Requirement already satisfied: google-auth-httplib2==0.0.4 in ./lib/python3.6/site-packages (from -r requirements.txt (line 23)) (0.0.4)
Requirement already satisfied: googleapis-common-protos==1.52.0 in ./lib/python3.6/site-packages (from
-r requirements.txt (line 29)) (1.52.0)
Requirement already satisfied: idna==2.8 in ./lib/python3.6/site-packages (from -r requirements.txt (line 32)) (2.8)
Requirement already satisfied: iniconfig==1.1.1 in ./lib/python3.6/site-packages (from -r requirements.txt (line 34)) (1.1.1)
Requirement already satisfied: itsdangerous==1.1.0 in ./lib/python3.6/site-packages (from -r requirements.txt (line 36)) (1.1.0)
Requirement already satisfied: jsonpath-ng==1.5.2 in ./lib/python3.6/site-packages (from -r requirements.txt (line 38)) (1.5.2)
Requirement already satisfied: lazy-object-proxy==1.4.3 in ./lib/python3.6/site-packages (from -r requirements.txt (line 39)) (1.4.3)
Requirement already satisfied: MarkupSafe==1.1.1 in ./lib/python3.6/site-packages (from -r requirements.txt (line 41)) (1.1.1)
Requirement already satisfied: mccabe==0.6.1 in ./lib/python3.6/site-packages (from -r requirements.txt (line 42)) (0.6.1)
Requirement already satisfied: more-itertools==8.4.0 in ./lib/python3.6/site-packages (from -r requirements.txt (line 44)) (8.4.0)
Requirement already satisfied: multidict==4.7.6 in ./lib/python3.6/site-packages (from -r requirements.txt (line 46)) (4.7.6)
ERROR: Could not find a version that satisfies the requirement pkg-resources==0.0.0
ERROR: No matching distribution found for pkg-resources==0.0.0

The dependency pkg-resources==0.0.0 is the culprit, it seems this is a known bug that happens when freezing the requirements on Ubuntu, see link.

Removing pkg-resources==0.0.0 from requirements.txt seems to fix the issue.

gcloud deployment issue

Hello,

I have problems with the final step

$ gcloud app deploy cron.yaml

I get the error message:

ERROR: (gcloud.app.deploy) An error occurred while parsing file: [/Users/myname/Desktop/flathunter-main/cron.yaml]
Unexpected attribute 'loop' for object of type CronInfoExternal.
in "/Users/myname/Desktop/flathunter-main/cron.yaml", line 9, column 5

What could be the reason?

Many thanks in advance!

WG Gesucht returns more results than shown in browser

The results of WG Gesucht differ from the entities shown in the browser. The URL structure from beautiful soup is also different from the one shown in the browser.

Two Bots config help

Hi,
i am a beginner in programming. At this point everything is working fine but is it possible to configure two different bot´s to message? In my example just the second Bot is sending the messages. How do i have to arrange the lines for two bot support? An example would be great.

telegram:
   bot_token: 132xxxxxx:xxxxxxxxxxxxxxxxxxxxxxx
   receiver_ids:
       - 306xxxxx
   bot_token: 122xxxxxxxx-xxxxxxxxxxxxxxxxxxxx
   receiver_ids:
       - 102xxxxx

immoscout24 broken somehow

I am using the url https://www.immobilienscout24.de/Suche/de/nordrhein-westfalen/koeln/wohnung-mieten?sorting=2 and since a few hours, I am just getting a long printout but not any results sent via telgramm bot anymore. Ebay Kleinanzeigen is still working fine.

file.log

If I can provide more info, please let me know.

flathunter only works for flats ;)

Hi,,

this is more a question than an issue. Is there any way to use flathunter also for other searches on immobilienscout24.de?
So i realized, that it workes perfectly now on "Wohnung mieten" or "Haus mieten" but then it does not work on "Haus kaufen"
(example search string: https://www.immobilienscout24.de/Suche/de/berlin/berlin/haus-kaufen?enteredFrom=one_step_search)
python3 flathunter.py [2020/06/08 16:27:04|config.py |INFO ]: Using config /scripts/flathunter/config.yaml [2020/06/08 16:27:04|idmaintainer.py |INFO ]: already processed: 10 Traceback (most recent call last): File "flathunter.py", line 91, in <module> main() File "flathunter.py", line 87, in main launch_flat_hunt(config) File "flathunter.py", line 46, in launch_flat_hunt hunter.hunt_flats() File "/scripts/flathunter/flathunter/hunter.py", line 38, in hunt_flats results = searcher.get_results(url) File "/scripts/flathunter/flathunter/crawl_immobilienscout.py", line 33, in get_results 0].text) ValueError: invalid literal for int() with base 10: '1.228 '

neither on "Grundstück kaufen"
(search string: https://www.immobilienscout24.de/Suche/de/grundstueck-kaufen?plotarea=-5000.0&price=-10000.0&pricetype=buy&enteredFrom=result_list)

python3 flathunter.py [2020/06/08 16:28:51|config.py |INFO ]: Using config /scripts/flathunter/config.yaml [2020/06/08 16:28:51|idmaintainer.py |INFO ]: already processed: 10 Traceback (most recent call last): File "flathunter.py", line 91, in <module> main() File "flathunter.py", line 87, in main launch_flat_hunt(config) File "flathunter.py", line 46, in launch_flat_hunt hunter.hunt_flats() File "/scripts/flathunter/flathunter/hunter.py", line 38, in hunt_flats results = searcher.get_results(url) File "/scripts/flathunter/flathunter/crawl_immobilienscout.py", line 37, in get_results entries = self.extract_data(soup) File "/scripts/flathunter/flathunter/crawl_immobilienscout.py", line 101, in extract_data entries.append(details) UnboundLocalError: local variable 'details' referenced before assignment

would there be a quick fix for that, or is this just not planned or wanted?

Thank you so much in advance!

Cheers

Troubleshooting setting up flathunter

Hi,
would you be able to help me setting up flathunter?
Running pytest shows 21 failed, 52 passed.

Thanks!


========================================================================================== short test summary info ===========================================================================================
FAILED test/test_config.py::ConfigTest::test_defaults_fields - AssertionError: 'c:\\users\\xx\\documents\\flathunter\\flathunter-main' != 'C:\\Users\\xx\\Documents\\Flathunter\\flathunter-main'
FAILED test/test_config.py::ConfigTest::test_loads_config - yaml.parser.ParserError: while parsing a block mapping
FAILED test/test_config.py::ConfigTest::test_loads_config_at_file - PermissionError: [Errno 13] Permission denied: 'C:\\Users\\xx\\AppData\\Local\\Temp\\tmpe8hnlt4t'
FAILED test/test_crawl_ebaykleinanzeigen.py::test_process_expose_fetches_details - ValueError: Invalid format string
FAILED test/test_crawl_immobilienscout.py::test_crawl_works - assert 0 > 0
FAILED test/test_crawl_immobilienscout.py::test_process_expose_fetches_details - assert 0 > 0
FAILED test/test_crawl_immowelt.py::test_process_expose_fetches_details - ValueError: Invalid format string
FAILED test/test_crawl_wggesucht.py::WgGesuchtCrawlerTest::test - AssertionError: False is not true : URL should be an apartment link
FAILED test/test_statistics_view.py::test_statistics_view - sqlite3.OperationalError: unable to open database file
FAILED test/test_web_interface.py::test_get_index - sqlite3.OperationalError: unable to open database file
FAILED test/test_web_interface.py::test_get_index_with_exposes - sqlite3.OperationalError: unable to open database file
FAILED test/test_web_interface.py::test_hunt_with_users - sqlite3.OperationalError: unable to open database file
FAILED test/test_web_interface.py::test_hunt_via_post - sqlite3.OperationalError: unable to open database file
FAILED test/test_web_interface.py::test_multi_user_hunt_via_post - sqlite3.OperationalError: unable to open database file
FAILED test/test_web_interface.py::test_hunt_via_post_with_filters - sqlite3.OperationalError: unable to open database file
FAILED test/test_web_interface.py::test_render_index_after_login - sqlite3.OperationalError: unable to open database file
FAILED test/test_web_interface.py::test_do_not_send_messages_if_notifications_disabled - sqlite3.OperationalError: unable to open database file
FAILED test/test_web_interface.py::test_toggle_notification_status - sqlite3.OperationalError: unable to open database file
FAILED test/test_web_interface.py::test_update_filters - sqlite3.OperationalError: unable to open database file
FAILED test/test_web_interface.py::test_update_filters_not_logged_in - sqlite3.OperationalError: unable to open database file
FAILED test/test_web_interface.py::test_index_logged_in_with_filters - sqlite3.OperationalError: unable to open database file

Security Violation (503) - crawl_ebaykleinanzeigen.py ERROR

Since a few days, I am getting the following error message quite often.

[2020/07/29 08:13:51|crawl_ebaykleinanzeigen.py|ERROR   ]: Got response (503): b'<!DOCTYPE html><html><head><title>Error Page</title><style type="text/css">html{font-family:\'Helvetica Neue\',Helvetica,Arial,sans-serif;font-size:1em}.center-box{margin: 20% auto auto auto;width: 50%;border: 1px solid #dcdcdc;padding: 1em;}\n</style><title>Security Violation (503)</title></head></head><body>\n<div class="center-box">\n    <h3>www.ebay-kleinanzeigen.de&nbsp;|&nbsp;Access denied (403)</h3>\n    <h4>Current session has been terminated.</h2>\n    <p>For further information, do not hesitate to contact us.</p>\n    <p>Ref: <span id="addr">2003:f7:bf40:9c00:48ec:d655:a809:f7e7</span>&nbsp;<span id="time">1596003231</span></p>\n</div></body><script>document.getElementById("time").innerHTML = (new Date()).toISOString()</script>\n</html>\n'

Restarting flathunt.py mostly helps to stop getting these messages.

Any ideas or hints? This time, I also have pulled the latest code before opening an issue 😆

Error when adding the 2Captcha service

I pulled freshly and added the 2Captcha Service but not i get this:

 File "flathunt.py", line 89, in <module>
    main()
  File "flathunt.py", line 86, in main
    launch_flat_hunt(config)
  File "flathunt.py", line 46, in launch_flat_hunt
    hunter.hunt_flats()
  File "/home/user/janhuntboi/flathunter/hunter.py", line 42, in hunt_flats
    for expose in processor_chain.process(self.crawl_for_exposes(max_pages)):
  File "/home/user/janhuntboi/flathunter/hunter.py", line 22, in crawl_for_exposes
    for searcher in self.config.searchers()
  File "/home/user/janhuntboi/flathunter/hunter.py", line 23, in <listcomp>
    for url in self.config.get('urls', list())])
  File "/home/user/janhuntboi/flathunter/abstract_crawler.py", line 136, in crawl
    return self.get_results(url, max_pages)
  File "/home/user/janhuntboi/flathunter/crawl_immobilienscout.py", line 63, in get_results
    return self.get_entries_from_javascript()
  File "/home/user/janhuntboi/flathunter/crawl_immobilienscout.py", line 105, in get_entries_from_javascript
    return [ self.extract_entry_from_javascript(entry) for entry in entry_list ]
  File "/home/user/janhuntboi/flathunter/crawl_immobilienscout.py", line 105, in <listcomp>
    return [ self.extract_entry_from_javascript(entry) for entry in entry_list ]
  File "/home/user/janhuntboi/flathunter/crawl_immobilienscout.py", line 111, in extract_entry_from_javascript
    'image': entry["resultlist.realEstate"]["galleryAttachments"]["attachment"][0]["@xlink.href"] if "galleryAttachments" in entry["resultlist.realEstate"] else "https://www.static-immobilienscout24.de/statpic/placeholder_house/496c95154de31a357afa978cdb7f15f0_placeholder_medium.png",

WebHunter is no longer sending expose messages to receiver_ids

WebHunter doesn't send messages to users listed in receiver_ids who don't have settings stored in Firestore

ERROR: pipenv install => "Warning: Python 3.7 was not found on your system"

Hello,

I'm an absolute python newbie so please have mercy.
Python 3.8.4 is installed on my Win 10 Pro client.

When executing

pipenv install

I get the following errors:

Warning: Python 3.7 was not found on your system... Neither 'pyenv' nor 'asdf' could be found to install Python. You can specify specific versions of Python with: $ pipenv --python path\to\python

Thanks a lot for your help.

Chromedriver issue with Docker

Hi People,

I Try to start a docker contatiner on a Ubuntu Server without GUI. the programm Crash if i enable anticaptcha, ERROR:

used IMAGE: oyzoursky/python-chromedriver:3.8-selenium

Traceback (most recent call last):
File "flathunt.py", line 89, in
main()
File "flathunt.py", line 68, in main
config = Config(config_handle.name)
File "/app/flathunter/config.py", line 29, in init
self.searchers = [CrawlImmobilienscout(self),
File "/app/flathunter/crawl_immobilienscout.py", line 43, in init
self.driver = self.configure_driver(self.driver_executable_path, self.driver_arguments)
File "/app/flathunter/abstract_crawler.py", line 51, in configure_driver
driver = webdriver.Chrome(executable_path=driver_path, options=chrome_options)
File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/chrome/webdriver.py", line 76, in init
RemoteWebDriver.init(
File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 157, in init
self.start_session(capabilities, browser_profile)
File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 252, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally
(unknown error: DevToolsActivePort file doesn't exist)
(The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)

used image: oyzoursky/python-chromedriver:3.8 :

raceback (most recent call last):
File "flathunt.py", line 89, in
main()
File "flathunt.py", line 68, in main
config = Config(config_handle.name)
File "/app/flathunter/config.py", line 29, in init
self.searchers = [CrawlImmobilienscout(self),
File "/app/flathunter/crawl_immobilienscout.py", line 43, in init
self.driver = self.configure_driver(self.driver_executable_path, self.driver_arguments)
File "/app/flathunter/abstract_crawler.py", line 51, in configure_driver
driver = webdriver.Chrome(executable_path=driver_path, options=chrome_options)
File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/chrome/webdriver.py", line 76, in init
RemoteWebDriver.init(
File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 157, in init
self.start_session(capabilities, browser_profile)
File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 252, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally
(unknown error: DevToolsActivePort file doesn't exist)
(The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)

can anyone help me ?

wg-gesucht crawler generates URLs wiht a double slash, which makes tests fail

I don't know if I'm missing something here. But essentially, it seems that the test test_crawl_wggesucht.py fails on this assertion:

self.assertTrue(entries[0]['url'].startswith("https://www.wg-gesucht.de/wohnungen"), u"URL should be an apartment link")

and the reason it fails is because the urls that the crawler generates contain a double slash after the domain, e.g.:

'https://www.wg-gesucht.de//wohnungen-in-Berlin-Friedrichshain.8598343.html'

Apparently when the URL is created (code below), the href in the a element is assumed to come without a leading slash, but it does in fact come with a leading slash. I guess the wg-gesucht site changed this at some point?

        base_url = 'https://www.wg-gesucht.de/'
        for row in existing_findings:
            title_row = row.find('h3', {"class": "truncate_title"})
            title = title_row.text.strip()
            url = base_url + title_row.find('a')['href']

The functionality still works because the URL with the double slash works just the same, but the tests fail.

If this is confirmed to be an issue, I'd be happy to provide a PR to fix it.

Unable to process Ebay expose: 'NoneType' object has no attribute 'text'

While crawling ebay-kleinanzeigen.de I get the following warning:

[2021/02/23 12:44:21|crawl_ebaykleinanzeigen.py|WARNING ]: Unable to process Ebay expose: 'NoneType' object has no attribute 'text'

I don't know if this effects the notification in telegram because the message seems fine:

Integrate shell script?

Hi,

last year i was using an old version of flathunter. The hunter.py looked way different. At that time i changed this file a little bit to fire up my own shell script on every single id the flathunter found. It looked like this:

...............
     for expose in results:
                # check if already processed
                if expose['id'] in processed:
                    continue


               # WohnungsMailer abfeuern 
              ident = expose['id']
              subprocess.call(['/home/pi/Desktop/flathunter/flathunter/mailer.sh', str(ident)])



                self.__log__.info('New offer: ' + expose['title'])
                
                # to reduce traffic, some addresses need to be loaded on demand
...............

how could i integrate this now?

Thanks a lot!!!

Immoscout Stopped Working

Immoscout stopped working. The crawler gets following message:

Warum haben wir deine Suchanfragen blockiert?

Es kann verschiedene Gründe haben, warum wir dich fälschlicherweise als Roboter identifiziert haben. Möglicherweise

hast du die Cookies für unsere Seite deaktiviert.
hast du die Ausführung von JavaScript deaktiviert.
nutzt du ein Browser-Plugin eines Drittanbieters, beispielsweise einen Ad-Blocker.

Flathunter no loop

Hello, I got the program running and after the first run I also received all the available flats via Telegram. But now it seems that the program isn't looping, so it only works every time I start it manually. Is there a way to check if it's running properly? I don't get any updates in the console, also after waiting 5 minutes, which is the looptime set in the config file.

Onboarding: clicking the link in Telegram opens browser in Telegram and does not allow logging in

The app is working great - just 1 small onboarding issue that confused me at first.

Platform: iOS latest version.

If you click the initial flathunter.codders.io link, it opens an in-app browser which does not save the login credentials, resulting in repeated requests to login.

418 You look like a robot with Ebay Kleinanzeigen

I get this message when using the chromedriver in conjunction with the 2captcha service. I even set the sleeping_time to 650.

Got response (418): b'<!DOCTYPE html>\n<html>\n <head>\n <title>418 You look like a robot (1). If you think you are not, contact us: [email protected]\\n \\n54.155.47.221</title>\n </head>\n <body>\n <h1>Error 418 You look like a robot (1). If you think you are not, contact us: [email protected]\\n \\n54.155.47.221</h1>\n <p>You look like a robot (1). If you think you are not, contact us: [email protected]\\n \\n54.155.47.221</p>\n <h3>Guru Meditation:</h3>\n <p>XID: 253749141</p>\n <hr>\n <p>Varnish cache server</p>\n </body>\n</html>\n'

Filters dont work for ebay-kleinanzeigen

Hi,

since flathunter does not work for immoscout ( #45), i tried to let it hunt for flats on ebay kleinanzeigen, but realized, that filters are not recognized.
Is this how it should be? Any other way to filter out offers with certain words?

Cheers and thanks!!!

Problem to get Flathunter running

Unfortunately I have a problem to get Flathunter up and running. When I run "pytest" I get the following error message:

======================================================================== ERRORS =========================================================================
_________________________________________________________ ERROR collecting test/test_config.py __________________________________________________________
ImportError while importing test module '/home/planetdyna/test/test_config.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python2.7/dist-packages/six.py:709: in exec_
exec("""exec code in globs, locs""")
test/test_config.py:5: in
from flathunter.config import Config
E ImportError: No module named flathunter.config
_________________________________________________ ERROR collecting test/test_crawl_ebaykleinanzeigen.py _________________________________________________
ImportError while importing test module '/home/planetdyna/test/test_crawl_ebaykleinanzeigen.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python2.7/dist-packages/six.py:709: in exec_
exec("""exec code in globs, locs""")
test/test_crawl_ebaykleinanzeigen.py:2: in
from flathunter.crawl_ebaykleinanzeigen import CrawlEbayKleinanzeigen
E ImportError: No module named flathunter.crawl_ebaykleinanzeigen
__________________________________________________ ERROR collecting test/test_crawl_immobilienscout.py __________________________________________________
ImportError while importing test module '/home/planetdyna/test/test_crawl_immobilienscout.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python2.7/dist-packages/six.py:709: in exec_
exec("""exec code in globs, locs""")
test/test_crawl_immobilienscout.py:3: in
from flathunter.crawl_immobilienscout import CrawlImmobilienscout
E ImportError: No module named flathunter.crawl_immobilienscout
_____________________________________________________ ERROR collecting test/test_crawl_immowelt.py ______________________________________________________
ImportError while importing test module '/home/planetdyna/test/test_crawl_immowelt.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python2.7/dist-packages/six.py:709: in exec_
exec("""exec code in globs, locs""")
test/test_crawl_immowelt.py:3: in
from flathunter.crawl_immowelt import CrawlImmowelt
E ImportError: No module named flathunter.crawl_immowelt
_____________________________________________________ ERROR collecting test/test_crawl_wggesucht.py _____________________________________________________
ImportError while importing test module '/home/planetdyna/test/test_crawl_wggesucht.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python2.7/dist-packages/six.py:709: in exec_
exec("""exec code in globs, locs""")
test/test_crawl_wggesucht.py:3: in
from flathunter.crawl_wggesucht import CrawlWgGesucht
E ImportError: No module named flathunter.crawl_wggesucht
________________________________________________ ERROR collecting test/test_gmaps_duration_processor.py _________________________________________________
/usr/lib/python2.7/dist-packages/pytest/python.py:450: in importtestmodule
mod = self.fspath.pyimport(ensuresyspath=importmode)
/usr/lib/python2.7/dist-packages/py/path/local.py:701: in pyimport
import(modname)
E File "/home/planetdyna/test/test_gmaps_duration_processor.py", line 28
E SyntaxError: Non-ASCII character '\xd0' in file /home/planetdyna/test/test_gmaps_duration_processor.py on line 29, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
________________________________________________ ERROR collecting test/test_googlecloud_idmaintainer.py _________________________________________________
ImportError while importing test module '/home/planetdyna/test/test_googlecloud_idmaintainer.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python2.7/dist-packages/six.py:709: in exec
exec("""exec code in globs, locs""")
test/test_googlecloud_idmaintainer.py:4: in
from mockfirestore import MockFirestore
E ImportError: No module named mockfirestore
_________________________________________________________ ERROR collecting test/test_hunter.py __________________________________________________________
ImportError while importing test module '/home/planetdyna/test/test_hunter.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python2.7/dist-packages/six.py:709: in exec
exec("""exec code in globs, locs""")
test/test_hunter.py:2: in
import yaml
E ImportError: No module named yaml
______________________________________________________ ERROR collecting test/test_id_maintainer.py ______________________________________________________
ImportError while importing test module '/home/planetdyna/test/test_id_maintainer.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python2.7/dist-packages/six.py:709: in exec
exec("""exec code in globs, locs""")
test/test_id_maintainer.py:5: in
from flathunter.idmaintainer import IdMaintainer
E ImportError: No module named flathunter.idmaintainer
________________________________________________________ ERROR collecting test/test_processor.py ________________________________________________________
ImportError while importing test module '/home/planetdyna/test/test_processor.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python2.7/dist-packages/six.py:709: in exec_
exec("""exec code in globs, locs""")
test/test_processor.py:2: in
import yaml
E ImportError: No module named yaml
_____________________________________________________ ERROR collecting test/test_sender_telegram.py _____________________________________________________
ImportError while importing test module '/home/planetdyna/test/test_sender_telegram.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python2.7/dist-packages/six.py:709: in exec_
exec("""exec code in globs, locs""")
test/test_sender_telegram.py:1: in
import requests_mock
E ImportError: No module named requests_mock
_____________________________________________________ ERROR collecting test/test_statistics_view.py _____________________________________________________
ImportError while importing test module '/home/planetdyna/test/test_statistics_view.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python2.7/dist-packages/six.py:709: in exec_
exec("""exec code in globs, locs""")
test/test_statistics_view.py:3: in
import yaml
E ImportError: No module named yaml
______________________________________________________ ERROR collecting test/test_web_interface.py ______________________________________________________
ImportError while importing test module '/home/planetdyna/test/test_web_interface.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python2.7/dist-packages/six.py:709: in exec_
exec("""exec code in globs, locs""")
test/test_web_interface.py:3: in
import yaml
E ImportError: No module named yaml
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 13 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

UnboundLocalError: local variable 'no_of_results' referenced before assignment

Hello,
I managed to set everything up and run the file. Had some difficulties because I never used Python and the console before, but I managed to install it. No after running the flathunt.py file, I get this error message:

File "/.../flathunter-main/flathunter/crawl_immobilienscout.py", line 45, in get_results
    while len(entries) < min(no_of_results, self.RESULT_LIMIT) and \
UnboundLocalError: local variable 'no_of_results' referenced before assignment

Does this mean it can't find any flats under my link that I've added to the config? How can I fix this error and keep the code running?

Edit:
I added a Immowelt and a Immonet link and removed the ImmoScount one, and now it runs perfectly fine. So the error has to be inside of the Immscout crawler, or my link isn't valable?
Also how can I see if the program crawls every 5 minutes? I got it running inside the console but after 5 minutes, nothing new pops up inside of the console. Is that normal when the program doesn't find any new flats?

Add mock-firestore to requirements file

TimeoutException after Captcha

Hi!
I experience timeouts right after the captcha solving, but it is not a steady problem, i.e. when I restart the script it runs thorugh. My 2_captcha setup is working (most of the time as you see).

Here is a typical log:

[2021/03/16 15:50:07|abstract_crawler.py|DEBUG   ]: Captcha status: CAPCHA_NOT_READY
[2021/03/16 15:50:12|abstract_crawler.py|DEBUG   ]: Captcha status: CAPCHA_NOT_READY
[2021/03/16 15:50:12|abstract_crawler.py|DEBUG   ]: Captcha promise: OK|03AGdBq24s86a4LSPuU8CJCT1gF-JzbFcRDkyXAtQwSgzP6QK9Yv9z3iA0mlzfBdIh5hv0H1t7s-PJnBDD0yFgaXJVaYr1dba57vCu_66yXiQ6gzeRtIcQwJYKgctw9_8Y9d7ThbShmLlG7v6Y5qWTmELSyX0QiDNInqIDNwM5DXCNmzLTw1lrVoENlgXoKerJbJhO0Gy1aZdO6-gV-nD_wqPpGI5NDKGnKcMXdajE4L6FxJILEnyXY77HAnI05MRbbI-dLIFEUAKKenWovdyMLgjDIbb83dZhoEB8iFyEDmDhV07Zea2CS7MEvLAXT9B-0s9D3mmR0pfZbhQ9bF_KEh43k83kxBLZ1_jjhebf2lECm6LfWKKv1MCPSsObwNsrhtt2ivCxKJqRaoqjXlDHxkRyyRh0p6oyvjiX8tlx0Iynse7oX2w2FvEqX9htv4F_M06EPkweyoXGx3-rEHPyA3IqhTCQXbow6ChGvaF1Y9S4Ze1AwHuveto
Traceback (most recent call last):
  File "flathunt.py", line 89, in <module>
    main()
  File "flathunt.py", line 86, in main
    launch_flat_hunt(config)
  File "flathunt.py", line 50, in launch_flat_hunt
    hunter.hunt_flats()
  File "/home/m/flathunter/flathunter/hunter.py", line 42, in hunt_flats
    for expose in processor_chain.process(self.crawl_for_exposes(max_pages)):
  File "/home/m/flathunter/flathunter/hunter.py", line 22, in crawl_for_exposes
    for searcher in self.config.searchers()
  File "/home/m/flathunter/flathunter/hunter.py", line 23, in <listcomp>
    for url in self.config.get('urls', list())])
  File "/home/m/flathunter/flathunter/abstract_crawler.py", line 136, in crawl
    return self.get_results(url, max_pages)
  File "/home/m/flathunter/flathunter/crawl_immobilienscout.py", line 60, in get_results
    soup = self.get_page(search_url, self.driver, page_no)
  File "/home/m/flathunter/flathunter/crawl_immobilienscout.py", line 120, in get_page
    return self.get_soup_from_url(search_url.format(page_no), driver=driver, captcha_api_key=self.captcha_api_key, checkbox=self.checkbox, afterlogin_string=self.afterlogin_string)
  File "/home/m/flathunter/flathunter/abstract_crawler.py", line 75, in get_soup_from_url
    self.resolvecaptcha(driver, checkbox, afterlogin_string, captcha_api_key)
  File "/home/m/flathunter/flathunter/abstract_crawler.py", line 153, in resolvecaptcha
    self._solve(driver, api_key)
  File "/home/m/flathunter/flathunter/abstract_crawler.py", line 181, in _solve
    self._check_if_iframe_not_visible(driver)
  File "/home/m/flathunter/flathunter/abstract_crawler.py", line 216, in _check_if_iframe_not_visible
    (By.CSS_SELECTOR, "iframe[src^='https://www.google.com/recaptcha/api2/anchor?']")))
  File "/home/m/.pyenv/versions/venv_flathunter/lib/python3.6/site-packages/selenium/webdriver/support/wait.py", line 80, in until
    raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:

here are two examples with more context:

timeout_1.log
timeout_2.log

I am just guessing here: maybe after 2 min of no answer from 2captcha, the script raises the timeoutException?

If this is a persistent problem of 2captcha (time to answer > time selenium holds this connection open), it would be nice, if the script wouldn't break, when this happens.

Thanks for looking into it!
Marvin

"Kein Bild hinterlegt" Problem with immoscout24 & selenium

Sometimes there are entries without images in the resultlist for immoscout24. AFAIK there is no URL config to exclude those.

  File "..\flathunter\crawl_immobilienscout.py", line 107, in extract_entry_from_javascript
    'image': entry["resultlist.realEstate"]["galleryAttachments"]["attachment"][0]["@xlink.href"],
KeyError: 'galleryAttachments'

data-imgsrc KeyError when crawling ebay-kleinanzeigen

Hi!
I get an error when crawling kleinanzeigen after days with no problem. Did they change something? Here is the error message:

Traceback (most recent call last):
  File "flathunt.py", line 89, in <module>
    main()
  File "flathunt.py", line 86, in main
    launch_flat_hunt(config)
  File "flathunt.py", line 46, in launch_flat_hunt
    hunter.hunt_flats()
  File "/home/m/flathunter/flathunter/hunter.py", line 42, in hunt_flats
    for expose in processor_chain.process(self.crawl_for_exposes(max_pages)):
  File "/home/m/flathunter/flathunter/hunter.py", line 22, in crawl_for_exposes
    for searcher in self.config.searchers()
  File "/home/m/flathunter/flathunter/hunter.py", line 23, in <listcomp>
    for url in self.config.get('urls', list())])
  File "/home/m/flathunter/flathunter/abstract_crawler.py", line 136, in crawl
    return self.get_results(url, max_pages)
  File "/home/m/flathunter/flathunter/abstract_crawler.py", line 127, in get_results
    entries = self.extract_data(soup)
  File "/home/m/flathunter/flathunter/crawl_ebaykleinanzeigen.py", line 72, in extract_data
    image = image_element["data-imgsrc"]
  File "/home/m/.pyenv/versions/venv_flathunter/lib/python3.6/site-packages/bs4/element.py", line 992, in __getitem__
    return self.attrs[key]
KeyError: 'data-imgsrc'

that is my search-string:

https://www.ebay-kleinanzeigen.de/s-wohnung-mieten/mitte/anzeige:angebote/preis::800/c203l3518r5+wohnung_mieten.zimmer_d:2

Thanks for looking into it!
Marvin

Index Error on wg-gesucht

Hi,

I keep getting an Index Error whenever I try to crawl wg-gesucht. The message is:

flathunter\flathunter\crawl_wggesucht.py", line 40, in extract_data
rooms = re.findall(r'\d Zimmer', details_array[0])[0][:1]
IndexError: list index out of range

The problem seems to be that it cannot handle an empty list, i.e. there are no numbers with "Zimmer". I did a quick fix and it seems to be working for me now like this:

rooms_tmp = re.findall(r'\d Zimmer', details_array[0])
if not rooms_tmp:
rooms = 0
else:
rooms = rooms_tmp[0][:1]

TypeError: 'NoneType' object is not subscriptable

Hello,

since one hour I am getting the following error message

[2020/07/09 15:42:01|config.py         |INFO    ]: Using config /home/choeffer/Dokumente/flathunter/config.yaml
Traceback (most recent call last):
  File "flathunt.py", line 89, in <module>
    main()
  File "flathunt.py", line 86, in main
    launch_flat_hunt(config)
  File "flathunt.py", line 46, in launch_flat_hunt
    hunter.hunt_flats()
  File "/home/choeffer/Dokumente/flathunter/flathunter/hunter.py", line 42, in hunt_flats
    for expose in processor_chain.process(self.crawl_for_exposes(max_pages)):
  File "/home/choeffer/Dokumente/flathunter/flathunter/hunter.py", line 21, in crawl_for_exposes
    return chain(*[searcher.crawl(url, max_pages)
  File "/home/choeffer/Dokumente/flathunter/flathunter/hunter.py", line 21, in <listcomp>
    return chain(*[searcher.crawl(url, max_pages)
  File "/home/choeffer/Dokumente/flathunter/flathunter/abstract_crawler.py", line 48, in crawl
    return self.get_results(url, max_pages)
  File "/home/choeffer/Dokumente/flathunter/flathunter/abstract_crawler.py", line 39, in get_results
    entries = self.extract_data(soup)
  File "/home/choeffer/Dokumente/flathunter/flathunter/crawl_ebaykleinanzeigen.py", line 81, in extract_data
    rooms = re.match(r'(\d+)', tags[1].text)[1]
TypeError: 'NoneType' object is not subscriptable

If you need more information for debugging the issue or additional logs, please let me know.

Add a heartbeat message, to ensure the software is running

This is good for debugging and making sure the program is still running, which otherwise cannot be known without chechink it manually. This can be tedious, when the program is running on a server and supposed to be a set and forget type thing.

So a feature, that a "I'm still alive" message is sent regularly (maybe once a day or week, this should be configurable) may make sense.

grpcio Package installation failed

Hi,
I've tried to install the package and received the following error

Traceback (most recent call last):
  File "/home/ubuntu/.local/bin/pipenv", line 8, in <module>
    sys.exit(cli())
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pipenv/vendor/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pipenv/vendor/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pipenv/vendor/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pipenv/vendor/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pipenv/vendor/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pipenv/vendor/click/decorators.py", line 73, in new_func
    return ctx.invoke(f, obj, *args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pipenv/vendor/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pipenv/vendor/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pipenv/cli/command.py", line 233, in install
    retcode = do_install(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pipenv/core.py", line 2052, in do_install
    do_init(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pipenv/core.py", line 1304, in do_init
    do_install_dependencies(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pipenv/core.py", line 899, in do_install_dependencies
    batch_install(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pipenv/core.py", line 796, in batch_install
    _cleanup_procs(procs, failed_deps_queue, retry=retry)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pipenv/core.py", line 703, in _cleanup_procs
    raise exceptions.InstallError(c.dep.name, extra=err_lines)
pipenv.exceptions.InstallError: ERROR: Couldn't install package: grpcio
 Package installation failed...
  ☤  ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 0/1 — 01:23:46

Some details about my system
-Ubuntu 20.04
-Python 3.8.5.

Already tried pip3 install --upgrade setuptools without an effect.
Suprisingly the installation pip3 install gprcio run through without an error.

Any idea how to solve this issue?

Thanks!
Alex

chromedriver on Google Cloud

Hi,
Sorry for this trivial question, but I cannot get chromedriver to start in the Google Cloud. My latest guess for config.yaml was "driver_path: /usr/local/bin/chromedriver", because this is the path mentioned in the joyzoursky dockerfile.

On Google Cloud I get these messages:

FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/bin/chromedriver': '/usr/local/bin/chromedriver'
selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home

As I'm not at all familiar with Google Cloud, I'm not sure how to find out where the executable is actually located.

Cheers
Martin

Error 405 on Immobilienscout24

Hi there! In advance, thanks for the great tool, which runs perfectly fine despite the error. for immoscout24 I get a 405 error every time (see below), but the flathunter continues to run without problems. currently it runs on a cloud-server from hetzner. When I test it at home with the identical settings, I get no error message. any idea to solve the http-error?

[2020/10/30 07:32:32|abstract_crawler.py|ERROR ]: Got response (405): b'<!DOCTYPE html>\n<html>\n<head>\n <meta http-equiv="content-type" content="text/html; charset=UTF-8"/>\n <meta name="viewport" content="width=device-width, initial-scale=1, minimum-scale=1, maximum-scale=1" />\n <meta http-equiv="X-UA-Compatible" content="IE=edge" />\n <meta name="robots" content="noindex, nofollow">\n <meta http-equiv="cache-control" content="no-cache, no-store, must-revalidate">\n <meta http-equiv="pragma" content="no-cache">\n <meta http-equiv="expires" content="0">\n <title>Ich bin kein Roboter - ImmobilienScout24</title>\n <link rel="icon" type="image/vnd.microsoft.icon" href="https://www.immobilienscout24.de/favicon.ico"/>\n <link rel="shortcut icon" type="image/vnd.microsoft.icon" href="https://www.immobilienscout24.de/favicon.ico"/>\n <style>\n @font-face {\n font-family: "Make It Sans IS24 Web";\n font-style: normal;\n font-weight: 400;\n font-display: swap;\n src: url("https://www.static-immobilienscout24.de/fro/core/4.4.1/font/vendor/make-it-sans/MakeItSansIS24WEB-Regular.woff2") format("woff2"), url("https://www.static-immobilienscout24.de/fro/core/4.4.1/font/vendor/make-it-sans/MakeItSansIS24WEB-Regular.woff") format("woff");\n }\n @font-face {\n font-family: "Make It Sans IS24 Web";\n font-style: normal;\n font-weight: 700;\n font-display: swap;\n src: url("https://www.static-immobilienscout24.de/fro/core/4.4.1/font/vendor/make-it-sans/MakeItSansIS24WEB-Bold.woff2") format("woff2"), url("https://www.static-immobilienscout24.de/fro/core/4.4.1/font/vendor/make-it-sans/MakeItSansIS24WEB-Bold.woff") format("woff");\n }\n\n @font-face {\n font-family: \'IS24Icons\';\n src: url(\'https://www.static-immobilienscout24.de/fro/core/4.4.1/font/vendor/is24-icons/is24-icons.woff\') format(\'woff\');\n font-weight: normal;\n font-style: normal;\n }\n\n a, abbr, address, article, aside, audio, b, blockquote, body, canvas, caption, cite, code, dd, del, details, dfn, div, dl, dt, em, fieldset, figcaption, figure, footer, form, h1, h2, h3, h4, h5, h6, header, html, i, iframe, img, input, ins, kbd, label, legend, li, main, mark, menu, nav, object, ol, p, pre, q, samp, section, select, small, span, strong, sub, summary, sup, table, tbody, td, textarea, tfoot, th, thead, time, tr, ul, var, video {\n -ms-box-sizing: border-box;\n -o-box-sizing: border-box;\n box-sizing: border-box;\n margin: 0;\n padding: 0;\n border: 0;\n outline: 0;\n }\n\n html {\n font-size: 62.5%;\n }\n body {\n background-color: #fff;\n color: #333;\n font-size: 1.4em;\n line-height: 1.61;\n font-family: "Make It Sans IS24 Web",Verdana,"DejaVu Sans",Arial,Helvetica,sans-serif;\n }\n .page-wrapper {\n margin-left: auto;\n margin-right: auto;\n max-width: 1170px;\n background-color: #fff;\n }\n .grid {\n display: block;\n margin-right: 0;\n }\n .grid:after {\n display: table;\n clear: both;\n content: "";\n }\n .grid-item {\n display: block;\n float: left;\n vertical-align: top;\n text-align: left;\n }\n .header {\n border-bottom: 1px solid #e0e0e0;\n }\n .header .grid {\n padding-left: 70px;\n padding-right: 70px;\n padding-top: 14px;\n padding-bottom: 14px;\n }\n .header .logo {\n width: 50%;\n float: left;\n }\n .header .logo img {\n vertical-align: top;\n }\n .header .login-button {\n width: 50%;\n text-align: right;\n float: left;\n }\n .header .login-button a {\n padding-top: .35714286em;\n padding-bottom: .35714286em;\n min-width: 9.42857143em;\n font-family: "Make It Sans IS24 Web",Verdana,"DejaVu Sans",Arial,Helvetica,sans-serif;\n border-radius: 8px;\n background-color: #fff;\n display: inline-block;\n border: 1px solid #333333;\n padding: .64285714em 1.64285714em;\n font-weight: 600;\n font-size: 1.4rem;\n text-align: center;\n letter-spacing: .2px;\n line-height: 1.42857143em;\n white-space: nowrap;\n cursor: pointer;\n color: #333333;\n }\n .header .login-button a:link, .header .login-button a:visited, .header .login-button a:focus, .header .login-button a:hover {\n text-decoration: none;\n color: #333333;\n }\n .header .login-button a:hover {\n background-color: #eaeaea;\n }\n .main {\n clear: both;\n padding-top: 55px;\n max-width: 583px;\n margin-left: auto;\n margin-right: auto;\n text-align: center;\n }\n .main .headline {\n font-size: 4.0rem;\n font-weight: bold;\n letter-spacing: 0px;\n line-height: 4.8rem;\n text-align: center;\n }\n .main .main__logo {\n padding-top: 10px;\n text-align: center;\n }\n .main .main__logo img {\n height: 240px;\n width: 240px;\n vertical-align: top;\n }\n .main .main__part1 {\n padding-top: 11px;\n font-size: 1.4rem;\n font-weight: bold;\n letter-spacing: 0px;\n line-height: 20px;\n }\n .main .main__captcha {\n padding-top: 36px;\n padding-bottom: 36px;\n }\n .main .main_part2_header1 {\n font-weight: bold;\n }\n .main .main_part2_header2 {\n font-weight: bold;\n padding-top: 16px;\n }\n .main .main__list {\n padding-top: 14px;\n padding-bottom: 42px;\n }\n .main .main__list ul li {\n list-style-position: inside;\n }\n .footer {\n background: #f2f2f2;\n text-align: center;\n }\n .footer .footer-content {\n max-width: 583px;\n margin-left: auto;\n margin-right: auto;\n padding-top: 15px;\n padding-bottom: 6px;\n color: #757575;\n font-size: 1.2rem;\n line-height: 1.6rem;\n }\n .footer .footer-content div {\n padding-top: 20px;\n }\n .footer .footer-content div:first-child {\n padding-top: 0;\n }\n .footer .footer-content a, .footer .footer-content a:visited, .footer .footer-content a:link, .footer .footer-content a:focus, .footer .footer-content .legend {\n color: #757575;\n font-size: 1.2rem;\n line-height: 1.6rem;\n text-decoration: none;\n }\n .footer .footer-content a:hover {\n color: #757575;\n font-size: 1.2rem;\n line-height: 1.6rem;\n text-decoration: underline;\n }\n\n .g-recaptcha {\n display: inline-block;\n }\n\n @media (max-width: 668px) {\n .palm-hide {\n display: none;\n }\n .header .grid {\n padding-left: 16px;\n padding-right: 16px;\n padding-top: 8px;\n padding-bottom: 8px;\n }\n .main {\n padding-top: 32px;\n padding-left: 16px;\n padding-right: 16px;\n }\n .main .headline {\n font-size: 3.2rem;\n font-weight: normal;\n line-height: 4.0rem;\n }\n .main .main__logo img {\n height: 188px;\n width: 188px;\n }\n .footer .footer-content {\n padding-bottom: 32px;\n }\n\n }\n </style>\n\n <script>\n function showBlockPage() {\n console.log("showing block page");\n }\n setTimeout(showBlockPage, 10000);\n </script>\n <script type="text/javascript" src="/assets/immo-1-17" async defer></script>\n <script>window.captchaDescription = \'<p>Nachdem du das unten stehende CAPTCHA best\xc3\xa4tigt hast, wirst du sofort auf die von dir angefragte Seite weitergeleitet.</p>\';</script>\n <script src=\'https://www.google.com/recaptcha/api.js?hl=de\'></script>\n \n <script src="https://www.google.com/recaptcha/api.js" async defer></script>\n <script>\n function solvedCaptcha(payload) {\n const timeoutMs = 10000;\n protectionSubmitCaptcha("recaptcha", payload, timeoutMs, "3:KgR7QA9Zb+DPvlNK5NS0rQ==:Qc1ZWjV3jT+q6LyOv1htA/nmUoIWkcqqc41XxsIy6OWxHPb2t8XycRcMDV/0FGR3ax4IVPrl5qRmqm2RA8aHIuRNhZL1E6PJAkbg5IFVVBbtYVxxo59nosGtEY01RrnSuhs5hD0STKKPbDzntLLh60R0W7+6AzIUSQFKehVnUHiERpphMCXrg74Hg6N6sY75I4ZtEHJEhBRgO36V5uCHOQ==:q3Gl4XIOmWNJ6zYAqLlwwZDHJSgNwu0MGvGtik7zNvo=").then(function() {\n window.location.reload(true);\n });\n }\n </script>\n \n</head>\n<body>\n\n<div class="header">\n <div class="page-wrapper">\n <div class="grid">\n <div class="logo grid-item">\n <a href="https://www.immobilienscout24.de/">\n <img src="https://www.static-immobilienscout24.de/fro/imperva/0.0.1/is24-logo.svg" alt="ImmoScout24 Logo">\n </a>\n </div>\n <div class="login-button grid-item">\n <a href="https://www.immobilienscout24.de/geschlossenerbereich/start.html?source=meinkontodropdown-login">\n Anmelden <span class="palm-hide">/ Registrieren</span>\n </a>\n </div>\n </div>\n </div>\n</div>\n\n<div class="page-wrapper">\n\n<div class="main">\n <div class="headline">\n Ich bin kein Roboter\n </div>\n <div class="main__logo">\n <img src="https://www.static-immobilienscout24.de/fro/imperva/0.0.1/robot-logo.svg" alt="Roboter Logo">\n </div>\n<div class="main__part1">\n Du bist ein Mensch aus Fleisch und Blut? Entschuldige bitte, dann hat unser System dich f\xc3\xa4lschlicherweise als Roboter identifiziert. Um unsere Services weiterhin zu nutzen, l\xc3\xb6se bitte diesen kurzen Test.\n</div>\n\n <div class="main__captcha">\n \n <div class="container">\n \n <script>\n showBlockPage()\n document.writeln(window.captchaDescription || "<p>After completing the CAPTCHA below, you will immediately regain access to the site again.</p>");\n </script>\n <div class="g-recaptcha" data-sitekey="6LeaILIZAAAAALTgLZV1AQXPc2dAsLItNYJ8jVvB" data-callback="solvedCaptcha"></div>\n </div>\n </div>\n\n<div class="main__part2">\n\n <div class="main_part2_header1">Warum f\xc3\xbchren wir diese Sicherheitsma\xc3\x9fnahme durch?</div>\n<div class="main_part2_text1">Mit der Captcha-Methode stellen wir fest, dass du kein Roboter oder eine sch\xc3\xa4dliche Spam-Software bist. Damit sch\xc3\xbctzen wir unsere Webseite und die Daten unserer Nutzerinnen und Nutzer vor betr\xc3\xbcgerischen Aktivit\xc3\xa4ten.</div>\n\n <div class="main_part2_header2">Warum haben wir deine Suchanfragen blockiert?</div>\n <div class="main_part2_text2">Es kann verschiedene Gr\xc3\xbcnde haben, warum wir dich f\xc3\xa4lschlicherweise als Roboter identifiziert haben. M\xc3\xb6glicherweise</div>\n\n</div>\n<div class="main__list">\n<ul>\n <li>hast du die Cookies f\xc3\xbcr unsere Seite deaktiviert.</li>\n <li>hast du die Ausf\xc3\xbchrung von JavaScript deaktiviert.</li>\n <li>nutzt du ein Browser-Plugin eines Drittanbieters, beispielsweise einen Ad-Blocker.</li>\n<li>hast du in kurzer Zeit mehr Anfragen an unser System gestellt, als es \xc3\xbcblicherweise der Fall ist.</li>\n</ul>\n</div>\n\n\n</div>\n\n</div>\n\n<div class="footer">\n <div class="footer-content">\n\n\n <div>\n <a href="https://www.immobilienscout24.de/unternehmen.html">\xc3\x9cber uns</a> |\n <a href="https://www.immobilienscout24.de/kontakt.html">Kontakt & Hilfe</a> |\n <a href="https://www.immobilienscout24.de/unternehmen/karriere/">Karriere</a> |\n <a href="https://www.immobilienscout24.de/sitemap.html">Sitemap</a> |\n <a href="https://api.immobilienscout24.de">Developer</a> |\n <a href="https://www.immobilienscout24.de/unternehmen/mediendienst.html">Presseservice</a> |\n <a href="https://www.immobilienscout24.de/ratgeber/newsletter.html">Newsletter abonnieren</a> |\n <a href="https://www.immobilienscout24.de/impressum.html">Impressum</a> |\n <a href="https://www.immobilienscout24.de/agb.html">AGB\'s & Rechtliche Hinweise</a> |\n <a href="https://www.immobilienscout24.de/agb/verbraucherinformationen.html">Verbraucherinformationen</a> |\n <a href="https://www.immobilienscout24.de/agb/datenschutz.html">Datenschutz</a> |\n <a href="https://www.immobilienscout24.de/lp/Geodatenkodex.html">Datenschutz Kodex f\xc3\xbcr Geodatendienste</a> |\n <a href="https://sicherheit.immobilienscout24.de">Sicherheit</a>\n </div>\n <div>\n \n <a href="https://www.scout24media.com/">Werbung</a> |\n <a href="https://blog.immobilienscout24.de">Blog</a>\n \n </div>\n <div>\n <a href="https://www.immobilienscout24.de/">www.ImmobilienScout24.de</a>\n </div>\n <div class="legend">\n \xc2\xa9 Copyright 1999 - 2020 Immobilien Scout GmbH\n </div>\n </div>\n\n</div>\n\n</body>\n</html>\n'

KeyError: 'monthlyRate' on ImmoScout

Hi, first of all: thanks for writing this awesome bot!!

It was running yesterday, but seems to crash now with following error on URL

https://www.immobilienscout24.de/Suche/de/berlin/berlin/wohnung-mieten?numberofrooms=4.0-&price=-1500.0&livingspace=100.0-&geocodes=110000000911,110000000801,110000000703,110000000605,110000000704,110000000906,110000000907,110000001102,110000000201,110000000202,110000000301,110000000302,110000000601,110000000910,110000000701&pagenumber={0}

error log

[2020/10/19 16:34:48|config.py         |INFO    ]: Using config /Users/user/flathunter/config.yaml
[2020/10/19 16:34:50|flathunt.py       |DEBUG   ]: Settings from config: <flathunter.config.Config object at 0x108c51f10>
[2020/10/19 16:34:50|crawl_immobilienscout.py|DEBUG   ]: Got search URL https://www.immobilienscout24.de/Suche/de/berlin/berlin/wohnung-mieten?numberofrooms=4.0-&price=-1500.0&livingspace=100.0-&geocodes=110000000911,110000000801,110000000703,110000000605,110000000704,110000000906,110000000907,110000001102,110000000201,110000000202,110000000301,110000000302,110000000601,110000000910,110000000701&pagenumber={0}
[2020/10/19 16:34:51|abstract_crawler.py|DEBUG   ]: Google site key: <re.Match object; span=(49, 93), match='&k=6LeaILIZAAAAALTgLZV1AQXPc2dAsLItNYJ8jVvB&'>
[2020/10/19 16:34:56|abstract_crawler.py|DEBUG   ]: Captcha status: CAPCHA_NOT_READY
[2020/10/19 16:35:01|abstract_crawler.py|DEBUG   ]: Captcha status: CAPCHA_NOT_READY
[2020/10/19 16:35:06|abstract_crawler.py|DEBUG   ]: Captcha status: CAPCHA_NOT_READY
[2020/10/19 16:35:12|abstract_crawler.py|DEBUG   ]: Captcha status: CAPCHA_NOT_READY
[2020/10/19 16:35:17|abstract_crawler.py|DEBUG   ]: Captcha status: CAPCHA_NOT_READY
[2020/10/19 16:35:17|abstract_crawler.py|DEBUG   ]: Captcha promise: OK|03AGdBq26K-M4biKiyM1LweSOVKS1UuwZouTmow2O7P0f4P7yslyST6Fr7D1qwuHOWd63NU6GG_oQND0Vd1X0Z7MYlH8LO29WaHhgfxyPcoGo19TyERKtQZBxh0ktiSzWuuFs07dHyOw6sNKFKZQt3X1cDv5xJnqnEugqgIY26ZqVSg5zJvAQdEr1wIvaPTehCOQh-4Uh910LK7EnFzrIdc5qRnWVFdQ5RHuMw1sCCjUNTB_jhgCHax-oxG_ec33AMiXm_cMW-HtVnAcQ01ESpBMJe3Cjjhwd77BpbuWUmP3TQTIEObBTe_C3DMnIn_xVeH1B_yw8F1SCsLm_Eh43-4SQjsQZeJhem_odU1RdvVm8E3Os2YAkrEl1c7jOMY9NRcMr-i_kElZmppQjE5Ps1FhHMgd-NzaTwV5bTNmMhKh0W9I2XTP00eHd8FFbJBvusqSkKQmf-OXqMVI6YYS0dR1SOILgiYNsx8u-ppzAzLl2klKo_9DLdPRZY3ctk-MfmpODwEouRLIAaol28pxnRp8trERm2komELw
Traceback (most recent call last):
  File "flathunt.py", line 89, in <module>
    main()
  File "flathunt.py", line 86, in main
    launch_flat_hunt(config)
  File "flathunt.py", line 46, in launch_flat_hunt
    hunter.hunt_flats()
  File "/Users/user/flathunter/flathunter/hunter.py", line 42, in hunt_flats
    for expose in processor_chain.process(self.crawl_for_exposes(max_pages)):
  File "/Users/user/flathunter/flathunter/hunter.py", line 22, in crawl_for_exposes
    for searcher in self.config.searchers()
  File "/Users/user/flathunter/flathunter/hunter.py", line 23, in <listcomp>
    for url in self.config.get('urls', list())])
  File "/Users/user/flathunter/flathunter/abstract_crawler.py", line 136, in crawl
    return self.get_results(url, max_pages)
  File "/Users/user/flathunter/flathunter/crawl_immobilienscout.py", line 64, in get_results
    return self.get_entries_from_javascript()
  File "/Users/user/flathunter/flathunter/crawl_immobilienscout.py", line 98, in get_entries_from_javascript
    return self.get_entries_from_json(result_json)
  File "/Users/user/flathunter/flathunter/crawl_immobilienscout.py", line 102, in get_entries_from_json
    return [ self.extract_entry_from_javascript(entry.value) for entry in jsonpath_expr.find(json) ]
  File "/Users/user/flathunter/flathunter/crawl_immobilienscout.py", line 102, in <listcomp>
    return [ self.extract_entry_from_javascript(entry.value) for entry in jsonpath_expr.find(json) ]
  File "/Users/user/flathunter/flathunter/crawl_immobilienscout.py", line 113, in extract_entry_from_javascript
    'price': str(entry["monthlyRate"]),
KeyError: 'monthlyRate'

Any idea?
Thanks!

flathunter config help

Hello,
how do you set the config?

If i set urls on line 13 i got errors

can one show his config?
of course without his token

updated flathunter - not working anymore...

Hey there, i dont know if i am right here, but i have a problem running flathunter.
I had it up and runnning previousely (version from last october i think). now i tried to reuse it. i updated it via git, and now i get the following error when trying to run it:

./flathunter.py [2020/06/08 14:42:29|config.py |INFO ]: Using config /scripts/flathun ter/config.yaml [2020/06/08 14:42:29|idmaintainer.py |INFO ]: already processed: 0 Traceback (most recent call last): File "./flathunter.py", line 91, in <module> main() File "./flathunter.py", line 87, in main launch_flat_hunt(config) File "./flathunter.py", line 46, in launch_flat_hunt hunter.hunt_flats() File "/scripts/flathunter/flathunter/hunter.py", line 49, in hunt_flats for expose in self.config.get_filter().filter(results): File "/scripts/flathunter/flathunter/config.py", line 36, in get_filter if "excluded_titles" in filters_config: TypeError: argument of type 'NoneType' is not iterable johannes@mail:/scripts/flathunter$

is this en error, or did i miss something when installing the dependencies?

Thanks a lot!!!

different sleeping time for different searches

Hi,

is there any quick way to have multiple search intervals for different searches? (ebay-kleinanzeigen 600s, immoscout 1200s for example..)

Cheers

Supported websites

The config file sample says that only immoscout and wg-gesucht are supported. However, the README claims immowelt and ebay Kleinanzeigen work as well. Is that just an oversight, or am I missing something?

Thanks for sharing a great tool!

immobilienscout24: add missing cookies to the header

immobilienscout24 added cookies to their headers. Have to get them before proceeding with the crawling.

no_of_results although 2captcha setup

Hello,
First, thanks to all the amazing people contributing to this project!

I always got the problem when crawling ImmobilienScout24 that I get the no_of_results error.

Traceback (most recent call last):
File "flathunt.py", line 89, in
main()
File "flathunt.py", line 86, in main
launch_flat_hunt(config)
File "flathunt.py", line 46, in launch_flat_hunt
hunter.hunt_flats()
File "/Users/zoe/Desktop/flathunter-main/flathunter/hunter.py", line 42, in hunt_flats
for expose in processor_chain.process(self.crawl_for_exposes(max_pages)):
File "/Users/zoe/Desktop/flathunter-main/flathunter/hunter.py", line 22, in crawl_for_exposes
for searcher in self.config.searchers()
File "/Users/zoe/Desktop/flathunter-main/flathunter/hunter.py", line 23, in
for url in self.config.get('urls', list())])
File "/Users/zoe/Desktop/flathunter-main/flathunter/abstract_crawler.py", line 117, in crawl
return self.get_results(url, max_pages)
File "/Users/zoe/Desktop/flathunter-main/flathunter/crawl_immobilienscout.py", line 65, in get_results
while len(entries) < min(no_of_results, self.RESULT_LIMIT) and
UnboundLocalError: local variable 'no_of_results' referenced before assignment

So I tried to make the changes to the files like mentioned in the pull #61
But after changing the code, installing chromedriver and selenium and paying 2captcha, I still get the same error.
Is there anything else I need to do? I added the 2captcha key and the chromedriver path to the config file. I don't know what else there is to do...
Any help would be amazing.

Alternative senders to telegram

Hi there! Just discovered this project.

Would you accept a PR that adds an additional sender service? I don't use Telegram, so I've added Pushover support. I've already got this working locally. But I wanted to ask before making the PR.

I see that the web feature of flathunter makes use of telegram login. I don't think I would add Pushover support to that, though if desired I could.

Is 2captcha required for chrome driver usage?

Do I need 2captcha to have immoscout working with selenium/chromedriver?

Error from Telegram

I get this error:
2020-10-21T04:52:04.525696+00:00 app[worker.1]: [�[94m2020/10/21 04:52:04�[0m|�[94msender_telegram.py�[0m|�[93mERROR �[0m]: When sending bot message, we got status 429 with message: {'ok': False, 'error_code': 429, 'description': 'Too Many Requests: retry after 34', 'parameters': {'retry_after': 34}}
Is that normal?

AttributeError: 'NoneType' object has no attribute 'text' in crawl_wggesucht

I recently noticed that the crawler crashes on WG-Gesucht with the following error:

Traceback (most recent call last):

  File "C:\Users\X\Documents\Python\flathunter\flathunt.py", line 89, in <module>
    main()

  File "C:\Users\X\Documents\Python\flathunter\flathunt.py", line 86, in main
    launch_flat_hunt(config)

  File "C:\Users\X\Documents\Python\flathunter\flathunt.py", line 46, in launch_flat_hunt
    hunter.hunt_flats()

  File "C:\Users\X\Documents\Python\flathunter\flathunter\hunter.py", line 42, in hunt_flats
    for expose in processor_chain.process(self.crawl_for_exposes(max_pages)):

  File "C:\Users\X\Documents\Python\flathunter\flathunter\default_processors.py", line 30, in process_expose
    expose['address'] = searcher.load_address(url)

  File "C:\Users\X\Documents\Python\flathunter\flathunter\crawl_wggesucht.py", line 81, in load_address
    .find("a", {"href": "#"}).text.strip().split())

AttributeError: 'NoneType' object has no attribute 'text'

This part in the load_address() method in crawl_wggesucht.py causes the error:

        address = ' '.join(response.find('div', {"class": "col-sm-4 mb10"})
                           .find("a", {"href": "#"}).text.strip().split())

I think the problem is the .find("a", {"href": "#"}) part returns a NoneType reflecting that nothing was found. Maybe the implementation on WG-Gesucht has changed slightly? To me it looks like the href is now "#mapContainer". So if I change the problematic line to

        address = ' '.join(response.find('div', {"class": "col-sm-4 mb10"})
                           .find("a", {"href": "#mapContainer"}).text.strip().split())

things seem to work fine again. Can anyone confirm this?

No Telegram messages on Google Cloud deployment

First of all, thank you all for this great project.

I have deployed the app as a cron job to Google Cloud and when accessing via the webpage, I do see new listings coming in. However I have not received a message from the Telegram bot yet. I ran the program locally first and it worked fine and sent me Telegram messages. So I think my config.yaml should be fine. The Google Cloud dashboard does not show any errors and that the /hunt url was triggered 149 times in the last 24 hours.

Any clue of where my error might be?

flathunters / flathunter Goto Github PK

flathunter's People

Contributors

Stargazers

Watchers

Forkers

flathunter's Issues

Recommend Projects

Recommend Topics

Recommend Org