flathunters / flathunter Goto Github PK
View Code? Open in Web Editor NEWThis project forked from mordax7/flathunter
A bot to help people with their rental real-estate search. ๐ ๐ค
License: GNU Affero General Public License v3.0
This project forked from mordax7/flathunter
A bot to help people with their rental real-estate search. ๐ ๐ค
License: GNU Affero General Public License v3.0
Hello, the verbose mode tells me that I get an index error for immobilienscout. I am not that versatile with coding and can't fix this myself. Can anyone help me?
[2020/08/19 12:44:23|flathunt.py |DEBUG ]: Settings from config: <flathunter.config.Config object at 0x10a500e10>
[2020/08/19 12:44:23|crawl_immobilienscout.py|DEBUG ]: Got search URL https://www.immobilienscout24.de/Suche/shape/wohnung-mieten?shape=aWB3ZEhpZWVlQWZ4QGlhQGR7QHllQXp2QGtoQ3pBc2VAeWJAd3ZAbUBzZUB0b0BlZkNvaUF7YEltdUB9eEBxfEBkUGNxQGx7QmdaYF1xfEB6S298QGJoQnt0QWRQbUpkbEF_Um5gR2JGeGdAZlp4Y0JqdUBoakJobEFfTg..&numberofrooms=2.0-&price=-1300.0&livingspace=40.0-&sorting=2&pagenumber={0}
[2020/08/19 12:44:24|crawl_immobilienscout.py|DEBUG ]: Index Error occurred
[2020/08/19 12:44:24|crawl_immobilienscout.py|DEBUG ]: []
[2020/08/19 12:44:24|crawl_immobilienscout.py|DEBUG ]: extracted: 0
I think the issue is the provided search URL. If I use it in a browser, it does not lead anywhere. The '&pagenumber={0}' part at the end comes from the crawl_immobilienscout.py file, the URL in my config does not have this part. I tried a few things but all I got where errors.
Thanks guys!
I'm happy to see that 2captcha was integrated into flathunters since immobilienscout24.de is triggering the captcha after 2x hits for a "selbst gezeichnetes Suchgebiet" link.
Unfortunately, I'm not able to get it completely up and running.
It seems like the captcha was successfully solved, it also shows up in the 2captcha statistics.
This is the resulting log:
[2020/10/03 23:58:50|config.py |INFO ]: Using config /app/config.yaml
[2020/10/03 23:58:51|flathunt.py |DEBUG ]: Settings from config: <flathunter.config.Config object at 0x7fdb050b43a0>
[2020/10/03 23:58:51|crawl_immobilienscout.py|DEBUG ]: Got search URL https://www.immobilienscout24.de/Suche/shape/wohnung-mieten?shape=fWZ0ZEh3dGtlQXhWZWtAblJlQGBUdUFoaUB3Q0pvc0FtaUBjfUFtW19gQHNOakhvRmJBdUNqR3lKSXNFa0FxSHdKeWJAZXZBfVZ7cEB3akBpWnFCe29AbWJAfV1jUHxNZ1h4fkBtXW1iQG1iQHNWfU5vR2dLYnRAa01waUB0RGxxQG1HYHBAdEJmX0B7Q2p0QGtFbGtBfGVAYmpAblhlYEB6X0BkRnZlQWhQYF52ZkBmbkFqdEBoXHpE&numberofrooms=2.0-&price=-1300.0&livingspace=50.0-&sorting=2&enteredFrom=result_list&pagenumber={0}
[2020/10/03 23:58:53|abstract_crawler.py|DEBUG ]: Google site key: <re.Match object; span=(49, 93), match='&k=6LeaILIZAAAAALTgLZV1AQXPc2dAsLItNYJ8jVvB&'>
[2020/10/03 23:58:58|abstract_crawler.py|DEBUG ]: Captcha status: CAPCHA_NOT_READY
[2020/10/03 23:59:03|abstract_crawler.py|DEBUG ]: Captcha status: CAPCHA_NOT_READY
[2020/10/03 23:59:08|abstract_crawler.py|DEBUG ]: Captcha status: CAPCHA_NOT_READY
[2020/10/03 23:59:13|abstract_crawler.py|DEBUG ]: Captcha status: CAPCHA_NOT_READY
[2020/10/03 23:59:19|abstract_crawler.py|DEBUG ]: Captcha status: CAPCHA_NOT_READY
[2020/10/03 23:59:24|abstract_crawler.py|DEBUG ]: Captcha status: CAPCHA_NOT_READY
[2020/10/03 23:59:29|abstract_crawler.py|DEBUG ]: Captcha status: CAPCHA_NOT_READY
[2020/10/03 23:59:34|abstract_crawler.py|DEBUG ]: Captcha status: CAPCHA_NOT_READY
[2020/10/03 23:59:39|abstract_crawler.py|DEBUG ]: Captcha status: CAPCHA_NOT_READY
[2020/10/03 23:59:44|abstract_crawler.py|DEBUG ]: Captcha status: CAPCHA_NOT_READY
[2020/10/03 23:59:49|abstract_crawler.py|DEBUG ]: Captcha status: CAPCHA_NOT_READY
[2020/10/03 23:59:54|abstract_crawler.py|DEBUG ]: Captcha status: CAPCHA_NOT_READY
[2020/10/03 23:59:54|abstract_crawler.py|DEBUG ]: Captcha promise: OK|03AGdBq25VNbxPhhj-ezyN11xsib3B5QPdtKJo9ZUANoJCXhNG6e3juFSmFIrHvZYVXO_63cEOhOsl9vxmUDdwqfQKj858qVkI-zSoT7idq99rB6uoV0z9WsX4D4TeQdjwlFozEIrIgZ5u12XcAUIfGAGSDrkA3xgdUtwWOuk8swEiW7u51Y_sf4r3GMX0UMgX0KksNv238L9eM26fEa3hDPMr0raK6vpsFUOPW0NxH-vjYTQ4ZDp_LL-8auv9FGn4bfkQTzNpDE71Nn-CGw5iVyEZc_mRupXhrHwD9FENIOokXeB1Iwm6v1pccKEvsrnsoJ5rmkBP5wr2YXDvld8p7VufhtwAk8GUaipG0kEiwpBEOrFKQc1L7t-qp4gZcITfQvjyqOlxEdo8tPTfIt4tIVcZrlGvGOql0qeR57d_X8w5oISKzsMSP6glD3RMy6GeSNVzWpaqfq861qOiDy9nqeQhWyRmBrhJtDGbqBOMQ1EtWKnPQ3mjU98
[2020/10/03 23:59:58|crawl_immobilienscout.py|DEBUG ]: [123231487, 123210362, 123202495, 123201463, 48470193, 123184168, 123145073, 123020022, 123083698, 123063242, 123054184, 123041195, 123030893, 122974183, 122960443, 122923251, 122915078, 119193192, 122543400, 122810740]
Traceback (most recent call last):
File "/app/flathunter/crawl_immobilienscout.py", line 134, in extract_data
image = image_tag["src"]
File "/usr/local/lib/python3.8/site-packages/bs4/element.py", line 992, in __getitem__
return self.attrs[key]
KeyError: 'src'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "flathunt.py", line 89, in <module>
main()
File "flathunt.py", line 86, in main
launch_flat_hunt(config)
File "flathunt.py", line 46, in launch_flat_hunt
hunter.hunt_flats()
File "/app/flathunter/hunter.py", line 42, in hunt_flats
for expose in processor_chain.process(self.crawl_for_exposes(max_pages)):
File "/app/flathunter/hunter.py", line 21, in crawl_for_exposes
return chain(*[searcher.crawl(url, max_pages)
File "/app/flathunter/hunter.py", line 21, in <listcomp>
return chain(*[searcher.crawl(url, max_pages)
File "/app/flathunter/abstract_crawler.py", line 121, in crawl
return self.get_results(url, max_pages)
File "/app/flathunter/crawl_immobilienscout.py", line 64, in get_results
entries = self.extract_data(soup)
File "/app/flathunter/crawl_immobilienscout.py", line 136, in extract_data
image = image_tag["data-lazy-src"]
File "/usr/local/lib/python3.8/site-packages/bs4/element.py", line 992, in __getitem__
return self.attrs[key]
KeyError: 'data-lazy-src'
Great repo btw! Thx for the good work!
(flathunter-main-tulP8G4X) pi@raspberrypi:~/shared/flathunter-main $ python3 flathunt.py --config config.yaml
[2021/03/11 00:23:12|config.py |INFO ]: Using config config.yaml
Traceback (most recent call last):
File "flathunt.py", line 89, in <module>
main()
File "flathunt.py", line 68, in main
config = Config(config_handle.name)
File "/home/pi/shared/flathunter-main/flathunter/config.py", line 28, in __init__
self.config = yaml.safe_load(file)
File "/home/pi/.local/share/virtualenvs/flathunter-main-tulP8G4X/lib/python3.7/site-packages/yaml/__init__.py", line 162, in safe_load
return load(stream, SafeLoader)
File "/home/pi/.local/share/virtualenvs/flathunter-main-tulP8G4X/lib/python3.7/site-packages/yaml/__init__.py", line 114, in load
return loader.get_single_data()
File "/home/pi/.local/share/virtualenvs/flathunter-main-tulP8G4X/lib/python3.7/site-packages/yaml/constructor.py", line 41, in get_single_data
node = self.get_single_node()
File "/home/pi/.local/share/virtualenvs/flathunter-main-tulP8G4X/lib/python3.7/site-packages/yaml/composer.py", line 36, in get_single_node
document = self.compose_document()
File "/home/pi/.local/share/virtualenvs/flathunter-main-tulP8G4X/lib/python3.7/site-packages/yaml/composer.py", line 55, in compose_document
node = self.compose_node(None, None)
File "/home/pi/.local/share/virtualenvs/flathunter-main-tulP8G4X/lib/python3.7/site-packages/yaml/composer.py", line 84, in compose_node
node = self.compose_mapping_node(anchor)
File "/home/pi/.local/share/virtualenvs/flathunter-main-tulP8G4X/lib/python3.7/site-packages/yaml/composer.py", line 133, in compose_mapping_node
item_value = self.compose_node(node, item_key)
File "/home/pi/.local/share/virtualenvs/flathunter-main-tulP8G4X/lib/python3.7/site-packages/yaml/composer.py", line 64, in compose_node
if self.check_event(AliasEvent):
File "/home/pi/.local/share/virtualenvs/flathunter-main-tulP8G4X/lib/python3.7/site-packages/yaml/parser.py", line 98, in check_event
self.current_event = self.state()
File "/home/pi/.local/share/virtualenvs/flathunter-main-tulP8G4X/lib/python3.7/site-packages/yaml/parser.py", line 449, in parse_block_mapping_value
if not self.check_token(KeyToken, ValueToken, BlockEndToken):
File "/home/pi/.local/share/virtualenvs/flathunter-main-tulP8G4X/lib/python3.7/site-packages/yaml/scanner.py", line 115, in check_token
while self.need_more_tokens():
File "/home/pi/.local/share/virtualenvs/flathunter-main-tulP8G4X/lib/python3.7/site-packages/yaml/scanner.py", line 152, in need_more_tokens
self.stale_possible_simple_keys()
File "/home/pi/.local/share/virtualenvs/flathunter-main-tulP8G4X/lib/python3.7/site-packages/yaml/scanner.py", line 292, in stale_possible_simple_keys
"could not find expected ':'", self.get_mark())
yaml.scanner.ScannerError: while scanning a simple key
in "config.yaml", line 27, column 1
could not find expected ':'
in "config.yaml", line 28, column 1
It would be nice if the 3 PR that I made counted towards the goal for my shirt (never got one, and this year
the rules are very strict)
Henceforth, I think it would be nice if the repository added the tag.
The link above shows how.
If you are not familiar with what Hacktoberfest is, you can check out a description here: https://hacktoberfest.digitalocean.com/
Thanks for the consideration!
The crawler of immowelt will only search for matching flats in the first 4 entries of the result list.
All further results are loaded asynchronously on immowelt. The crawler does not load these entries. Usually they would be located in a <div id="listItemWrapperAsync" ... >
after loading.
HI,
first of all, do i understand it correctly, that i can use the new captacha resolv methode also on a headless system?
i am on a debian system and downloaded chromedriver from here (https://chromedriver.storage.googleapis.com/index.html?path=87.0.4280.20/) wich seems to work:
Starting ChromeDriver 87.0.4280.20 (c99e81631faa0b2a448e658c0dbd8311fb04ddbd-refs/branch-heads/4280@{#355}) on port 9515
Only local connections are allowed.
Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
ChromeDriver was started successfully.
i also installed chromium. when i start flathunter now i get the following error message:
python3 flathunt.py
[2020/10/16 13:24:52|config.py |INFO ]: Using config /home/user/flathunter/config.yaml
Traceback (most recent call last):
File "flathunt.py", line 89, in <module>
main()
File "flathunt.py", line 68, in main
config = Config(config_handle.name)
File "/home/user/flathunter/flathunter/config.py", line 29, in __init__
self.__searchers__ = [CrawlImmobilienscout(self),
File "/home/user/flathunter/flathunter/crawl_immobilienscout.py", line 42, in __init__
self.driver = self.configure_driver(self.driver_executable_path, self.driver_arguments)
File "/home/user/flathunter/flathunter/abstract_crawler.py", line 51, in configure_driver
driver = webdriver.Chrome(executable_path=driver_path, options=chrome_options)
File "/usr/local/lib/python3.7/dist-packages/selenium/webdriver/chrome/webdriver.py", line 73, in __init__
self.service.start()
File "/usr/local/lib/python3.7/dist-packages/selenium/webdriver/common/service.py", line 98, in start
self.assert_process_still_running()
File "/usr/local/lib/python3.7/dist-packages/selenium/webdriver/common/service.py", line 111, in assert_process_still_running
% (self.path, return_code)
selenium.common.exceptions.WebDriverException: Message: Service /usr/bin/chromium unexpectedly exited. Status code was: 1
any idea what to do here?
Thanks a lot!!
The installation of the dependencies as specified in requirements.txt fails on Ubuntu 18.04.5 LTS as well as in docker FROM python:3
.
โถ pip3 install -r requirements.txt
Requirement already satisfied: astroid==2.4.2 in ./lib/python3.6/site-packages (from -r requirements.txt (line 2)) (2.4.2)
Requirement already satisfied: async-timeout==3.0.1 in ./lib/python3.6/site-packages (from -r requirements.txt (line 3)) (3.0.1)
Requirement already satisfied: beautifulsoup4==4.8.1 in ./lib/python3.6/site-packages (from -r requirements.txt (line 5)) (4.8.1)
Requirement already satisfied: bs4==0.0.1 in ./lib/python3.6/site-packages (from -r requirements.txt (line 6)) (0.0.1)
Requirement already satisfied: CacheControl==0.12.6 in ./lib/python3.6/site-packages (from -r requirements.txt (line 7)) (0.12.6)
Requirement already satisfied: certifi==2019.9.11 in ./lib/python3.6/site-packages (from -r requirements.txt (line 9)) (2019.9.11)
Requirement already satisfied: chardet==3.0.4 in ./lib/python3.6/site-packages (from -r requirements.txt (line 11)) (3.0.4)
Requirement already satisfied: click==7.1.2 in ./lib/python3.6/site-packages (from -r requirements.txt
(line 12)) (7.1.2)
Requirement already satisfied: colorama==0.4.3 in ./lib/python3.6/site-packages (from -r requirements.txt (line 14)) (0.4.3)
Requirement already satisfied: decorator==4.4.2 in ./lib/python3.6/site-packages (from -r requirements.txt (line 16)) (4.4.2)
Requirement already satisfied: Flask==1.1.2 in ./lib/python3.6/site-packages (from -r requirements.txt
(line 18)) (1.1.2)
Requirement already satisfied: Flask-API==2.0 in ./lib/python3.6/site-packages (from -r requirements.txt (line 19)) (2.0)
Requirement already satisfied: google-auth-httplib2==0.0.4 in ./lib/python3.6/site-packages (from -r requirements.txt (line 23)) (0.0.4)
Requirement already satisfied: googleapis-common-protos==1.52.0 in ./lib/python3.6/site-packages (from
-r requirements.txt (line 29)) (1.52.0)
Requirement already satisfied: idna==2.8 in ./lib/python3.6/site-packages (from -r requirements.txt (line 32)) (2.8)
Requirement already satisfied: iniconfig==1.1.1 in ./lib/python3.6/site-packages (from -r requirements.txt (line 34)) (1.1.1)
Requirement already satisfied: itsdangerous==1.1.0 in ./lib/python3.6/site-packages (from -r requirements.txt (line 36)) (1.1.0)
Requirement already satisfied: jsonpath-ng==1.5.2 in ./lib/python3.6/site-packages (from -r requirements.txt (line 38)) (1.5.2)
Requirement already satisfied: lazy-object-proxy==1.4.3 in ./lib/python3.6/site-packages (from -r requirements.txt (line 39)) (1.4.3)
Requirement already satisfied: MarkupSafe==1.1.1 in ./lib/python3.6/site-packages (from -r requirements.txt (line 41)) (1.1.1)
Requirement already satisfied: mccabe==0.6.1 in ./lib/python3.6/site-packages (from -r requirements.txt (line 42)) (0.6.1)
Requirement already satisfied: more-itertools==8.4.0 in ./lib/python3.6/site-packages (from -r requirements.txt (line 44)) (8.4.0)
Requirement already satisfied: multidict==4.7.6 in ./lib/python3.6/site-packages (from -r requirements.txt (line 46)) (4.7.6)
ERROR: Could not find a version that satisfies the requirement pkg-resources==0.0.0
ERROR: No matching distribution found for pkg-resources==0.0.0
The dependency pkg-resources==0.0.0
is the culprit, it seems this is a known bug that happens when freezing the requirements on Ubuntu, see link.
Removing pkg-resources==0.0.0
from requirements.txt seems to fix the issue.
Hello,
I have problems with the final step
$ gcloud app deploy cron.yaml
I get the error message:
ERROR: (gcloud.app.deploy) An error occurred while parsing file: [/Users/myname/Desktop/flathunter-main/cron.yaml]
Unexpected attribute 'loop' for object of type CronInfoExternal.
in "/Users/myname/Desktop/flathunter-main/cron.yaml", line 9, column 5
What could be the reason?
Many thanks in advance!
The results of WG Gesucht differ from the entities shown in the browser. The URL structure from beautiful soup is also different from the one shown in the browser.
Hi,
i am a beginner in programming. At this point everything is working fine but is it possible to configure two different botยดs to message? In my example just the second Bot is sending the messages. How do i have to arrange the lines for two bot support? An example would be great.
telegram:
bot_token: 132xxxxxx:xxxxxxxxxxxxxxxxxxxxxxx
receiver_ids:
- 306xxxxx
bot_token: 122xxxxxxxx-xxxxxxxxxxxxxxxxxxxx
receiver_ids:
- 102xxxxx
I am using the url https://www.immobilienscout24.de/Suche/de/nordrhein-westfalen/koeln/wohnung-mieten?sorting=2
and since a few hours, I am just getting a long printout but not any results sent via telgramm bot anymore. Ebay Kleinanzeigen is still working fine.
If I can provide more info, please let me know.
Hi,,
this is more a question than an issue. Is there any way to use flathunter also for other searches on immobilienscout24.de?
So i realized, that it workes perfectly now on "Wohnung mieten" or "Haus mieten" but then it does not work on "Haus kaufen"
(example search string: https://www.immobilienscout24.de/Suche/de/berlin/berlin/haus-kaufen?enteredFrom=one_step_search)
python3 flathunter.py [2020/06/08 16:27:04|config.py |INFO ]: Using config /scripts/flathunter/config.yaml [2020/06/08 16:27:04|idmaintainer.py |INFO ]: already processed: 10 Traceback (most recent call last): File "flathunter.py", line 91, in <module> main() File "flathunter.py", line 87, in main launch_flat_hunt(config) File "flathunter.py", line 46, in launch_flat_hunt hunter.hunt_flats() File "/scripts/flathunter/flathunter/hunter.py", line 38, in hunt_flats results = searcher.get_results(url) File "/scripts/flathunter/flathunter/crawl_immobilienscout.py", line 33, in get_results 0].text) ValueError: invalid literal for int() with base 10: '1.228 '
neither on "Grundstรผck kaufen"
(search string: https://www.immobilienscout24.de/Suche/de/grundstueck-kaufen?plotarea=-5000.0&price=-10000.0&pricetype=buy&enteredFrom=result_list)
python3 flathunter.py [2020/06/08 16:28:51|config.py |INFO ]: Using config /scripts/flathunter/config.yaml [2020/06/08 16:28:51|idmaintainer.py |INFO ]: already processed: 10 Traceback (most recent call last): File "flathunter.py", line 91, in <module> main() File "flathunter.py", line 87, in main launch_flat_hunt(config) File "flathunter.py", line 46, in launch_flat_hunt hunter.hunt_flats() File "/scripts/flathunter/flathunter/hunter.py", line 38, in hunt_flats results = searcher.get_results(url) File "/scripts/flathunter/flathunter/crawl_immobilienscout.py", line 37, in get_results entries = self.extract_data(soup) File "/scripts/flathunter/flathunter/crawl_immobilienscout.py", line 101, in extract_data entries.append(details) UnboundLocalError: local variable 'details' referenced before assignment
would there be a quick fix for that, or is this just not planned or wanted?
Thank you so much in advance!
Cheers
Hi,
would you be able to help me setting up flathunter?
Running pytest shows 21 failed, 52 passed.
Thanks!
========================================================================================== short test summary info ===========================================================================================
FAILED test/test_config.py::ConfigTest::test_defaults_fields - AssertionError: 'c:\\users\\xx\\documents\\flathunter\\flathunter-main' != 'C:\\Users\\xx\\Documents\\Flathunter\\flathunter-main'
FAILED test/test_config.py::ConfigTest::test_loads_config - yaml.parser.ParserError: while parsing a block mapping
FAILED test/test_config.py::ConfigTest::test_loads_config_at_file - PermissionError: [Errno 13] Permission denied: 'C:\\Users\\xx\\AppData\\Local\\Temp\\tmpe8hnlt4t'
FAILED test/test_crawl_ebaykleinanzeigen.py::test_process_expose_fetches_details - ValueError: Invalid format string
FAILED test/test_crawl_immobilienscout.py::test_crawl_works - assert 0 > 0
FAILED test/test_crawl_immobilienscout.py::test_process_expose_fetches_details - assert 0 > 0
FAILED test/test_crawl_immowelt.py::test_process_expose_fetches_details - ValueError: Invalid format string
FAILED test/test_crawl_wggesucht.py::WgGesuchtCrawlerTest::test - AssertionError: False is not true : URL should be an apartment link
FAILED test/test_statistics_view.py::test_statistics_view - sqlite3.OperationalError: unable to open database file
FAILED test/test_web_interface.py::test_get_index - sqlite3.OperationalError: unable to open database file
FAILED test/test_web_interface.py::test_get_index_with_exposes - sqlite3.OperationalError: unable to open database file
FAILED test/test_web_interface.py::test_hunt_with_users - sqlite3.OperationalError: unable to open database file
FAILED test/test_web_interface.py::test_hunt_via_post - sqlite3.OperationalError: unable to open database file
FAILED test/test_web_interface.py::test_multi_user_hunt_via_post - sqlite3.OperationalError: unable to open database file
FAILED test/test_web_interface.py::test_hunt_via_post_with_filters - sqlite3.OperationalError: unable to open database file
FAILED test/test_web_interface.py::test_render_index_after_login - sqlite3.OperationalError: unable to open database file
FAILED test/test_web_interface.py::test_do_not_send_messages_if_notifications_disabled - sqlite3.OperationalError: unable to open database file
FAILED test/test_web_interface.py::test_toggle_notification_status - sqlite3.OperationalError: unable to open database file
FAILED test/test_web_interface.py::test_update_filters - sqlite3.OperationalError: unable to open database file
FAILED test/test_web_interface.py::test_update_filters_not_logged_in - sqlite3.OperationalError: unable to open database file
FAILED test/test_web_interface.py::test_index_logged_in_with_filters - sqlite3.OperationalError: unable to open database file
Since a few days, I am getting the following error message quite often.
[2020/07/29 08:13:51|crawl_ebaykleinanzeigen.py|ERROR ]: Got response (503): b'<!DOCTYPE html><html><head><title>Error Page</title><style type="text/css">html{font-family:\'Helvetica Neue\',Helvetica,Arial,sans-serif;font-size:1em}.center-box{margin: 20% auto auto auto;width: 50%;border: 1px solid #dcdcdc;padding: 1em;}\n</style><title>Security Violation (503)</title></head></head><body>\n<div class="center-box">\n <h3>www.ebay-kleinanzeigen.de | Access denied (403)</h3>\n <h4>Current session has been terminated.</h2>\n <p>For further information, do not hesitate to contact us.</p>\n <p>Ref: <span id="addr">2003:f7:bf40:9c00:48ec:d655:a809:f7e7</span> <span id="time">1596003231</span></p>\n</div></body><script>document.getElementById("time").innerHTML = (new Date()).toISOString()</script>\n</html>\n'
Restarting flathunt.py
mostly helps to stop getting these messages.
Any ideas or hints? This time, I also have pulled the latest code before opening an issue ๐
I pulled freshly and added the 2Captcha Service but not i get this:
File "flathunt.py", line 89, in <module>
main()
File "flathunt.py", line 86, in main
launch_flat_hunt(config)
File "flathunt.py", line 46, in launch_flat_hunt
hunter.hunt_flats()
File "/home/user/janhuntboi/flathunter/hunter.py", line 42, in hunt_flats
for expose in processor_chain.process(self.crawl_for_exposes(max_pages)):
File "/home/user/janhuntboi/flathunter/hunter.py", line 22, in crawl_for_exposes
for searcher in self.config.searchers()
File "/home/user/janhuntboi/flathunter/hunter.py", line 23, in <listcomp>
for url in self.config.get('urls', list())])
File "/home/user/janhuntboi/flathunter/abstract_crawler.py", line 136, in crawl
return self.get_results(url, max_pages)
File "/home/user/janhuntboi/flathunter/crawl_immobilienscout.py", line 63, in get_results
return self.get_entries_from_javascript()
File "/home/user/janhuntboi/flathunter/crawl_immobilienscout.py", line 105, in get_entries_from_javascript
return [ self.extract_entry_from_javascript(entry) for entry in entry_list ]
File "/home/user/janhuntboi/flathunter/crawl_immobilienscout.py", line 105, in <listcomp>
return [ self.extract_entry_from_javascript(entry) for entry in entry_list ]
File "/home/user/janhuntboi/flathunter/crawl_immobilienscout.py", line 111, in extract_entry_from_javascript
'image': entry["resultlist.realEstate"]["galleryAttachments"]["attachment"][0]["@xlink.href"] if "galleryAttachments" in entry["resultlist.realEstate"] else "https://www.static-immobilienscout24.de/statpic/placeholder_house/496c95154de31a357afa978cdb7f15f0_placeholder_medium.png",
WebHunter doesn't send messages to users listed in receiver_ids who don't have settings stored in Firestore
Hello,
I'm an absolute python newbie so please have mercy.
Python 3.8.4 is installed on my Win 10 Pro client.
When executing
pipenv install
I get the following errors:
Warning: Python 3.7 was not found on your system... Neither 'pyenv' nor 'asdf' could be found to install Python. You can specify specific versions of Python with: $ pipenv --python path\to\python
Thanks a lot for your help.
Hi People,
I Try to start a docker contatiner on a Ubuntu Server without GUI. the programm Crash if i enable anticaptcha, ERROR:
used IMAGE: oyzoursky/python-chromedriver:3.8-selenium
Traceback (most recent call last):
File "flathunt.py", line 89, in
main()
File "flathunt.py", line 68, in main
config = Config(config_handle.name)
File "/app/flathunter/config.py", line 29, in init
self.searchers = [CrawlImmobilienscout(self),
File "/app/flathunter/crawl_immobilienscout.py", line 43, in init
self.driver = self.configure_driver(self.driver_executable_path, self.driver_arguments)
File "/app/flathunter/abstract_crawler.py", line 51, in configure_driver
driver = webdriver.Chrome(executable_path=driver_path, options=chrome_options)
File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/chrome/webdriver.py", line 76, in init
RemoteWebDriver.init(
File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 157, in init
self.start_session(capabilities, browser_profile)
File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 252, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally
(unknown error: DevToolsActivePort file doesn't exist)
(The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
used image: oyzoursky/python-chromedriver:3.8 :
raceback (most recent call last):
File "flathunt.py", line 89, in
main()
File "flathunt.py", line 68, in main
config = Config(config_handle.name)
File "/app/flathunter/config.py", line 29, in init
self.searchers = [CrawlImmobilienscout(self),
File "/app/flathunter/crawl_immobilienscout.py", line 43, in init
self.driver = self.configure_driver(self.driver_executable_path, self.driver_arguments)
File "/app/flathunter/abstract_crawler.py", line 51, in configure_driver
driver = webdriver.Chrome(executable_path=driver_path, options=chrome_options)
File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/chrome/webdriver.py", line 76, in init
RemoteWebDriver.init(
File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 157, in init
self.start_session(capabilities, browser_profile)
File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 252, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally
(unknown error: DevToolsActivePort file doesn't exist)
(The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
can anyone help me ?
I don't know if I'm missing something here. But essentially, it seems that the test test_crawl_wggesucht.py fails on this assertion:
self.assertTrue(entries[0]['url'].startswith("https://www.wg-gesucht.de/wohnungen"), u"URL should be an apartment link")
and the reason it fails is because the urls that the crawler generates contain a double slash after the domain, e.g.:
'https://www.wg-gesucht.de//wohnungen-in-Berlin-Friedrichshain.8598343.html'
Apparently when the URL is created (code below), the href
in the a
element is assumed to come without a leading slash, but it does in fact come with a leading slash. I guess the wg-gesucht site changed this at some point?
base_url = 'https://www.wg-gesucht.de/'
for row in existing_findings:
title_row = row.find('h3', {"class": "truncate_title"})
title = title_row.text.strip()
url = base_url + title_row.find('a')['href']
The functionality still works because the URL with the double slash works just the same, but the tests fail.
If this is confirmed to be an issue, I'd be happy to provide a PR to fix it.
Hi,
last year i was using an old version of flathunter. The hunter.py looked way different. At that time i changed this file a little bit to fire up my own shell script on every single id the flathunter found. It looked like this:
...............
for expose in results:
# check if already processed
if expose['id'] in processed:
continue
# WohnungsMailer abfeuern
ident = expose['id']
subprocess.call(['/home/pi/Desktop/flathunter/flathunter/mailer.sh', str(ident)])
self.__log__.info('New offer: ' + expose['title'])
# to reduce traffic, some addresses need to be loaded on demand
...............
how could i integrate this now?
Thanks a lot!!!
Immoscout stopped working. The crawler gets following message:
Hello, I got the program running and after the first run I also received all the available flats via Telegram. But now it seems that the program isn't looping, so it only works every time I start it manually. Is there a way to check if it's running properly? I don't get any updates in the console, also after waiting 5 minutes, which is the looptime set in the config file.
The app is working great - just 1 small onboarding issue that confused me at first.
Platform: iOS latest version.
If you click the initial flathunter.codders.io link, it opens an in-app browser which does not save the login credentials, resulting in repeated requests to login.
Login only works for me when I click โopen in safariโ, then โlog in with telegramโ.
I get this message when using the chromedriver in conjunction with the 2captcha service. I even set the sleeping_time to 650.
Got response (418): b'<!DOCTYPE html>\n<html>\n <head>\n <title>418 You look like a robot (1). If you think you are not, contact us: [email protected]\\n \\n54.155.47.221</title>\n </head>\n <body>\n <h1>Error 418 You look like a robot (1). If you think you are not, contact us: [email protected]\\n \\n54.155.47.221</h1>\n <p>You look like a robot (1). If you think you are not, contact us: [email protected]\\n \\n54.155.47.221</p>\n <h3>Guru Meditation:</h3>\n <p>XID: 253749141</p>\n <hr>\n <p>Varnish cache server</p>\n </body>\n</html>\n'
Hi,
since flathunter does not work for immoscout ( #45), i tried to let it hunt for flats on ebay kleinanzeigen, but realized, that filters are not recognized.
Is this how it should be? Any other way to filter out offers with certain words?
Cheers and thanks!!!
Unfortunately I have a problem to get Flathunter up and running. When I run "pytest" I get the following error message:
======================================================================== ERRORS =========================================================================
_________________________________________________________ ERROR collecting test/test_config.py __________________________________________________________
ImportError while importing test module '/home/planetdyna/test/test_config.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python2.7/dist-packages/six.py:709: in exec_
exec("""exec code in globs, locs""")
test/test_config.py:5: in
from flathunter.config import Config
E ImportError: No module named flathunter.config
_________________________________________________ ERROR collecting test/test_crawl_ebaykleinanzeigen.py _________________________________________________
ImportError while importing test module '/home/planetdyna/test/test_crawl_ebaykleinanzeigen.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python2.7/dist-packages/six.py:709: in exec_
exec("""exec code in globs, locs""")
test/test_crawl_ebaykleinanzeigen.py:2: in
from flathunter.crawl_ebaykleinanzeigen import CrawlEbayKleinanzeigen
E ImportError: No module named flathunter.crawl_ebaykleinanzeigen
__________________________________________________ ERROR collecting test/test_crawl_immobilienscout.py __________________________________________________
ImportError while importing test module '/home/planetdyna/test/test_crawl_immobilienscout.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python2.7/dist-packages/six.py:709: in exec_
exec("""exec code in globs, locs""")
test/test_crawl_immobilienscout.py:3: in
from flathunter.crawl_immobilienscout import CrawlImmobilienscout
E ImportError: No module named flathunter.crawl_immobilienscout
_____________________________________________________ ERROR collecting test/test_crawl_immowelt.py ______________________________________________________
ImportError while importing test module '/home/planetdyna/test/test_crawl_immowelt.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python2.7/dist-packages/six.py:709: in exec_
exec("""exec code in globs, locs""")
test/test_crawl_immowelt.py:3: in
from flathunter.crawl_immowelt import CrawlImmowelt
E ImportError: No module named flathunter.crawl_immowelt
_____________________________________________________ ERROR collecting test/test_crawl_wggesucht.py _____________________________________________________
ImportError while importing test module '/home/planetdyna/test/test_crawl_wggesucht.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python2.7/dist-packages/six.py:709: in exec_
exec("""exec code in globs, locs""")
test/test_crawl_wggesucht.py:3: in
from flathunter.crawl_wggesucht import CrawlWgGesucht
E ImportError: No module named flathunter.crawl_wggesucht
________________________________________________ ERROR collecting test/test_gmaps_duration_processor.py _________________________________________________
/usr/lib/python2.7/dist-packages/pytest/python.py:450: in importtestmodule
mod = self.fspath.pyimport(ensuresyspath=importmode)
/usr/lib/python2.7/dist-packages/py/path/local.py:701: in pyimport
import(modname)
E File "/home/planetdyna/test/test_gmaps_duration_processor.py", line 28
E SyntaxError: Non-ASCII character '\xd0' in file /home/planetdyna/test/test_gmaps_duration_processor.py on line 29, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
________________________________________________ ERROR collecting test/test_googlecloud_idmaintainer.py _________________________________________________
ImportError while importing test module '/home/planetdyna/test/test_googlecloud_idmaintainer.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python2.7/dist-packages/six.py:709: in exec
exec("""exec code in globs, locs""")
test/test_googlecloud_idmaintainer.py:4: in
from mockfirestore import MockFirestore
E ImportError: No module named mockfirestore
_________________________________________________________ ERROR collecting test/test_hunter.py __________________________________________________________
ImportError while importing test module '/home/planetdyna/test/test_hunter.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python2.7/dist-packages/six.py:709: in exec
exec("""exec code in globs, locs""")
test/test_hunter.py:2: in
import yaml
E ImportError: No module named yaml
______________________________________________________ ERROR collecting test/test_id_maintainer.py ______________________________________________________
ImportError while importing test module '/home/planetdyna/test/test_id_maintainer.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python2.7/dist-packages/six.py:709: in exec
exec("""exec code in globs, locs""")
test/test_id_maintainer.py:5: in
from flathunter.idmaintainer import IdMaintainer
E ImportError: No module named flathunter.idmaintainer
________________________________________________________ ERROR collecting test/test_processor.py ________________________________________________________
ImportError while importing test module '/home/planetdyna/test/test_processor.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python2.7/dist-packages/six.py:709: in exec_
exec("""exec code in globs, locs""")
test/test_processor.py:2: in
import yaml
E ImportError: No module named yaml
_____________________________________________________ ERROR collecting test/test_sender_telegram.py _____________________________________________________
ImportError while importing test module '/home/planetdyna/test/test_sender_telegram.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python2.7/dist-packages/six.py:709: in exec_
exec("""exec code in globs, locs""")
test/test_sender_telegram.py:1: in
import requests_mock
E ImportError: No module named requests_mock
_____________________________________________________ ERROR collecting test/test_statistics_view.py _____________________________________________________
ImportError while importing test module '/home/planetdyna/test/test_statistics_view.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python2.7/dist-packages/six.py:709: in exec_
exec("""exec code in globs, locs""")
test/test_statistics_view.py:3: in
import yaml
E ImportError: No module named yaml
______________________________________________________ ERROR collecting test/test_web_interface.py ______________________________________________________
ImportError while importing test module '/home/planetdyna/test/test_web_interface.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python2.7/dist-packages/six.py:709: in exec_
exec("""exec code in globs, locs""")
test/test_web_interface.py:3: in
import yaml
E ImportError: No module named yaml
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 13 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Hello,
I managed to set everything up and run the file. Had some difficulties because I never used Python and the console before, but I managed to install it. No after running the flathunt.py file, I get this error message:
File "/.../flathunter-main/flathunter/crawl_immobilienscout.py", line 45, in get_results
while len(entries) < min(no_of_results, self.RESULT_LIMIT) and \
UnboundLocalError: local variable 'no_of_results' referenced before assignment
Does this mean it can't find any flats under my link that I've added to the config? How can I fix this error and keep the code running?
Edit:
I added a Immowelt and a Immonet link and removed the ImmoScount one, and now it runs perfectly fine. So the error has to be inside of the Immscout crawler, or my link isn't valable?
Also how can I see if the program crawls every 5 minutes? I got it running inside the console but after 5 minutes, nothing new pops up inside of the console. Is that normal when the program doesn't find any new flats?
Hi!
I experience timeouts right after the captcha solving, but it is not a steady problem, i.e. when I restart the script it runs thorugh. My 2_captcha setup is working (most of the time as you see).
Here is a typical log:
[2021/03/16 15:50:07|abstract_crawler.py|DEBUG ]: Captcha status: CAPCHA_NOT_READY
[2021/03/16 15:50:12|abstract_crawler.py|DEBUG ]: Captcha status: CAPCHA_NOT_READY
[2021/03/16 15:50:12|abstract_crawler.py|DEBUG ]: Captcha promise: OK|03AGdBq24s86a4LSPuU8CJCT1gF-JzbFcRDkyXAtQwSgzP6QK9Yv9z3iA0mlzfBdIh5hv0H1t7s-PJnBDD0yFgaXJVaYr1dba57vCu_66yXiQ6gzeRtIcQwJYKgctw9_8Y9d7ThbShmLlG7v6Y5qWTmELSyX0QiDNInqIDNwM5DXCNmzLTw1lrVoENlgXoKerJbJhO0Gy1aZdO6-gV-nD_wqPpGI5NDKGnKcMXdajE4L6FxJILEnyXY77HAnI05MRbbI-dLIFEUAKKenWovdyMLgjDIbb83dZhoEB8iFyEDmDhV07Zea2CS7MEvLAXT9B-0s9D3mmR0pfZbhQ9bF_KEh43k83kxBLZ1_jjhebf2lECm6LfWKKv1MCPSsObwNsrhtt2ivCxKJqRaoqjXlDHxkRyyRh0p6oyvjiX8tlx0Iynse7oX2w2FvEqX9htv4F_M06EPkweyoXGx3-rEHPyA3IqhTCQXbow6ChGvaF1Y9S4Ze1AwHuveto
Traceback (most recent call last):
File "flathunt.py", line 89, in <module>
main()
File "flathunt.py", line 86, in main
launch_flat_hunt(config)
File "flathunt.py", line 50, in launch_flat_hunt
hunter.hunt_flats()
File "/home/m/flathunter/flathunter/hunter.py", line 42, in hunt_flats
for expose in processor_chain.process(self.crawl_for_exposes(max_pages)):
File "/home/m/flathunter/flathunter/hunter.py", line 22, in crawl_for_exposes
for searcher in self.config.searchers()
File "/home/m/flathunter/flathunter/hunter.py", line 23, in <listcomp>
for url in self.config.get('urls', list())])
File "/home/m/flathunter/flathunter/abstract_crawler.py", line 136, in crawl
return self.get_results(url, max_pages)
File "/home/m/flathunter/flathunter/crawl_immobilienscout.py", line 60, in get_results
soup = self.get_page(search_url, self.driver, page_no)
File "/home/m/flathunter/flathunter/crawl_immobilienscout.py", line 120, in get_page
return self.get_soup_from_url(search_url.format(page_no), driver=driver, captcha_api_key=self.captcha_api_key, checkbox=self.checkbox, afterlogin_string=self.afterlogin_string)
File "/home/m/flathunter/flathunter/abstract_crawler.py", line 75, in get_soup_from_url
self.resolvecaptcha(driver, checkbox, afterlogin_string, captcha_api_key)
File "/home/m/flathunter/flathunter/abstract_crawler.py", line 153, in resolvecaptcha
self._solve(driver, api_key)
File "/home/m/flathunter/flathunter/abstract_crawler.py", line 181, in _solve
self._check_if_iframe_not_visible(driver)
File "/home/m/flathunter/flathunter/abstract_crawler.py", line 216, in _check_if_iframe_not_visible
(By.CSS_SELECTOR, "iframe[src^='https://www.google.com/recaptcha/api2/anchor?']")))
File "/home/m/.pyenv/versions/venv_flathunter/lib/python3.6/site-packages/selenium/webdriver/support/wait.py", line 80, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
here are two examples with more context:
I am just guessing here: maybe after 2 min of no answer from 2captcha, the script raises the timeoutException?
If this is a persistent problem of 2captcha (time to answer > time selenium holds this connection open), it would be nice, if the script wouldn't break, when this happens.
Thanks for looking into it!
Marvin
Sometimes there are entries without images in the resultlist for immoscout24. AFAIK there is no URL config to exclude those.
File "..\flathunter\crawl_immobilienscout.py", line 107, in extract_entry_from_javascript
'image': entry["resultlist.realEstate"]["galleryAttachments"]["attachment"][0]["@xlink.href"],
KeyError: 'galleryAttachments'
Hi!
I get an error when crawling kleinanzeigen after days with no problem. Did they change something? Here is the error message:
Traceback (most recent call last):
File "flathunt.py", line 89, in <module>
main()
File "flathunt.py", line 86, in main
launch_flat_hunt(config)
File "flathunt.py", line 46, in launch_flat_hunt
hunter.hunt_flats()
File "/home/m/flathunter/flathunter/hunter.py", line 42, in hunt_flats
for expose in processor_chain.process(self.crawl_for_exposes(max_pages)):
File "/home/m/flathunter/flathunter/hunter.py", line 22, in crawl_for_exposes
for searcher in self.config.searchers()
File "/home/m/flathunter/flathunter/hunter.py", line 23, in <listcomp>
for url in self.config.get('urls', list())])
File "/home/m/flathunter/flathunter/abstract_crawler.py", line 136, in crawl
return self.get_results(url, max_pages)
File "/home/m/flathunter/flathunter/abstract_crawler.py", line 127, in get_results
entries = self.extract_data(soup)
File "/home/m/flathunter/flathunter/crawl_ebaykleinanzeigen.py", line 72, in extract_data
image = image_element["data-imgsrc"]
File "/home/m/.pyenv/versions/venv_flathunter/lib/python3.6/site-packages/bs4/element.py", line 992, in __getitem__
return self.attrs[key]
KeyError: 'data-imgsrc'
that is my search-string:
https://www.ebay-kleinanzeigen.de/s-wohnung-mieten/mitte/anzeige:angebote/preis::800/c203l3518r5+wohnung_mieten.zimmer_d:2
Thanks for looking into it!
Marvin
Hi,
I keep getting an Index Error whenever I try to crawl wg-gesucht. The message is:
flathunter\flathunter\crawl_wggesucht.py", line 40, in extract_data
rooms = re.findall(r'\d Zimmer', details_array[0])[0][:1]
IndexError: list index out of range
The problem seems to be that it cannot handle an empty list, i.e. there are no numbers with "Zimmer". I did a quick fix and it seems to be working for me now like this:
rooms_tmp = re.findall(r'\d Zimmer', details_array[0])
if not rooms_tmp:
rooms = 0
else:
rooms = rooms_tmp[0][:1]
Hello,
since one hour I am getting the following error message
[2020/07/09 15:42:01|config.py |INFO ]: Using config /home/choeffer/Dokumente/flathunter/config.yaml
Traceback (most recent call last):
File "flathunt.py", line 89, in <module>
main()
File "flathunt.py", line 86, in main
launch_flat_hunt(config)
File "flathunt.py", line 46, in launch_flat_hunt
hunter.hunt_flats()
File "/home/choeffer/Dokumente/flathunter/flathunter/hunter.py", line 42, in hunt_flats
for expose in processor_chain.process(self.crawl_for_exposes(max_pages)):
File "/home/choeffer/Dokumente/flathunter/flathunter/hunter.py", line 21, in crawl_for_exposes
return chain(*[searcher.crawl(url, max_pages)
File "/home/choeffer/Dokumente/flathunter/flathunter/hunter.py", line 21, in <listcomp>
return chain(*[searcher.crawl(url, max_pages)
File "/home/choeffer/Dokumente/flathunter/flathunter/abstract_crawler.py", line 48, in crawl
return self.get_results(url, max_pages)
File "/home/choeffer/Dokumente/flathunter/flathunter/abstract_crawler.py", line 39, in get_results
entries = self.extract_data(soup)
File "/home/choeffer/Dokumente/flathunter/flathunter/crawl_ebaykleinanzeigen.py", line 81, in extract_data
rooms = re.match(r'(\d+)', tags[1].text)[1]
TypeError: 'NoneType' object is not subscriptable
If you need more information for debugging the issue or additional logs, please let me know.
This is good for debugging and making sure the program is still running, which otherwise cannot be known without chechink it manually. This can be tedious, when the program is running on a server and supposed to be a set and forget type thing.
So a feature, that a "I'm still alive" message is sent regularly (maybe once a day or week, this should be configurable) may make sense.
Hi,
I've tried to install the package and received the following error
Traceback (most recent call last):
File "/home/ubuntu/.local/bin/pipenv", line 8, in <module>
sys.exit(cli())
File "/home/ubuntu/.local/lib/python3.8/site-packages/pipenv/vendor/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/pipenv/vendor/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/home/ubuntu/.local/lib/python3.8/site-packages/pipenv/vendor/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/ubuntu/.local/lib/python3.8/site-packages/pipenv/vendor/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/ubuntu/.local/lib/python3.8/site-packages/pipenv/vendor/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/pipenv/vendor/click/decorators.py", line 73, in new_func
return ctx.invoke(f, obj, *args, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/pipenv/vendor/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/pipenv/vendor/click/decorators.py", line 21, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/pipenv/cli/command.py", line 233, in install
retcode = do_install(
File "/home/ubuntu/.local/lib/python3.8/site-packages/pipenv/core.py", line 2052, in do_install
do_init(
File "/home/ubuntu/.local/lib/python3.8/site-packages/pipenv/core.py", line 1304, in do_init
do_install_dependencies(
File "/home/ubuntu/.local/lib/python3.8/site-packages/pipenv/core.py", line 899, in do_install_dependencies
batch_install(
File "/home/ubuntu/.local/lib/python3.8/site-packages/pipenv/core.py", line 796, in batch_install
_cleanup_procs(procs, failed_deps_queue, retry=retry)
File "/home/ubuntu/.local/lib/python3.8/site-packages/pipenv/core.py", line 703, in _cleanup_procs
raise exceptions.InstallError(c.dep.name, extra=err_lines)
pipenv.exceptions.InstallError: ERROR: Couldn't install package: grpcio
Package installation failed...
โค โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 0/1 โ 01:23:46
Some details about my system
-Ubuntu 20.04
-Python 3.8.5.
Already tried pip3 install --upgrade setuptools
without an effect.
Suprisingly the installation pip3 install gprcio
run through without an error.
Any idea how to solve this issue?
Thanks!
Alex
Hi,
Sorry for this trivial question, but I cannot get chromedriver to start in the Google Cloud. My latest guess for config.yaml was "driver_path: /usr/local/bin/chromedriver", because this is the path mentioned in the joyzoursky dockerfile.
On Google Cloud I get these messages:
As I'm not at all familiar with Google Cloud, I'm not sure how to find out where the executable is actually located.
Cheers
Martin
Hi there! In advance, thanks for the great tool, which runs perfectly fine despite the error. for immoscout24 I get a 405 error every time (see below), but the flathunter continues to run without problems. currently it runs on a cloud-server from hetzner. When I test it at home with the identical settings, I get no error message. any idea to solve the http-error?
[2020/10/30 07:32:32|abstract_crawler.py|ERROR ]: Got response (405): b'<!DOCTYPE html>\n<html>\n<head>\n <meta http-equiv="content-type" content="text/html; charset=UTF-8"/>\n <meta name="viewport" content="width=device-width, initial-scale=1, minimum-scale=1, maximum-scale=1" />\n <meta http-equiv="X-UA-Compatible" content="IE=edge" />\n <meta name="robots" content="noindex, nofollow">\n <meta http-equiv="cache-control" content="no-cache, no-store, must-revalidate">\n <meta http-equiv="pragma" content="no-cache">\n <meta http-equiv="expires" content="0">\n <title>Ich bin kein Roboter - ImmobilienScout24</title>\n <link rel="icon" type="image/vnd.microsoft.icon" href="https://www.immobilienscout24.de/favicon.ico"/>\n <link rel="shortcut icon" type="image/vnd.microsoft.icon" href="https://www.immobilienscout24.de/favicon.ico"/>\n <style>\n @font-face {\n font-family: "Make It Sans IS24 Web";\n font-style: normal;\n font-weight: 400;\n font-display: swap;\n src: url("https://www.static-immobilienscout24.de/fro/core/4.4.1/font/vendor/make-it-sans/MakeItSansIS24WEB-Regular.woff2") format("woff2"), url("https://www.static-immobilienscout24.de/fro/core/4.4.1/font/vendor/make-it-sans/MakeItSansIS24WEB-Regular.woff") format("woff");\n }\n @font-face {\n font-family: "Make It Sans IS24 Web";\n font-style: normal;\n font-weight: 700;\n font-display: swap;\n src: url("https://www.static-immobilienscout24.de/fro/core/4.4.1/font/vendor/make-it-sans/MakeItSansIS24WEB-Bold.woff2") format("woff2"), url("https://www.static-immobilienscout24.de/fro/core/4.4.1/font/vendor/make-it-sans/MakeItSansIS24WEB-Bold.woff") format("woff");\n }\n\n @font-face {\n font-family: \'IS24Icons\';\n src: url(\'https://www.static-immobilienscout24.de/fro/core/4.4.1/font/vendor/is24-icons/is24-icons.woff\') format(\'woff\');\n font-weight: normal;\n font-style: normal;\n }\n\n a, abbr, address, article, aside, audio, b, blockquote, body, canvas, caption, cite, code, dd, del, details, dfn, div, dl, dt, em, fieldset, figcaption, figure, footer, form, h1, h2, h3, h4, h5, h6, header, html, i, iframe, img, input, ins, kbd, label, legend, li, main, mark, menu, nav, object, ol, p, pre, q, samp, section, select, small, span, strong, sub, summary, sup, table, tbody, td, textarea, tfoot, th, thead, time, tr, ul, var, video {\n -ms-box-sizing: border-box;\n -o-box-sizing: border-box;\n box-sizing: border-box;\n margin: 0;\n padding: 0;\n border: 0;\n outline: 0;\n }\n\n html {\n font-size: 62.5%;\n }\n body {\n background-color: #fff;\n color: #333;\n font-size: 1.4em;\n line-height: 1.61;\n font-family: "Make It Sans IS24 Web",Verdana,"DejaVu Sans",Arial,Helvetica,sans-serif;\n }\n .page-wrapper {\n margin-left: auto;\n margin-right: auto;\n max-width: 1170px;\n background-color: #fff;\n }\n .grid {\n display: block;\n margin-right: 0;\n }\n .grid:after {\n display: table;\n clear: both;\n content: "";\n }\n .grid-item {\n display: block;\n float: left;\n vertical-align: top;\n text-align: left;\n }\n .header {\n border-bottom: 1px solid #e0e0e0;\n }\n .header .grid {\n padding-left: 70px;\n padding-right: 70px;\n padding-top: 14px;\n padding-bottom: 14px;\n }\n .header .logo {\n width: 50%;\n float: left;\n }\n .header .logo img {\n vertical-align: top;\n }\n .header .login-button {\n width: 50%;\n text-align: right;\n float: left;\n }\n .header .login-button a {\n padding-top: .35714286em;\n padding-bottom: .35714286em;\n min-width: 9.42857143em;\n font-family: "Make It Sans IS24 Web",Verdana,"DejaVu Sans",Arial,Helvetica,sans-serif;\n border-radius: 8px;\n background-color: #fff;\n display: inline-block;\n border: 1px solid #333333;\n padding: .64285714em 1.64285714em;\n font-weight: 600;\n font-size: 1.4rem;\n text-align: center;\n letter-spacing: .2px;\n line-height: 1.42857143em;\n white-space: nowrap;\n cursor: pointer;\n color: #333333;\n }\n .header .login-button a:link, .header .login-button a:visited, .header .login-button a:focus, .header .login-button a:hover {\n text-decoration: none;\n color: #333333;\n }\n .header .login-button a:hover {\n background-color: #eaeaea;\n }\n .main {\n clear: both;\n padding-top: 55px;\n max-width: 583px;\n margin-left: auto;\n margin-right: auto;\n text-align: center;\n }\n .main .headline {\n font-size: 4.0rem;\n font-weight: bold;\n letter-spacing: 0px;\n line-height: 4.8rem;\n text-align: center;\n }\n .main .main__logo {\n padding-top: 10px;\n text-align: center;\n }\n .main .main__logo img {\n height: 240px;\n width: 240px;\n vertical-align: top;\n }\n .main .main__part1 {\n padding-top: 11px;\n font-size: 1.4rem;\n font-weight: bold;\n letter-spacing: 0px;\n line-height: 20px;\n }\n .main .main__captcha {\n padding-top: 36px;\n padding-bottom: 36px;\n }\n .main .main_part2_header1 {\n font-weight: bold;\n }\n .main .main_part2_header2 {\n font-weight: bold;\n padding-top: 16px;\n }\n .main .main__list {\n padding-top: 14px;\n padding-bottom: 42px;\n }\n .main .main__list ul li {\n list-style-position: inside;\n }\n .footer {\n background: #f2f2f2;\n text-align: center;\n }\n .footer .footer-content {\n max-width: 583px;\n margin-left: auto;\n margin-right: auto;\n padding-top: 15px;\n padding-bottom: 6px;\n color: #757575;\n font-size: 1.2rem;\n line-height: 1.6rem;\n }\n .footer .footer-content div {\n padding-top: 20px;\n }\n .footer .footer-content div:first-child {\n padding-top: 0;\n }\n .footer .footer-content a, .footer .footer-content a:visited, .footer .footer-content a:link, .footer .footer-content a:focus, .footer .footer-content .legend {\n color: #757575;\n font-size: 1.2rem;\n line-height: 1.6rem;\n text-decoration: none;\n }\n .footer .footer-content a:hover {\n color: #757575;\n font-size: 1.2rem;\n line-height: 1.6rem;\n text-decoration: underline;\n }\n\n .g-recaptcha {\n display: inline-block;\n }\n\n @media (max-width: 668px) {\n .palm-hide {\n display: none;\n }\n .header .grid {\n padding-left: 16px;\n padding-right: 16px;\n padding-top: 8px;\n padding-bottom: 8px;\n }\n .main {\n padding-top: 32px;\n padding-left: 16px;\n padding-right: 16px;\n }\n .main .headline {\n font-size: 3.2rem;\n font-weight: normal;\n line-height: 4.0rem;\n }\n .main .main__logo img {\n height: 188px;\n width: 188px;\n }\n .footer .footer-content {\n padding-bottom: 32px;\n }\n\n }\n </style>\n\n <script>\n function showBlockPage() {\n console.log("showing block page");\n }\n setTimeout(showBlockPage, 10000);\n </script>\n <script type="text/javascript" src="/assets/immo-1-17" async defer></script>\n <script>window.captchaDescription = \'<p>Nachdem du das unten stehende CAPTCHA best\xc3\xa4tigt hast, wirst du sofort auf die von dir angefragte Seite weitergeleitet.</p>\';</script>\n <script src=\'https://www.google.com/recaptcha/api.js?hl=de\'></script>\n \n <script src="https://www.google.com/recaptcha/api.js" async defer></script>\n <script>\n function solvedCaptcha(payload) {\n const timeoutMs = 10000;\n protectionSubmitCaptcha("recaptcha", payload, timeoutMs, "3:KgR7QA9Zb+DPvlNK5NS0rQ==:Qc1ZWjV3jT+q6LyOv1htA/nmUoIWkcqqc41XxsIy6OWxHPb2t8XycRcMDV/0FGR3ax4IVPrl5qRmqm2RA8aHIuRNhZL1E6PJAkbg5IFVVBbtYVxxo59nosGtEY01RrnSuhs5hD0STKKPbDzntLLh60R0W7+6AzIUSQFKehVnUHiERpphMCXrg74Hg6N6sY75I4ZtEHJEhBRgO36V5uCHOQ==:q3Gl4XIOmWNJ6zYAqLlwwZDHJSgNwu0MGvGtik7zNvo=").then(function() {\n window.location.reload(true);\n });\n }\n </script>\n \n</head>\n<body>\n\n<div class="header">\n <div class="page-wrapper">\n <div class="grid">\n <div class="logo grid-item">\n <a href="https://www.immobilienscout24.de/">\n <img src="https://www.static-immobilienscout24.de/fro/imperva/0.0.1/is24-logo.svg" alt="ImmoScout24 Logo">\n </a>\n </div>\n <div class="login-button grid-item">\n <a href="https://www.immobilienscout24.de/geschlossenerbereich/start.html?source=meinkontodropdown-login">\n Anmelden <span class="palm-hide">/ Registrieren</span>\n </a>\n </div>\n </div>\n </div>\n</div>\n\n<div class="page-wrapper">\n\n<div class="main">\n <div class="headline">\n Ich bin kein Roboter\n </div>\n <div class="main__logo">\n <img src="https://www.static-immobilienscout24.de/fro/imperva/0.0.1/robot-logo.svg" alt="Roboter Logo">\n </div>\n<div class="main__part1">\n Du bist ein Mensch aus Fleisch und Blut? Entschuldige bitte, dann hat unser System dich f\xc3\xa4lschlicherweise als Roboter identifiziert. Um unsere Services weiterhin zu nutzen, l\xc3\xb6se bitte diesen kurzen Test.\n</div>\n\n <div class="main__captcha">\n \n <div class="container">\n \n <script>\n showBlockPage()\n document.writeln(window.captchaDescription || "<p>After completing the CAPTCHA below, you will immediately regain access to the site again.</p>");\n </script>\n <div class="g-recaptcha" data-sitekey="6LeaILIZAAAAALTgLZV1AQXPc2dAsLItNYJ8jVvB" data-callback="solvedCaptcha"></div>\n </div>\n </div>\n\n<div class="main__part2">\n\n <div class="main_part2_header1">Warum f\xc3\xbchren wir diese Sicherheitsma\xc3\x9fnahme durch?</div>\n<div class="main_part2_text1">Mit der Captcha-Methode stellen wir fest, dass du kein Roboter oder eine sch\xc3\xa4dliche Spam-Software bist. Damit sch\xc3\xbctzen wir unsere Webseite und die Daten unserer Nutzerinnen und Nutzer vor betr\xc3\xbcgerischen Aktivit\xc3\xa4ten.</div>\n\n <div class="main_part2_header2">Warum haben wir deine Suchanfragen blockiert?</div>\n <div class="main_part2_text2">Es kann verschiedene Gr\xc3\xbcnde haben, warum wir dich f\xc3\xa4lschlicherweise als Roboter identifiziert haben. M\xc3\xb6glicherweise</div>\n\n</div>\n<div class="main__list">\n<ul>\n <li>hast du die Cookies f\xc3\xbcr unsere Seite deaktiviert.</li>\n <li>hast du die Ausf\xc3\xbchrung von JavaScript deaktiviert.</li>\n <li>nutzt du ein Browser-Plugin eines Drittanbieters, beispielsweise einen Ad-Blocker.</li>\n<li>hast du in kurzer Zeit mehr Anfragen an unser System gestellt, als es \xc3\xbcblicherweise der Fall ist.</li>\n</ul>\n</div>\n\n\n</div>\n\n</div>\n\n<div class="footer">\n <div class="footer-content">\n\n\n <div>\n <a href="https://www.immobilienscout24.de/unternehmen.html">\xc3\x9cber uns</a> |\n <a href="https://www.immobilienscout24.de/kontakt.html">Kontakt & Hilfe</a> |\n <a href="https://www.immobilienscout24.de/unternehmen/karriere/">Karriere</a> |\n <a href="https://www.immobilienscout24.de/sitemap.html">Sitemap</a> |\n <a href="https://api.immobilienscout24.de">Developer</a> |\n <a href="https://www.immobilienscout24.de/unternehmen/mediendienst.html">Presseservice</a> |\n <a href="https://www.immobilienscout24.de/ratgeber/newsletter.html">Newsletter abonnieren</a> |\n <a href="https://www.immobilienscout24.de/impressum.html">Impressum</a> |\n <a href="https://www.immobilienscout24.de/agb.html">AGB\'s & Rechtliche Hinweise</a> |\n <a href="https://www.immobilienscout24.de/agb/verbraucherinformationen.html">Verbraucherinformationen</a> |\n <a href="https://www.immobilienscout24.de/agb/datenschutz.html">Datenschutz</a> |\n <a href="https://www.immobilienscout24.de/lp/Geodatenkodex.html">Datenschutz Kodex f\xc3\xbcr Geodatendienste</a> |\n <a href="https://sicherheit.immobilienscout24.de">Sicherheit</a>\n </div>\n <div>\n <!--<a href="">Immobiliensuche</a> | -->\n <a href="https://www.scout24media.com/">Werbung</a> |\n <a href="https://blog.immobilienscout24.de">Blog</a>\n <!--|\n <a href="">Nachbarschaft</a> |\n <a href="">Gratis! E-Mail-Adresse @t-online.de</a>-->\n </div>\n <div>\n <a href="https://www.immobilienscout24.de/">www.ImmobilienScout24.de</a>\n </div>\n <div class="legend">\n \xc2\xa9 Copyright 1999 - 2020 Immobilien Scout GmbH\n </div>\n </div>\n\n</div>\n\n</body>\n</html>\n'
Hi, first of all: thanks for writing this awesome bot!!
It was running yesterday, but seems to crash now with following error on URL
https://www.immobilienscout24.de/Suche/de/berlin/berlin/wohnung-mieten?numberofrooms=4.0-&price=-1500.0&livingspace=100.0-&geocodes=110000000911,110000000801,110000000703,110000000605,110000000704,110000000906,110000000907,110000001102,110000000201,110000000202,110000000301,110000000302,110000000601,110000000910,110000000701&pagenumber={0}
error log
[2020/10/19 16:34:48|config.py |INFO ]: Using config /Users/user/flathunter/config.yaml
[2020/10/19 16:34:50|flathunt.py |DEBUG ]: Settings from config: <flathunter.config.Config object at 0x108c51f10>
[2020/10/19 16:34:50|crawl_immobilienscout.py|DEBUG ]: Got search URL https://www.immobilienscout24.de/Suche/de/berlin/berlin/wohnung-mieten?numberofrooms=4.0-&price=-1500.0&livingspace=100.0-&geocodes=110000000911,110000000801,110000000703,110000000605,110000000704,110000000906,110000000907,110000001102,110000000201,110000000202,110000000301,110000000302,110000000601,110000000910,110000000701&pagenumber={0}
[2020/10/19 16:34:51|abstract_crawler.py|DEBUG ]: Google site key: <re.Match object; span=(49, 93), match='&k=6LeaILIZAAAAALTgLZV1AQXPc2dAsLItNYJ8jVvB&'>
[2020/10/19 16:34:56|abstract_crawler.py|DEBUG ]: Captcha status: CAPCHA_NOT_READY
[2020/10/19 16:35:01|abstract_crawler.py|DEBUG ]: Captcha status: CAPCHA_NOT_READY
[2020/10/19 16:35:06|abstract_crawler.py|DEBUG ]: Captcha status: CAPCHA_NOT_READY
[2020/10/19 16:35:12|abstract_crawler.py|DEBUG ]: Captcha status: CAPCHA_NOT_READY
[2020/10/19 16:35:17|abstract_crawler.py|DEBUG ]: Captcha status: CAPCHA_NOT_READY
[2020/10/19 16:35:17|abstract_crawler.py|DEBUG ]: Captcha promise: OK|03AGdBq26K-M4biKiyM1LweSOVKS1UuwZouTmow2O7P0f4P7yslyST6Fr7D1qwuHOWd63NU6GG_oQND0Vd1X0Z7MYlH8LO29WaHhgfxyPcoGo19TyERKtQZBxh0ktiSzWuuFs07dHyOw6sNKFKZQt3X1cDv5xJnqnEugqgIY26ZqVSg5zJvAQdEr1wIvaPTehCOQh-4Uh910LK7EnFzrIdc5qRnWVFdQ5RHuMw1sCCjUNTB_jhgCHax-oxG_ec33AMiXm_cMW-HtVnAcQ01ESpBMJe3Cjjhwd77BpbuWUmP3TQTIEObBTe_C3DMnIn_xVeH1B_yw8F1SCsLm_Eh43-4SQjsQZeJhem_odU1RdvVm8E3Os2YAkrEl1c7jOMY9NRcMr-i_kElZmppQjE5Ps1FhHMgd-NzaTwV5bTNmMhKh0W9I2XTP00eHd8FFbJBvusqSkKQmf-OXqMVI6YYS0dR1SOILgiYNsx8u-ppzAzLl2klKo_9DLdPRZY3ctk-MfmpODwEouRLIAaol28pxnRp8trERm2komELw
Traceback (most recent call last):
File "flathunt.py", line 89, in <module>
main()
File "flathunt.py", line 86, in main
launch_flat_hunt(config)
File "flathunt.py", line 46, in launch_flat_hunt
hunter.hunt_flats()
File "/Users/user/flathunter/flathunter/hunter.py", line 42, in hunt_flats
for expose in processor_chain.process(self.crawl_for_exposes(max_pages)):
File "/Users/user/flathunter/flathunter/hunter.py", line 22, in crawl_for_exposes
for searcher in self.config.searchers()
File "/Users/user/flathunter/flathunter/hunter.py", line 23, in <listcomp>
for url in self.config.get('urls', list())])
File "/Users/user/flathunter/flathunter/abstract_crawler.py", line 136, in crawl
return self.get_results(url, max_pages)
File "/Users/user/flathunter/flathunter/crawl_immobilienscout.py", line 64, in get_results
return self.get_entries_from_javascript()
File "/Users/user/flathunter/flathunter/crawl_immobilienscout.py", line 98, in get_entries_from_javascript
return self.get_entries_from_json(result_json)
File "/Users/user/flathunter/flathunter/crawl_immobilienscout.py", line 102, in get_entries_from_json
return [ self.extract_entry_from_javascript(entry.value) for entry in jsonpath_expr.find(json) ]
File "/Users/user/flathunter/flathunter/crawl_immobilienscout.py", line 102, in <listcomp>
return [ self.extract_entry_from_javascript(entry.value) for entry in jsonpath_expr.find(json) ]
File "/Users/user/flathunter/flathunter/crawl_immobilienscout.py", line 113, in extract_entry_from_javascript
'price': str(entry["monthlyRate"]),
KeyError: 'monthlyRate'
Any idea?
Thanks!
Hello,
how do you set the config?
If i set urls on line 13 i got errors
can one show his config?
of course without his token
Hey there, i dont know if i am right here, but i have a problem running flathunter.
I had it up and runnning previousely (version from last october i think). now i tried to reuse it. i updated it via git, and now i get the following error when trying to run it:
./flathunter.py [2020/06/08 14:42:29|config.py |INFO ]: Using config /scripts/flathun ter/config.yaml [2020/06/08 14:42:29|idmaintainer.py |INFO ]: already processed: 0 Traceback (most recent call last): File "./flathunter.py", line 91, in <module> main() File "./flathunter.py", line 87, in main launch_flat_hunt(config) File "./flathunter.py", line 46, in launch_flat_hunt hunter.hunt_flats() File "/scripts/flathunter/flathunter/hunter.py", line 49, in hunt_flats for expose in self.config.get_filter().filter(results): File "/scripts/flathunter/flathunter/config.py", line 36, in get_filter if "excluded_titles" in filters_config: TypeError: argument of type 'NoneType' is not iterable johannes@mail:/scripts/flathunter$
is this en error, or did i miss something when installing the dependencies?
Thanks a lot!!!
Hi,
is there any quick way to have multiple search intervals for different searches? (ebay-kleinanzeigen 600s, immoscout 1200s for example..)
Cheers
The config file sample says that only immoscout and wg-gesucht are supported. However, the README claims immowelt and ebay Kleinanzeigen work as well. Is that just an oversight, or am I missing something?
Thanks for sharing a great tool!
immobilienscout24 added cookies to their headers. Have to get them before proceeding with the crawling.
Hello,
First, thanks to all the amazing people contributing to this project!
I always got the problem when crawling ImmobilienScout24 that I get the no_of_results error.
Traceback (most recent call last):
File "flathunt.py", line 89, in
main()
File "flathunt.py", line 86, in main
launch_flat_hunt(config)
File "flathunt.py", line 46, in launch_flat_hunt
hunter.hunt_flats()
File "/Users/zoe/Desktop/flathunter-main/flathunter/hunter.py", line 42, in hunt_flats
for expose in processor_chain.process(self.crawl_for_exposes(max_pages)):
File "/Users/zoe/Desktop/flathunter-main/flathunter/hunter.py", line 22, in crawl_for_exposes
for searcher in self.config.searchers()
File "/Users/zoe/Desktop/flathunter-main/flathunter/hunter.py", line 23, in
for url in self.config.get('urls', list())])
File "/Users/zoe/Desktop/flathunter-main/flathunter/abstract_crawler.py", line 117, in crawl
return self.get_results(url, max_pages)
File "/Users/zoe/Desktop/flathunter-main/flathunter/crawl_immobilienscout.py", line 65, in get_results
while len(entries) < min(no_of_results, self.RESULT_LIMIT) and
UnboundLocalError: local variable 'no_of_results' referenced before assignment
So I tried to make the changes to the files like mentioned in the pull #61
But after changing the code, installing chromedriver and selenium and paying 2captcha, I still get the same error.
Is there anything else I need to do? I added the 2captcha key and the chromedriver path to the config file. I don't know what else there is to do...
Any help would be amazing.
Hi there! Just discovered this project.
Would you accept a PR that adds an additional sender service? I don't use Telegram, so I've added Pushover support. I've already got this working locally. But I wanted to ask before making the PR.
I see that the web feature of flathunter makes use of telegram login. I don't think I would add Pushover support to that, though if desired I could.
Do I need 2captcha to have immoscout working with selenium/chromedriver?
I get this error:
2020-10-21T04:52:04.525696+00:00 app[worker.1]: [๏ฟฝ[94m2020/10/21 04:52:04๏ฟฝ[0m|๏ฟฝ[94msender_telegram.py๏ฟฝ[0m|๏ฟฝ[93mERROR ๏ฟฝ[0m]: When sending bot message, we got status 429 with message: {'ok': False, 'error_code': 429, 'description': 'Too Many Requests: retry after 34', 'parameters': {'retry_after': 34}}
Is that normal?
I recently noticed that the crawler crashes on WG-Gesucht with the following error:
Traceback (most recent call last):
File "C:\Users\X\Documents\Python\flathunter\flathunt.py", line 89, in <module>
main()
File "C:\Users\X\Documents\Python\flathunter\flathunt.py", line 86, in main
launch_flat_hunt(config)
File "C:\Users\X\Documents\Python\flathunter\flathunt.py", line 46, in launch_flat_hunt
hunter.hunt_flats()
File "C:\Users\X\Documents\Python\flathunter\flathunter\hunter.py", line 42, in hunt_flats
for expose in processor_chain.process(self.crawl_for_exposes(max_pages)):
File "C:\Users\X\Documents\Python\flathunter\flathunter\default_processors.py", line 30, in process_expose
expose['address'] = searcher.load_address(url)
File "C:\Users\X\Documents\Python\flathunter\flathunter\crawl_wggesucht.py", line 81, in load_address
.find("a", {"href": "#"}).text.strip().split())
AttributeError: 'NoneType' object has no attribute 'text'
This part in the load_address()
method in crawl_wggesucht.py
causes the error:
address = ' '.join(response.find('div', {"class": "col-sm-4 mb10"})
.find("a", {"href": "#"}).text.strip().split())
I think the problem is the .find("a", {"href": "#"})
part returns a NoneType reflecting that nothing was found. Maybe the implementation on WG-Gesucht has changed slightly? To me it looks like the href is now "#mapContainer". So if I change the problematic line to
address = ' '.join(response.find('div', {"class": "col-sm-4 mb10"})
.find("a", {"href": "#mapContainer"}).text.strip().split())
things seem to work fine again. Can anyone confirm this?
First of all, thank you all for this great project.
I have deployed the app as a cron job to Google Cloud and when accessing via the webpage, I do see new listings coming in. However I have not received a message from the Telegram bot yet. I ran the program locally first and it worked fine and sent me Telegram messages. So I think my config.yaml
should be fine. The Google Cloud dashboard does not show any errors and that the /hunt
url was triggered 149 times in the last 24 hours.
Any clue of where my error might be?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.