vardecab / otomoto_olx-scraper Goto Github PK

Scrape car offers from OTOMOTO․pl & OLX․pl and run IFTTT automation (eg. send email; add a to-do task) when new car(s) matching search criteria is found. With support for native macOS & Windows 10 notifications.

License: GNU General Public License v3.0

Python 100.00%

beautifulsoup4 beautifulsoup python python3 scraper scraping-websites otomoto ifttt ifttt-webhooks ifttt-maker

otomoto_olx-scraper's Introduction

Hello 👋🏼

Kuba here. I hope you're having a great day! You can read more about me by visiting my website. Below you can see the (small) things I put together.

Active projects

☔ My biggest project (using 6+ different APIs) → Umbrella:

A simple weather page that tells you if you need to take an umbrella when going outside ☔ + it shows sunrise/sunset times, air quality and allergy information for supported regions. Currently only in Polish.
💸 Script to be notified about currency rates & when to buy/sell them → forex-notifier
⬇️ Playing around with auto-downloading videos from YouTube when a new video is published (using RSS reader) → rss-youtube-downloader
📺 And downloading YouTube videos (or extracting music from them) in general — on demand or automatically by taking URL from clipboard or Pushbullet → youtube-downloader
🗺️ Script to visualize my car trips on an interactive map → fuelio-trip-visualizer
⚡ Control Tuya-compatible light bulbs & sockets in your smart home from Windows / macOS computer → control-tuya_smarthome
📦 And a script to look for updates for my Kindle → kindle-updater
💡 My browser extension (to motivate me) → Good Manager

Inactive projects

🏠 Perhaps my second biggest project (using AI 😱) → flat-finder:

Looking for a modern, nicely looking apartment but tired with all these old flats from the '90s? Let's automate that :)

Scrape apartment offers from OLX․pl, analyse them using artificial intelligence (AI) model to get only good looking flats and (optionally) run IFTTT automation (eg. send email; add a to-do task) when new offers matching search criteria (eg. ≥ 2 rooms, ≤ 2k PLN, ≥ 30m²) are found.
🚗 My Python project (to find a car) → otomoto_olx-scraper:

Scrape car offers from OTOMOTO․pl & OLX․pl and run IFTTT automation (eg. send email; add a to-do task) when new car(s) matching search criteria is found. With support for native macOS & Windows 10 notifications.
💬 Another Python project I've managed to build (to improve my vocabulary) → kindle-words:

Do something useful with your Kindle notes :) This script extracts individual words from My Clippings file hidden on your Kindle e-reader, translates them using Google Translate and exports the pair "original word" → "translation" into a .txt file from which you can learn these words or import them into an application such as Quizlet.
🚚 My first Python module available on PyPI (to give back to community) → win10toast-click:

An easy-to-use Python library for displaying Windows 10 Toast Notifications. Improved version of win10toast and win10toast-persist to include callback_on_click to run a function on notification click, for example to open a URL.
🐍 And yet another Python one (to save time) → eSkarbonka-update:

Periodically check & get a native system notification if value was changed on a website. In this case, get value from my WOŚP's eSkarbonka.
🤖 And an idea I had... → emails-from-pdfs:

Extract email addresses from PDFs stored in multiple folders. For example: you can download candidates' CVs from your HR platform and this script will extract all email addresses found in the files.

Languages:

otomoto_olx-scraper's People

Stargazers

Watchers

Forkers

krzesloszatan mosoonk michaljes

otomoto_olx-scraper's Issues

Does this still work ?

Hello, does it still work ? I keep getting

python3 otomoto2.py
Starting...
First run - no file exists.
File doesn't exist.
Folder created: 220626-152532
Page URL: https://is.gd/Eews0U
How many pages are there to crawl? 1
Page number: 1 / 1
Waiting for 2 seconds before opening URL...
<●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●> 2/2 [100%] in 2.2s (0.91/s) 
Opening page...
Scraping page...
[########################################] (!) 0 in 0.1s (0.00/s) 
Successfully added 0 cars to file.
Reading file to clean up...
Cleaning the file...
There are 0 cars in total.
File cleaned up. New lines added.
Keyword wasn't provided - not searching.
Variable not defined. Keyword wasn't provided.
Traceback (most recent call last):
  File "otomoto2.py", line 253, in <module>
    counter2
NameError: name 'counter2' is not defined

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "otomoto2.py", line 258, in <module>
    file_previous_run = open('output/' + previous_run_datetime + '/2-clean.txt', 'r') # 1st file 
NameError: name 'previous_run_datetime' is not defined

Exceptions occuring during run

Hello, i dont know where should i post it but it might help someone with similar issues like mine.

franz@Matrix:$(master)$ tree
.
├── LICENSE
├── README.md
├── automate
├── icons
│   ├── browser.ico
│   ├── car.ico
│   ├── car.png
│   ├── link.ico
│   └── www.ico
├── olx1
│   └── olx1.py
├── olx2
│   └── olx2.py
├── otomoto1
│   └── otomoto1.py
└── otomoto2
    ├── otomoto2.py
    └── output
        ├── 220701-140608
        ├── 220701-140621
        │   ├── 1-output.txt
        │   └── 2-clean.txt
        ├── 220701-140701
        │   ├── 1-output.txt
        │   └── 2-clean.txt
        ├── 220701-151153
        │   ├── 1-output.txt
        │   └── 2-clean.txt
        └── 220701-154142
            ├── 1-output.txt
            └── 2-clean.txt

12 directories, 19 files
franz@Matrix:$(master)$ cd otomoto2
franz@Matrix:$(master)$ python3 otomoto2.py 
Starting...
First run - no file exists.
File doesn't exist.
Folder created: 220701-154646
Page URL: https://is.gd/YLYNnK
How many pages are there to crawl? 1
Page number: 1 / 1
Waiting for 2 seconds before opening URL...
<●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●> 2/2 [100%] in 2.2s (0.92/s) 
Opening page...
Scraping page...
[########################################] (!) 14 in 0.0s (447.98/s) 
Successfully added 14 cars to file.
Reading file to clean up...
Cleaning the file...
There are 14 cars in total.
File cleaned up. New lines added.
Keyword wasn't provided - not searching.
Variable not defined. Keyword wasn't provided.
Traceback (most recent call last):
  File "otomoto2.py", line 253, in <module>
    counter2
NameError: name 'counter2' is not defined

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "otomoto2.py", line 258, in <module>
    file_previous_run = open('output/' + previous_run_datetime + '/2-clean.txt', 'r') # 1st file 
NameError: name 'previous_run_datetime' is not defined

So i figured out that on linux (im using WSL) you have to manually create folder "output" and "data" and then it works fine after second run:

Starting...
Previous run: 220701-155140
This run: 220701-155211
Folder created: 220701-155211
Page URL: https://is.gd/YLYNnK
How many pages are there to crawl? 1
Page number: 1 / 1
Waiting for 2 seconds before opening URL...
<●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●> 2/2 [100%] in 2.2s (0.91/s) 
Opening page...
Scraping page...
[########################################] (!) 14 in 0.0s (473.69/s) 
Successfully added 14 cars to file.
Reading file to clean up...
Cleaning the file...
There are 14 cars in total.
File cleaned up. New lines added.
Keyword wasn't provided - not searching.
Variable not defined. Keyword wasn't provided.
Files are the same.
Script run time: 3.71 seconds.

OLX issue

I am unable to compile olx1.py, even though bs4, lxml and html5lib are installed. Otomoto scraping works well, but OLX isn't scraping at all.

~/otomoto_olx-scraper/olx1 $ pip3 install lxml
Looking in indexes: https://pypi.org/simple, https://www.piwheels.org/simple
Requirement already satisfied: lxml in /home/pi/.local/lib/python3.9/site-packages (4.9.1)

~/otomoto_olx-scraper/olx1 $ python3 olx1.py
Starting...
Previous run: 221101-155614
This run: 221101-155812
Folder created: 221101-155812
Page URL: https://is.gd/OhXxPQ
First run - no file exists.
How many pages are there to crawl? 1
Page number: 1 / 1
Waiting for 2 seconds before opening URL...
<●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●> 2/2 [100%] in 2.3s (0.88/s)
Opening page...
Scraping page...
Traceback (most recent call last):
File "/home/pi/otomoto_olx-scraper/olx1/olx1.py", line 165, in
pullData(full_page_url) # throw URL to function
File "/home/pi/otomoto_olx-scraper/olx1/olx1.py", line 118, in pullData
soup = BeautifulSoup(page, features="lxml") # get URL into BS # *NOTE: v: olx
File "/home/pi/.local/lib/python3.9/site-packages/bs4/init.py", line 248, in init
raise FeatureNotFound(
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

vardecab / otomoto_olx-scraper Goto Github PK

otomoto_olx-scraper's Introduction

Hello 👋🏼

Active projects

Inactive projects

otomoto_olx-scraper's People

Stargazers

Watchers

Forkers

otomoto_olx-scraper's Issues

Does this still work ?

Exceptions occuring during run

OLX issue

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent