Git Product home page Git Product logo

vardecab / otomoto_olx-scraper Goto Github PK

View Code? Open in Web Editor NEW
6.0 2.0 4.0 177 KB

Scrape car offers from OTOMOTO․pl & OLX․pl and run IFTTT automation (eg. send email; add a to-do task) when new car(s) matching search criteria is found. With support for native macOS & Windows 10 notifications.

License: GNU General Public License v3.0

Python 100.00%
beautifulsoup4 beautifulsoup python python3 scraper scraping-websites otomoto ifttt ifttt-webhooks ifttt-maker

otomoto_olx-scraper's Introduction

Hello 👋🏼

Kuba here. I hope you're having a great day! You can read more about me by visiting my website. Below you can see the (small) things I put together.

Active projects

  • ☔ My biggest project (using 6+ different APIs) → Umbrella:

    A simple weather page that tells you if you need to take an umbrella when going outside ☔ + it shows sunrise/sunset times, air quality and allergy information for supported regions. Currently only in Polish.

  • 💸 Script to be notified about currency rates & when to buy/sell them → forex-notifier

  • ⬇️ Playing around with auto-downloading videos from YouTube when a new video is published (using RSS reader) → rss-youtube-downloader

  • 📺 And downloading YouTube videos (or extracting music from them) in general — on demand or automatically by taking URL from clipboard or Pushbulletyoutube-downloader

  • 🗺️ Script to visualize my car trips on an interactive map → fuelio-trip-visualizer

  • ⚡ Control Tuya-compatible light bulbs & sockets in your smart home from Windows / macOS computer → control-tuya_smarthome

  • 📦 And a script to look for updates for my Kindle → kindle-updater

  • 💡 My browser extension (to motivate me) → Good Manager

Inactive projects

  • 🏠 Perhaps my second biggest project (using AI 😱) → flat-finder:

    Looking for a modern, nicely looking apartment but tired with all these old flats from the '90s? Let's automate that :)

    Scrape apartment offers from OLX․pl, analyse them using artificial intelligence (AI) model to get only good looking flats and (optionally) run IFTTT automation (eg. send email; add a to-do task) when new offers matching search criteria (eg. ≥ 2 rooms, ≤ 2k PLN, ≥ 30m²) are found.

  • 🚗 My Python project (to find a car) → otomoto_olx-scraper:

    Scrape car offers from OTOMOTO․pl & OLX․pl and run IFTTT automation (eg. send email; add a to-do task) when new car(s) matching search criteria is found. With support for native macOS & Windows 10 notifications.

  • 💬 Another Python project I've managed to build (to improve my vocabulary) → kindle-words:

    Do something useful with your Kindle notes :) This script extracts individual words from My Clippings file hidden on your Kindle e-reader, translates them using Google Translate and exports the pair "original word" → "translation" into a .txt file from which you can learn these words or import them into an application such as Quizlet.

  • 🚚 My first Python module available on PyPI (to give back to community) → win10toast-click:

    An easy-to-use Python library for displaying Windows 10 Toast Notifications. Improved version of win10toast and win10toast-persist to include callback_on_click to run a function on notification click, for example to open a URL.

  • 🐍 And yet another Python one (to save time) → eSkarbonka-update:

    Periodically check & get a native system notification if value was changed on a website. In this case, get value from my WOŚP's eSkarbonka.

  • 🤖 And an idea I had... → emails-from-pdfs:

    Extract email addresses from PDFs stored in multiple folders. For example: you can download candidates' CVs from your HR platform and this script will extract all email addresses found in the files.


Languages:

Top Langs

otomoto_olx-scraper's People

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

otomoto_olx-scraper's Issues

Does this still work ?

Hello, does it still work ? I keep getting

python3 otomoto2.py
Starting...
First run - no file exists.
File doesn't exist.
Folder created: 220626-152532
Page URL: https://is.gd/Eews0U
How many pages are there to crawl? 1
Page number: 1 / 1
Waiting for 2 seconds before opening URL...
<●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●> 2/2 [100%] in 2.2s (0.91/s) 
Opening page...
Scraping page...
[########################################] (!) 0 in 0.1s (0.00/s) 
Successfully added 0 cars to file.
Reading file to clean up...
Cleaning the file...
There are 0 cars in total.
File cleaned up. New lines added.
Keyword wasn't provided - not searching.
Variable not defined. Keyword wasn't provided.
Traceback (most recent call last):
  File "otomoto2.py", line 253, in <module>
    counter2
NameError: name 'counter2' is not defined

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "otomoto2.py", line 258, in <module>
    file_previous_run = open('output/' + previous_run_datetime + '/2-clean.txt', 'r') # 1st file 
NameError: name 'previous_run_datetime' is not defined

Exceptions occuring during run

Hello, i dont know where should i post it but it might help someone with similar issues like mine.

franz@Matrix:$(master)$ tree
.
├── LICENSE
├── README.md
├── automate
├── icons
│   ├── browser.ico
│   ├── car.ico
│   ├── car.png
│   ├── link.ico
│   └── www.ico
├── olx1
│   └── olx1.py
├── olx2
│   └── olx2.py
├── otomoto1
│   └── otomoto1.py
└── otomoto2
    ├── otomoto2.py
    └── output
        ├── 220701-140608
        ├── 220701-140621
        │   ├── 1-output.txt
        │   └── 2-clean.txt
        ├── 220701-140701
        │   ├── 1-output.txt
        │   └── 2-clean.txt
        ├── 220701-151153
        │   ├── 1-output.txt
        │   └── 2-clean.txt
        └── 220701-154142
            ├── 1-output.txt
            └── 2-clean.txt

12 directories, 19 files
franz@Matrix:$(master)$ cd otomoto2
franz@Matrix:$(master)$ python3 otomoto2.py 
Starting...
First run - no file exists.
File doesn't exist.
Folder created: 220701-154646
Page URL: https://is.gd/YLYNnK
How many pages are there to crawl? 1
Page number: 1 / 1
Waiting for 2 seconds before opening URL...
<●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●> 2/2 [100%] in 2.2s (0.92/s) 
Opening page...
Scraping page...
[########################################] (!) 14 in 0.0s (447.98/s) 
Successfully added 14 cars to file.
Reading file to clean up...
Cleaning the file...
There are 14 cars in total.
File cleaned up. New lines added.
Keyword wasn't provided - not searching.
Variable not defined. Keyword wasn't provided.
Traceback (most recent call last):
  File "otomoto2.py", line 253, in <module>
    counter2
NameError: name 'counter2' is not defined

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "otomoto2.py", line 258, in <module>
    file_previous_run = open('output/' + previous_run_datetime + '/2-clean.txt', 'r') # 1st file 
NameError: name 'previous_run_datetime' is not defined

So i figured out that on linux (im using WSL) you have to manually create folder "output" and "data" and then it works fine after second run:

Starting...
Previous run: 220701-155140
This run: 220701-155211
Folder created: 220701-155211
Page URL: https://is.gd/YLYNnK
How many pages are there to crawl? 1
Page number: 1 / 1
Waiting for 2 seconds before opening URL...
<●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●> 2/2 [100%] in 2.2s (0.91/s) 
Opening page...
Scraping page...
[########################################] (!) 14 in 0.0s (473.69/s) 
Successfully added 14 cars to file.
Reading file to clean up...
Cleaning the file...
There are 14 cars in total.
File cleaned up. New lines added.
Keyword wasn't provided - not searching.
Variable not defined. Keyword wasn't provided.
Files are the same.
Script run time: 3.71 seconds.

OLX issue

I am unable to compile olx1.py, even though bs4, lxml and html5lib are installed. Otomoto scraping works well, but OLX isn't scraping at all.

~/otomoto_olx-scraper/olx1 $ pip3 install lxml
Looking in indexes: https://pypi.org/simple, https://www.piwheels.org/simple
Requirement already satisfied: lxml in /home/pi/.local/lib/python3.9/site-packages (4.9.1)

~/otomoto_olx-scraper/olx1 $ python3 olx1.py
Starting...
Previous run: 221101-155614
This run: 221101-155812
Folder created: 221101-155812
Page URL: https://is.gd/OhXxPQ
First run - no file exists.
How many pages are there to crawl? 1
Page number: 1 / 1
Waiting for 2 seconds before opening URL...
<●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●> 2/2 [100%] in 2.3s (0.88/s)
Opening page...
Scraping page...
Traceback (most recent call last):
File "/home/pi/otomoto_olx-scraper/olx1/olx1.py", line 165, in
pullData(full_page_url) # throw URL to function
File "/home/pi/otomoto_olx-scraper/olx1/olx1.py", line 118, in pullData
soup = BeautifulSoup(page, features="lxml") # get URL into BS # *NOTE: v: olx
File "/home/pi/.local/lib/python3.9/site-packages/bs4/init.py", line 248, in init
raise FeatureNotFound(
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.