Git Product home page Git Product logo

olx-phone-number-scraper's Introduction

OLX Phone Number Scraper

Scrapes phone numbers from OLX listings.

Usage

NOTE: rewrite this when final version is done, this here is just to guide the design of the program, to make it intuitive.

olxscrape <url_of_listings> <page_limit>* <phone_number_limit>* <output_file_name>

The tools should write a phone number to the file as soon as it's scraped, so that
it can be safely stopped with Ctrl^C at any time.
join <file_1> ... <file_n> <output_file_name>

Joins two or more files into a single one, removing duplicates.

prefix_phone <input_file_name> <prefix> <output_file_name>

Prefixes the given string to all of the phone nubmers in the list. Useful if you
want to prefix all with a country code.

Features marked with * might only be considered for beyond-base-version of the tool.

About

This tool can be easily altered to scrap e-mails and any other useful information as well. This was written as a need for a specific project and is only designed to support this functionality. It's a result of reverse-engineering necessary to fulfill the purpose of scraping phone numbers from listings, given a url of to a list of those listings. Some command-line tools are included.

A minimal test set is included. You can use this as a basis for a tool with similar purpose.

I have a lack of time at the moment and quickly hacked this in. When reading the code, please keep this in mind.

olx-phone-number-scraper's People

Contributors

iluxonchik avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

olx-phone-number-scraper's Issues

Parse multiple phone numbers

Sometimes getting the phone number returns multiple ones, for example:

<spanclass="block">123456</span><spanclass="block">234567</span>
<spanclass="block">345678</span>

The tool does not deal with that at the moment.

They do not work with the Polish Olx

The results file 1651345101.txt is empty, an additional last_page.txt file is created with the message 'To continue using OLX, go to your browser settings and update to the latest version.'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.