Git Product home page Git Product logo

crawlee-puppeteer-domain.com.au's Introduction

Web Scraping Project (domain.com.au)

GitHub last commit

Welcome to the Domain.com.au (Web Scraping Project) repository! This project demonstrates web scraping techniques to extract data from the domain.com.au website, a platform for real estate listings in Australia. The scraped data can be used for various purposes, such as analysis, research, or data visualization, within the bounds of the website's terms of use.

Features

  • Utilizes Node.js and libraries like Crawlee and npm packages such as puppeteer-extra, puppeteer-extra-plugin-stealth, csv-writer.
  • Scrapes property listings, prices, details, and other information, take a look at csv-headers files for more scraped information.

Run Locally

Follow these steps to set up and run the scraping script locally:

  1. Clone this repository: git clone https://github.com/anujbarochia/crawlee-puppeteer-domain.com.au
  2. Navigate to the project directory: cd crawlee-puppeteer-domain.com.au
  3. Install the required Node.js packages: npm install
  4. Create .env file in your project directory, you will need to enter 3 variables that are being used in this project (if you wish you can change the way program uses it & apply your own logic)
PROXY =""
MANUAL_CAPTCHA = "0"
CRAWLEE_CHROME_EXECUTABLE_PATH = ""
PROXY => This variable takes takes a url from where you are accessing your pool of new proxy address when the website blocks your IP, crawlee will automatically take new proxy address from this pool
MANUAL_CAPTCHA => Refer to index.js and have a look at the code for better understanding on how it is being used.
CRAWLEE_CHROME_EXECUTABLE_PATH => By default our process would use chromium as a browser but here we are explicitly defining Chrome to be used as the browser, change the path address as per the location in your system, to find executable path type `chrome://version` in the search bar of chrome-in there you will be able to see a variable defined as Executable Path.
  1. Run the scraping script: node index.js
  2. Enjoy and feel free to make changes as per your requirement.

Contributing

Contributions and suggestions are encouraged! If you encounter issues, have enhancement ideas, or wish to contribute, feel free to open an issue or submit a pull request.

Warning

Please exercise caution and adhere to domain.com.au's terms of use and scraping guidelines when using this script.

Disclaimer: This project is intended for educational and personal use only. The author and contributors are not responsible for any misuse of the scraping code or the data collected.

Note: Web scraping activities should always be carried out responsibly, respecting the website's terms of use and applicable laws and regulations.

crawlee-puppeteer-domain.com.au's People

Contributors

anujbarochia avatar

Watchers

 avatar

Forkers

showyilu

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.