Git Product home page Git Product logo

ovodu-parser's Introduction

Scraper for fazwaz.com and thailand-property.com

How to Use:

AWS Console:

  • Log in to the AWS EC2 console.
  • Click on "Instances" and select the instance named "ScraperRealty"
  • In the top right, click the "Connect" button.
  • In the EC2 Instance Connect dialog, click "Connect"
  • Ensure the following parameters are filled by default, if not:
  • Connection type: Connect using EC2 Instance Connect
  • Username: ubuntu

Linux Terminal:

  • Open your terminal (commands are similar for Windows).
  • Run the following command: ssh -i "scraperKey.pem" [email protected] (Make sure the scraperKey.pem is in the same folder, or provide the absolute path.)

Running the Scraper on AWS:

  • Change directory to the project folder: cd scraper
  • Activate the virtual environment for Python: source env/bin/activate
  • Run the main script: python main.py

Additional Notes for Modifications:

  • For code comments in complex sections, refer to comments marked in the code.
  • main.py imports all parsers.
  • base.py contains classes and constants for parsing.
  • scraper_fazwaz.py and scraper_thailandproperty.py are the actual parsers.
  • watermark_resolver.py handles watermark removal (a simple version; editing may require modifying the watermark or its mask)

Deploying the Application Elsewhere:

  • Ensure Python is installed (preferably version 3.11)
  • Create a virtual environment: python -m venv env
  • Activate the virtual environment: source env/bin/activate
  • Install required dependencies: pip install -r requirements.txt
  • Configure AWS: aws configure
  • Run the main script: python main.py

Note: The application should work with both earlier and future versions of Python, though conflicts may arise in extreme cases.

Feel free to reach out for further assistance or enhancements!

ovodu-parser's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.