Git Product home page Git Product logo

docker-redis-cicd's Introduction

Project Title

Description

This project was created with the initial purpose of learning to scrape 'messy' data and clean it through a pipeline of functions automatically. Some features may not be implemented perfectly, or be missing entirely. Clean, functional, decoupled code is the main purpose of this project, along with learning how to implement traditional relational databases and NoSQL databases.

Features

  • Scrapes all of the listings for search term
  • Cleans data for analysis
  • Stores data in database
  • Event logging
  • NLP of descriptions and amenities
  • Machine learning and visualization of price influencers.

Future Features

  • Load different formats into database
  • MongoDB integration for articles
  • Visualize Data
  • Machine Learning algorithms to find key price predictors.
  • Options Run on Command Line
  • Web Interface with more options

File Descriptions

trulia_scrape.py - this file can be run from the command line and will automatically scrape apartment data for the Austin area and save it to a CSV file in the daily_scrape_files folder.

How To Use

# Clone this repository
$ git clone https://github.com/datapointchris/etl_housing

# Go into the repository
$ cd etl_housing

# Run the app
$ python scraper.py

Program will begin scraping Trulia for rentals. Currently only Austin rentals have been tested. Other cities and search terms will be available in future versions.

Jupyter Notebooks are also included in the repo where you can run the program and change the page_url to scrape different cities.

Requirements

You really should only need to install BeautifulSoup if you don't have it. Everything else should be part of the standard library.

  • Numpy
  • Pandas
  • Requests
  • BeautifulSoup
  • SQLite3

Credits

License

MIT

How To

Redis Docker

# Run Redis Docker
docker run -d --name redis-server -p 6379:6379 redis
# Set up network
docker network create page-tracker-network
# Run Redis Docker on network
docker network connect page-tracker-network redis-server
# Run Redis CLI
docker run -it --network page-tracker-network --rm redis redis-cli -h redis-server

docker-redis-cicd's People

Contributors

datapointchris avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.