Git Product home page Git Product logo

groceries's Introduction

Groceries

The goal of this is to scrape different websites grocery prices It then will calculate things such as price per ounce. And then store them over time in a database and allow queries

Installation instructions

  • Clone the repository
  • cd to Groceries
  • Run "startcore.sh" to start the core containers
  • Create a database called "groceries" in adminer (port 8081)
  • Run "scripts/init_db.sh" to run the flask commands to create the structure of the database
  • Now you can spin up the various crawlers by starting start scripts in main directory
  • You can stop the various crawler scripts using the drop scripts in the main directory

Database updates (after the database groceries is initialized)

Updates the the data structure are performed via changing the models.py class in flask/code/app Use the helper script -- scripts/update_db.sh -- to upgrade your db using the flask db commands

groceries's People

Contributors

gobfink avatar kgustafson avatar alexkayser avatar akayser2 avatar

Watchers

 avatar  avatar  avatar

groceries's Issues

Add urls to each item

We should add a link to the page where each item is found, that way we can see more details about it.

Switch to High Performance Database

We want a database that can support the ingestion of massive amounts of information. Cassandra seems like a good option for this task.

Initial success will look like being able to spin up a container running an instance of Cassandra.

Add feedback to show user what is being searched and sorted

Add feedback and code to show the user what is being searched and sorted. Should be human readable such as: Showing all groceries, Showing groceries sorted by Grocery Name, Descending, Where Store Name like 'weg' and Grocery Name like 'tuna'.

Set up some automated tests

It would be great if we had some sort of test framework that would help us debug problems with the backend / frontend scrapers etc

Make price per unit comparable

Getting different formats for the price per units on the websites. This is making them hard to compare and see which is the better deal

Add urls to the frontend

It would be nice to have a url (or button you can click) to take you to the item when looking through the frontend

Look into elasticsearch

Was wondering if we could use elasticsearch to improve our searches. I think it would be good to spend a few cycles to at least check it out or see if it makes sense for us

Figure out why walmart is blocking us

Scraper for Walmart is getting redirected to walmart.com/blocked* when using docker-compose up walmart-urls However is seems like running it inside of

[dev] $ docker-compose up
[walmart] $ scrapy runspider urlScraper

Works just fine - so we need to figure that out

Create a way to ingest data from a webpage

Need to follow the robots.txt for the webpage and ingest data from their website.

I think an easy first step would be to manually download some html and prototype a scraper.
Then move that into a chrome extension.
Then work on automating it - paying careful consideration to their robots.txt file

Track section quantity

Some stores report the quantity for each section. That seems like some useful data to gather to know how effective the scraper is doing

Create Pagination System

Only show 25 or 50 items and have a paging system to go through all the pages of information for groceries

Make sure they can run concurrently

Will need to shift how we do the splash middleware so that different ones can be setup to run independently without stepping on eachother

Implement authors

We need a way for user's to sign in and perform user management in the database

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.