Git Product home page Git Product logo

flywhy's Introduction

๐Ÿ‘‹ I'm mikelor

IT leader by day, Malformed URI by night. This is my public facing repository, where I work on personal projects, or learn new things. I'm currently combining my development skills with my passion for travel.

flywhy's People

Contributors

dependabot[bot] avatar mikelor avatar ssweens avatar

Watchers

 avatar

Forkers

ssweens

flywhy's Issues

Move getReviews.py to src/data folder and write output file to /data/raw folder

To conform with the Data Science template, we should move the getReviews.py file to the src/data folder. This folder is used for populating the data/* folders.

In this case the output file from this should go into the data/raw folder.

Acceptance

  • getReviews.py resides in /src/data folder
  • outputs data file to /data/raw folder

Add Ability To Start Run at Specific Review

By default the application starts at the main review page. However, due to Internet connection and other issues, it's very difficult to get the process to run to completion on all reviews successfully.

A way to make this easier is to start a process at a particular review and continue processing.

In addition, the process should log that last Url reviewed so that the next one in line can be easily found.

Remove Unused/Deprecated Files

When initially creating this project, I included some sample files to help me get started, or try different ideas. These are no longer valid and should be removed.

scrappy.py

Investigate Tox for Automated Testing

The CookieCutter Data Science template sets up a Tox.ini file. Tox aims to automate and standardize testing in Python. It is part of a larger vision of easing the packaging, testing and release process of Python software.

This looks like it could be valuable, if it is, setup Tox, if not delete the Tox.ini file

Parse Reviewer.Id to remove Pre & Post Txt

The Reviewer.Id field is a concatenation of UID_xxx_ReviewId. See sample below

  • UID_A455850D086316E0157BE50C4EB2115E-SRC_773635392

We need to determine what TA uses for the actual user id, and only store that portion in the output file.

Add DrawIo Diagram illustrating current solution

Draw.io aka https://app.diagrams.net/ is a great tool for creating diagrams. Because it stores it's files as .xml it is very version control friendly. There is also a Visual Code plugin that makes it easy to display diagrams while your working on code.

Add a diagram illustrating the screen scraping flow.

Not required, but issue #4 should be done before working on this to avoid need to move another file around.

Stream Output to CSV File Instead of All at Once

Write now the getReviews.py file loads up the entire review structure before writing out the file. Instead we should look at stream the reviews out to the file a page at a time. This should reduce the memory footprint needed for the python application.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.