Git Product home page Git Product logo

bismol's People

Contributors

austinpgraham avatar cegme avatar jaredbond avatar joshimbriani avatar ninacesare avatar smalgireddy avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

beartell

bismol's Issues

Fix interfaces

With the jobandworker branch merge, all interface fields previously called 'url' should be changed to 'id'

Collecting tweets over the continent of Africa

We can collect tweets for the continent of Africa but we need to make a few decisions:

  • What are the Languages that we will accept? English only?
  • What bounding boxe(s) are considered Africa?
  • Should we include (0°, 0°)?
  • Should we use multiple ingestors for different regions?

Thanks!

Checking for unknown file

When running on local system, I encounter this error:
FileNotFoundError: [Errno 2] No such file or directory: '/data/tweetsdb/tweet_health_20161231151722.json'
Shouldn't be checking for this file?

Verbs and exclusion terms

We may want to link the following verbs to the following exclusion terms in the exercise data collection queries:

Run/running/ran (-wild, -late, -errands, -water)
Hike/hiking/hiked (-rent, -wage, -fare)
Yoga (-pants)
Pool (-union, -uber)
walk/walking/walk (-dead, into in, ins)
surf/surfing/surfed (-internet, -web, -crowd, - channel)

I think that should capture most of the more ambiguous terms without returning a lot of irrelevant data!

How does a user distinguish points of interest

Point of interest are ambiguous or ill-defined points which the algorithm cannot cleanly cluster. Humans can help cluster these points to increase the total clustering time.
To implement this we can do the following.

  • Keep track to the distance traveled for each point during the clustering process.
  • Identity the furthest and least traveled points. We can use the top and bottom fixed number, percentage, or quartile.
  • In addition to moving points, we can allow the user to (1) scatter or (2) accept the current position. Scatter randomizes the position allowing the clustering process to reset and a accepts stops the movement of a particular datapoint.

The scatter and accept functions can lead us into the beginning of streaming data.

Link commits related to this point by adding the issue number to the commit message.

Particles are moving quickly

The particles in the t-SNE animation are moving quickly and we need to understand a little more about the movement.

  • Show visual trails of the particles moving in t-sne.
  • Measure and possible show the delay between changes and updates

Design Experiment for Users

In order to understand user perceptions of assisting with the clustering process we will need to design an experiment. This experiment will test the ability of humans to assist the clustering process and allow us to understand if humans are helping and how much they think they are helping.

  • We will also need to design a small training module to get users used to the clustering method.

Remove large objects from the repo

We committed some very large objects in the repo. I believe they are the rethink db logs. We can remove them using something like git filter-branch --tree-filter 'rm filename' HEAD where filename is the name of the large file. But your should double check to make sure this command does this (and doesn't delete the whole repo).
Also, it is a good practice to git add filename files one by one so you don't include any extra log or tmp files.

Support methods of highlighting and gestering

If we want to select multiple

  • If you select an item, temporarily grey out everything else so we can see its new movement.
  • When clicking and holding, the lengths that the screen is held is the increase in the capture radius of items.
  • Show comment trails so we can see the direction of the particles
  • Add a gesture for multi select and un-selecting

Others?

Define functional relation between Job and worker

Worker should manage:

  • input and output of data
  • model training with job
  • classification with job
  • other functions as needed (training cycles, ect)

And Job should provide functions:

  • training(**kwargs) where kwargs is a dictionary of named arguments and values
  • run classifier(iterator, stop_condition) where the iterator is "live", i.e. can be modified and added to while its being iterated over. It should be message objects and stop condition can be changed in real time by the worker. This should return another iterator of classified message objects.

Create a function for animation speed

Measure time between tsne running on server and client updates with different data sizes in order to approximate a function for the animation speed.

Dokerized tweet collector and tweet store

To start fetching tweets on the UW server we need to dockerize the process. That is, we need a PostgreSQL docker setup and also another container to pull the tweets. Below are the items that need to be developed to get this system running.

  • Allow the PostgreSQL docker server to use external volumes that are persistent.
  • Create a backup system for the tweet database.
  • Implement public networking between PostgreSQL and any other web host or docker
  • Upload the customized PostgreSQL Docker service to docker hub.
  • Create docker service for tweet collector.
  • Upload docker server for tweet collector to docker hub.
  • Update passwords and security

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.