oudalab / bismol Goto Github PK
View Code? Open in Web Editor NEWThe Bismol classification system prototype
License: Other
The Bismol classification system prototype
License: Other
With the jobandworker branch merge, all interface fields previously called 'url' should be changed to 'id'
Message object will need a list of tags to be modified for learning/training/classification
USA/Africa collectors are fine, however the illness and exercise collectors are getting 420s from Twitter.
We can collect tweets for the continent of Africa but we need to make a few decisions:
Thanks!
When running on local system, I encounter this error:
FileNotFoundError: [Errno 2] No such file or directory: '/data/tweetsdb/tweet_health_20161231151722.json'
Shouldn't be checking for this file?
We may want to link the following verbs to the following exclusion terms in the exercise data collection queries:
Run/running/ran (-wild, -late, -errands, -water)
Hike/hiking/hiked (-rent, -wage, -fare)
Yoga (-pants)
Pool (-union, -uber)
walk/walking/walk (-dead, into in, ins)
surf/surfing/surfed (-internet, -web, -crowd, - channel)
I think that should capture most of the more ambiguous terms without returning a lot of irrelevant data!
Point of interest are ambiguous or ill-defined points which the algorithm cannot cleanly cluster. Humans can help cluster these points to increase the total clustering time.
To implement this we can do the following.
The scatter and accept functions can lead us into the beginning of streaming data.
Link commits related to this point by adding the issue number to the commit message.
The particles in the t-SNE animation are moving quickly and we need to understand a little more about the movement.
In order to understand user perceptions of assisting with the clustering process we will need to design an experiment. This experiment will test the ability of humans to assist the clustering process and allow us to understand if humans are helping and how much they think they are helping.
We want to make bismol a package that can be created and installed with a pip or easy install.
Here is a guide for achieving this: https://python-packaging.readthedocs.org/en/latest/
We committed some very large objects in the repo. I believe they are the rethink db logs. We can remove them using something like git filter-branch --tree-filter 'rm filename' HEAD
where filename
is the name of the large file. But your should double check to make sure this command does this (and doesn't delete the whole repo).
Also, it is a good practice to git add filename
files one by one so you don't include any extra log or tmp files.
If we want to select multiple
Others?
Worker should manage:
And Job should provide functions:
training(**kwargs)
where kwargs
is a dictionary of named arguments and valuesrun classifier(iterator, stop_condition)
where the iterator
is "live", i.e. can be modified and added to while its being iterated over. It should be message objects and stop condition can be changed in real time by the worker. This should return another iterator of classified message objects.I found that the issue in the labeler pulling the same tweet is that our database contains tweets with the same text but different id's, sometimes upwards of 20-30 times.
The first collumn of the neel data is actually an id. Tweets also have a unique id, see:
http://tkang.blogspot.com/2011/01/tweepy-twitter-api-status-object.html
so maybe it would be a good idea to have that in the message object? Idk, just an idea.
In the code there are few plots, after execution of the file there were no plots created without throwing any error.
Measure time between tsne running on server and client updates with different data sizes in order to approximate a function for the animation speed.
To start fetching tweets on the UW server we need to dockerize the process. That is, we need a PostgreSQL docker setup and also another container to pull the tweets. Below are the items that need to be developed to get this system running.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.