planrich / tu_aic_13 Goto Github PK
View Code? Open in Web Editor NEWLicense: GNU General Public License v2.0
License: GNU General Public License v2.0
use taps or so to display the graphs (not all of them on one page)
like google analytics
don't run it every minute, an hour or so is enough
list the open tasks with paginagtion
write a small selenium tests project, where we can simulate a lot of wokers solving tasks
Both applications (main, crowd) should be deployable in a simple, mostly automated manner, preferably into a stable, productive state. Not only for the sake of making development easier but also for the TA to be able to easily get the application running locally.
This means optionally pre-populating the database with known-good data, using Apache and mod_wsgi instead of the built-in application server, not relying on Cron, and logging properly. It also means the applications should be deployable outside and independently of OpenShift and work just as well (or better?).
Issue #17 is related to this, as are issues #23 and #24.
I'd start with making the applications deployable using Vagrant, so that anyone interested can just do a vagrant up
and a few minutes later have a working deployment of our main and crowd applications running in two small virtual machines.
highlight searched keyword in the solve_task page
maybe in javascript
change the crowd source application that then clicking 'solve a random task' it really
takes a random task not the first.
Instead of Cron the applications should use Advanced Python Scheduler to run periodic jobs like the scraper or garbage collector. This offers greater control over running job instances, makes deployment easier (no external components) and allows for better cross-platform compatibility.
It would be nice to keep track of the date from when we are analyzing some keyword and also from when a worker is in our system. I suggest we add a datetime column to the worker and keywords tables.
The open_tasks list on crowd app crashes when you want to show all open tasks(more than 100000) at once. We should only show a limited number of open_tasks in one page and allow scrolling through this pages.
we should provide a Button in the list_tasks site, where we can solve the task directly
Reason: now we have the problem, that we always get random numbers and we can't finish the rating, because of the amount of data we get all the time.
so a simple Button, called "Solve" with a link to the ID will be enough.
Based on Richards approach on marking already solved tasks in the session, we should not display these tasks in the list_tasks site (I don't know if we already implemented it)
do research on the crowd sourcing topic
produce a paper
"Start with the updates to your crowdsourcing model. Firstly, think about how you can measure the quality of your human workers. Once you have a metric for this, blocking notoriously bad workers is trivial over the MobileWorks API"
define the metrics! implement it!
this is an important task for the 28th of november.
it needs to be accurate, interesting presentation that
is split into 3 parts:
"Afterwards design your dynamic pricing solution. Think about what your general approach is going to be, and implement it."
This is an optional feature. I would still recommend to implement it.
is there a presentation yet? If not it would be a good time to create one.
Also i think we should focus on presenting the tool well. ( @KonradMSteiner already started to do cleanup and add data )
do we meet before the presentation? I think that would not be a bad idea, but only if there is already a presentation + concept to present tool.
Is there someone to coordinate this?
load the statistics from the local database and display them
currently the data is just random
f.e. http://main-tuaic13.rhcloud.com/query/Apple?d=1
apple should also be possible
"Add a “garbage collection” mechanism that tracks which tasks are never being picked up, and removes them."
Define what is the resonable decision here and implement it.
or at least delete the keyword
use \b in python
http://docs.python.org/2/library/re.html
fill the keywords in the dbs on startup
one cannot solve a task on http://crowd-tuaic13.rhcloud.com/solve_task?r=t
Internal error 4! Could not finish task. Here is a new one!
As far as I remember this is due to the fact that the main application does not take the answer!
"Finally, build a simple Web interface for your whole service. The Web interface should be both, a management interface for inspecting the sentiment data and the quality of your workers, as well as a GUI for issuing queries to your service and visualize the results (e.g., via diagrams or plots). You may use your choice of platform for building this Web interface."
we should limit the text, so that we don't see the whole text (which is quite long sometimes)
It seems no proper logging facilities are utilized to output informational or debug messages. Instead we use quite a few print
statements throughout.
Python offers a standard logging module, which we should use. Of interest: http://victorlin.me/posts/2012/08/good-logging-practice-in-python/
The tutor should be able to setup our system easily
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.