Git Product home page Git Product logo

csc-869-mlog's People

csc-869-mlog's Issues

Change Loaders so they will ignore .svn folder

Change SentenceBasedTextDirectoryLoader and TextDirectoryLoader to ignore .svn 
directory and contained files

Original issue reported on code.google.com by markus.neubrand on 5 Apr 2011 at 11:50

Play around with different settings of the StringToWordVector

Play around with different settings of the StringToWordVector:

- Different tokenizer
- Different stemmer
- Different stopword list or no stopword list
- Different min. word freq.
- Different num. of words to keep
- Different pruning
...

Document how each of those things influence the results.

Original issue reported on code.google.com by markus.neubrand on 5 Apr 2011 at 11:58

Find twitter accounts of non-politicians which political opinions are known

Best would be to have some normal people => anybody personally knows people who 
knows twitter and would be willing to share if they are rep./democrat?

Other possibility: Check out organizations whose members are strongly 
affiliated with either republican or democrat (e.g. NRA, ...)

Original issue reported on code.google.com by markus.neubrand on 6 Apr 2011 at 12:03

Implement different weighing schemes

Implement different weighing schemes and try out the ones implemented in 
StringToWordVector. E.g.:

- In-party-frequency / Total-frequency
- Higher/Lower weight of special attributes like #hashtag or @user
...

Original issue reported on code.google.com by markus.neubrand on 6 Apr 2011 at 12:00

Exclude attributes to trace author from classification

Exclude attributes added by the SentenceBasedTextDirectoryLoader.java like 
democrats_03152011/JamesMcDermott or republicans_03152011/MarioDiaz-Balart from 
the classification. Optionally just disable the generation of them as they're 
not used at the moment anyways

Original issue reported on code.google.com by markus.neubrand on 5 Apr 2011 at 11:53

Add new classifier

As Lorenzo already did with J48 add new classifier to the app (SVM, kNN, ...) 
and document the results

Original issue reported on code.google.com by markus.neubrand on 5 Apr 2011 at 11:59

Fix dependent cross-validation

From Orens email:

one quick question, in the 10-fold cross validation, did we make sure
that there are no shared people between two sections? i mean, if we
just divide into 10 sections according to tweets, then we may have
tweet1 and tweet2 of the same congressmanA in two different sections.
in this case, we may get a good result in the cross validation simply
because the classifier can find similarity between tweets of
congressmanA (e.g. if tweet1 is in the verification section, and
tweet2 is in one of the 9 training sections, it may simply learn that
tweet1 and tweet2 are similar in language and we'll get a good
misleading score...).

This needs to be fixed by custom code to distribute.

Original issue reported on code.google.com by markus.neubrand on 5 Apr 2011 at 11:55

Implement parameter to limit input

Some classifier have an unacceptable runtime on the whole data set for fast 
development (e.g. J48 = 12h on Lorenzos machine) => Implement a command-line 
argument which cuts down the input size to speed up development

Original issue reported on code.google.com by markus.neubrand on 6 Apr 2011 at 12:04

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.