Git Product home page Git Product logo

gemstat's People

Contributors

bryan-lunt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

gemstat's Issues

Prediction Scaling

Once a model is created, later predictions using that model do not create output curves at the same scale.

Copious IO inlined in main fuction.

There is a lot of IO inlined in the main function that could be moved out to an IO.cpp / IO.h file.

The various parsers implemented should be DRY, and should also be absolutely pedantic. (Excess data afterward in a file should cause an error!)

Problem in Corr

The correlation coefficient objective function gives 'nan' error probably because of division by zero error.

Bryan came up with these changes quickly:

Tools.cpp - line 994 : double corr_xy = cov_xy / sqrt( x_var * y_var + 0.000001 );
ObjFunc.cpp - line 36 : totalSim += abs( corr( prediction[i], ground_truth[i] ) );

-et option (default factor threshold) and Hassan's factor thr file are (effectively) ignored.

Users should resolve by vote which order these get resolved in.

Currently, the factor thresholds will be input if they are provided in the .par file, but the other input locations are, in effect, ignored.

So we need to decide which order to resolve these in.

My preference is :
-ft (The factor threshold file) overwrites the input from
-et (The overall default factor threashold) which overwrites the input from
the .par file

The reason this is effectively ignored is that from neither of the first sources does it make its way into the parameters that get fed into the ExprPredictor (They do get used for the initial annotation.).
Unfortunately ExprPredictor re-annotates every sequence each time it's called on to make a prediction.
(That's fine if you want to learn the annotation threshold, but because it never got the values that were specified earlier, it means it starts at the default.)

Incorrect sine-based infty_transform

The current infty_transform seems to think that SINE has domain [0,+1], but it is really [-1, +1].
Because the inverse_infty_transform is the correct inverse of the bad infty_transform, it somehow still worked.

Fix the ExprPar::load function

Modify the ExprPar::load function to be more defensive. Currently it silently corrupts things on bad input.
We Probably need to define a new file format altogether, really.

Memory Leak

There is a big memory leak in ExprPredictor::evalObjective( const ExprPar& par ), but it is easily fixed.

Bad exception handling in gradient_minimize .

gradient_minimize will partially roll back optimization if there is some kind of exception.
This means even if the previous parameter vector was acceptable and good.
When the exception happens, it does not save the objective function value that corresponds to the parameter vector that it does save (whatever that is.) It saves that value from the parameter vector that it didn't like.

This issue is multifold :

  1. The best parameter vector seen is not saved.
  2. The objective value reported is not that of the parameter vector that was saved.

IND Gemstat is sensitive to the order of TFs, (and who knows what else?)

In
ExprFunc::predictExpr(...) (ExprPredictor.cpp)
we see at least one place where the code assumes that 'cic' will be factor number 2 (0-index, so the third in the factor_expression file.)

This silently and secretly requires the user to have their transcription factors in identical order to what was used in the paper.

Additional Parameters

There should be a facility in the ExprPar and ParFactory to hold more additional parameters, more easily. Perhaps a first step would be making the functions that convert between a std::vector<double> and an ExprPar virtual, so that it is easier to overwrite and add to them.

My suggestion is that each TF be allowed to have any number of parameters, and instead of storing them in vectors per parameter (column oriented) store them with the description of the TF (row oriented). When an ExprFunc is created, it can reformat them for its own use as it likes anyway.

Likewise, each promoter/enhancer should have its own parameters, with some facility for storing global parameters.

Similarly, in the ExprModel, we should have a more general facility for storing indicators, rather than the current ExprModel::actIndicators and ExprModel::repIndicators.

pi parameter

Pi parameter is not getting used in the code, it needs to be fixed inside the code as a temporary solution.
In long term, we would like to remove pi.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.