Git Product home page Git Product logo

cobrexa.jl's People

Contributors

bertonoronha avatar cylon-x avatar exaexa avatar github-actions[bot] avatar hettiec avatar htpusa avatar josepereiro avatar laurentheirendt avatar marvinvanaalst avatar stelmo avatar syarra avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

cobrexa.jl's Issues

Implement function callbacks to make function modifications uniform and streamlined

@exaexa had a great idea of using callbacks as function arguments to modify the function applied to a model. E.g. using user defined bounds on some reactions in FBA etc. In fba(...), fva(...) and pfba(...) I have a bunch of keyword arguments that are largely repeats and should all be wrapped in some way. This is likely a feature the average user will end up using often I think.

Some SBML formats (level 3?) are not read correctly

For example:

download("https://www.vmh.life/files/reconstructions/AGORA/1.03/reconstructions/sbml/Abiotrophia_defectiva_ATCC_49176.xml", "testModel.xml");
model = readSBML("testModel.xml");

findall(getOCs(model)!.=0)
# all zeros
getLBs(model)
# -Inf when should be -1000

Automatic file loading

I still don't like the automatic file loading thing because it introduce alphabetic ordering issues with package loading, e.g. I have to append "a" in front of reaction.jl to make sure it loads before cobraModel.jl... I think it solves a slight inconvenience of listing all the packages but introduces a bigger inconvenience with having to have creative file names (less descriptive names)...

So far this only seems to affect me, but what do ya'll think?

SBMLModel should just be CobraModel

SBML.jl imports the dense version of the model on file, we should only have one struct that stores this information (avoid clutter). I think this should be CobraModel since model construction will likely happen in it... ?

Community model + tutorial

  • load a heap of whatever models
  • have the model structure re-ID them correctly and add exchange reactions

Implement macros for all functions (where appropriate)

Let's make macros to really make the user interface clean. I have implemented fba macros in #53 (see the last few commits) and @marvinvanaalst has implemented a macro for reaction adding in #59 . @exaexa and I were also talking about this extensively on slack, it would be really cool to have something like this (from slack @exaexa):

vs = @variants
  knockout(123)
  knockout(4345)
  ...
end

@mod_variants! vs
  remove_reaction(123)
end

@combine_variants! vs
  no_modification()
  add_random_ATP()
  add_some_toxin()
end

Currently I have an fba mini version of this working as shown below:

using COBREXA
using Tulip

model = read_model(joinpath("e_coli_core.json"))
biomass = findfirst(model.reactions, "BIOMASS_Ecoli_core_w_GAM")
glucose = findfirst(model.reactions, "EX_glc__D_e")

vec = @flux_balance_analysis_vec model Tulip.Optimizer begin
    modify_objective(biomass)
    modify_constraint(glucose, -8.0, -8.0)
end

Overview issue: documentation structure

The docs are more like a collection of ideas now, we should give it a clear tutorial structure.

My idea for the structure is as follows:

  • Introduction, what does COBREXA solve and what it does not solve
  • Totally primitive example ((down)load 1 model and do 1 FBA)
  • Section with tutorials
    • Model loading and conversion
    • Running FBA, FVA and sampling on small models
    • Running FVA on HPCs
    • Modifying and saving the models (Serialize, "export" through conversion to JSON/MAT and saving.)
  • Section with advanced tutorials
    • Using modifications to scan for many model properties at once (this still needs to be implemented, but it's hopefully 50LOC :D )
    • Using a custom model structure
    • Writing own reconstructions and modifications and running them on HPC
    • Using the extra information in StdModel to screen through many model variants, e.g. knockouts or something (I guess some of the original tutorials may sink in here)
  • Examples (aka backing notebooks)
    • Loading and saving
    • Simple FBA and FVA
    • Sampling and seeing the results
    • Parallel FVA
    • Knockouts
    • Custom models
  • Function documentation (REFERENCE, this should correspond to the structure in src/ as much as possible. I'd still separate it into some roughly consistent functions ordered by kinda bottom-up structure)
    • Types
    • IO
    • Analysis functions
    • Sampling
    • Modifications and reconstruction functions
    • Utils+misc

Feel free to edit/suggest.

Fix samplers and create good tests for them

Currently the samplers are not super robust and the testing leaves much to be desired.

  1. Fix ACHR
  2. Add better tests
  3. Add projections to ensure robust sampling in case the samplers go out of bounds

Implement knockouts in an efficient way

We should definitely have some way of doing knockouts on models. This makes the most sense for StandardModel. Currently the plan is to add a field to StandardModel:

mutable struct StandardModel <: MetabolicModel
    id::String
    reactions::Array{Reaction,1}
    metabolites::Array{Metabolite,1}
    genes::Array{Gene,1}
    gene_reaction::Dict{Gene, Array{Reaction,1}}
end

This should make looking up reactions affected by deletions quicker than loopings over all reactions for each gene.

Then a new function,

knockout_modification = knockout(model, gene1, gene2, ..., geneN)

needs to made that will intelligently create a callback that can be passed to the modifications argument in the analysis functions to actually do the knockout. WIP

Clean-up downloading of test models

  • check if the file exists before downloading
  • always check against a hash and print an error if the hashes do not match, so that we can quickly spot that something fishy happened to the models
  • preferably wrap the Download.download in something that does all this automagically

Homogenize tests

After #92 the package structure will change somewhat. Let's fix the test layout to match the src directory structure better, and perhaps implement better tests (@stelmo I am looking at you)

Regroup tests to files that match src/

Lots of tests should be homogenized e.g. test/io/io_test.jl tests read write of StandardModel and test/io/writer.jl does the same thing but in a different way. I think we can get rid of test/testing_functions.jl by making the testing style more uniform. Will add more comments as I spot things. Also, #64 will change all the file names in src/io and this should be reflected in test/io.

reorganize file contents

Transcript:

Mo  1:03 PM
we should change the file names of "modeling.jl" and "model_manipulations.jl" to something like "manipulate_linearmodel.jl" and "manipulate_fullmodel.jl"
mirek  1:04 PM
or modeling/fullmodel and modeling/linearmodel
would also make it easier to split into doc sections
Mo  1:05 PM
also I think find_exchange_metabolites should be near my exchange_reactions and we should just have one name and dispatch on model type
mirek  1:05 PM
yeah
Mo  1:05 PM
this is of course not for this PR but later
should I make an issue to remind us?
mirek  1:05 PM
same thing for the prettyprinting&misc functions probably, they are now mixed with the model docs
I'll make one

Consistent model variable naming

it is a bit ugly that JSON and SBML model contain .m but MATModel contains .mat. Either clean up to all .m or use .json and .sbml.

Same for model names used in function parameters, there's m, model and a, with occasional excesses.

Ongoing design considerations

From our discussion today via Slack (@exaexa @laurentheirendt):

  1. LinearModel -> CoreModel
  2. Creation of SBMLModel, MATModel, JSONModel, (maybe YAMLModel?) types to store models read in from those files. Use all fields supported by the various file types.
  3. Analysis functions should work on all model types. Input: model type. Output: numbers etc.
  4. Reconstruction functions will output StandardModel or CoreModel depending on the input type with restrictions on which type of model depending on the purpose of the reconstruction functions
  5. Squashing models: example: CoreModel and SBMLModel can at most output a CoreModel
  6. Reconstruction functions will have as an input StandardModel or CoreModel
  7. accessors rather than deep copies when converting model types

Make sure the accessors for MetabolicModel kinda feature-complete

State:

  • reactions ๐Ÿ†—
  • n_reactions ๐Ÿ†— (efficient "just tell me how many reaccs you have)
  • metabolites ๐Ÿ†—
  • n_metabolites ๐Ÿ†—
  • genes ๐Ÿ†—
  • n_genes -- this would be great to have done in #101
  • stoichiometry ๐Ÿ†—
  • bounds ๐Ÿ†—
  • objective ๐Ÿ†—
  • balance ๐Ÿ†—
  • gene_associations ๐Ÿ†— (basically GRRs)
  • metabolite_chemistry ๐Ÿ†— (basically formulas+charges)
  • metabolite_annotations -- TODO, is there any expected standard to pull out e.g. the standardized identifiers? I'd go for metabolite_identifiers directly tbh. Done in #102 together with the below 2
  • reaction_annotations
  • gene_annotations
  • reaction_subsystem -- TODO
  • metabolite_comparment -- TODO
  • reaction/gene/metabolite_notes -- notes are extremely random, we might postpone them. If there's no conveyable structure for notes, I'd suggest not having them in the generic MetabolicModels at all.
  • reaction/gene/metabolite_name

Bounds should not be sparse vectors, maybe a new data type...

Bound vectors are typically not sparse in the usual sense. They are populated with max/min bounds and not very many zeros. It might make sense to make our own "sparse" vector format where the zeros are actually the max or min bounds. This might save significant storage at the exa-scale...

`loadModel` is broken

There's no haskey() for MAT files

using MAT
file=matopen("test/data/toyModel1.mat")
haskey(file, "model")
ERROR: MethodError: no method matching haskey(::MAT.MAT_v5.Matlabv5File, ::String)

add CodeCov

We need to add Codecov and Coveralls to the testing pipeline

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.