systemsbioinformatics / ecmtool Goto Github PK

Uncover organisms' metabolic blueprints

License: MIT License

Python 82.06% MATLAB 11.98% Dockerfile 0.31% Shell 1.82% Batchfile 0.02% Cython 3.81%

ecmtool's Introduction

ecmtool - Uncover organisms' metabolic blueprints

With this tool you can calculate Elementary Conversion Modes (ECMs) from metabolic networks. Combinations of ECMs comprise all metabolic influences an organism can exert on its environment.

ecmtool can be used in two different modes: either as a standalone command line tool, or as a Python library for your own scripts. We will describe how to install and use both modes.

Prerequisites

Download and install Python. Ecmtool is compatible with python 3.x, and tested on 3.10. Ensure both python and its package manager pip are added to your PATH environment variable. If this last step is omitted, an error like the following will be thrown when you try to run python: ’python’ is not recognized as an internal or external command [..].
Download and install Java. ecmtool is tested with OpenJDK 17. Make sure you have a 64bit version; you can check this with java -version. Otherwise, you might get an error Invalid maximum heap size.

Mode 1: standalone command line tool

In this mode, you can call ecmtool like a normal program from your command line. It reads metabolic networks in the SBML format, and writes resulting ECMs into a CSV file for later analysis. Most researchers will use this method. For running ecmtool on computing clusters efficiently, see the Advanced Usage section in this readme.

Installation

Download the latest ecmtool source through git clone, or as a zip file from https://github.com/tjclement/ecmtool.
Open a command prompt, and navigate to the ecmtool directory (e.g. cd C:\Users\You\Git\ecmtool, where the path should be replaced with the path ecmtool was downloaded to).
Install the dependencies in requirements.txt inside the ecmtool directory (e.g. by running pip install -r requirements.txt).
Linux only: install redund of package lrslib (e.g. by running apt install lrslib).

Installing ecmtool using Docker

For convenience, there's a Docker script you can use that has all dependencies already installed, and allows you to directly run ecmtool. Open a terminal with the ecmtool project as its working directory, and run:

docker build -t ecmtool -f docker/Dockerfile .
docker run -ti ecmtool bash

Installing ecmtool using Singularity

To be continued.

Running

Ecmtool can be run by executing python main.py –-model_path <path/to/model.xml> [arguments] from the command line, after navigating to the ecmtool directory as described above. The possible arguments and their default values are printed when you run python main.py --help. After execution is done, the found conversions have been written to file (default: conversions.csv). The first row of this CSV file contain the metabolite IDs as read from the SBML model.

Example

python main.py --model_path models/e_coli_core.xml --auto_direction true --out_path core_conversions.csv

Benefiting from optional arguments of ecmtool

For an elaborate discussion of all optional arguments that can be used when ecmtool is run as a command line tool, please see the extensive manual that was uploaded as a Supplementary File with the ecmtool-publication at: https://doi.org/10.1016/j.patter.2020.100177

Mode 2: Python library

ecmtool can also be used as a separate programming interface from within your own Python code. To do so, install ecmtool using pip (e.g. pip install ecmtool). The most crucial method is ecmtool.conversion_cone:get_conversion_cone(), which returns the ECMs of a given stoichiometric matrix. For information on how to use advanced features like SBML parsing, network compression, and metabolite direction estimation, please see ecmtool/main.py.

We strongly advise the user to either use ecmtool as a command line tool, or to pay much attention to carefully copy the order from ecmtool/main.py.

Example

from ecmtool.network import extract_sbml_stoichiometry
from ecmtool.conversion_cone import get_conversion_cone
from ecmtool.helpers import unsplit_metabolites, print_ecms_direct
import numpy as np

DETERMINE_INPUTS_OUTPUTS = False # Determines whether ecmtool tries to infer directionality (input/output/both)
PRINT_CONVERSIONS = True # Prints the resulting ECMs on the console

network = extract_sbml_stoichiometry('models/sxp_toy.xml', add_objective=True, determine_inputs_outputs=DETERMINE_INPUTS_OUTPUTS)

# Some steps of compression only work when cone is in one orthant, so we need to split external metabolites with
# direction "both" into two metabolites, one of which is output, and one is input
network.split_in_out(only_rays=False)

# It is generally a good idea to compress the network before computation
network.compress(verbose=True, SCEI=True, cycle_removal=True, remove_infeasible=True)

stoichiometry = network.N

ecms = get_conversion_cone(stoichiometry, network.external_metabolite_indices(),
 network.reversible_reaction_indices(), network.input_metabolite_indices(), 
 network.output_metabolite_indices(), verbose=True)
 
# Since we have split the "both" metabolites, we now need to unsplit them again
cone_transpose, ids = unsplit_metabolites(np.transpose(ecms), network)
cone = np.transpose(cone_transpose)

# We can remove all internal metabolites, since their values are zero in the conversions (by definition of internal)
internal_ids = []
for metab in network.metabolites:
    if not metab.is_external:
        id_ind = [ind for ind, id in enumerate(ids) if id == metab.id]
        if len(id_ind):
            internal_ids.append(id_ind[0])

ids = list(np.delete(ids, internal_ids))
cone = np.delete(cone, internal_ids, axis=1)

# If you wish, one can print the ECM results:
if PRINT_CONVERSIONS:
    print_ecms_direct(np.transpose(cone), ids)

Example scripts

See the scripts in the folder examples_and_results for examples on how to use ecmtool as a library. In particular: ECM_calc_script.py, compare_efms_ecms_number.py.

Enumerating ECMs without an SBML-file

See the script examples_and_results/minimal_run_wo_sbml.py for an example on how to compute ECMs starting from a stoichiometric matrix, and some additional information.

Advanced usage

After testing how the tool works, most users will want to run their workloads on computing clusters instead of on single machines. This section describes some of the steps that are useful for running on clusers

Parallel computing with OpenMPI

On Linux or Mac, ecmtool can make use of OpenMPI for running on parallel in a computing cluster. To make use of this feature, in addition to the dependencies in requirements.txt, OpenMPI, mpi4py, and mplrs are required. The installation of OpenMPI and mplrs is done via:

apt install libopenmpi-dev
wget http://cgm.cs.mcgill.ca/~avis/C/lrslib/archive/lrslib-071a.tar.gz
tar -xzf lrslib-071a.tar.gz
cd lrslib-071a
make && make mplrs && make install
ln -s `pwd`/mplrs /usr/local/bin/mplrs
ln -s `pwd`/redund /usr/local/bin/redund
cd ..

The installation of mpi4py is done via:

pip3 install mpi4py==3.1.4

Running ecmtool on a cluster using the indirect enumeration method is now as simple as running:

python3 main.py --processes <number of processes for enumeration> --model_path models/e_coli_core.xml

Note that this performs preprocessing steps like network compression on the node you run this command on, and not on the compute cluster.

For direct enumeration, the number of processes for enumeration is passed to mpiexec instead:

mpiexec -n <number of processes for enumeration> python3 main.py --direct true --model_path models/e_coli_core.xml

In this mode, preprocessing steps are run on the compute cluster too.

Advanced ECM-computation on a computing cluster

Installation of ecmtool when the user does not have root privileges on the cluster (a case report)

On some computing clusters, it is not easy to install OpenMPI and mplrs. One method that was successful is outlined here. This cluster had an OpenMPI already available as a module that could be loaded. The available versions can be seen by

module av OpenMPI

For the installation of mplrs, we will also need GMP, check this by

module av GMP

It is important that the versions of OpenMPI and GMP have to match. In this case, we used

module load OpenMPI/4.1.1-GCC-10.3.0
module load GMP/6.2.1-GCCcore-10.3.0

where the last number indicates that they are using a compatible version of GCC. Now, we are ready to install mplrs. This can be done via:

apt install libopenmpi-dev
wget http://cgm.cs.mcgill.ca/~avis/C/lrslib/archive/lrslib-071a.tar.gz
tar -xzf lrslib-071a.tar.gz
cd lrslib-071a
make && make mplrs && make install

Now we need to tell the cluster where to find the installed mplrs. We can do this by adding the path to mplrs to the search path:

export LD_LIBRARY_PATH=/scicore/home/nimwegen/degroo0000/ecmtool/lrslib-071a:$LD_LIBRARY_PATH
export PATH=/scicore/home/nimwegen/degroo0000/ecmtool/lrslib-071a:$PATH

Now using the command

mplrs

should give some output that indicates that mplrs is working and can be found.

Running ecmtool using separate runs for non-parallel and parallel parts, with a .sh-script (on a slurm-cluster)

To fully exploit parallel computation on a cluster, one would like to use ecmtool in separate steps, as outlined below. (In the ecmtool-folder one can also find an example-script that can be used on a computing cluster that is using slurm: examples_and_results/launch_separate_mmsyn_newest.sh.)

preprocessing and compression of the model on a compute node (instead of a login node). For this run

srun --ntasks=1 --nodes=1 python3 main.py all_until_mplrs --model_path ${MODEL_PATH} --auto_direction ${AUTO_DIRECT} --hide "${HIDE}" --prohibit "${PROHIBIT}" --tag "${TAG}" --inputs "${INPUTS}" --outputs "${OUTPUTS}" --use_external_compartment "${EXT_COMP}" --add_objective_metabolite "${ADD_OBJ}" --compress "${COMPRESS}" --hide_all_in_or_outputs "${HIDE_ALL_IN_OR_OUTPUTS}

where the arguments in curly brackets should be replaced by your choices for these arguments.

first vertex enumeration step with mplrs in parallel. For this run

mpirun -np <number of processes> mplrs -redund ecmtool/tmp/mplrs.ine ecmtool/tmp/redund.ine
mpirun -np <number of processes>  mplrs ecmtool/tmp/redund.ine ecmtool/tmp/mplrs.out

processing of results from first vertex enumeration step, adding steady-state constraints and removing redundant rays using a parallelized redundancy check.

mpirun -np <number of processes> python3 main.py all_between_mplrs

second vertex enumeration step with mplrs in parallel

mpirun -np <number of processes> mplrs -redund ecmtool/tmp/mplrs.ine ecmtool/tmp/redund.ine
mpirun -np <number of processes>  mplrs ecmtool/tmp/redund.ine ecmtool/tmp/mplrs.out

processing of results from second vertex enumeration step, unsplitting of metabolites, ensuring that results are unique, and saving ecms to file

srun --ntasks=1 --nodes=1 python3 main.py all_from_mplrs --out_path ${OUT_PATH}

Doubling direct enumeration method speed

The direct enumeration method can be sped up by compiling our LU decomposition code with Cython. The following describes the steps needed on Linux, but the same concept also applies to Mac OS and Windows. First make sure all dependencies are satisfied. Then execute:

python3 cython_setup.py build_ext --inplace

mv _bglu* ecmtool/

ℹ️ Note that in the Docker script, this optimisation has already been done. You don't need to compile anything there.

Automatically testing ecmtool and contributing to ecmtool

When ecmtool is installed properly its functioning with various parameter settings can be tested using some predefined tests using

python3 -m pytest tests/test_conversions.py

When contributing to ecmtool please make sure that these tests are passed before making a pull request.

Citing ecmtool

Please refer to the following papers when using ecmtool:

Initial version - https://www.cell.com/patterns/fulltext/S2666-3899(20)30241-5.

mplrs improved version - https://doi.org/10.1093/bioinformatics/btad095.

Acknowledgements

The original source code with indirect enumeration was written by Tom Clement. Erik Baalhuis later expanded the code with a direct enumeration method that improved parallellisation. Daan de Groot helped with many new features, bug fixes, and code reviews. Bianca Buchner added support for mplrs, which raises the maximal size of networks you can enumerate with ecmtool.

License

ecmtool is released with the liberal MIT license. You are free to use it for any purpose. We hope others will contribute to the field by making derived work publicly available too.

ecmtool's People

Contributors

Stargazers

Watchers

Forkers

carolinschulte eunicevpk evbiotechbv arianccbasile tjclement onewhaleid

ecmtool's Issues

Solution differences between Windows vs linux (centOS 7)

When I used a slightly modified GEM from here: https://onlinelibrary.wiley.com/doi/10.1002/biot.201200266 for iMG746, I get significantly different solutions when I run ecmtool on Windows vs CintOS 7. For instance, the Windows solution produces 3084 ECMs, while linux produces 57724 ECMs. The windows solution does not include any ECMs with the objective function (which I didn't understand why), but the linux solution produces many of those. I'm using the recent version of ecmtool on both boxes. The GEM can be grabbed here: http://eco37.mbl.edu/GEMs/iMG746_ExchangeFix.xml if you would like to test.

cheers,
-joe

sympy version check in cbmpy pckg

Hi,

(me again) another little issue I ran into is the version check of sympy used in the cbmpy package.

In mutliple files, the version check is done by using

if int(sympy.version.split('.')[1]) >= 7 and int(sympy.version.split('.')[2]) >= 4:

As the last version of sympy is 1.8, this leads to an indexing problem
I fixed it by simply removing the second part of the statement but if you want to ensure version below are 1.7.4 are bot used you will need to introduce.

Maybe I should directly contact O. Brett on this topic but rather let you know that this prevent the run of test codes and examples

best

File missing

In the published paper there is clustering R script which has mentioned - metab_info_ecolicore.csv and metab_info_bacteroid.csv file inside data directory but is not here.

`get_conversion_cone()` function missing

Hi!
Thanks a lot for this amazing work of yours!

I was wondering whether there's a mistake with your readme for the python library case.
You mention that "The most crucial method is ecmtool.conversion_cone:get_conversion_cone(), which returns the ECMs of a given stoichiometric matrix."

from ecmtool.conversion_cone import get_conversion_cone

but I could not find a get_conversion_cone() at all.

Am I missing something?

Thanks a lot again!

Running redund; Access is denied.

Hi, i tried to test ecmtool with this command in the ecmtool folder, python main.py --model_path .\models\e_coli_core_rounded.xml, get this error File "C:\Users\a.antonakoudis\OneDrive - Sartorius Corporate Administration GmbH\Documents\Projects\Media Design\ecmtool\ecmtool\helpers.py", line 478, in redund
raise ValueError('An error occurred during removal of redundant vectors from an input matrix: '
ValueError: An error occurred during removal of redundant vectors from an input matrix: redund did not write an output file after being presented input file "%s". Please check if your input matrix contains erroneous data, and let us know via https://github.com/SystemsBioinformatics/ecmtool/issues if you think the input matrix seems fine. It helps if you attach the matrix file mentioned above when creating an issue.

I've attached the tmp/matrix0.ine
matrix0.txt

lrslib on CentOS 7

This is an OS issue and not ecmtool, but if you have a quick answer, that would be great. I'm trying to run ecmtool on a CentOS 7 box, but I can't find any repo for lrslib, so I grabbed the source from here http://cgm.cs.mcgill.ca/~avis/C/lrslib/archive/ and have a redund executable, but when I run the ecmtool example
"python main.py --model_path models/e_coli_core.xml --auto_direction true --out_path core_conversions.csv"

ecmtool errors with this from redund:

Iteration 24/25
Adding 8 candidates
Removing 6 rays

Running redund

*lrs:overflow possible: restarting with 128 bit arithmetic

Traceback (most recent call last):
  File "main.py", line 297, in <module>
    remove_infeasible=args.remove_infeasible)
  File "/home/jvallino/Software/ecmtool/ecmtool-master/ecmtool/network.py", line 409, in compress
    remove_infeasible=remove_infeasible)
  File "/home/jvallino/Software/ecmtool/ecmtool-master/ecmtool/network.py", line 444, in compress_inner
    self.cancel_clementine(verbose=verbose)
  File "/home/jvallino/Software/ecmtool/ecmtool-master/ecmtool/network.py", line 702, in cancel_clementine
    self.output_metabolite_indices())
  File "/home/jvallino/Software/ecmtool/ecmtool-master/ecmtool/network.py", line 103, in clementine_equality_compression
    G = redund(G, verbose=verbose)
  File "/home/jvallino/Software/ecmtool/ecmtool-master/ecmtool/helpers.py", line 275, in redund
    matrix_nored = np.append(matrix_nored, [row], axis=0)
  File "<__array_function__ internals>", line 6, in append
  File "/home/jvallino/Software/ecmtool/ecmtool-master/python3-virtualenv/lib64/python3.6/site-packages/numpy/lib/function_base.py", line 4671, in append
    return concatenate((arr, values), axis=axis)
  File "<__array_function__ internals>", line 6, in concatenate
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 54 and the array at index 1 has size 8

I've tried using versions lrs version 71a and 71b, which generate the above, and version 72 generates a syntax like error. Assuming I'm doing this correctly (probably not), what version of lrslib does ecmtool use?

thanks!
-joe

ECMs without biomass, remove_infeasible_irreversible_reactions bug

Hi,

I have a small problem with calculating the ECMs for my (relatively large) model.
It seems that the function 'remove_infeasible_irreversible_reactions' mistakenly throws out reactions necessary for the ECMs with biomass. If I run the ecmtool without the 'remove_infeasible_irreversible_reactions ' step during compression it does return ECMs containing biomass (objective).

I attached my runscript and used model.
data_ECMs_bug.zip

Thanks in advance,
Pjotr van der Jagt
University of Amsterdam

Error with network.compress()

Hi,

I keep running into the same error when using network.compress()

Traceback is as follows:

compress
compress_inner
cancel_clementine
clementine_equality_compression
redund
with open(matrix_nonredundant_path) as file:

FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\...\ecmtool\ecmtool\tmp\matrix_nored0.ine'

Any suggestions?

Thank in advance

ECM results missing key metabolism or biomass synthesis

Hi,

I've been running ecmtool on a few GEMs from BiGG (as well as a couple others), but I'm getting results that appear incomplete and I've yet to figure out why. Here is an example of using the latest version of ecmtool on a methanogen:

Methanosarcina barkeri str. Fusaro GEM iAF692 http://bigg.ucsd.edu/models/iAF692
This is a methanotroph, yet there is no ECM where hydrogen is consumed, only produced, but one of the primary catabolic pathway is
4 H2 + CO2 -> CH4 + 2 H2O
which is not found in the list of ECMs

I have another GEM where flux analysis show biomass synthesis; however, no ECMs are found that involve biomass synthesis. Unfortunately, I can't point to that GEM. I also didn't get any biomass ECM for a Synechocystis GEM (http://bigg.ucsd.edu/models/iJN678) when I ran an earlier version of ecmtool; however, when I run the latest version of ecmtool on that GEM now it finds a problem with one of the reactions (R_EX_photon_e) that has both lower and upper bounds of 0. I can remove that reaction, but that make it harder for you to verify.

I realize this is not much info to go on, but do you see a reason why there are no hydrogen consuming ECMs for the iAF692 GEM above?

Thanks!
-joe

Negative "possible adjacencies" printed

Strange output"

Iteration 12 (internal metabolite = 11: M_nadph_c) of 25
Possible LP amounts for this step:
7296, 7891, 7296, 87084, 9150, 6765, 92448, 67680, 76612, 1144, 582, 29646, 31726, 65912
Total: 491232
Possible adjacencies added for this step:
60, 60, 60, -44, 60, 60, -44, -44, -44, -12, -12, -44, -44, -44

Command:
python3 main.py --model_path models/iIT341.xml --inputs 22,33,35,93,294,300,306,314,334,356,231,262,139,28 --hide 16,26,29,33,39,40,59,65,75,81,90,93,100,110,145,171,174,212,223,224,232,234,235,239,252,253,255,259,261,262,263,265,269,271,276,277,279,280,283,284,286,291,293,296,302,308,312,319,320,323,325,329,331,336,341,342,344,345,350,352,358,361,366,368,370,372

polco leads to 0 ecms with `e_coli_core`

Hi there!

I am running the following command:

python main.py  --model_path  e_coli_core.xml  --out_path core_conversions.csv --polco True

and I am getting 0 ECMs. Do you have any idea this might happening ?

Regards!

GLPK is not recognized

Hello!

I'm trying to install ecmtool on a HPC environment on my campus in a singularity container. So far, I'm having a problem with the GLPK solver: Even though GLPK is installed and loaded in the environment, the program states that it is not available. I'm not sure whether the problem is the installation or loading of the GLPK solver or if it is a problem with ecmtool. It does seem ecmtool is able to use CPLEX, but only the python interface is installed in the environment, so it is likely not actually used.

If anyone would like to look into this, I would be happy to send the singularity definition file, the SLURM script and the log-files.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.