Git Product home page Git Product logo

minaa's Introduction

MiNAA: Microbiome Network Alignment Algorithm

GitHub Releases GitHub license GitHub Issues  status

Description

MiNAA takes as input a pair of node-edge networks, and finds a correspondance between them such that each node in one is mapped to its most similar node in the other. MiNAA is capable of using both topological (structural) information about the network, and biological information about the taxa each node represents, in order to produce a good approximation of the optimal alignment. Due to the complexity of this task, an approximation is the best that can be done in an efficient runtime. Network alignment in this setting is done primarily for comparative purposes. For example, an alignment might map clusters of taxa to each other, revealing conserved or analogous functions between microbial communities. See our software note (preprint) for additional details.

Requirements

This program requires C++20 or higher, and g++.

Compilation

Unix

make

Windows

mkdir obj
make

In addition to C++20 and g++, Windows requires a special means to run the provided makefile. The MinGW Package Manager provides a lightweight make function. It is recommended to download MinGW here, and follow this guide for installation, however any method for compiling C++ using g++ should suffice.

Usage

This utility has the form ./minaa.exe <G> <H> [-B=bio] [-a=alpha] [-b=beta].

Required Arguments (ordered)

  1. G; a network to align.
  2. H; a network to align.
  • Require:
    • The networks are represented by adjacency matrices in CSV format, with labels in both the first column and row.
    • The CSV delimiter must be one of {comma, semicolon, space, tab}, and will be detected automatically.
    • |G| is lesser or equal to |H|.
  • Notes:
    • Any nonzero entry is considered an edge.

Optional Arguments (unordered)

Common

  • -B=: the path to the biological cost matrix file.
    • Require: a CSV adjacency matrix where the first column consists of the labels of G, in order, and first row consists of the labels of H, in order.
    • Default: the algorithm will run using only topological calculations.
    • Notes:
      • The input matrix is normalized by MiNAA such that all entries are in range [0, 1].
      • The input is assumed to be a cost matrix. If it is a similarity matrix, use the -s option detailed below.
  • -a=: alpha; the GDV-edge weight balancer.
    • Require: a real number in range [0, 1].
    • Default: 1 (100% GDV data).
  • -b=: beta; the topological-biological cost matrix balancer.
    • Require: a real number in range [0, 1].
    • Default: 1 (100% topological data).
  • -st=: similarity threshold; The similarity value above which aligned pairs are included in the output.
    • Require: a real number in range [0, 1].
    • Default: 0.

Uncommon

  • -Galias=: an alias for the G file.
    • Require: a valid file name.
    • Default: the G file keeps its original name.
  • -Halias=: an alias for the H file.
    • Require: a valid file name.
    • Default: the H file keeps its original name.
  • -Balias=: an alias for the B file.
    • Require: a valid file name.
    • Default: the B file keeps its original name.
  • -p: passthrough; whether or not to write the input files into the output folder.
    • Require: none.
    • Default: the files are not passed through to the output folder.
    • Note: the output reflects the input data after having been processed by the algorithm, this is not a direct copy and paste.
  • -t: timestamp; the output folder's name includes the date and time of execution.
    • Require: none.
    • Default: the output folder's name does not include date and time.
  • -g: greekstamp; the output folder's name includes the values for alpha and beta.
    • Require: none.
    • Default: the output folder's name does not include the values for alpha and beta.
  • -s: similarity conversion; for each entry in the given biological matrix, the value (post normalization) is replaced with 1 - value.
    • Require: none.
    • Default: the given biological matrix is left as is.
    • Note: use this if and only if the provided biological matrix is a similarity matrix.

Outputs

  • G-H/: (where G, H are the input networks) The folder containing the output files specified below.
  • log.txt: record of the important details from the alignment.
  • G_gdvs.csv: (where G is the input network) the Graphlet Degree Vectors for network G.
  • H_gdvs.csv: (where H is the input network) the Graphlet Degree Vectors for network H.
  • top_costs.csv: the topological cost matrix.
  • bio_costs.csv: the biologocal cost matrix (as inputed). Not created unless biological input is given.
  • overall_costs.csv: the combination of the topological and biological cost matrix. Not created unless biological input is given.
  • alignment_list.csv: a complete list of all aligned nodes, with rows in the format g_node,h_node,similarity, descending acording to similarity. The first row in this list is the total cost of the alignment, or the sum of (1 - similarity) for all aligned pairs.
  • alignment_matrix.csv: a matrix form of the same alignment, where the first column and row are the labels from the two input networks, respectively.

Examples

Examples of MiNAA's usage with real data and in-depth explanations can be found in the examples/ directory.

Simulations in the Manuscript

All scripts and instructions to reproduce the analyses in the manuscript can be found in the simulations/ directory.

Contributions, Questions, Issues, and Feedback

Users interested in expanding functionalities in MiNAA are welcome to do so. Issues reports are encouraged through Github's issue tracker. See details on how to contribute and report issues in CONTRIBUTING.md.

License

MiNAA is licensed under the MIT license. Β© SolisLemus lab (2024).

Citation

If you use MiNAA in your work, we kindly ask that you cite the following paper:

@ARTICLE{Nelson2022,
  title         = "MiNAA: Microbiome Network Alignment Algorithm",
  author        = "Nelson, Reed and Aghdam, Rosa and
                   Solis-Lemus, Claudia",
  year          =  2022,
  archivePrefix = "arXiv",
  primaryClass  = "q-bio.PE",
  eprint        = "xxx"
}

minaa's People

Contributors

crsl4 avatar reednel avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

minaa's Issues

makefile error

Hi!

I was trying to install minaa on windows 10.
I tried with Ubuntu LTS (seemed less complicated than figuring out mingw) and got this error:

~/minaa$ make
g++ -O3 -g -c -Wall -Wextra -ansi -pedantic -std=c++20 -Iinclude -o obj/minaa.o src/minaa.cpp
make: g++: No such file or directory
make: *** [Makefile:26: obj/minaa.o] Error 127

Sometimes linux on Windows has a hard time with paths, so I tried with mingw, I got this error:

~\minaa>make
g++ -O3 -g -c -Wall -Wextra -ansi -pedantic -std=c++20 -Iinclude -o obj/minaa.o src/minaa.cpp
process_begin: CreateProcess(NULL, g++ -O3 -g -c -Wall -Wextra -ansi -pedantic -std=c++20 -Iinclude -o obj/minaa.o src/minaa.cpp, ...) failed.
make (e=2): The system cannot find the file specified.
make: *** [Makefile:26: obj/minaa.o] Error 2

Turns out the file not found was g++...

So apparently, having or installing 'make' on Ubuntu doesn't mean you also have g++ already. I installed it with sudo apt install g++.
For the windows install, the link from the guide in the readme wasn't working for me, so I got mingw-get from https://sourceforge.net/projects/mingw/. Also, since I got minwg with RTools, I had to make sure, make.exe and g++.exe got install in the same mingw folder else, they doesn't find each other.
After that, it work with both mingw and Ubuntu LTS.

In the compilation section it says 'any method for compiling C++ should suffice', but maybe, I'd add a note about needing g++ specifically since that's what your makefile uses. I would also add it to the unix method.

Output name error on Windows

So running the example command: ./minaa.exe example/G.csv example/H.csv
I got this error:

ERROR:
 Unable to create output folder alignments\example/G-example/H-2023_11_26-18_56_14\

The problem is at line 39 of file_io.cpp:

auto si = file.find_last_of("\\") + 1;

For windows you're looking for the last backslash but your example code uses slash.

If I run ./minaa.exe example\\G.csv example\\H.csv instead it works, but to avoid this you could simply check for the last backslash or slash ("\\|/") instead and then you could use the same command regardless of OS.

    /**
     * Returns the truncated name of the file.
     * 
     * @param file The file to truncate.
     * 
     * @return The truncated name of the file.
     */
    std::string name_file(std::string file)
    {
        auto si = file.find_last_of("\\|/") + 1;
        auto ei = file.find_last_of(".");
        auto file_name = file.substr(si, ei - si);
        return file_name;
    }

Rebuilt with this code ^ and got no error.

Reproducing simulations: can not install SpiecEasi

Hi! πŸ˜„ I can not install SpiecEasy package on my Macos Ventura.

The https://github.com/zdk123/SpiecEasi repo mentions the problem but the proposed solution (running xcode-select --install) did not work for me.

Would you know how to solve this?

> install_github("zdk123/SpiecEasi")
Downloading GitHub repo zdk123/SpiecEasi@HEAD
── R CMD build ─────────────────────────────────────────────────────────────────────────────
βœ”  checking for file β€˜/private/var/folders/22/scsyyfyx7dxfxfzv6gptfws00000gn/T/RtmpLUOwLs/remotes29936fe1d57b/zdk123-SpiecEasi-bc33288/DESCRIPTION’ ...
─  preparing β€˜SpiecEasi’:
βœ”  checking DESCRIPTION meta-information ...
─  cleaning src
─  checking for LF line-endings in source and make files and shell scripts
─  checking for empty or unneeded directories
   Removed empty directory β€˜SpiecEasi/inst’
─  looking to see if a β€˜data/datalist’ file should be added
─  building β€˜SpiecEasi_1.1.2.tar.gz’
   
* installing *source* package β€˜SpiecEasi’ ...
** using staged installation
** libs
clang++ -arch arm64 -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG  -I'/Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library/Rcpp/include' -I'/Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library/RcppArmadillo/include' -I/opt/R/arm64/include   -fPIC  -falign-functions=64 -Wall -g -O2  -c ADMM.cpp -o ADMM.o
clang++ -arch arm64 -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG  -I'/Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library/Rcpp/include' -I'/Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library/RcppArmadillo/include' -I/opt/R/arm64/include   -fPIC  -falign-functions=64 -Wall -g -O2  -c RcppExports.cpp -o RcppExports.o
clang++ -arch arm64 -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG  -I'/Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library/Rcpp/include' -I'/Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library/RcppArmadillo/include' -I/opt/R/arm64/include   -fPIC  -falign-functions=64 -Wall -g -O2  -c matops.cpp -o matops.o
clang++ -arch arm64 -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG  -I'/Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library/Rcpp/include' -I'/Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library/RcppArmadillo/include' -I/opt/R/arm64/include   -fPIC  -falign-functions=64 -Wall -g -O2  -c sqrtNewton.cpp -o sqrtNewton.o
clang++ -arch arm64 -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG  -I'/Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library/Rcpp/include' -I'/Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library/RcppArmadillo/include' -I/opt/R/arm64/include   -fPIC  -falign-functions=64 -Wall -g -O2  -c svthresh.cpp -o svthresh.o
clang++ -arch arm64 -std=gnu++11 -dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup -single_module -multiply_defined suppress -L/Library/Frameworks/R.framework/Resources/lib -L/opt/R/arm64/lib -o SpiecEasi.so ADMM.o RcppExports.o matops.o sqrtNewton.o svthresh.o -L/Library/Frameworks/R.framework/Resources/lib -lRlapack -L/Library/Frameworks/R.framework/Resources/lib -lRblas -L/opt/R/arm64/gfortran/lib/gcc/aarch64-apple-darwin20.6.0/12.0.1 -L/opt/R/arm64/gfortran/lib -lgfortran -lemutls_w -lquadmath -F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework -Wl,CoreFoundation
ld: warning: directory not found for option '-L/opt/R/arm64/gfortran/lib/gcc/aarch64-apple-darwin20.6.0/12.0.1'
ld: warning: directory not found for option '-L/opt/R/arm64/gfortran/lib'
ld: library not found for -lgfortran
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [SpiecEasi.so] Error 1
ERROR: compilation failed for package β€˜SpiecEasi’
* removing β€˜/Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library/SpiecEasi’
Warning message:
In i.p(...) :
  installation of package β€˜/var/folders/22/scsyyfyx7dxfxfzv6gptfws00000gn/T//RtmpLUOwLs/file29933da0cb77/SpiecEasi_1.1.2.tar.gz’ had non-zero exit status

Thank you ! :D

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.