murali-group / beeline Goto Github PK

View Code? Open in Web Editor NEW

167.0 167.0 51.0 125.61 MB

BEELINE: evaluation of algorithms for gene regulatory network inference

License: GNU General Public License v3.0

Dockerfile 4.06% Python 86.60% Julia 0.24% Shell 5.49% R 3.61%

beeline's People

Contributors

Stargazers

Watchers

beeline's Issues

Not able to run SCRIBE

Hi,
I reinstalled BEELINE, and it seems that BEELINE has been installed successfully. The command line output is as below:

zengyp@ubuntu:~/Single_cell/1/Beeline-master$ . setupAnacondaVENV.sh 
Setting up Anaconda Python virtual environment...
Collecting package metadata (current_repodata.json): done
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/zengyp/anaconda3/envs/BEELINE

  added / updated specs:
    - matplotlib==3.0.2
    - networkx==2.2
    - numpy==1.15.4
    - pandas==0.23.4
    - python=3.7.1
    - pyyaml==5.1.1
    - r=3.5.0
    - rpy2==2.9.1
    - scikit-learn==0.21.2
    - seaborn==0.9.0
    - tqdm==4.28.1


The following NEW packages will be INSTALLED:

  _libgcc_mutex      pkgs/main/linux-64::_libgcc_mutex-0.1-main
  _r-mutex           pkgs/r/linux-64::_r-mutex-1.0.0-anacondar_1
  binutils_impl_lin~ pkgs/main/linux-64::binutils_impl_linux-64-2.33.1-he6710b0_7
  binutils_linux-64  pkgs/main/linux-64::binutils_linux-64-2.33.1-h9595d00_15
  blas               pkgs/main/linux-64::blas-1.0-mkl
  bwidget            pkgs/main/linux-64::bwidget-1.9.11-1
  bzip2              pkgs/main/linux-64::bzip2-1.0.8-h7b6447c_0
  ca-certificates    pkgs/main/linux-64::ca-certificates-2021.1.19-h06a4308_0
  cairo              pkgs/main/linux-64::cairo-1.14.12-h8948797_3
  certifi            pkgs/main/linux-64::certifi-2020.12.5-py37h06a4308_0
  curl               pkgs/main/linux-64::curl-7.67.0-hbc83047_0
  cycler             pkgs/main/linux-64::cycler-0.10.0-py37_0
  dbus               pkgs/main/linux-64::dbus-1.13.18-hb2f20db_0
  decorator          pkgs/main/noarch::decorator-4.4.2-pyhd3eb1b0_0
  expat              pkgs/main/linux-64::expat-2.2.10-he6710b0_2
  fontconfig         pkgs/main/linux-64::fontconfig-2.13.1-h6c09931_0
  freetype           pkgs/main/linux-64::freetype-2.10.4-h5ab3b9f_0
  fribidi            pkgs/main/linux-64::fribidi-1.0.10-h7b6447c_0
  gcc_impl_linux-64  pkgs/main/linux-64::gcc_impl_linux-64-7.3.0-habb00fd_1
  gcc_linux-64       pkgs/main/linux-64::gcc_linux-64-7.3.0-h553295d_15
  gfortran_impl_lin~ pkgs/main/linux-64::gfortran_impl_linux-64-7.3.0-hdf63c60_1
  gfortran_linux-64  pkgs/main/linux-64::gfortran_linux-64-7.3.0-h553295d_15
  glib               pkgs/main/linux-64::glib-2.63.1-h5a9c865_0
  graphite2          pkgs/main/linux-64::graphite2-1.3.14-h23475e2_0
  gsl                pkgs/main/linux-64::gsl-2.4-h14c3975_4
  gst-plugins-base   pkgs/main/linux-64::gst-plugins-base-1.14.0-hbbd80ab_1
  gstreamer          pkgs/main/linux-64::gstreamer-1.14.0-hb453b48_1
  gxx_impl_linux-64  pkgs/main/linux-64::gxx_impl_linux-64-7.3.0-hdf63c60_1
  gxx_linux-64       pkgs/main/linux-64::gxx_linux-64-7.3.0-h553295d_15
  harfbuzz           pkgs/main/linux-64::harfbuzz-1.8.8-hffaf4a1_0
  icu                pkgs/main/linux-64::icu-58.2-he6710b0_3
  intel-openmp       pkgs/main/linux-64::intel-openmp-2020.2-254
  jinja2             pkgs/main/noarch::jinja2-2.11.3-pyhd3eb1b0_0
  joblib             pkgs/main/noarch::joblib-1.0.0-pyhd3eb1b0_0
  jpeg               pkgs/main/linux-64::jpeg-9b-h024ee3a_2
  kiwisolver         pkgs/main/linux-64::kiwisolver-1.3.1-py37h2531618_0
  krb5               pkgs/main/linux-64::krb5-1.16.4-h173b8e3_0
  ld_impl_linux-64   pkgs/main/linux-64::ld_impl_linux-64-2.33.1-h53a641e_7
  libcurl            pkgs/main/linux-64::libcurl-7.67.0-h20c2e04_0
  libedit            pkgs/main/linux-64::libedit-3.1.20191231-h14c3975_1
  libffi             pkgs/main/linux-64::libffi-3.2.1-hf484d3e_1007
  libgcc-ng          pkgs/main/linux-64::libgcc-ng-9.1.0-hdf63c60_0
  libgfortran-ng     pkgs/main/linux-64::libgfortran-ng-7.3.0-hdf63c60_0
  libpng             pkgs/main/linux-64::libpng-1.6.37-hbc83047_0
  libssh2            pkgs/main/linux-64::libssh2-1.9.0-h1ba5d50_1
  libstdcxx-ng       pkgs/main/linux-64::libstdcxx-ng-9.1.0-hdf63c60_0
  libtiff            pkgs/main/linux-64::libtiff-4.1.0-h2733197_1
  libuuid            pkgs/main/linux-64::libuuid-1.0.3-h1bed415_2
  libxcb             pkgs/main/linux-64::libxcb-1.14-h7b6447c_0
  libxml2            pkgs/main/linux-64::libxml2-2.9.10-hb55368b_3
  lz4-c              pkgs/main/linux-64::lz4-c-1.9.3-h2531618_0
  make               pkgs/main/linux-64::make-4.2.1-h1bed415_1
  markupsafe         pkgs/main/linux-64::markupsafe-1.1.1-py37h14c3975_1
  matplotlib         pkgs/main/linux-64::matplotlib-3.0.2-py37h5429711_0
  mkl                pkgs/main/linux-64::mkl-2020.2-256
  mkl-service        pkgs/main/linux-64::mkl-service-2.3.0-py37he8ac12f_0
  mkl_fft            pkgs/main/linux-64::mkl_fft-1.2.0-py37h23d657b_0
  mkl_random         pkgs/main/linux-64::mkl_random-1.1.1-py37h0573a6f_0
  ncurses            pkgs/main/linux-64::ncurses-6.2-he6710b0_1
  networkx           pkgs/main/linux-64::networkx-2.2-py37_1
  numpy              pkgs/main/linux-64::numpy-1.15.4-py37h7e9f1db_0
  numpy-base         pkgs/main/linux-64::numpy-base-1.15.4-py37hde5b4d6_0
  openssl            pkgs/main/linux-64::openssl-1.1.1i-h27cfd23_0
  pandas             pkgs/main/linux-64::pandas-0.23.4-py37h04863e7_0
  pango              pkgs/main/linux-64::pango-1.42.4-h049681c_0
  patsy              pkgs/main/linux-64::patsy-0.5.1-py37_0
  pcre               pkgs/main/linux-64::pcre-8.44-he6710b0_0
  pip                pkgs/main/linux-64::pip-20.3.3-py37h06a4308_0
  pixman             pkgs/main/linux-64::pixman-0.40.0-h7b6447c_0
  pyparsing          pkgs/main/noarch::pyparsing-2.4.7-pyhd3eb1b0_0
  pyqt               pkgs/main/linux-64::pyqt-5.9.2-py37h05f1152_2
  python             pkgs/main/linux-64::python-3.7.1-h0371630_7
  python-dateutil    pkgs/main/noarch::python-dateutil-2.8.1-pyhd3eb1b0_0
  pytz               pkgs/main/noarch::pytz-2021.1-pyhd3eb1b0_0
  pyyaml             pkgs/main/linux-64::pyyaml-5.1.1-py37h7b6447c_0
  qt                 pkgs/main/linux-64::qt-5.9.7-h5867ecd_1
  r                  pkgs/r/linux-64::r-3.5.0-r350_0
  r-assertthat       pkgs/r/linux-64::r-assertthat-0.2.0-r350h912f1d8_0
  r-base             pkgs/r/linux-64::r-base-3.5.0-h1e0a451_1
  r-bh               pkgs/r/linux-64::r-bh-1.66.0_1-r350h912f1d8_0
  r-bindr            pkgs/r/linux-64::r-bindr-0.1.1-r350h912f1d8_0
  r-bindrcpp         pkgs/r/linux-64::r-bindrcpp-0.2.2-r350hebe7666_0
  r-bit              pkgs/r/linux-64::r-bit-1.1_12-r350hb353451_0
  r-bit64            pkgs/r/linux-64::r-bit64-0.9_7-r350hb353451_0
  r-blob             pkgs/r/linux-64::r-blob-1.1.1-r350h912f1d8_0
  r-boot             pkgs/r/linux-64::r-boot-1.3_20-r350h912f1d8_0
  r-class            pkgs/r/linux-64::r-class-7.3_14-r350hb353451_4
  r-cli              pkgs/r/linux-64::r-cli-1.0.0-r350h912f1d8_0
  r-cluster          pkgs/r/linux-64::r-cluster-2.0.7_1-r350h6ecb4d7_0
  r-codetools        pkgs/r/linux-64::r-codetools-0.2_15-r350h912f1d8_0
  r-crayon           pkgs/r/linux-64::r-crayon-1.3.4-r350h912f1d8_0
  r-dbi              pkgs/r/linux-64::r-dbi-0.8-r350h912f1d8_0
  r-dbplyr           pkgs/r/linux-64::r-dbplyr-1.2.1-r350h912f1d8_0
  r-digest           pkgs/r/linux-64::r-digest-0.6.15-r350hb353451_0
  r-dplyr            pkgs/r/linux-64::r-dplyr-0.7.4-r350hebe7666_0
  r-foreign          pkgs/r/linux-64::r-foreign-0.8_70-r350hb353451_0
  r-glue             pkgs/r/linux-64::r-glue-1.2.0-r350hb353451_0
  r-kernsmooth       pkgs/r/linux-64::r-kernsmooth-2.23_15-r350h6ecb4d7_4
  r-lattice          pkgs/r/linux-64::r-lattice-0.20_35-r350hb353451_0
  r-magrittr         pkgs/r/linux-64::r-magrittr-1.5-r350h912f1d8_4
  r-mass             pkgs/r/linux-64::r-mass-7.3_49-r350hb353451_0
  r-matrix           pkgs/r/linux-64::r-matrix-1.2_14-r350hb353451_0
  r-memoise          pkgs/r/linux-64::r-memoise-1.1.0-r350h912f1d8_0
  r-mgcv             pkgs/r/linux-64::r-mgcv-1.8_23-r350hb353451_0
  r-nlme             pkgs/r/linux-64::r-nlme-3.1_137-r350h6ecb4d7_0
  r-nnet             pkgs/r/linux-64::r-nnet-7.3_12-r350hb353451_0
  r-pillar           pkgs/r/linux-64::r-pillar-1.2.1-r350h912f1d8_0
  r-pkgconfig        pkgs/r/linux-64::r-pkgconfig-2.0.1-r350h912f1d8_0
  r-plogr            pkgs/r/linux-64::r-plogr-0.2.0-r350h912f1d8_0
  r-prettyunits      pkgs/r/linux-64::r-prettyunits-1.0.2-r350h912f1d8_0
  r-purrr            pkgs/r/linux-64::r-purrr-0.2.4-r350hb353451_0
  r-r6               pkgs/r/linux-64::r-r6-2.2.2-r350h912f1d8_0
  r-rcpp             pkgs/r/linux-64::r-rcpp-0.12.16-r350hebe7666_0
  r-recommended      pkgs/r/linux-64::r-recommended-3.5.0-r350_0
  r-rlang            pkgs/r/linux-64::r-rlang-0.2.0-r350hb353451_0
  r-rpart            pkgs/r/linux-64::r-rpart-4.1_13-r350hb353451_0
  r-rsqlite          pkgs/r/linux-64::r-rsqlite-2.1.0-r350hebe7666_0
  r-spatial          pkgs/r/linux-64::r-spatial-7.3_11-r350hb353451_4
  r-survival         pkgs/r/linux-64::r-survival-2.42_3-r350hb353451_0
  r-tibble           pkgs/r/linux-64::r-tibble-1.4.2-r350hb353451_0
  r-tidyselect       pkgs/r/linux-64::r-tidyselect-0.2.4-r350hebe7666_0
  r-utf8             pkgs/r/linux-64::r-utf8-1.1.3-r350hb353451_0
  readline           pkgs/main/linux-64::readline-7.0-h7b6447c_5
  rpy2               pkgs/r/linux-64::rpy2-2.9.1-py37r350h035aef0_0
  scikit-learn       pkgs/main/linux-64::scikit-learn-0.21.2-py37hd81dba3_0
  scipy              pkgs/main/linux-64::scipy-1.5.2-py37h0b6359f_0
  seaborn            pkgs/main/noarch::seaborn-0.9.0-pyh91ea838_1
  setuptools         pkgs/main/linux-64::setuptools-52.0.0-py37h06a4308_0
  sip                pkgs/main/linux-64::sip-4.19.8-py37hf484d3e_0
  six                pkgs/main/linux-64::six-1.15.0-py37h06a4308_0
  sqlite             pkgs/main/linux-64::sqlite-3.33.0-h62c20be_0
  statsmodels        pkgs/main/linux-64::statsmodels-0.11.1-py37h7b6447c_0
  tk                 pkgs/main/linux-64::tk-8.6.10-hbc83047_0
  tktable            pkgs/main/linux-64::tktable-2.10-h14c3975_0
  tornado            pkgs/main/linux-64::tornado-6.1-py37h27cfd23_0
  tqdm               pkgs/main/linux-64::tqdm-4.28.1-py37h28b3542_0
  wheel              pkgs/main/noarch::wheel-0.36.2-pyhd3eb1b0_0
  xz                 pkgs/main/linux-64::xz-5.2.5-h7b6447c_0
  yaml               pkgs/main/linux-64::yaml-0.1.7-had09818_2
  zlib               pkgs/main/linux-64::zlib-1.2.11-h7b6447c_3
  zstd               pkgs/main/linux-64::zstd-1.4.5-h9ceee32_0


Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate BEELINE
#
# To deactivate an active environment, use
#
#     $ conda deactivate


R version 3.5.0 (2018-04-23) -- "Joy in Playing"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-conda_cos6-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> install.packages('https://cran.r-project.org/src/contrib/PRROC_1.3.1.tar.gz', type = 'source')
inferring 'repos = NULL' from 'pkgs'
trying URL 'https://cran.r-project.org/src/contrib/PRROC_1.3.1.tar.gz'
Content type 'application/x-gzip' length 335708 bytes (327 KB)
==================================================
downloaded 327 KB

* installing *source* package ‘PRROC’ ...
** package ‘PRROC’ successfully unpacked and MD5 sums checked
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (PRROC)
> 
>

when I try to run 'python BLRunner.py --config config-files/config.yaml' for SCRIBE, it output an error, the command line output is as below:

(BEELINE) zengyp@ubuntu:~/Single_cell/1/Beeline-master$ python BLRunner.py --config config-files/config.yaml
Skipping PIDC
Skipping GRNVBEM
Skipping GENIE3
Skipping GRNBOOST2
Skipping PPCOR
Skipping SCODE
Skipping SCNS
Skipping SINCERITIES
Skipping LEAP
Skipping GRISLI
Skipping SINGE
<BLRun.BLRun object at 0x7fc6bb52c208>
Evaluation started
docker run --rm -v /home/zengyp/Single_cell/1/Beeline-master:/data/ scribe:base /bin/sh -c "time -v -o data/outputs/example/GSD/SCRIBE/time0.txt Rscript runScribe.R -e data/inputs/example/GSD/SCRIBE/ExpressionData0.csv -c data/inputs/example/GSD/SCRIBE/CellData0.csv -g data/inputs/example/GSD/SCRIBE/GeneData.csv -o data/outputs/example/GSD/SCRIBE/ -d 5 -l 0 -m ucRDI -x uninormal --outFile outFile0.csv -i"
Error in library(monocle, warn.conflicts = FALSE, quietly = TRUE) : 
  there is no package called ‘monocle’
Calls: suppressPackageStartupMessages -> withCallingHandlers -> library
Execution halted
docker run --rm -v /home/zengyp/Single_cell/1/Beeline-master:/data/ scribe:base /bin/sh -c "time -v -o data/outputs/example/GSD/SCRIBE/time1.txt Rscript runScribe.R -e data/inputs/example/GSD/SCRIBE/ExpressionData1.csv -c data/inputs/example/GSD/SCRIBE/CellData1.csv -g data/inputs/example/GSD/SCRIBE/GeneData.csv -o data/outputs/example/GSD/SCRIBE/ -d 5 -l 0 -m ucRDI -x uninormal --outFile outFile1.csv -i"
Error in library(monocle, warn.conflicts = FALSE, quietly = TRUE) : 
  there is no package called ‘monocle’
Calls: suppressPackageStartupMessages -> withCallingHandlers -> library
Execution halted
outputs/example/GSD/SCRIBE/outFile0.csv does not exist, skipping...
Evaluation complete

Add a new repository for evaluation/post hoc analysis, call it RNEval or something for now.

Demonstrate performance difference between dynverse models and GNW output

We have shown that GNW output does not display the any meaningful qualitative structure (like linear or bifurcating trajectories) for a given Dynverse network. This is our argument for using the dynverse and literature curated models in order to see "meaningful" trajectories. However, we have not yet demonstrated that giving the network inference methods such biologically interpretable trajectories leads to better results. For this we have to compare performance of the methods on dynverse and GNW data for a given model.
cc @adyprat

Create mock-up of front-end interface.

Curate Boolean models from literature

Curate 10, 30 and 50 node boolean GRN models, use our code to generate input data that includes ExpressionData.csv (a gene by cell matrix) and pseudotime ordering of cells.

Will you upload Supplementary Section ?

Hello.
In the paper(https://www.biorxiv.org/content/10.1101/642926v3.full),
"We simulate these networks using BoolODE, a method we have developed to convert Boolean models into systems of differential equations (Supplementary Section S1)", but I can't see the Supplementary Section1.
So, I cant't know how to create the input data for synthetic network.
I'd like to use the synthetic data which used in this paper.
Could you upload Supplementary Section1 and Table2 ?
Thank you.

using dynverse/dyngen to simulate the synthetic datasets

Hey all!

I read your bioRxiv on "Benchmarking algorithms for gene regulatory network inferencefrom single-cell transcriptomic data" and was quite intrigued. I've been toying around with various forms of NI myself, and think that a benchmark for scNI methods is highly warranted.

I read that the synthetic networks are being generated by taking the module networks from dynverse/dyngen, converting them to ODEs using BoolODE and then running GeneNetWeaver. I was wondering why GNW is being used at all, since dyngen is also able to perform all of these steps. One of the benefits of dyngen is that it uses Gillespie's SSA instead of ODE's. SSA simulations keep track of the number of molecules in your cell (mrna, proteins), and simulates at each step which reaction takes place (e.g. transcribe a new mrna). This way it doesn't need need to generate random noise at each time step in order to simulate stochasticity. Instead, the stochasticity comes from which reactions are being triggered at each of the time steps.

Have you tried dyngen instead of GNW in order to perform the simulations?

What are your thoughts on this?
Robrecht

curated datasets

Hi,

Thank you very much for this tool!!
I am wondering if one can generate the curated datasets (Fig. 3 of your paper) with less noise.

Best

Problems running GRNBoost and GENIE3 on the example data

Hello,

I tried running all the methods supported in BEELINE on your example GSD data following the steps you provide in the documentation. I am getting this error for GRNBoost2 and GENIE3:

docker run --rm -v /Users/sotolm/Documents/PR-scGRN/Beeline:/data pidc:base /bin/sh -c "time -v -o data/outputs/example/GSD/PIDC/time.txt julia runPIDC.jl data/inputs/example/GSD/PIDC/ExpressionData.csv data/outputs/example/GSD/PIDC/outFile.txt " 4.710384 seconds (11.83 M allocations: 3.117 GiB, 7.00% gc time) 3.156385 seconds (10.83 M allocations: 531.753 MiB, 7.40% gc time) docker run --rm -v /Users/sotolm/Documents/Beeline:/data/ --expose=41269 arboreto:base /bin/sh -c "time -v -o data/outputs/example/GSD/GENIE3/time.txt python runArboreto.py --algo=GENIE3 --inFile=data/inputs/example/GSD/GENIE3/ExpressionData.csv --outFile=data/outputs/example/GSD/GENIE3/outFile.txt " Traceback (most recent call last): File "runArboreto.py", line 43, in <module> main(sys.argv) File "runArboreto.py", line 32, in main network = genie3(inDF, client_or_address = client) File "/opt/conda/lib/python3.7/site-packages/arboreto/algo.py", line 73, in genie3 limit=limit, seed=seed, verbose=verbose) File "/opt/conda/lib/python3.7/site-packages/arboreto/algo.py", line 115, in diy expression_matrix, gene_names, tf_names = _prepare_input(expression_data, gene_names, tf_names) File "/opt/conda/lib/python3.7/site-packages/arboreto/algo.py", line 214, in _prepare_input expression_matrix = expression_data.as_matrix() File "/opt/conda/lib/python3.7/site-packages/pandas/core/generic.py", line 5274, in __getattr__ return object.__getattribute__(self, name) AttributeError: 'DataFrame' object has no attribute 'as_matrix'

docker run --rm -v /Users/sotolm/Documents/PR-scGRN/Beeline:/data/ --expose=41269 arboreto:base /bin/sh -c "time -v -o data/outputs/example/GSD/GRNBOOST2/time.txt python runArboreto.py --algo=GRNBoost2 --inFile=data/inputs/example/GSD/GRNBOOST2/ExpressionData.csv --outFile=data/outputs/example/GSD/GRNBOOST2/outFile.txt " Traceback (most recent call last): File "runArboreto.py", line 43, in <module> main(sys.argv) File "runArboreto.py", line 36, in main network = grnboost2(inDF, client_or_address = client) File "/opt/conda/lib/python3.7/site-packages/arboreto/algo.py", line 41, in grnboost2 early_stop_window_length=early_stop_window_length, limit=limit, seed=seed, verbose=verbose) File "/opt/conda/lib/python3.7/site-packages/arboreto/algo.py", line 115, in diy expression_matrix, gene_names, tf_names = _prepare_input(expression_data, gene_names, tf_names) File "/opt/conda/lib/python3.7/site-packages/arboreto/algo.py", line 214, in _prepare_input expression_matrix = expression_data.as_matrix() File "/opt/conda/lib/python3.7/site-packages/pandas/core/generic.py", line 5274, in __getattr__ return object.__getattribute__(self, name) AttributeError: 'DataFrame' object has no attribute 'as_matrix'

Additionally, I get this error for GENIE3:

Traceback (most recent call last): File "BLRunner.py", line 77, in <module> main() File "BLRunner.py", line 71, in main evaluation.runners[idx].parseOutput() File "/Users/sotolm/Documents/Beeline/BLRun/runner.py", line 90, in parseOutput OutputParser[self.name](self) File "/Users/sotolm/Documents/Beeline/BLRun/genie3Runner.py", line 60, in parseOutput OutDF = pd.read_csv(outDir+'outFile.txt', sep = '\t', header = 0) File "/Users/sotolm/usr/local/bin/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 676, in parser_f return _read(filepath_or_buffer, kwds) File "/Users/sotolm/usr/local/bin/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 448, in _read parser = TextFileReader(fp_or_buf, **kwds) File "/Users/sotolm/usr/local/bin/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 880, in __init__ self._make_engine(self.engine) File "/Users/sotolm/usr/local/bin/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 1114, in _make_engine self._engine = CParserWrapper(self.f, **self.options) File "/Users/sotolm/usr/local/bin/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 1891, in __init__ self._reader = parsers.TextReader(src, **kwds) File "pandas/_libs/parsers.pyx", line 374, in pandas._libs.parsers.TextReader.__cinit__ File "pandas/_libs/parsers.pyx", line 673, in pandas._libs.parsers.TextReader._setup_parser_source FileNotFoundError: [Errno 2] File outputs/example/GSD/GENIE3/outFile.txt does not exist: 'outputs/example/GSD/GENIE3/outFile.txt'

I believe the first two are issues with the version of pandas, the attribute as.martix was deprecated since version 0.23.0 and the one listed is 0.23.4. I tried changing the version of pandas in the requirements.txt but it clashes with the versions of other packages.

List website user operations

Nan value in PIDC results

Hi Beeline team,
I am currently using your neat pipeline, while I have encountered a very wird typo in the rankedEdges.csv file of PIDC results. It seems that on my datasets, the edge weights measured by PIDC is all nan values, like this

But! after I used a search algorithm developed by myself ( this algorithm need to repeat run PIDC, which could not be applied on the large-scale scRNA-seq datasets), I found that just delete some of the genes (in my cases, the 441th,865th,866th genes), the edge weights are back to normal ??

I originally thought that may be these genes have some bad statistical characteristics, but regretly that I didn't find any special properties of those genes. (e.g. average expression, variance, coefficients of variation, etc...)
I found this thing is happened in most of my datasets, so I think its really important to be figured out, but I have no idea about how to solve it.

In order to let your team to check this typo, I have create a repo and upload the ExpressionData.csv, https://github.com/WWXkenmo/PIDC_bug

Best,
Ken

Perform systematic parameter searches for each algorithm

ground truth for example dataset

Hi,
where do I find the ground truth for the example dataset in your paper?

thank you very much

docker image

Hi,
I am doing an internship and my supervisor told me to read your paper and try to compare your results with newer gene regulatory networking algorithms like CellOracle on the same datasets.
The cluster I'm currently working with only supports Singulartiy and not Docker.
But i found out that Singularity is capable of pulling docker images.
After many attempts it always says:
FATAL: While making image from oci registry: failed to get checksum for
docker://grnbeeline/pidc:latest: Error reading manifest latest in
docker.io/grnbeeline/pidc: manifest unknown: manifest unknown

I wanted to make sure if anyone had the same problems working with Singularity or if theres even a solution for my problem
many thanks!

Data availability from paper

Hello again, I was wondering if you've posted the ground truth networks for the experimental single-cell RNA-seq datasets from the paper anywhere. Thanks for your help!

GRN ground-truth datasets

Error in inputPath derivation in runner scripts

Hi,

Thanks for the interesting package and collection of GRN methods! I ran into the following error when trying to run the eval.py script on the included sample data:

$ pwd
/media/data/chris/beeline/Beeline
$ python eval.py --config config-files/config.yaml
eval.py:126: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details
.
  config_map = yaml.load(config_file_handle)
<__main__.Evaluation object at 0x7fee34158cf8>
Evaluation started
Input folder for PIDC does not exist, creating input folder...
Input folder for GRNVBEM does not exist, creating input folder...
Input folder for GENIE3 does not exist, creating input folder...
Input folder for GRNBOOST2 does not exist, creating input folder...
Input folder for PPCOR does not exist, creating input folder...
Input folder for SCODE does not exist, creating input folder...
Input folder for SCNS does not exist, creating input folder...
Input folder for SINCERITIES does not exist, creating input folder...
Input folder for LEAP does not exist, creating input folder...
Input folder for GRISLI does not exist, creating input folder...
Input folder for SCINGE does not exist, creating input folder...
Input folder for SCRIBE does not exist, creating input folder...
Traceback (most recent call last):
  File "eval.py", line 215, in <module>
    main()
  File "eval.py", line 206, in main
    evaluation.runners[idx].run()
  File "/media/data/chris/beeline/Beeline/src/runner.py", line 86, in run
    AlgorithmMapper[self.name](self)
  File "/media/data/chris/beeline/Beeline/src/pidcRunner.py", line 26, in run
    inputPath = "data/" + str(RunnerObj.inputDir).split("RNMethods/")[1] + \
IndexError: list index out of range

It's not clear to me where RNMethods is supposed to come from, but I was able to work around this by changing the inputPath derivation in each of the runner files:

inputPath = "data/" + str(RunnerObj.inputDir).split(str(Path.cwd()))[1] + "/PIDC/ExpressionData.csv"

The methods appear to be running on the sample data after this change, so I'm not sure if I missed a step somewhere.

Not able to run some algorithms

Hi,
I have successfully installed all the images, but I have some problems when I run BLRunner.py with 'should_run = [True]' for SCRIBE, GRISLI, SINGE, GRNVBEM, GENIE3 and GRNBOOST2. The problems are presented in the images. I do not know how to solve them. Hope that you can give me some advice. Thank you very much!
Best,
Yanping

Visualization of reconstruction of wiring diagrams

Color edges by the number of methods that recovered them. Post to GraphSpace.
Use as many edges from the ranked list in the reconstruction as are in the original wiring diagram.

Add updated documentation

Add updated documentation for using BEELINE.

Synthetic Datasets (dynverse networks)

Not able to run GENIE3 and GRNBOOST2

Hi,
When I try to run GENIE3 and GRNBOOST, it outputs an error: AttributeError: 'DataFrame' object has no attribute 'to_numpy'. The command line output is as below:

(BEELINE) zengyp@ubuntu:~/Single_cell/1/Beeline-master$ python BLRunner.py --config config-files/config.yaml
Skipping PIDC
Skipping GRNVBEM
Skipping PPCOR
Skipping SCODE
Skipping SCNS
Skipping SINCERITIES
Skipping LEAP
Skipping GRISLI
Skipping SINGE
Skipping SCRIBE
<BLRun.BLRun object at 0x7f15d1eae240>
Evaluation started
docker run --rm -v /home/zengyp/Single_cell/1/Beeline-master:/data/ --expose=41269 arboreto:base /bin/sh -c "time -v -o data/outputs/example/GSD/GENIE3/time.txt python runArboreto.py --algo=GENIE3 --inFile=data/inputs/example/GSD/GENIE3/ExpressionData.csv --outFile=data/outputs/example/GSD/GENIE3/outFile.txt "
Traceback (most recent call last):
  File "runArboreto.py", line 43, in <module>
    main(sys.argv)
  File "runArboreto.py", line 32, in main
    network = genie3(inDF.to_numpy(), client_or_address = client, gene_names = inDF.columns)
  File "/opt/conda/lib/python3.6/site-packages/pandas/core/generic.py", line 3614, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'to_numpy'
docker run --rm -v /home/zengyp/Single_cell/1/Beeline-master:/data/ --expose=41269 arboreto:base /bin/sh -c "time -v -o data/outputs/example/GSD/GRNBOOST2/time.txt python runArboreto.py --algo=GRNBoost2 --inFile=data/inputs/example/GSD/GRNBOOST2/ExpressionData.csv --outFile=data/outputs/example/GSD/GRNBOOST2/outFile.txt "
Traceback (most recent call last):
  File "runArboreto.py", line 43, in <module>
    main(sys.argv)
  File "runArboreto.py", line 36, in main
    network = grnboost2(inDF.to_numpy(), client_or_address = client, gene_names = inDF.columns)
  File "/opt/conda/lib/python3.6/site-packages/pandas/core/generic.py", line 3614, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'to_numpy'
Traceback (most recent call last):
  File "BLRunner.py", line 79, in <module>
    main()
  File "BLRunner.py", line 73, in main
    evaluation.runners[idx].parseOutput()
  File "/home/zengyp/Single_cell/1/Beeline-master/BLRun/runner.py", line 90, in parseOutput
    OutputParser[self.name](self)
  File "/home/zengyp/Single_cell/1/Beeline-master/BLRun/genie3Runner.py", line 60, in parseOutput
    OutDF = pd.read_csv(outDir+'outFile.txt', sep = '\t', header = 0)
  File "/home/zengyp/anaconda3/envs/BEELINE/lib/python3.7/site-packages/pandas/io/parsers.py", line 678, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/home/zengyp/anaconda3/envs/BEELINE/lib/python3.7/site-packages/pandas/io/parsers.py", line 440, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/home/zengyp/anaconda3/envs/BEELINE/lib/python3.7/site-packages/pandas/io/parsers.py", line 787, in __init__
    self._make_engine(self.engine)
  File "/home/zengyp/anaconda3/envs/BEELINE/lib/python3.7/site-packages/pandas/io/parsers.py", line 1014, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/home/zengyp/anaconda3/envs/BEELINE/lib/python3.7/site-packages/pandas/io/parsers.py", line 1708, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 384, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas/_libs/parsers.pyx", line 695, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: File b'outputs/example/GSD/GENIE3/outFile.txt' does not exist

Read each paper for best practices suggested by authors

We should read each paper carefully to check that we are following the best practices suggested by the authors, if there are any. Use this issue to record what we read.

SCODE Docker container failing to build

Hello,

I encountered the following error when building the SCODE docker container:

Some packages could not be installed. This may mean that you have                                                                  
requested an impossible situation or if you are using the unstable                                                                 
distribution that some required packages have not yet been created                                                                 
or been moved out of Incoming.                                                                                                     
The following information may help to resolve the situation:                                                                       
                                                                                                                                   
The following packages have unmet dependencies:                                                                                    
 libc6-dev : Breaks: libgcc-8-dev (< 8.4.0-2~) but 8.3.0-3 is to be installed                                                      
E: Error, pkgProblemResolver::Resolve generated breaks, this may be caused by held packages.

To resolve it, I had to modify Dockerfile to install gcc-8-base after running apt-get update

Does this program consider edge's direction when calculate roc_curve?

Hello. Thank you for answering my question before.

Please teach me whether the program consider the edge's direction or not in calculating roc_curve .
In /BLEval/computeDGAUC.py line221, I can see the only 1 or 0 in the dataframe (outDF["TrueEDges"]).
but in the paper(https://www.biorxiv.org/content/10.1101/642926v3.full#ref-10) about the Synthetic data it shows that there are activation and repression edges.
I think there are algorithms that consider the edge's direction and not, that's why they don't consider the direction in evaluation.
I'm sorry if I'm saying too directly or impolitely.
I appreciate for you and this program.

Comparison of outputs from different methods.

PCA or tSNE on the adjacency matrix (converted to a "vector").

Figure out licensing of individual software used in the evaluation pipeline and check if they are GPL3 compatible?

Analysis of network motifs

Look for motifs specific to cell differentiation (like the one described in: "https://www.nature.com/articles/nature08533#gene-regulatory-networks-and-cell-fate-attractors"

BLRunner stuck on "OSError: Timed out trying to connect to 'inproc://172.17.0.2/10/1'" while running arboreto

Hi,
I was trying to run grnbeeline/arboreto:base through BLRunner.py as the following command.

docker run --rm -v /home/abc/projects/Beeline:/data/ --expose=41269 grnbeeline/arboreto:base /bin/sh -c "time -v -o data/outputs/Synthetic/dyn-LI/dyn-LI-100-1/GENIE3/time.txt python runArboreto.py --algo=GENIE3 --inFile=data/inputs/Synthetic/dyn-LI/dyn-LI-100-1/GENIE3/ExpressionData.csv --outFile=data/outputs/Synthetic/dyn-LI/dyn-LI-100-1/GENIE3/outFile.txt "

However, an error occurred and the program stuck.

Task exception was never retrieved
future: <Task finished coro=<connect.<locals>._() done, defined at /opt/conda/lib/python3.7/site-packages/distributed/comm/core.py:288> exception=CommClosedError()>
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/distributed/comm/core.py", line 297, in _
    handshake = await asyncio.wait_for(comm.read(), 1)
  File "/opt/conda/lib/python3.7/asyncio/tasks.py", line 435, in wait_for
    await waiter
concurrent.futures._base.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/distributed/comm/core.py", line 304, in _
    raise CommClosedError() from e
distributed.comm.core.CommClosedError
tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object at 0x7f4a22da7250>>, <Task finished coro=<SpecCluster._correct_state_internal() done, defined at /opt/conda/lib/python3.7/site-packages/distributed/deploy/spec.py:320> exception=OSError("Timed out trying to connect to 'inproc://172.17.0.2/10/1' after 10 s: Timed out trying to connect to 'inproc://172.17.0.2/10/1' after 10 s: connect() didn't finish in time")>)
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/distributed/comm/core.py", line 322, in connect
    _raise(error)
  File "/opt/conda/lib/python3.7/site-packages/distributed/comm/core.py", line 275, in _raise
    raise IOError(msg)
OSError: Timed out trying to connect to 'inproc://172.17.0.2/10/1' after 10 s: connect() didn't finish in time

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/tornado/ioloop.py", line 743, in _run_callback
    ret = callback()
  File "/opt/conda/lib/python3.7/site-packages/tornado/ioloop.py", line 767, in _discard_future_result
    future.result()
  File "/opt/conda/lib/python3.7/site-packages/distributed/deploy/spec.py", line 401, in _close
    await self._correct_state()
  File "/opt/conda/lib/python3.7/site-packages/distributed/deploy/spec.py", line 328, in _correct_state_internal
    await self.scheduler_comm.retire_workers(workers=list(to_close))
  File "/opt/conda/lib/python3.7/site-packages/distributed/core.py", line 810, in send_recv_from_rpc
    comm = await self.live_comm()
  File "/opt/conda/lib/python3.7/site-packages/distributed/core.py", line 772, in live_comm
    **self.connection_args,
  File "/opt/conda/lib/python3.7/site-packages/distributed/comm/core.py", line 334, in connect
    _raise(error)
  File "/opt/conda/lib/python3.7/site-packages/distributed/comm/core.py", line 275, in _raise
    raise IOError(msg)
OSError: Timed out trying to connect to 'inproc://172.17.0.2/10/1' after 10 s: Timed out trying to connect to 'inproc://172.17.0.2/10/1' after 10 s: connect() didn't finish in time

The error is not stable that there is a probability of the error in different places in multiple attempts.

Additionally, the containers are running under docker's bridge network.

Add analysis on network structures provided by dynverse

Improving SINGE compatibility

Hi All,
Thanks for using SINGE (rebranded from SCINGE) in this paper. We read your paper recently, and some of the benchmarking figures are visuall very appealing. We have been updating SINGE quite a lot in the past couple of months, and wanted to update you on the same.
In the recent versions, we have also added guidance about parameter combinations and ensembling. This includes an updated default_hyperparameters.txt file which ensures that the first instinct of a new user to run would be to replicate the hyperparameters used in our paper. For users who would like to generate their own hyperparameter files, we have added s few scripts to do so in the scripts folder.
Based on your workflow, as well as dynverse, we were inspired to support our own Docker image, which is nearing its completion. We'd like to coordinate so that BEELINE users can have access to the latest version of SINGE as we continue our development.
By the way, one key attribute of SINGE is the aggregation stage on the ensemble generated. Do you think providing the aggregated scores in a matrix form (in addition to the ranked lists) would make it easier for the users to evaluate SINGE using BEELINE?
(I realize that the last couple of points may also be pertinent to #15 )

Debug/Document the code

Not able to run example data with most of the algorithm

Hello,

I followed your instructions and have successfully installed BEELINE. Yet when I run python BLRunner.py --config config-files/config.yaml, only two algorithm, PIDC and GRNVBEM are successfully run. The others have error messages such as:

(BEELINE) raphael830102@beeline-cwh:~/Beeline$ python BLRunner.py --config=config-files/new_config.yaml Skipping PIDC Skipping GRNVBEM Skipping GRNBOOST2 Skipping PPCOR Skipping SCODE Skipping SCNS Skipping LEAP Skipping GRISLI Skipping SINGE Skipping SCRIBE <BLRun.BLRun object at 0x7f29f320a080> Evaluation started docker run --rm -v /home/raphael830102/Beeline:/data/ --expose=41269 arboreto:base /bin/sh -c "time -v -o data/outputs/example/GSD/GENIE3/time.txt python runArboreto.py --algo=GENIE3 --inFile=data/inputs/example/G SD/GENIE3/ExpressionData.csv --outFile=data/outputs/example/GSD/GENIE3/outFile.txt " Traceback (most recent call last): File "runArboreto.py", line 43, in <module> main(sys.argv) File "runArboreto.py", line 32, in main network = genie3(inDF, client_or_address = client) File "/opt/conda/lib/python3.7/site-packages/arboreto/algo.py", line 73, in genie3 limit=limit, seed=seed, verbose=verbose) File "/opt/conda/lib/python3.7/site-packages/arboreto/algo.py", line 115, in diy expression_matrix, gene_names, tf_names = _prepare_input(expression_data, gene_names, tf_names) File "/opt/conda/lib/python3.7/site-packages/arboreto/algo.py", line 214, in _prepare_input expression_matrix = expression_data.as_matrix() File "/opt/conda/lib/python3.7/site-packages/pandas/core/generic.py", line 5130, in __getattr__ return object.__getattribute__(self, name) AttributeError: 'DataFrame' object has no attribute 'as_matrix' docker run --rm -v /home/raphael830102/Beeline:/SINCERITIES/data/ sincerities:base /bin/sh -c "time -v -o data/outputs/example/GSD/SINCERITIES/time0.txt Rscript MAIN.R data/inputs/example/GSD/SINCERITIES/Expressio nData0.csv data/outputs/example/GSD/SINCERITIES/outFile0.txt " Loading required package: SuppDists Loading required package: Matrix Loaded glmnet 4.0-2 Loading required package: MASS [1] "data/inputs/example/GSD/SINCERITIES/ExpressionData0.csv" [2] "data/outputs/example/GSD/SINCERITIES/outFile0.txt" Error in coef.cv.glmnet(CV_results, s = "lambda.min") : could not find function "coef.cv.glmnet" Calls: SINCERITITES In addition: There were 21 warnings (use warnings() to see them) Execution halted docker run --rm -v /home/raphael830102/Beeline:/SINCERITIES/data/ sincerities:base /bin/sh -c "time -v -o data/outputs/example/GSD/SINCERITIES/time1.txt Rscript MAIN.R data/inputs/example/GSD/SINCERITIES/Expressio nData1.csv data/outputs/example/GSD/SINCERITIES/outFile1.txt " Loading required package: SuppDists Loading required package: Matrix Loaded glmnet 4.0-2 Loading required package: MASS [1] "data/inputs/example/GSD/SINCERITIES/ExpressionData1.csv" [2] "data/outputs/example/GSD/SINCERITIES/outFile1.txt" Error in coef.cv.glmnet(CV_results, s = "lambda.min") : could not find function "coef.cv.glmnet" Calls: SINCERITITES In addition: There were 21 warnings (use warnings() to see them) Execution halted Traceback (most recent call last): File "BLRunner.py", line 77, in <module> main() File "BLRunner.py", line 71, in main evaluation.runners[idx].parseOutput() File "/home/raphael830102/Beeline/BLRun/runner.py", line 90, in parseOutput OutputParser[self.name](self) File "/home/raphael830102/Beeline/BLRun/genie3Runner.py", line 60, in parseOutput OutDF = pd.read_csv(outDir+'outFile.txt', sep = '\t', header = 0) File "/home/raphael830102/anaconda3/envs/BEELINE/lib/python3.7/site-packages/pandas/io/parsers.py", line 678, in parser_f return _read(filepath_or_buffer, kwds) File "/home/raphael830102/anaconda3/envs/BEELINE/lib/python3.7/site-packages/pandas/io/parsers.py", line 440, in _read parser = TextFileReader(filepath_or_buffer, **kwds) File "/home/raphael830102/anaconda3/envs/BEELINE/lib/python3.7/site-packages/pandas/io/parsers.py", line 787, in __init__ self._make_engine(self.engine) File "/home/raphael830102/anaconda3/envs/BEELINE/lib/python3.7/site-packages/pandas/io/parsers.py", line 1014, in _make_engine self._engine = CParserWrapper(self.f, **self.options) File "/home/raphael830102/anaconda3/envs/BEELINE/lib/python3.7/site-packages/pandas/io/parsers.py", line 1708, in __init__ self._reader = parsers.TextReader(src, **kwds) File "pandas/_libs/parsers.pyx", line 384, in pandas._libs.parsers.TextReader.__cinit__ File "pandas/_libs/parsers.pyx", line 695, in pandas._libs.parsers.TextReader._setup_parser_source FileNotFoundError: File b'outputs/example/GSD/GENIE3/outFile.txt' does not exist

I wonder if I have missed out any critical step, would be nice if you would provide some suggestions.
Thanks a lot.

Add scripts to download and preprocess ground-truth networks for experimental scRNA seq-datasets

These ground-truth networks are not a part of any of our supplemental materials, including Zenodo. We will add scripts to download and pre-process them.

Setup EC2 instance with a webserver

Handling nodes without incoming edges in bool models

There are two issues:

The script needs to be modified to accept a list of input states to handle inputs to simulations. This will need some work.
Currently, the script adds a self edge to nodes without any incoming edge. The user should be notified of which nodes are being handled in this manner. (Fixed in Murali-group/BoolODE@2913a8e)

An issue of Beeline/Algorithms/SCRIBE/runScribe.R

Line 122: ordering_genes not defined.

Issues with Beeline on Fedora 27

Hi all,
I'm trying to run the latest iteration of Beeline using the pre-build Docker container on a Linux Server running Fedora 27. Unfortunately, I'm still running into a few errors even when trying the simple example "GSD" provided in the repository.
First off, from Genie3 and GRNBoost2, i.e. the arboreto Docker, container I'm getting the following error:

docker run --rm -v /localscratch/home/ies/niclas.popp/BEELINE:/data/ --expose=41269 grnbeeline/arboreto:base /bin/sh -c "time -v -o data/outputs/example/GSD/GENIE3/time.txt python runArboreto.py --algo=GENIE3 --inFile=data/inputs/example/GSD/GENIE3/ExpressionData.csv --outFile=data/outputs/example/GSD/GENIE3/outFile.txt " docker: Error response from daemon: linux spec user: unable to find user root: no matching entries in passwd file. docker run --rm -v /localscratch/home/ies/niclas.popp/BEELINE:/data/ --expose=41269 grnbeeline/arboreto:base /bin/sh -c "time -v -o data/outputs/example/GSD/GRNBOOST2/time.txt python runArboreto.py --algo=GRNBoost2 --inFile=data/inputs/example/GSD/GRNBOOST2/ExpressionData.csv --outFile=data/outputs/example/GSD/GRNBOOST2/outFile.txt " docker: Error response from daemon: linux spec user: unable to find user root: no matching entries in passwd file. ERRO[0000] error waiting for container: context canceled

We already tried this but could not resolve the problem.
Also the output file "rankedEdges.csv" is not produced for some of the remaining algorithms (Scode, Sincerities, Leap, Grisli, Singe and Scribe) when running the config file provided in the repository and therefore the evaluation afterwards fails.

Any help from your side on how to fix this would be greatly appreciated, thank you very much in advance!

Lacking of "Type" in ground truth network

Hi Aditya,

I would like to ask what if the ground truth network I have does not have "+" (activation) or "-" (repression). Is it still possible to run evaluation process? I only know that there exist regulation relationship between Gene1 and Gene2, but are not sure about the gene pairs interaction type.

Thank you.

Best,
Che-Wei

adding new files

Hi,
I recently downloaded Beeline for my Bachelor thesis and Beeline comes along with example Datasets already which i could calculate.
I wanted to use my own Datasets or from the https://zenodo.org/record/3701939 website.
I wanted to make sure which zip, from the two listed there, i should download and where to implement them or in which files i need to change the exaple datasets with the new ones
Thanks for your help

Error with Monocle

In runScribe.R, I'm getting this error at estimateDispersions:
https://github.com/Murali-group/Beeline/blob/master/Algorithms/SCRIBE/runScribe.R#L119

Error in `[.data.frame`(`*tmp*`, res$mu == 0) : 
  undefined columns selected
Calls: estimateDispersions ... eval_tidy -> disp_calc_helper_NB -> [ -> [.data.frame

Looks like this is the line in their code where the problem is happening:
https://github.com/cole-trapnell-lab/monocle-release/blob/7df105006756801a305ff43321b26d289cd6e890/R/expr_models.R#L470
According to this stack overflow question, I think they forgot a comma for indexing the dataframe.

I tried removing cells with all 0s and genes with all 0s, but that didn't help. Looks like there a couple of issues on Monocle with a similar problem, but there's no response from the developers.

This is the command I was running on csb2:

docker run --rm -v /data/jeff-law/projects/2019-04-single-cell/Beeline:/data/ monocle:base /bin/sh -c "time -v -o data/inputs/datasets/li-2017/SCRIBE/time0.txt Rscript runMonocle.R -e data/inputs/datasets/li-2017/SCRIBE/ExpressionData0.csv -c data/inputs/datasets/li-2017/SCRIBE/CellData0.csv -g data/inputs/datasets/li-2017/SCRIBE/GeneData.csv -o data/inputs/datasets/li-2017/SCRIBE/ -d 5 -l 0 -m ucRDI -x negbinomial.size --outFile outFile0.csv"

Run-time analysis

Check how long each method takes to run on different sizes of input: 10, 30, 50 nodes

Compute a stability score for outputs from each dataset

Compute spearman's rank correlation for each pair of datasets.

Analysis of outputs from different methods

Plot distribution of weights from the rankedEdges.csv file

Evaluate edges by treating them as undirected.

bioRxiv Figure 3

I quite liked Figure 3 from the bioRxiv paper:

I had a few remarks / questions:

The different plotting styles (pie / circle / square / progressbar) adds no value to the visualization. To me, it seems like the figure would be easier to interpret if the AUROC, AUPRC and Jaccard Index would use the same shape (as was the case in our figures as well).
I'm missing a summary of the results. According to this Figure, which methods perform well in what areas? This is discussed in the Discussion, but it's not apparent for me in this figure.

Duplicated rows on refNetwork.csv

Hi!

I am quite interested in the evaluation of different GRN inference models. However, I have a question. What is the point of having duplicated rows in the refNetwork.csv example file?

For example, the first 3 interactions are duplicated (SOX9 is also autoregulated). Sometimes there are also interactions that appear twice in different order, are the interactions in this file directional and this mean the regulate each other?

Thanks!

Experimental single cell expression datasets

THP1
- Currently uses the expression of only the 45 TFs in the evaluation network. What if we used the expression of all genes?
mESC (SCODE)
hESC (SCODE)
mES-PE (SCODE)
zHSC (GRNVBEM)
- Differential expression
mES-retinoic (SCINGE)
mRetina (GRNBoost2, GENIE3)
mHSC (SCRIBE)
- No processed files available
mDC LEAP
- No processed files available

Use Docker hub image for algorithms, instead of building from Dockerfile

Building docker images from Dockerfile no longer works easily. There are several reasons for this:

Possibly the biggest reason: installing older versions of software using biomanager in R is not easy (or at least unclear to me)! Many algorithms built in R are bound to fail eventually due to a lack of easy way to specify the version of the dependencies used. Moreover, many new builds can fail even if some nested dependencies of certain packages are no longer compatible (depending on how they are set-up internally). One year in, setting up Scribe from Dockerfile already doesn't work (see issue #45) 🤷.
Some algorithms used in BEELINE come from github repos. If for some reason, the owner of those repos updates/removes a public repo (even if they are pointed to an older commit); the docker set-up is not going to work (!).
Any other system commands might pose a problem too (for example: the line RUN apt-get update in the Dockerfiles is needed to install some packages/libraries, and can potentially break things down the line).

The only way I can think of getting around this issue is to make the images built at the time of manuscript preparation available via Docker hub. While building an image using Dockerfile should work (until it wont), the initial set-up going forward is to going to pull an existing image instead of relying on Dockerfiles.

An example image is already available here: https://hub.docker.com/repository/docker/grnbeeline/scribe

Here are the steps needed to push existing images (for future reference):

Login docker login --username <your username>
Find image ID for the pre-built docker image using docker images
Tag a local image: docker tag <image ID> <docker hub username>/<algorithm name>:<tag>
Alternative to steps 2-3 is to just append your docker hub username to existing images docker tag <algorithm name>:<tag> <docker hub username>/<algorithm name>:<tag>
Push to docker hub: docker push <docker hub username>/<algorithm name>:<tag>

If for some reason you need to update an existing image to create new versions:

Run the docker to make changes: docker run -it <docker hub username>/<algorithm name>:<tag> /bin/bash
Once inside the container, make your changes; exit the docker. Since we did not use --rm flag while running docker in the previous step, the modified container will be available for later use, even after exiting.
Access the modified container id by using docker ps -a
Commit modified image docker commit <container ID> <docker hub username>/<algorithm name>:<new tag>
Push to docker hub: docker push <docker hub username>/<algorithm name>:<new tag>

Running without trueEdges (refNetwork.csv)

Hi,

Thanks for this great tool.

In your documentation is says that it is possible to run base pipeline without a refNetwork.csv file. However when I try to run:
python BLRunner.py --config config-files/config_d0.yaml

I get the following error:

Traceback (most recent call last):
  File "BLRunner.py", line 77, in <module>
    main()
  File "BLRunner.py", line 59, in main
    evaluation = br.ConfigParser.parse(conf)
  File "/scratch/projects/GRN/Beeline/BLRun/__init__.py", line 137, in parse
    config_map['output_settings']))
  File "/scratch/projects/GRN/Beeline/BLRun/__init__.py", line 67, in __init__
    self.runners: Dict[int, Runner] = self.__create_runners()
  File "/scratch/projects/GRN/Beeline/BLRun/__init__.py", line 88, in __create_runners
    data['trueEdges'] = dataset['trueEdges']
KeyError: 'trueEdges'

Any help with this would be greatly appreciated.

Thanks

murali-group / beeline Goto Github PK

beeline's People

Contributors

Stargazers

Watchers

Forkers

beeline's Issues

Recommend Projects

Recommend Topics

Recommend Org