Git Product home page Git Product logo

3dchromatin_replicateqc's People

Contributors

msauria avatar oursu avatar oursu-broad avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

3dchromatin_replicateqc's Issues

Potential issue in wrappers/QuASAR/data_to_hifive.py

This may be a issue specific to me, but I had float values in my contact map which caused similar issues in bxlab/hifive#12 where there were no valid read data.

I approximately fixed it by modifying line 83 of wrappers/QuASAR/data_to_hifive.py to account for float values:
count = int(temp[4]) -> count = int(float(temp[4]))

Additionally, perhaps exceptions could be printed for easier debugging purposes?

Summary tables not created when HiRep gives NaN back

Hi,

for some reasons, HiC-rep fails for one of my resolution (50K) while everything goes fine at 10K i.e.
here is the result file SMPL1.vs.SMPL2.txt in the reproducibility/HiCRep folder:

SMPL1   SMPL2   chr2L        NA
SMPL1   SMPL2   chr2R       0.961
SMPL1   SMPL2   chr3L        0.975
SMPL1   SMPL2   chr3R        0.964
SMPL1   SMPL2   chrX          0.938

Weirdly, I have 6 comparisons and it constantly fails for chr2L for all 6 comparisons (Dm data). The result above is gained using corrected counts. But I have the same with raw counts but here all car failed :

SMPL1   SMPL2   chr2L       NA
SMPL1   SMPL2   chr2R       NA
SMPL1   SMPL2   chr3L       NA
SMPL1   SMPL2   chr3R       NA
SMPL1   SMPL2   chrX         NA

I manually executed your HiCRep_wrapper.R code and this is what the get_scc() returns :

> get.scc(m1_big[,-c(1:3)], m2_big[,-c(1:3)], resol, h, 0, 100000)
$corr
logical(0)

$wei
logical(0)

$scc
     [,1]
[1,]  NaN

$std
[1] NaN

which ends up as NA and unfortunately make the generation of the summary results in the scores folder fails.

Rscript: command not found

Hi, I am trying to run HiCRep using your setup, but can't figure out why I keep getting this error in the .e files (I am running it on the cluster):

/var/spool/gridscheduler/execd/node2c19/job_scripts/1028343: line 3: Rscript: command not found
cat: /gpfs/igmmfs01/eddie/wendy-lab/ilia/coolers/QCRep/output/results/reproducibility/HiCRep/chr1.WT-1.vs.KO-2.scores.txt: No such file or directory

I can use Rscript normally in the conda environment I am using, but for some reason it doesn't work in the jobs submitted by 3DChromatin_ReplicateQC concordance ... Any thought will be appreciated.

Suggestions to improve the installation

Hi,

I just installed your package. I wanted to have it installed centrally so colleagues could also use the tool. But I have encountered different issues:

  1. sklearn & psutil have been found missing (for GenomeDISCO) after the install
  2. The --rlib option did not work for me. The R packages where installed in a default location (that is specific to the user doing the install) and the tool could not be shared.
  3. ${pythondir}conda install -c anaconda mpi4py failed

To solve these issues and successfully share the package, I encapsulated the install into a conda environment.

conda create --prefix /path/to/condaenv/3DChromatin_ReplicateQC python=2.7
conda activate /path/to/condaenv/3DChromatin_ReplicateQC

Then:

  1. was solved by simply
pip install sklearn
pip install psutil
  1. was a little more tricky. The way I could solve this was :
  • bypass your R lib installation altogether i.e. comment the line 108-109 :
#cmd="${PATHTOR}script ${dir_of_script}/install_R_packages.R"
#eval "${cmd}"
  • pre-install the needed packages and R using conda :
    conda install -c r r=3.5.1 r-reshape2 r-rmarkdown r-testthat
    conda install -c bioconda bioconductor-rhdf5 r-pheatmap
    
  • Note that hicrep could also be easily conda installed (v 1.2 as of today) but it turned out that the v1.2 did not work with your code (would be great to fix this) so I had to stick with the 1.0.1 version. But to get it installed in the right place (ie in the conda env), I had to do it manually and give the right lib=/path/to/condaenv/3DChromatin_ReplicateQC/lib/R/library/ :
# launch R (the one from the conda env)
$ R
> source("https://bioconductor.org/biocLite.R")
> install.packages("Supplemental_hicrep_1.0.1.tar.gz", lib="/path/to/condaenv/3DChromatin_ReplicateQC/lib/R/library/")
  1. ${pythondir}conda install -c anaconda mpi4py was replaced with ${pythondir}conda install mpi4py

I hope this feedback will help.

ValueError when running 3DChromatin_ReplicateQC concordance

When running 3DChromatin_ReplicateQC step by step, I came across the following problem.

Both 3DChromatin_ReplicateQC preprocess and 3DChromatin_ReplicateQC qc generate reasonable results. But 3DChromatin_ReplicateQC concordance gives ValueError:

Step: concordance | Fri Nov  1 15:24:48 2019 | computing concordance between Control_Rep1_1000_10kb and Control_Rep2_1000_10kb
GenomeDISCO | Fri Nov  1 15:25:00 2019 | :::::::::: Starting reproducibility analysis
GenomeDISCO | Fri Nov  1 15:25:00 2019 | processing: Loading genomic regions from /data2/usr/liuy/Project/hicQC/hicQCOUTPUT/Control_Rep1VSControl_Rep2.10kb/data/nodes/nodes.chr1.gz
GenomeDISCO | Fri Nov  1 15:25:00 2019 | Loading contact maps
GenomeDISCO | Fri Nov  1 15:25:00 2019 | processing: Loading interaction data from /data2/usr/liuy/Project/hicQC/hicQCOUTPUT/Control_Rep1VSControl_Rep2.10kb/data/edges/Control_Rep1_1000_10kb/Control_Rep1_1000_10kb.chr1.gz
GenomeDISCO | Fri Nov  1 15:25:06 2019 | processing: Loading interaction data from /data2/usr/liuy/Project/hicQC/hicQCOUTPUT/Control_Rep1VSControl_Rep2.10kb/data/edges/Control_Rep2_1000_10kb/Control_Rep2_1000_10kb.chr1.gz
GenomeDISCO | Fri Nov  1 15:25:15 2019 | Subsampling depth = 7517561.0
GenomeDISCO | Fri Nov  1 15:25:25 2019 | Normalizing with sqrtvc
GenomeDISCO | Fri Nov  1 15:25:26 2019 | Distance dependence analysis
GenomeDISCO | Fri Nov  1 15:25:26 2019 | Computing reproducibility score
GenomeDISCO | Fri Nov  1 15:25:27 2019 | done t=1 (not included in score calculation)
GenomeDISCO | Fri Nov  1 15:25:54 2019 | done t=2 (not included in score calculation)
GenomeDISCO | Fri Nov  1 15:27:03 2019 | done t=3 | score=0.938
GenomeDISCO | Fri Nov  1 15:27:09 2019 | :::::::::: Starting reproducibility analysis
GenomeDISCO | Fri Nov  1 15:27:09 2019 | processing: Loading genomic regions from /data2/usr/liuy/Project/hicQC/hicQCOUTPUT/Control_Rep1VSControl_Rep2.10kb/data/nodes/nodes.chr2.gz
GenomeDISCO | Fri Nov  1 15:27:09 2019 | Loading contact maps
GenomeDISCO | Fri Nov  1 15:27:09 2019 | processing: Loading interaction data from /data2/usr/liuy/Project/hicQC/hicQCOUTPUT/Control_Rep1VSControl_Rep2.10kb/data/edges/Control_Rep1_1000_10kb/Control_Rep1_1000_10kb.chr2.gz
GenomeDISCO | Fri Nov  1 15:27:12 2019 | processing: Loading interaction data from /data2/usr/liuy/Project/hicQC/hicQCOUTPUT/Control_Rep1VSControl_Rep2.10kb/data/edges/Control_Rep2_1000_10kb/Control_Rep2_1000_10kb.chr2.gz
GenomeDISCO | Fri Nov  1 15:27:16 2019 | Subsampling depth = 6256585.0
GenomeDISCO | Fri Nov  1 15:27:21 2019 | Normalizing with sqrtvc
GenomeDISCO | Fri Nov  1 15:27:22 2019 | Distance dependence analysis
GenomeDISCO | Fri Nov  1 15:27:22 2019 | Computing reproducibility score
GenomeDISCO | Fri Nov  1 15:27:22 2019 | done t=1 (not included in score calculation)
GenomeDISCO | Fri Nov  1 15:27:33 2019 | done t=2 (not included in score calculation)
GenomeDISCO | Fri Nov  1 15:27:52 2019 | done t=3 | score=0.947
GenomeDISCO | Fri Nov  1 15:27:55 2019 | :::::::::: Starting reproducibility analysis
GenomeDISCO | Fri Nov  1 15:27:55 2019 | processing: Loading genomic regions from /data2/usr/liuy/Project/hicQC/hicQCOUTPUT/Control_Rep1VSControl_Rep2.10kb/data/nodes/nodes.chr3.gz
GenomeDISCO | Fri Nov  1 15:27:55 2019 | Loading contact maps
GenomeDISCO | Fri Nov  1 15:27:55 2019 | processing: Loading interaction data from /data2/usr/liuy/Project/hicQC/hicQCOUTPUT/Control_Rep1VSControl_Rep2.10kb/data/edges/Control_Rep1_1000_10kb/Control_Rep1_1000_10kb.chr3.gz
GenomeDISCO | Fri Nov  1 15:27:59 2019 | processing: Loading interaction data from /data2/usr/liuy/Project/hicQC/hicQCOUTPUT/Control_Rep1VSControl_Rep2.10kb/data/edges/Control_Rep2_1000_10kb/Control_Rep2_1000_10kb.chr3.gz
GenomeDISCO | Fri Nov  1 15:28:05 2019 | Subsampling depth = 5682122.0
GenomeDISCO | Fri Nov  1 15:28:11 2019 | Normalizing with sqrtvc
GenomeDISCO | Fri Nov  1 15:28:12 2019 | Distance dependence analysis
GenomeDISCO | Fri Nov  1 15:28:12 2019 | Computing reproducibility score
GenomeDISCO | Fri Nov  1 15:28:12 2019 | done t=1 (not included in score calculation)
GenomeDISCO | Fri Nov  1 15:28:26 2019 | done t=2 (not included in score calculation)
GenomeDISCO | Fri Nov  1 15:28:57 2019 | done t=3 | score=0.946
GenomeDISCO | Fri Nov  1 15:29:01 2019 | :::::::::: Starting reproducibility analysis
GenomeDISCO | Fri NTraceback (most recent call last):
  File "/data2/usr/liuy/software/3DChromatin_ReplicateQC/wrappers/HiC-Spector/run_reproducibility_v1.py", line 209, in <module>
    main()
  File "/data2/usr/liuy/software/3DChromatin_ReplicateQC/wrappers/HiC-Spector/run_reproducibility_v1.py", line 202, in main
    get_reproducibility(M1,M2,int(sys.argv[6]))
  File "/data2/usr/liuy/software/3DChromatin_ReplicateQC/wrappers/HiC-Spector/run_reproducibility_v1.py", line 157, in get_reproducibility
    a1, b1=eigsh(M1b_L,k=num_evec,which="SM")
  File "/home/liuyi/.python2.7.15/lib/python2.7/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 1585, in eigsh
    return eigh(A, b=M, eigvals_only=not return_eigenvectors)
  File "/home/liuyi/.python2.7.15/lib/python2.7/site-packages/scipy/linalg/decomp.py", line 432, in eigh
    iu=a1.shape[0], overwrite_a=overwrite_a)
ValueError: failed to create intent(cache|hide)|optional array-- must have defined dimensions but got (0,)
Traceback (most recent call last):
  File "/data2/usr/liuy/software/3DChromatin_ReplicateQC/wrappers/HiC-Spector/run_reproducibility_v1.py", line 209, in <module>
    main()
  File "/data2/usr/liuy/software/3DChromatin_ReplicateQC/wrappers/HiC-Spector/run_reproducibility_v1.py", line 202, in main
    get_reproducibility(M1,M2,int(sys.argv[6]))
  File "/data2/usr/liuy/software/3DChromatin_ReplicateQC/wrappers/HiC-Spector/run_reproducibility_v1.py", line 157, in get_reproducibility
    a1, b1=eigsh(M1b_L,k=num_evec,which="SM")
  File "/home/liuyi/.python2.7.15/lib/python2.7/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 1585, in eigsh
    return eigh(A, b=M, eigvals_only=not return_eigenvectors)
  File "/home/liuyi/.python2.7.15/lib/python2.7/site-packages/scipy/linalg/decomp.py", line 432, in eigh
    iu=a1.shape[0], overwrite_a=overwrite_a)
ValueError: failed to create intent(cache|hide)|optional array-- must have defined dimensions but got (0,)
Traceback (most recent call last):
  File "/data2/usr/liuy/software/3DChromatin_ReplicateQC/wrappers/HiC-Spector/run_reproducibility_v1.py", line 209, in <module>
    main()
  File "/data2/usr/liuy/software/3DChromatin_ReplicateQC/wrappers/HiC-Spector/run_reproducibility_v1.py", line 202, in main
    get_reproducibility(M1,M2,int(sys.argv[6]))
  File "/data2/usr/liuy/software/3DChromatin_ReplicateQC/wrappers/HiC-Spector/run_reproducibility_v1.py", line 157, in get_reproducibility
    a1, b1=eigsh(M1b_L,k=num_evec,which="SM")
  File "/home/liuyi/.python2.7.15/lib/python2.7/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 1585, in eigsh
    return eigh(A, b=M, eigvals_only=not return_eigenvectors)
  File "/home/liuyi/.python2.7.15/lib/python2.7/site-packages/scipy/linalg/decomp.py", line 432, in eigh
    iu=a1.shape[0], overwrite_a=overwrite_a)
ValueError: failed to create intent(cache|hide)|optional array-- must have defined dimensions but got (0,)
Traceback (most recent call last):
  File "/data2/usr/liuy/software/3DChromatin_ReplicateQC/wrappers/HiC-Spector/run_reproducibility_v1.py", line 209, in <module>
    main()
  File "/data2/usr/liuy/software/3DChromatin_ReplicateQC/wrappers/HiC-Spector/run_reproducibility_v1.py", line 202, in main
    get_reproducibility(M1,M2,int(sys.argv[6]))
  File "/data2/usr/liuy/software/3DChromatin_ReplicateQC/wrappers/HiC-Spector/run_reproducibility_v1.py", line 157, in get_reproducibility
    a1, b1=eigsh(M1b_L,k=num_evec,which="SM")
  File "/home/liuyi/.python2.7.15/lib/python2.7/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 1585, in eigsh
    return eigh(A, b=M, eigvals_only=not return_eigenvectors)
  File "/home/liuyi/.python2.7.15/lib/python2.7/site-packages/scipy/linalg/decomp.py", line 432, in eigh
    iu=a1.shape[0], overwrite_a=overwrite_a)
ValueError: failed to create intent(cache|hide)|optional array-- must have defined dimensions but got (0,)
Traceback (most recent call last):
  File "/data2/usr/liuy/software/3DChromatin_ReplicateQC/wrappers/HiC-Spector/run_reproducibility_v1.py", line 209, in <module>
    main()
  File "/data2/usr/liuy/software/3DChromatin_ReplicateQC/wrappers/HiC-Spector/run_reproducibility_v1.py", line 202, in main
    get_reproducibility(M1,M2,int(sys.argv[6]))
  File "/data2/usr/liuy/software/3DChromatin_ReplicateQC/wrappers/HiC-Spector/run_reproducibility_v1.py", line 163, in get_reproducibility
    b1_extend[i_nz1,i]=b1[:,i]
IndexError: index 2 is out of bounds for axis 1 with size 2
Loading required package: hicrep
Loading required package: reshape2
Loading required package: hicrep
Loading required package: reshape2
Loading required package: hicrep
Loading required package: reshape2
Loading required package: hicrep
Loading required package: reshape2
Loading required package: hicrep
Loading required package: reshape2
ov  1 15:29:01 2019 | processing: Loading genomic regions from /data2/usr/liuy/Project/hicQC/hicQCOUTPUT/Control_Rep1VSControl_Rep2.10kb/data/nodes/nodes.chr4.gz
GenomeDISCO | Fri Nov  1 15:29:01 2019 | Loading contact maps
GenomeDISCO | Fri Nov  1 15:29:01 2019 | processing: Loading interaction data from /data2/usr/liuy/Project/hicQC/hicQCOUTPUT/Control_Rep1VSControl_Rep2.10kb/data/edges/Control_Rep1_1000_10kb/Control_Rep1_1000_10kb.chr4.gz
GenomeDISCO | Fri Nov  1 15:29:04 2019 | processing: Loading interaction data from /data2/usr/liuy/Project/hicQC/hicQCOUTPUT/Control_Rep1VSControl_Rep2.10kb/data/edges/Control_Rep2_1000_10kb/Control_Rep2_1000_10kb.chr4.gz
GenomeDISCO | Fri Nov  1 15:29:07 2019 | Subsampling depth = 4125211.0
GenomeDISCO | Fri Nov  1 15:29:12 2019 | Normalizing with sqrtvc
GenomeDISCO | Fri Nov  1 15:29:13 2019 | Distance dependence analysis
GenomeDISCO | Fri Nov  1 15:29:13 2019 | Computing reproducibility score
GenomeDISCO | Fri Nov  1 15:29:13 2019 | done t=1 (not included in score calculation)
GenomeDISCO | Fri Nov  1 15:29:22 2019 | done t=2 (not included in score calculation)
GenomeDISCO | Fri Nov  1 15:29:37 2019 | done t=3 | score=0.945
GenomeDISCO | Fri Nov  1 15:29:40 2019 | :::::::::: Starting reproducibility analysis
GenomeDISCO | Fri Nov  1 15:29:40 2019 | processing: Loading genomic regions from /data2/usr/liuy/Project/hicQC/hicQCOUTPUT/Control_Rep1VSControl_Rep2.10kb/data/nodes/nodes.chr5.gz
GenomeDISCO | Fri Nov  1 15:29:41 2019 | Loading contact maps
GenomeDISCO | Fri Nov  1 15:29:41 2019 | processing: Loading interaction data from /data2/usr/liuy/Project/hicQC/hicQCOUTPUT/Control_Rep1VSControl_Rep2.10kb/data/edges/Control_Rep1_1000_10kb/Control_Rep1_1000_10kb.chr5.gz
GenomeDISCO | Fri Nov  1 15:29:45 2019 | processing: Loading interaction data from /data2/usr/liuy/Project/hicQC/hicQCOUTPUT/Control_Rep1VSControl_Rep2.10kb/data/edges/Control_Rep2_1000_10kb/Control_Rep2_1000_10kb.chr5.gz
GenomeDISCO | Fri Nov  1 15:29:53 2019 | Subsampling depth = 6519023.0
GenomeDISCO | Fri Nov  1 15:30:01 2019 | Normalizing with sqrtvc
GenomeDISCO | Fri Nov  1 15:30:01 2019 | Distance dependence analysis
GenomeDISCO | Fri Nov  1 15:30:01 2019 | Computing reproducibility score
GenomeDISCO | Fri Nov  1 15:30:02 2019 | done t=1 (not included in score calculation)
GenomeDISCO | Fri Nov  1 15:30:21 2019 | done t=2 (not included in score calculation)
GenomeDISCO | Fri Nov  1 15:31:06 2019 | done t=3 | score=0.942

I only specified --metadata_pairs and --outdir options in this command, and it semms that my data is somewhat special and triggered errors in scipy.

Do you know how to deal with such problem? Thanks!

Problems installing

I am trying to install 3DChromatin_ReplicateQC and I am having issues. I have all the requirements installed and when I try to run the installation script I get the following errors:


sudo ./install_3DChromatin_ReplicateQC.sh --pathtopython /home/John/anaconda2/bin/ --pathtor /home/John/R-3.4.0/bin/R --rlib /home/John/R-3.4.0/library/ --pathtobedtools /usr/bin/bedtools

Cloning into '/mnt/d/3DChromatin_ReplicateQC/software/genomedisco'...
remote: Counting objects: 1273, done.
remote: Total 1273 (delta 0), reused 0 (delta 0), pack-reused 1273
Receiving objects: 100% (1273/1273), 281.90 MiB | 2.78 MiB/s, done.
Resolving deltas: 100% (624/624), done.
Checking connectivity... done.

R version 3.4.0 (2017-04-21) -- "You Stupid Darkness"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> install.packages("pheatmap",repos="http://cran.rstudio.com/")
trying URL 'http://cran.rstudio.com/src/contrib/pheatmap_1.0.8.tar.gz'
Content type 'application/x-gzip' length 13759 bytes (13 KB)
==================================================
downloaded 13 KB

* installing *source* package ‘pheatmap’ ...
** package ‘pheatmap’ successfully unpacked and MD5 sums checked
** R
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (pheatmap)

The downloaded source packages are in
        ‘/tmp/RtmpWxevWx/downloaded_packages’
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
>
>
Bioconductor version 3.5 (BiocInstaller 1.26.1), ?biocLite for help
BioC_mirror: https://bioconductor.org
Using Bioconductor 3.5 (BiocInstaller 1.26.1), R 3.4.0 (2017-04-21).
Installing package(s) ‘hicrep’
trying URL 'https://bioconductor.org/packages/3.5/bioc/src/contrib/hicrep_1.0.0.tar.gz'
Content type 'application/x-gzip' length 193621 bytes (189 KB)
==================================================
downloaded 189 KB

* installing *source* package ‘hicrep’ ...
** R
** data
*** moving datasets to lazyload DB
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (hicrep)

The downloaded source packages are in
        ‘/tmp/Rtmp90vL1z/downloaded_packages’
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
Old packages: 'boot', 'foreign', 'Matrix', 'mgcv'
Error in path.expand(new) : object 'new' not found
Calls: print ... .libPaths -> Sys.glob -> path.expand -> path.expand
Execution halted
Cloning into '/mnt/d/3DChromatin_ReplicateQC/software/HiC-spector'...
remote: Counting objects: 201, done.
remote: Total 201 (delta 0), reused 0 (delta 0), pack-reused 201
Receiving objects: 100% (201/201), 42.57 KiB | 0 bytes/s, done.
Resolving deltas: 100% (101/101), done.
Checking connectivity... done.
Cloning into '/mnt/d/3DChromatin_ReplicateQC/software/hifive'...
remote: Counting objects: 2191, done.
remote: Total 2191 (delta 0), reused 0 (delta 0), pack-reused 2191
Receiving objects: 100% (2191/2191), 2.86 MiB | 4.12 MiB/s, done.
Resolving deltas: 100% (1337/1337), done.
Checking connectivity... done.
./install_3DChromatin_ReplicateQC.sh: line 112: /home/John/anaconda2/python: No such file or directory
./install_3DChromatin_ReplicateQC.sh: line 113: /home/John/anaconda2/pip: No such file or directory
./install_3DChromatin_ReplicateQC.sh: line 114: /home/John/anaconda2/conda: No such file or directory

Do you know what the issue here is or how I can fix it?

KeyError: "Unable to open object (Object 'dist.1.1000000' doesn't exist)"

I was able to install the program and ran it on the example data and it worked fined. Now I am trying to run it on my own Hi-C data. I am getting the following output and errors when I try to run the script:

john@john-VirtualBox:~/3DChromatin_ReplicateQC$ python 3DChromatin_ReplicateQC.py run_all --metadata_samples brain/metadata.samples --metadata_pairs brain/metadata.pairs --bins brain/dplfc_1000000_abs.bed.gz --outdir output
4161718 validly-mapped reads pairs loaded.        
4161718 total validly-mapped read pairs loaded. 325 valid fend pairs
Parsing fend pairs... Done  221904 cis reads, 3939814 trans reads
Filtering fends... Removed 3078 of 3103 bins
Traceback (most recent call last):                                                                                      
  File "/home/john/3DChromatin_ReplicateQC/software/genomedisco/reproducibility_analysis/plot_quasar_transform.py", line 68, in <module>
    main()
  File "/home/john/3DChromatin_ReplicateQC/software/genomedisco/reproducibility_analysis/plot_quasar_transform.py", line 47, in main
    data1 = load_data(infile1, chroms, resolutions)
  File "/home/john/3DChromatin_ReplicateQC/software/genomedisco/reproducibility_analysis/plot_quasar_transform.py", line 24, in load_data
    dist = infile['dist.%s.%i' % (chrom, res)][...]
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/home/john/anaconda2/lib/python2.7/site-packages/h5py/_hl/group.py", line 169, in __getitem__
    oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: "Unable to open object (Object 'dist.1.1000000' doesn't exist)"
4235673 validly-mapped reads pairs loaded.        
4235673 total validly-mapped read pairs loaded. 325 valid fend pairs
Parsing fend pairs... Done  222852 cis reads, 4012821 trans reads
Filtering fends... Removed 3078 of 3103 bins
Traceback (most recent call last):                                                                                      
  File "/home/john/3DChromatin_ReplicateQC/software/genomedisco/reproducibility_analysis/plot_quasar_transform.py", line 68, in <module>
    main()
  File "/home/john/3DChromatin_ReplicateQC/software/genomedisco/reproducibility_analysis/plot_quasar_transform.py", line 47, in main
    data1 = load_data(infile1, chroms, resolutions)
  File "/home/john/3DChromatin_ReplicateQC/software/genomedisco/reproducibility_analysis/plot_quasar_transform.py", line 24, in load_data
    dist = infile['dist.%s.%i' % (chrom, res)][...]
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/home/john/anaconda2/lib/python2.7/site-packages/h5py/_hl/group.py", line 169, in __getitem__
    oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: "Unable to open object (Object 'dist.1.1000000' doesn't exist)"
3DChromatin_ReplicateQC | Tue Oct 10 13:15:56 2017 | Splitting nodes chr1
3DChromatin_ReplicateQC | Tue Oct 10 13:15:56 2017 | Splitting dplfc_1_1mb.txt chr1
3DChromatin_ReplicateQC | Tue Oct 10 13:16:01 2017 | Splitting dplfc_2_1mb.txt chr1
3DChromatin_ReplicateQC | Tue Oct 10 13:16:07 2017 | Splitting nodes chr10
3DChromatin_ReplicateQC | Tue Oct 10 13:16:07 2017 | Splitting dplfc_1_1mb.txt chr10
3DChromatin_ReplicateQC | Tue Oct 10 13:16:12 2017 | Splitting dplfc_2_1mb.txt chr10
3DChromatin_ReplicateQC | Tue Oct 10 13:16:17 2017 | Splitting nodes chr11
3DChromatin_ReplicateQC | Tue Oct 10 13:16:17 2017 | Splitting dplfc_1_1mb.txt chr11
3DChromatin_ReplicateQC | Tue Oct 10 13:16:22 2017 | Splitting dplfc_2_1mb.txt chr11
3DChromatin_ReplicateQC | Tue Oct 10 13:16:27 2017 | Splitting nodes chr12
3DChromatin_ReplicateQC | Tue Oct 10 13:16:27 2017 | Splitting dplfc_1_1mb.txt chr12
3DChromatin_ReplicateQC | Tue Oct 10 13:16:32 2017 | Splitting dplfc_2_1mb.txt chr12
3DChromatin_ReplicateQC | Tue Oct 10 13:16:37 2017 | Splitting nodes chr13
3DChromatin_ReplicateQC | Tue Oct 10 13:16:37 2017 | Splitting dplfc_1_1mb.txt chr13
3DChromatin_ReplicateQC | Tue Oct 10 13:16:43 2017 | Splitting dplfc_2_1mb.txt chr13
3DChromatin_ReplicateQC | Tue Oct 10 13:16:48 2017 | Splitting nodes chr14
3DChromatin_ReplicateQC | Tue Oct 10 13:16:48 2017 | Splitting dplfc_1_1mb.txt chr14
3DChromatin_ReplicateQC | Tue Oct 10 13:16:53 2017 | Splitting dplfc_2_1mb.txt chr14
3DChromatin_ReplicateQC | Tue Oct 10 13:16:58 2017 | Splitting nodes chr15
3DChromatin_ReplicateQC | Tue Oct 10 13:16:58 2017 | Splitting dplfc_1_1mb.txt chr15
3DChromatin_ReplicateQC | Tue Oct 10 13:17:03 2017 | Splitting dplfc_2_1mb.txt chr15
3DChromatin_ReplicateQC | Tue Oct 10 13:17:08 2017 | Splitting nodes chr16
3DChromatin_ReplicateQC | Tue Oct 10 13:17:08 2017 | Splitting dplfc_1_1mb.txt chr16
3DChromatin_ReplicateQC | Tue Oct 10 13:17:14 2017 | Splitting dplfc_2_1mb.txt chr16
3DChromatin_ReplicateQC | Tue Oct 10 13:17:19 2017 | Splitting nodes chr17
3DChromatin_ReplicateQC | Tue Oct 10 13:17:19 2017 | Splitting dplfc_1_1mb.txt chr17
3DChromatin_ReplicateQC | Tue Oct 10 13:17:24 2017 | Splitting dplfc_2_1mb.txt chr17
3DChromatin_ReplicateQC | Tue Oct 10 13:17:30 2017 | Splitting nodes chr18
3DChromatin_ReplicateQC | Tue Oct 10 13:17:30 2017 | Splitting dplfc_1_1mb.txt chr18
3DChromatin_ReplicateQC | Tue Oct 10 13:17:34 2017 | Splitting dplfc_2_1mb.txt chr18
3DChromatin_ReplicateQC | Tue Oct 10 13:17:40 2017 | Splitting nodes chr19
3DChromatin_ReplicateQC | Tue Oct 10 13:17:40 2017 | Splitting dplfc_1_1mb.txt chr19
3DChromatin_ReplicateQC | Tue Oct 10 13:17:45 2017 | Splitting dplfc_2_1mb.txt chr19
3DChromatin_ReplicateQC | Tue Oct 10 13:17:50 2017 | Splitting nodes chr2
3DChromatin_ReplicateQC | Tue Oct 10 13:17:50 2017 | Splitting dplfc_1_1mb.txt chr2
3DChromatin_ReplicateQC | Tue Oct 10 13:17:55 2017 | Splitting dplfc_2_1mb.txt chr2
3DChromatin_ReplicateQC | Tue Oct 10 13:18:01 2017 | Splitting nodes chr20
3DChromatin_ReplicateQC | Tue Oct 10 13:18:01 2017 | Splitting dplfc_1_1mb.txt chr20
3DChromatin_ReplicateQC | Tue Oct 10 13:18:05 2017 | Splitting dplfc_2_1mb.txt chr20
3DChromatin_ReplicateQC | Tue Oct 10 13:18:11 2017 | Splitting nodes chr21
3DChromatin_ReplicateQC | Tue Oct 10 13:18:11 2017 | Splitting dplfc_1_1mb.txt chr21
3DChromatin_ReplicateQC | Tue Oct 10 13:18:16 2017 | Splitting dplfc_2_1mb.txt chr21
3DChromatin_ReplicateQC | Tue Oct 10 13:18:21 2017 | Splitting nodes chr22
3DChromatin_ReplicateQC | Tue Oct 10 13:18:21 2017 | Splitting dplfc_1_1mb.txt chr22
3DChromatin_ReplicateQC | Tue Oct 10 13:18:26 2017 | Splitting dplfc_2_1mb.txt chr22
3DChromatin_ReplicateQC | Tue Oct 10 13:18:31 2017 | Splitting nodes chr3
3DChromatin_ReplicateQC | Tue Oct 10 13:18:31 2017 | Splitting dplfc_1_1mb.txt chr3
3DChromatin_ReplicateQC | Tue Oct 10 13:18:36 2017 | Splitting dplfc_2_1mb.txt chr3
3DChromatin_ReplicateQC | Tue Oct 10 13:18:42 2017 | Splitting nodes chr4
3DChromatin_ReplicateQC | Tue Oct 10 13:18:42 2017 | Splitting dplfc_1_1mb.txt chr4
3DChromatin_ReplicateQC | Tue Oct 10 13:18:47 2017 | Splitting dplfc_2_1mb.txt chr4
3DChromatin_ReplicateQC | Tue Oct 10 13:18:52 2017 | Splitting nodes chr5
3DChromatin_ReplicateQC | Tue Oct 10 13:18:52 2017 | Splitting dplfc_1_1mb.txt chr5
3DChromatin_ReplicateQC | Tue Oct 10 13:18:57 2017 | Splitting dplfc_2_1mb.txt chr5
3DChromatin_ReplicateQC | Tue Oct 10 13:19:03 2017 | Splitting nodes chr6
3DChromatin_ReplicateQC | Tue Oct 10 13:19:03 2017 | Splitting dplfc_1_1mb.txt chr6
3DChromatin_ReplicateQC | Tue Oct 10 13:19:08 2017 | Splitting dplfc_2_1mb.txt chr6
3DChromatin_ReplicateQC | Tue Oct 10 13:19:14 2017 | Splitting nodes chr7
3DChromatin_ReplicateQC | Tue Oct 10 13:19:14 2017 | Splitting dplfc_1_1mb.txt chr7
3DChromatin_ReplicateQC | Tue Oct 10 13:19:19 2017 | Splitting dplfc_2_1mb.txt chr7
3DChromatin_ReplicateQC | Tue Oct 10 13:19:25 2017 | Splitting nodes chr8
3DChromatin_ReplicateQC | Tue Oct 10 13:19:25 2017 | Splitting dplfc_1_1mb.txt chr8
3DChromatin_ReplicateQC | Tue Oct 10 13:19:29 2017 | Splitting dplfc_2_1mb.txt chr8
3DChromatin_ReplicateQC | Tue Oct 10 13:19:37 2017 | Splitting nodes chr9
3DChromatin_ReplicateQC | Tue Oct 10 13:19:37 2017 | Splitting dplfc_1_1mb.txt chr9
3DChromatin_ReplicateQC | Tue Oct 10 13:19:42 2017 | Splitting dplfc_2_1mb.txt chr9
3DChromatin_ReplicateQC | Tue Oct 10 13:19:48 2017 | Splitting nodes chrM
3DChromatin_ReplicateQC | Tue Oct 10 13:19:48 2017 | Splitting dplfc_1_1mb.txt chrM
3DChromatin_ReplicateQC | Tue Oct 10 13:19:53 2017 | Splitting dplfc_2_1mb.txt chrM
3DChromatin_ReplicateQC | Tue Oct 10 13:19:58 2017 | Splitting nodes chrX
3DChromatin_ReplicateQC | Tue Oct 10 13:19:58 2017 | Splitting dplfc_1_1mb.txt chrX
3DChromatin_ReplicateQC | Tue Oct 10 13:20:03 2017 | Splitting dplfc_2_1mb.txt chrX
3DChromatin_ReplicateQC | Tue Oct 10 13:20:08 2017 | Splitting nodes chrY
3DChromatin_ReplicateQC | Tue Oct 10 13:20:08 2017 | Splitting dplfc_1_1mb.txt chrY
3DChromatin_ReplicateQC | Tue Oct 10 13:20:14 2017 | Splitting dplfc_2_1mb.txt chrY
/home/john/3DChromatin_ReplicateQC/software/hifive/bin/find_quasar_quality_score:68: RuntimeWarning: invalid value encountered in double_scalars
  results[i, -1] = temp[0] / temp[1] - temp[2] / temp[3]
Traceback (most recent call last):
  File "/home/john/3DChromatin_ReplicateQC/software/genomedisco/reproducibility_analysis/quasar_split_by_chromosomes_qc.py", line 27, in <module>
    main()
  File "/home/john/3DChromatin_ReplicateQC/software/genomedisco/reproducibility_analysis/quasar_split_by_chromosomes_qc.py", line 23, in main
    outfile.write(samplename+'\t'+scorelist[d[chromo]]+'\n')
IndexError: list index out of range
Traceback (most recent call last):
  File "3DChromatin_ReplicateQC.py", line 712, in <module>
    main()
  File "3DChromatin_ReplicateQC.py", line 708, in main
    command_methods[command](**args)
  File "3DChromatin_ReplicateQC.py", line 687, in run_all
    get_qc(metadata_samples,methods,parameters_file,outdir,running_mode,concise_analysis,subset_chromosomes)
  File "3DChromatin_ReplicateQC.py", line 390, in get_qc
    quasar_qc_wrapper(outdir,None,samplename,running_mode)
  File "3DChromatin_ReplicateQC.py", line 279, in quasar_qc_wrapper
    run_script(script_comparison_file,running_mode)
  File "3DChromatin_ReplicateQC.py", line 222, in run_script
    output=subp.check_output(['bash','-c',script_name])
  File "/home/john/anaconda2/lib/python2.7/subprocess.py", line 219, in check_output
    raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '['bash', '-c', 'output/scripts/QuASAR-QC/dplfc_1_1mb.txt/dplfc_1_1mb.txt.QuASAR-QC.sh']' returned non-zero exit status 1

I think I have all the inputs in the correct format but they are here if you would like to compare:
https://github.com/jstansfield0/brain/tree/master/brain

Do you know what is causing these errors?

AttributeError when running example script

Hello, I am running into the following error after installing the pipeline and testing it on the example data. Any insights into what might be causing the issue? Thanks!

-bash-4.1$ python 3DChromatin_ReplicateQC.py run_all --metadata_samples examples/metadata.samples --metadata_pairs examples/metadata.pairs --bins examples/Nodes.w40000.bed.gz --outdir examples/output
Traceback (most recent call last):
File "3DChromatin_ReplicateQC.py", line 709, in
main()
File "3DChromatin_ReplicateQC.py", line 705, in main
command_methodscommand
File "3DChromatin_ReplicateQC.py", line 683, in run_all
split_by_chromosome(metadata_samples,bins,re_fragments,methods,outdir,running_mode,subset_chromosomes)
File "3DChromatin_ReplicateQC.py", line 156, in split_by_chromosome
subp.check_output(['bash','-c','mkdir -p '+outdir+'/scripts'])
AttributeError: 'module' object has no attribute 'check_output'

installing error: Couldn't find index page for 'setuptools_cython' (maybe misspelled?)

Hi,

I am trying to install 3DChromatin_ReplicateQC, but got the following error

$ ~/software/anaconda2/bin/pip install 3DChromatin_ReplicateQC/
DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. pip 21.0 will drop support for Python 2.7 in January 2021. More details about Python 2 support in pip can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support pip 21.0 will remove support for this functionality.
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Processing ./3DChromatin_ReplicateQC
Requirement already satisfied: numpy>=1.9 in ./anaconda2/lib/python2.7/site-packages (from 3DChromatin-ReplicateQC==0.0.1) (1.16.4)
Requirement already satisfied: matplotlib>=1.5.0 in ./anaconda2/lib/python2.7/site-packages (from 3DChromatin-ReplicateQC==0.0.1) (1.5.3)
Requirement already satisfied: h5py in ./anaconda2/lib/python2.7/site-packages (from 3DChromatin-ReplicateQC==0.0.1) (2.9.0)
Collecting hifive==1.5.6
  Using cached https://pypi.tuna.tsinghua.edu.cn/packages/87/db/352ffd43f2ac26a6071b42ba7ad489d1c4328c536dc1099e0fa99b6fa43a/hifive-1.5.6.tar.gz (1.3 MB)
    ERROR: Command errored out with exit status 1:
     command: /home/niuyw/software/anaconda2/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-nk4ypx/hifive/setup.py'"'"'; __file__='"'"'/tmp/pip-install-nk4ypx/hifive/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-Uv0KGo
         cwd: /tmp/pip-install-nk4ypx/hifive/
    Complete output (26 lines):
    Couldn't find index page for 'setuptools_cython' (maybe misspelled?)
    No local packages or download links found for setuptools-cython
    install_dir .
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-nk4ypx/hifive/setup.py", line 189, in <module>
        setup_package()
      File "/tmp/pip-install-nk4ypx/hifive/setup.py", line 146, in setup_package
        setup(**metadata)
      File "/home/niuyw/software/anaconda2/lib/python2.7/distutils/core.py", line 111, in setup
        _setup_distribution = dist = klass(attrs)
      File "/home/niuyw/software/anaconda2/lib/python2.7/site-packages/distribute-0.6.14-py2.7.egg/setuptools/dist.py", line 221, in __init__
        self.fetch_build_eggs(attrs.pop('setup_requires'))
      File "/home/niuyw/software/anaconda2/lib/python2.7/site-packages/distribute-0.6.14-py2.7.egg/setuptools/dist.py", line 245, in fetch_build_eggs
        parse_requirements(requires), installer=self.fetch_build_egg
      File "/home/niuyw/software/anaconda2/lib/python2.7/site-packages/distribute-0.6.14-py2.7.egg/pkg_resources.py", line 544, in resolve
        dist = best[req.key] = env.best_match(req, self, installer)
      File "/home/niuyw/software/anaconda2/lib/python2.7/site-packages/distribute-0.6.14-py2.7.egg/pkg_resources.py", line 786, in best_match
        return self.obtain(req, installer) # try and download/install
      File "/home/niuyw/software/anaconda2/lib/python2.7/site-packages/distribute-0.6.14-py2.7.egg/pkg_resources.py", line 798, in obtain
        return installer(requirement)
      File "/home/niuyw/software/anaconda2/lib/python2.7/site-packages/distribute-0.6.14-py2.7.egg/setuptools/dist.py", line 293, in fetch_build_egg
        return cmd.easy_install(req)
      File "/home/niuyw/software/anaconda2/lib/python2.7/site-packages/distribute-0.6.14-py2.7.egg/setuptools/command/easy_install.py", line 576, in easy_install
        raise DistutilsError(msg)
    distutils.errors.DistutilsError: Could not find suitable distribution for Requirement.parse('setuptools-cython')
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

Any suggestions would be highly appreciated.

can't run concordance

Hi,
I have run the example successfully.
When I try my own data, I have successfully run the preprocess and QC parts, however, I couldn't get the concordance results.
The problem is listed below

Step: qc | Tue Mar 14 15:42:24 2023 | running QuASAR-QC | computing QC for CRC-T_1
Step: qc | Tue Mar 14 15:42:41 2023 | running QuASAR-QC | computing QC for CRC-P_1
Step: qc | Tue Mar 14 15:42:43 2023 | running QuASAR-QC | computing QC for CRC-A_1
Step: qc | Tue Mar 14 15:42:45 2023 | running QuASAR-QC | computing QC for CRC-N_1
Step: concordance | Tue Mar 14 15:42:46 2023 | computing concordance between CRC-T_1 and CRC-P_1
Step: concordance | Tue Mar 14 15:42:59 2023 | computing concordance between CRC-T_1 and CRC-A_1
Step: concordance | Tue Mar 14 15:43:12 2023 | computing concordance between CRC-T_1 and CRC-N_1
Step: concordance | Tue Mar 14 15:43:25 2023 | computing concordance between CRC-P_1 and CRC-A_1
Step: concordance | Tue Mar 14 15:43:38 2023 | computing concordance between CRC-P_1 and CRC-N_1
Step: concordance | Tue Mar 14 15:43:50 2023 | computing concordance between CRC-A_1 and CRC-N_1
Traceback (most recent call last):
File "./3DChromatin_ReplicateQC", line 11, in
load_entry_point('3DChromatin-ReplicateQC', 'console_scripts', '3DChromatin_ReplicateQC')()
File "/share/Adata/hic/drawing/software/3DChromatin_ReplicateQC/3DChromatin_ReplicateQC/main.py", line 17, in main
command_methodscommand
File "/share/Adata/hic/drawing/software/3DChromatin_ReplicateQC/software/genomedisco/genomedisco/concordance_utils.py", line 849, in run_all
concordance(metadata_pairs,methods,outdir,running_mode,concise_analysis,subset_chromosomes,timing)
File "/share/Adata/hic/drawing/software/3DChromatin_ReplicateQC/software/genomedisco/genomedisco/concordance_utils.py", line 454, in concordance
samplename1,samplename2=items[0],items[1]
IndexError: list index out of range

image

cannot install HiCRep

Hi,
I cannot install HiCRep using your directions (I am using R 3.6.0).
My command:
3DChromatin_ReplicateQC/install_scripts/install_3DChromatin_ReplicateQC.sh --pathtopython ~/programs/anaconda2/bin/python --modules R/3.6.0

I get this error message:
========= installing HiCRep ============================================

========================================================================
Error: With R version 3.5 or greater, install Bioconductor packages using BiocManager; see https://bioconductor.org/install
Execution halted

How can I fix this? I tried installing it in R and I can, but then 3DChromatin_ReplicateQC doesn't see it and I cannot run the example data test.
Thanks,
Alessandra

QuASAR-QC output folder

QuASAR-QC always prepends $HOME to the path of the specified folder, i.e. I ran

3DChromatin_ReplicateQC qc --running_mode sge --metadata_samples QCRep_metadata.samples --outdir ./output --methods QuASAR-QC

but it tried to save to

/home/s1529682/./output/scripts/QuASAR-QC/WT-1/WT-1.QuASAR-QC.sh.o

And this folder structure of course doesn't exist, and the job fails.

HiC-Spector

And HiC-Spector doesn't work for me either with the following error:

Traceback (most recent call last):
  File "/exports/eddie3_homes_local/s1529682/3DChromatin_ReplicateQC/wrappers/HiC-Spector/run_reproducibility_v1.py", line 209, in <module>
    main()
  File "/exports/eddie3_homes_local/s1529682/3DChromatin_ReplicateQC/wrappers/HiC-Spector/run_reproducibility_v1.py", line 202, in main
    get_reproducibility(M1,M2,int(sys.argv[6]))
  File "/exports/eddie3_homes_local/s1529682/3DChromatin_ReplicateQC/wrappers/HiC-Spector/run_reproducibility_v1.py", line 163, in get_reproducibility
    b1_extend[i_nz1,i]=b1[:,i]
IndexError: index 2 is out of bounds for axis 1 with size 2

The format of my input files are the same as your example files, but it can't run.

0 validly-mapped read pairs loaded.
No valid data was loaded.
Traceback (most recent call last):
File "anaconda3/envs/3DC_env/bin/hifive", line 849, in
main()
File "anaconda3/envs/3DC_env/bin/hifive", line 93, in main
run(args)
File "anaconda3/envs/3DC_env/lib/python2.7/site-packages/hifive/commands/find_quasar_scores.py", line 114, in run
coverages=args.coverages, seed=args.seed)
File "anaconda3/envs/3DC_env/lib/python2.7/site-packages/hifive/quasar.py", line 190, in find_transformation
elif hic.data['cis_indices'][chr_indices[i + 1]] - hic.data['cis_indices'][chr_indices[i]] == 0:
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "anaconda3/envs/3DC_env/lib/python2.7/site-packages/h5py/_hl/group.py", line 264, in getitem
oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: "Unable to open object (object 'cis_indices' doesn't exist)"
0 validly-mapped read pairs loaded.
No valid data was loaded.
Traceback (most recent call last):
File "anaconda3/envs/3DC_env/bin/hifive", line 849, in
main()
File "anaconda3/envs/3DC_env/bin/hifive", line 93, in main
run(args)
File "anaconda3/envs/3DC_env/lib/python2.7/site-packages/hifive/commands/find_quasar_scores.py", line 114, in run
coverages=args.coverages, seed=args.seed)
File "anaconda3/envs/3DC_env/lib/python2.7/site-packages/hifive/quasar.py", line 190, in find_transformation
elif hic.data['cis_indices'][chr_indices[i + 1]] - hic.data['cis_indices'][chr_indices[i]] == 0:
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File anaconda3/envs/3DC_env/lib/python2.7/site-packages/h5py/_hl/group.py", line 264, in getitem
oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: "Unable to open object (object 'cis_indices' doesn't exist)"
Step: preprocess | Tue Dec 15 20:57:10 2020 | Splitting nodes chr18
Step: preprocess | Tue Dec 15 20:57:10 2020 | Splitting hr chr18
Step: preprocess | Tue Dec 15 20:58:22 2020 | Splitting lr chr18
Step: qc | Tue Dec 15 20:59:19 2020 | running QuASAR-QC | computing QC for hr
Quasar file appears incomplete. Rerun with HiC project argument.
Traceback (most recent call last):
File "3DChromatin_ReplicateQC/wrappers/QuASAR/quasar_split_by_chromosomes_qc.py", line 29, in
main()
File "3DChromatin_ReplicateQC/wrappers/QuASAR/quasar_split_by_chromosomes_qc.py", line 10, in main
scorefile = open( sys.argv[1], 'r')
IOError: [Errno 2] No such file or directory: 'data/output/results/qc/hr/QuASAR-QC/hr.QuASAR-QC.scores.txt'
Traceback (most recent call last):
File "envs/3DC_env/bin/3DChromatin_ReplicateQC", line 11, in
load_entry_point('3DChromatin-ReplicateQC', 'console_scripts', '3DChromatin_ReplicateQC')()
File "3DChromatin_ReplicateQC/3DChromatin_ReplicateQC/main.py", line 17, in main
command_methodscommand
File "3DChromatin_ReplicateQC/software/genomedisco/genomedisco/concordance_utils.py", line 848, in run_all
get_qc(metadata_samples,methods,outdir,running_mode,concise_analysis,subset_chromosomes,timing)
File "3DChromatin_ReplicateQC/software/genomedisco/genomedisco/concordance_utils.py", line 548, in get_qc
quasar_qc_wrapper(outdir,parameters,samplename,running_mode,timing)
File "3DChromatin_ReplicateQC/software/genomedisco/genomedisco/concordance_utils.py", line 335, in quasar_qc_wrapper
run_script(script_comparison_file,running_mode,parameters)
File "3DChromatin_ReplicateQC/software/genomedisco/genomedisco/concordance_utils.py", line 266, in run_script
output=subp.check_output(['bash','-c',script_name])
File "anaconda3/envs/3DC_env/lib/python2.7/subprocess.py", line 223, in check_output
raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '['bash', '-c', 'data/output/scripts/QuASAR-QC/hr/hr.QuASAR-QC.sh']' returned non-zero exit status 1

low BR similarity scores for HiC-Spector

Hello,
I am having difficulties reproducing the similarity scores reported in the paper for HiC-Spector and would be grateful for any clarification.

For example, for samples G401 and A549.
I downloaded the bam files of both BRs for G401 and A549 from encode:

ENCFF649MAY, ENCFF758WUD
ENCFF867DCM, ENCFF532XBC

Then I generate pairs files with pairtools parse and aggregated them into 40kb binned cooler containers with cooler cload pairs. Finally, I converted them into the format required by the wrapper:

zcat ./ENCSR444WCZ_A549/PreProcessed/BioRep1/ENCSR444WCZ.BioRep1.40kb.3DChrom_REP.Gint.gz | head
chr1	0	chr1	218840000	1
chr1	0	chr8	139960000	1
chr1	0	chr11	123720000	1
chr1	40000	chr1	40000	4
chr1	40000	chr1	80000	1
chr1	40000	chr1	47320000	1
chr1	40000	chr1	155920000	1
chr1	40000	chr1	186120000	1
chr1	40000	chr2	24920000	1
chr1	40000	chr2	123360000	1
.
.
.

The number of leading EVs was left as the default of 20 and my input files are:

cat meta.pairs.A549-G401.txt
G401_BR1 G401_BR2
G401_BR1 A549_BR1
G401_BR1 A549_BR2
G401_BR2 A549_BR1
G401_BR2 A549_BR2
A549_BR1 A549_BR2
cat meta.samples.notDownSamp.A549-G401.norm.txt
A549_BR1 ./BioRep1/ENCSR444WCZ.BioRep1.40kb.3DChrom_REP.Gint.gz
A549_BR2 ./BioRep2/ENCSR444WCZ.BioRep2.40kb.3DChrom_REP.Gint.gz
G401_BR1 ./BioRep1/ENCSR079VIJ.BioRep1.40kb.3DChrom_REP.Gint.gz
G401_BR2 ./BioRep2/ENCSR079VIJ.BioRep2.40kb.3DChrom_REP.Gint.gz
zcat $REF_DIR/40kb.BINS.3DChrom_REP.bed.gz | head
chr1	0	40000	0
chr1	40000	80000	40000
chr1	80000	120000	80000
chr1	120000	160000	120000
chr1	160000	200000	160000
chr1	200000	240000	200000
chr1	240000	280000	240000
chr1	280000	320000	280000
chr1	320000	360000	320000
chr1	360000	400000	360000

And finally, I ran the wrapper with the default parameters file as:

FILE1="meta.pairs.A549-G401.txt"
FILE2="meta.samples.notDownSamp.A549-G401.txt"

3DChromatin_ReplicateQC run_all --metadata_samples $FILE2 \
        --metadata_pairs $FILE1 --bins $REF_DIR/"$RES"kb.BINS.3DChrom_REP.bed.gz \
        --outdir OUT \
        --parameters_file parameters.txt --methods GenomeDISCO,HiCRep,HiC-Spector,QuASAR-Rep

The scores of the other methods seem to be performing okay when compared to Figure 2B, but HiC-Spector only scores 0.19 and 0.482 for the comparisons of BRs

cat reproducibility.genomewide.txt
#Sample1	Sample2	GenomeDISCO	HiC-Spector	HiCRep	QuASAR-Rep
G401_BR1	G401_BR2	0.886	0.19	0.972	0.748
G401_BR1	A549_BR1	0.763	0.21	0.802	0.571
G401_BR1	A549_BR2	0.748	0.188	0.799	0.545
G401_BR2	A549_BR1	0.767	0.149	0.801	0.578
G401_BR2	A549_BR2	0.754	0.159	0.799	0.553
A549_BR1	A549_BR2	0.852	0.482	0.963	0.604

I tried normalizing the matrices before hand, but HiC-Spector throws the error:

Step: concordance | Wed Feb 28 12:36:00 2024 | computing concordance between G401_BR1 and G401_BR2
Step: concordance | Wed Feb 28 12:36:00 2024 | computing concordance between G401_BR1 and A549_BR1
Step: concordance | Wed Feb 28 12:36:00 2024 | computing concordance between G401_BR1 and A549_BR2
Step: concordance | Wed Feb 28 12:36:00 2024 | computing concordance between G401_BR2 and A549_BR1
Step: concordance | Wed Feb 28 12:36:01 2024 | computing concordance between G401_BR2 and A549_BR2
Step: concordance | Wed Feb 28 12:36:01 2024 | computing concordance between A549_BR1 and A549_BR2
Traceback (most recent call last):
  File "/HDD1/DocumentsHDD1/DISSERTATION/07_2022-/scripts/3C_SIMILARITY/3DChromatin_ReplicateQC/wrappers/HiC-Spector/run_reproducibility_v1.py", line 209, in <module>
    main()
  File "/HDD1/DocumentsHDD1/DISSERTATION/07_2022-/scripts/3C_SIMILARITY/3DChromatin_ReplicateQC/wrappers/HiC-Spector/run_reproducibility_v1.py", line 202, in main
    get_reproducibility(M1,M2,int(sys.argv[6]))
  File "/HDD1/DocumentsHDD1/DISSERTATION/07_2022-/scripts/3C_SIMILARITY/3DChromatin_ReplicateQC/wrappers/HiC-Spector/run_reproducibility_v1.py", line 157, in get_reproducibility
    a1, b1=eigsh(M1b_L,k=num_evec,which="SM")
  File "/home/xenia/anaconda3/envs/3DChromatin_Replicate/lib/python2.7/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 1596, in eigsh
    return eigh(A, b=M, eigvals_only=not return_eigenvectors)
  File "/home/xenia/anaconda3/envs/3DChromatin_Replicate/lib/python2.7/site-packages/scipy/linalg/decomp.py", line 432, in eigh

In the issue " HiC-Spector #11 ", chesi suggested lowering r. I reduced to 10 and it still throws the same error.
Any help is appreciated thanks.

Contact map and bins

Hi!
Do you have any recommendation software for contact map and bins? I couldn't find any references. Thanks.

Won

Support cooler files

Hi, thank you for this tool.

I just wanted to give it a try, and thought I would suggest you to support input as .cool files - they contain exactly the information that is required, including the bin table - and reading data from an hdf5-based cool file would be vastly faster than ungzipping and reading a text file (and would save time for saving the data from cool files to text).

Thank you!
Ilya

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.