xiaotaowang / eaglec Goto Github PK

View Code? Open in Web Editor NEW

51.0 51.0 8.0 1.56 MB

A deep-learning framework for predicting a full range of structural variations from bulk and single-cell contact maps

License: Other

eaglec's People

Contributors

Stargazers

Watchers

Forkers

antecede andresweitzel skelviper yfarjoun zuevval hzauleibowen zm19911213

eaglec's Issues

On the problem of standard sets

Hi Xiaotao,

I am trying to compare the performance of EagleC with other software. May I know how your standard set was created and what kind of standard set is more suitable for you. Thank you for providing such a great testing tool!

Thanks for your help,
Wang

Non human genomes

Hi!

Amazing tool! However, I was wondering....Can it be used with non-human samples? Like other mammalian species?

Thanks

UnboundLocalError: local variable 'cnn_models' referenced before assignment

Hi Xiaotao,

Thanks for your work. I was trying to use the predictSV-single-resolution function, since there came an error:

Traceback (most recent call last):
  File "/XXX/miniconda3/envs/EagleC/bin/predictSV-single-resolution", line 275, in <module>
    run()
  File "/XXX/miniconda3/envs/EagleC/bin/predictSV-single-resolution", line 226, in run
    intra_expected_count = intraPredict(clr, cnn_models, chroms, cache_folder, seq_depth,
UnboundLocalError: local variable 'cnn_models' referenced before assignment

I think there could be a bug in the scoreUtils.cpython-38-x86_64-linux-gnu.so file, but I can't edit to fix it. Could you please help me to deal with that?

Cheers,
Yimin

Capture HiC data

Hello @XiaoTaoWang.

Thanks for maintaining such an organized site for installing and running EagleC.

I noticed that in the paper it is noted that eaglec can run on Capture HiC data but I didn't see any detailed instructions about how to actually do this. In particular, how does one get around the fact that capture data has a particular pattern due to the capture technology? does the normalization help with that? if so, which normalization should be done? CNV or ICE? (as an aside, what does "ICE" stand for?)

Related: what methods/scripts/functions did you use to evaluate performance? clearly there are many ways to compare a call-set to a truth set and the details matter, so I was wondering if you have the evaluation scripts made publicly available?

Thanks!

Yossi

No module named 'eaglec.scoreUtils'

I install the package using pip isntall eaglec, it's ok when i run predictSV -h. It has issues when I run on my data:
predictSV --hic-5k 1881_BME.mcool::/resolutions/5000 --hic-10k 1881_BME.mcool::/resolutions/10000 --hic-50k 1881_BME.mcool::/resolutions/50000 -O SK-N-AS -g other --balance-type CNV --output-format full --prob-cutoff-5k 0.8 --prob-cutoff-10k 0.8 --prob-cutoff-50k 0.99999

The error is below:
root INFO @ 07/26/22 14:34:12:

ARGUMENT LIST:

Cool URI at 5kb = 1881_BME.mcool::/resolutions/5000

Cool URI at 10kb = 1881_BME.mcool::/resolutions/10000

Cool URI at 50kb = 1881_BME.mcool::/resolutions/50000

Balance Type = CNV

Reference Genome = other

Included Chromosomes = ['#', 'X']

Probability Cutoff for 5kb SVs = 0.8

Probability Cutoff for 10kb SVs = 0.8

Probability Cutoff for 50kb SVs = 0.99999

Output File Prefix = SK-N-AS

Output Format = full

Log file name = eaglec.log

root INFO @ 07/26/22 14:34:12: Predict SVs at 5kb resolution ...
Traceback (most recent call last):
File "/home/dguan/.local/bin/predictSV-single-resolution", line 276, in
run()
File "/home/dguan/.local/bin/predictSV-single-resolution", line 110, in run
from eaglec.scoreUtils import intraPredict, interPredict
ModuleNotFoundError: No module named 'eaglec.scoreUtils'
Traceback (most recent call last):
File "/home/dguan/.local/bin/predictSV", line 176, in
run()
File "/home/dguan/.local/bin/predictSV", line 112, in run
subprocess.check_call(' '.join(command), shell=True)
File "/share/apps/python3/lib/python3.6/subprocess.py", line 291, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'predictSV-single-resolution -H 1881_BME.mcool::/resolutions/5000 --balance-type CNV -O SK-N-AS.CNN_SVs.5K.txt --genome other --output-format full -C "#" "X" --prob-cutoff 0.8 --logFile eaglec.log' returned non-zero exit status 1.

Can you help check what happen? Thanks so much!

specify gpu number to use

Hi Xiaotao,
Thanks for this wonderful tool. I am currently running EagleC on a machine with 4 Quadro RTX 5000 (I hope this would be faster than running on cpu only...) But with one of the gpu largely occupied by another job. However when initiating EagleC run it encounter an error:
could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
I assume this could be caused by the low memory in that occupied gpu. I there a way like specifying threads, in specifying number (or which) gpu to use?

Best,
Yang

output of EagleC as input of Neoloopfinder

Hi Xiaotao,

Any way to apply EagleC output (..CNN_SVs.5K_combined.txt) to Neoloopfinder?
Neoloopfinder requires a dataframe with 6 columns, but it looks that the definitions of "++", "+-"... in Neoloopfinder are not exactly same with EgaleC, especially for interchromosomal SV.
Thank you!

How much resources are needed

Hi,

I am trying to run EagleC on the sample file provided with the tutorial. I am running the sbatch script on my HPC exactly as described in the guide with 16 parallel jobs each allocated 16GB. The tutorial says the 16 parallel jobs can be completed in under 2 hours. In my case, even after 5 hours the jobs get terminated due to time limit and I have only one output file (SKNAS.CNN_SVs.10K.txt) which I assume means generating all six files would take >24 hours. Since my HPC time is limited, I was wondering if this is normal or if I am doing something wrong and can change any parameters to improve performance? I am using the command exactly as described in the tutorial with these parameters and submitting it 16 times:
#SBATCH --nodes=1
#SBATCH --cpus-per-task=1
#SBATCH --time=05:00:00
#SBATCH --mem=16G

Thanks

Select SV type before running

Hi,

Can EagleC add a parameter to detect several type SVs? INS/DEL is very abudance in the plant genome, but HiC may have higher power to detect inversion or translocation. These two types should have few canidate.

EagleC HiChIP with Neoloopfinder

Hello,

More a question than an issue. I have used EagleC to predict SVs in my HiChIP data and converted it to NeoLoopFinder format. I was wondering if I can use this as input to NeoLoopFinder or not? I know EagleC supports HiChIP data but not sure if NeoLoopFinder does?

Thanks

error - eaglec.scoreUtils Not found

Hi,

I have installed eaglec using pip (python 3.7) option pip install --user eagleC and tried running it on my HiC data. But got the following error

predictSV --hic-5k ./mcools_files/PASMXM.mcool::/resolutions/5000 --hic-10k ./mcools_files/PASMXM.mcool::/resolutions/10000 --hic-50k ./mcools_files/PASMXM.mcool::/resolutions/50000 -O PASMXM -g hg38 --balance-type Raw --output-format full
root                      INFO    @ 03/24/23 16:03:54:
# ARGUMENT LIST:
# Cool URI at 5kb = ./mcools_files/PASMXM.mcool::/resolutions/5000
# Cool URI at 10kb = ./mcools_files/PASMXM.mcool::/resolutions/10000
# Cool URI at 50kb = ./mcools_files/PASMXM.mcool::/resolutions/50000
# Balance Type = Raw
# Reference Genome = hg38
# Included Chromosomes = ['#', 'X']
# Probability Cutoff for 5kb SVs = 0.8
# Probability Cutoff for 10kb SVs = 0.8
# Probability Cutoff for 50kb SVs = 0.99999
# Output File Prefix = PASMXM
# Output Format = full
# Log file name = PASMXM.log
root                      INFO    @ 03/24/23 16:03:54: Predict SVs at 5kb resolution ...
Traceback (most recent call last):
  File "/home/abhijit/.local/bin/predictSV-single-resolution", line 276, in <module>
    run()
  File "/home/abhijit/.local/bin/predictSV-single-resolution", line 110, in run
    from eaglec.scoreUtils import intraPredict, interPredict
ModuleNotFoundError: No module named 'eaglec.scoreUtils'
Traceback (most recent call last):
  File "/home/abhijit/.local/bin/predictSV", line 176, in <module>
    run()
  File "/home/abhijit/.local/bin/predictSV", line 112, in run
    subprocess.check_call(' '.join(command), shell=True)
  File "/home/abhijit/miniconda3/lib/python3.7/subprocess.py", line 363, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'predictSV-single-resolution -H ./mcools_files/PASMXM.mcool::/resolutions/5000 --balance-type Raw -O PASMXM.CNN_SVs.5K.txt --genome hg38 --output-format full -C "#" "X" --prob-cutoff 0.8 --logFile PASMXM.log' returned non-zero exit status 1.

I can import eaglec and eaglec.utilities but not eaglec.scoreUtils in python3.7 environment

$python3.7
Python 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:53)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import eaglec
>>> import eaglec.utilities
>>> import eaglec.scoreUtils
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'eaglec.scoreUtils'
>>> print(eaglec.__file__)
/home/abhijit/.local/lib/python3.7/site-packages/eaglec/__init__.py

Your help is greatly appreciated.

Optimization loop failed

I was trying to run the example data for the predictSV step, however, i got a "Optimization loop failed". can you give me some advice on this issue? thanks

predictSV --hic-5k SKNAS-MboI-allReps-filtered.mcool::/resolutions/5000 --hic-10k SKNAS-MboI-allReps-filtered.mcool::/resolutions/10000 --hic-50k SKNAS -MboI-allReps-filtered.mcool::/resolutions/50000 -O SK-N-AS -g hg38 --balance-type ICE -- output-format full --prob-cutoff-5k 0.8 --prob-cutoff-10k 0.8 --prob-cutoff-50k 0.99999
root INFO @ 05/18/23 15:29:26:

ARGUMENT LIST:

Cool URI at 5kb = SKNAS-MboI-allReps-filtered.mcool::/resolutions/5000

Cool URI at 10kb = SKNAS-MboI-allReps-filtered.mcool::/resolutions/10000

Cool URI at 50kb = SKNAS-MboI-allReps-filtered.mcool::/resolutions/50000

Balance Type = ICE

Reference Genome = hg38

Included Chromosomes = ['#', 'X']

Probability Cutoff for 5kb SVs = 0.8

Probability Cutoff for 10kb SVs = 0.8

Probability Cutoff for 50kb SVs = 0.99999

Output File Prefix = SK-N-AS

Output Format = full

Log file name = SK-N-AS.log

root INFO @ 05/18/23 15:29:26: Predict SVs at 5kb resolution ...
root INFO @ 05/18/23 15:29:27: matched sequencing depth in human at 10Kb: 801 81600.06574439
root INFO @ 05/18/23 15:29:27: Load CNN models from /disk3/users/dittman/.con da/envs/py38/lib/python3.8/site-packages/eaglec/data/bulk/50M-100M ...
2023-05-18 15:29:27.987789: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow bina ry is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instruction s in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-05-18 15:29:27.992225: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
root INFO @ 05/18/23 15:29:29: Done
root INFO @ 05/18/23 15:29:29: Interemediate results at the 5kb resolution wi ll be cached to .SKNAS-MboI-allReps-filtered.mcool.91006977.ICE.None.100000.None

eaglec.scoreUtils INFO @ 05/18/23 16:39:38: (chr1, chr1): Total 303446 candidates left after filter ing
2023-05-18 16:40:15.548456: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR O ptimization Passes are enabled (registered 2)
2023-05-18 16:40:15.569822: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 21000000 00 Hz
tensorflow WARNING @ 05/18/23 16:41:39: 5 out of the last 1004 calls to <function Model.make_pr edict_function..predict_function at 0x7efff41989d0> triggered tf.function retracing. Tracing is expens ive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) p assing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define yo ur @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that rela xes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/ guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more deta ils.
tensorflow WARNING @ 05/18/23 16:41:40: 6 out of the last 1005 calls to <function Model.make_pr edict_function..predict_function at 0x7efff4198940> triggered tf.function retracing. Tracing is expens ive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) p assing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define yo ur @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that rela xes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/ guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more deta ils.
2023-05-18 16:43:59.907971: W tensorflow/core/kernels/data/model_dataset_op.cc:205] Optimization loop failed: Cancelled: Operation was cancelled

eaglec.scoreUtils INFO @ 05/18/23 17:52:42: (chr2, chr2): Total 294222 candidates left after filter ing
eaglec.scoreUtils INFO @ 05/18/23 18:28:21: (chr3, chr3): Total 334652 candidates left after filter ing
2023-05-18 18:32:30.953895: W tensorflow/core/kernels/data/model_dataset_op.cc:205] Optimization loop failed: Cancelled: Operation was cancelled
2023-05-18 18:33:10.529717: W tensorflow/core/kernels/data/model_dataset_op.cc:205] Optimization loop failed: Cancelled: Operation was cancelled
2023-05-18 18:33:10.541688: W tensorflow/core/kernels/data/model_dataset_op.cc:205] Optimization loop failed: Cancelled: Operation was cancelled
2023-05-18 18:33:10.578712: W tensorflow/core/kernels/data/model_dataset_op.cc:205] Optimization loop failed: Cancelled: Operation was cancelled
2023-05-18 18:33:10.590651: W tensorflow/core/kernels/data/model_dataset_op.cc:205] Optimization loop failed: Cancelled: Operation was cancelled
2023-05-18 18:33:10.631463: W tensorflow/core/kernels/data/model_dataset_op.cc:205] Optimization loop failed: Cancelled: Operation was cancelled
2023-05-18 18:33:10.685669: W tensorflow/core/kernels/data/model_dataset_op.cc:205] Optimization loop failed: Cancelled: Operation was cancelled
2023-05-18 18:33:10.697107: W tensorflow/core/kernels/data/model_dataset_op.cc:205] Optimization loop failed: Cancelled: Operation was cancelled
2023-05-18 18:33:10.960220: W tensorflow/core/kernels/data/model_dataset_op.cc:205] Optimization loop failed: Cancelled: Operation was cancelled
2023-05-18 18:33:10.995985: W tensorflow/core/kernels/data/model_dataset_op.cc:205] Optimization loop failed: Cancelled: Operation was cancelled
2023-05-18 18:33:11.124616: W tensorflow/core/kernels/data/model_dataset_op.cc:205] Optimization loop failed: Cancelled: Operation was cancelled
eaglec.scoreUtils INFO @ 05/18/23 19:11:34: (chr4, chr4): Total 172673 candidates left after filter ing
eaglec.scoreUtils INFO @ 05/18/23 19:40:44: (chr5, chr5): Total 203266 candidates left after filter ing
eaglec.scoreUtils INFO @ 05/18/23 20:00:29: (chr6, chr6): Total 369756 candidates left after filter ing
2023-05-18 20:05:41.496671: W tensorflow/core/kernels/data/model_dataset_op.cc:205] Optimization loop failed: Cancelled: Operation was cancelled
eaglec.scoreUtils INFO @ 05/18/23 20:25:16: (chr7, chr7): Total 195819 candidates left after filter ing
2023-05-18 20:28:10.832702: W tensorflow/core/kernels/data/model_dataset_op.cc:205] Optimization loop failed: Cancelled: Operation was cancelled
2023-05-18 20:28:10.932666: W tensorflow/core/kernels/data/model_dataset_op.cc:205] Optimization loop failed: Cancelled: Operation was cancelled
2023-05-18 20:28:10.962664: W tensorflow/core/kernels/data/model_dataset_op.cc:205] Optimization loop failed: Cancelled: Operation was cancelled
2023-05-18 20:28:10.974770: W tensorflow/core/kernels/data/model_dataset_op.cc:205] Optimization loop failed: Cancelled: Operation was cancelled
2023-05-18 20:28:10.994697: W tensorflow/core/kernels/data/model_dataset_op.cc:205] Optimization loop failed: Cancelled: Operation was cancelled
2023-05-18 20:28:11.031622: W tensorflow/core/kernels/data/model_dataset_op.cc:205] Optimization loop failed: Cancelled: Operation was cancelled
2023-05-18 20:28:11.069519: W tensorflow/core/kernels/data/model_dataset_op.cc:205] Optimization loop failed: Cancelled: Operation was cancelled
2023-05-18 20:28:11.307348: W tensorflow/core/kernels/data/model_dataset_op.cc:205] Optimization loop failed: Cancelled: Operation was cancelled
2023-05-18 20:28:11.365171: W tensorflow/core/kernels/data/model_dataset_op.cc:205] Optimization loop failed: Cancelled: Operation was cancelled
2023-05-18 20:28:11.804032: W tensorflow/core/kernels/data/model_dataset_op.cc:205] Optimization loop failed: Cancelled: Operation was cancelled
2023-05-18 20:28:11.827570: W tensorflow/core/kernels/data/model_dataset_op.cc:205] Optimization loop failed: Cancelled: Operation was cancelled
eaglec.scoreUtils INFO @ 05/18/23 20:40:54: (chr8, chr8): Total 180975 candidates left after filter ing
2023-05-18 20:42:53.817573: W tensorflow/core/kernels/data/model_dataset_op.cc:205] Optimization loop failed: Cancelled: Operation was cancelled

Failed to run "download-pretrained-models"

_expected.pkl is not found

Hi Xiaotao,

Thanks for all your help! Sorry to bother you again, but I got another error saying ".231043-HC01_5000.mcool.175822081.CNV.None.100000.None/chr4_expected.pkl". I listed the directory .231043-HC01_5000.mcool.175822081.CNV.None.100000.None, but I do see 'chr4_expected.pkl' is there.
Not sure what's the problem there.
Thanks.

FileNotFoundError: [Errno 2] No such file or directory: '.231043-HC01_5000.mcool.175822081.CNV.None.100000.None/chr4_expected.pkl'
Traceback (most recent call last):
File "/rsrch4/home/genomic_med/bzhao2/miniconda3/envs/EagleC/bin/predictSV", line 176, in
run()
File "/rsrch4/home/genomic_med/bzhao2/miniconda3/envs/EagleC/bin/predictSV", line 159, in run
subprocess.check_call(' '.join(command), shell=True)
File "/rsrch4/home/genomic_med/bzhao2/miniconda3/envs/EagleC/lib/python3.8/subprocess.py", line 364, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'merge-multiple-resolutions --hic-10k ../CatalystHiCmcool/231045-HC01_5000.mcool::resolutions/10000 --hic-5k ../CatalystHiCmcool/231045-HC01_5000.mcool::resolutions/5000 --balance-type CNV -C "#" "X" --full-sv-files 231045-HC01_5000.CNN_SVs.5K.txt 231045-HC01_5000.CNN_SVs.10K_highres.txt 231045-HC01_5000.CNN_SVs.50K_highres.txt -O 231045-HC01_5000.CNN_SVs.5K_combined.txt --buff-size 50000 --output-format NeoLoopFinder --cache-10k .231043-HC01_5000.mcool.175822081.CNV.None.100000.None --cache-5k .235206-HC01_5000.mcool.131736076.CNV.None.100000.None' returned non-zero exit status 1.

FileNotFoundError: [Errno 2] No such file or directory: 'Sample.CNN_SVs.10K.txt'

Hi,

I managed to run EagleC on most of my samples and they finished successfully. But for some of the samples, I have received the following error

Traceback (most recent call last):
  File "/home/abhijit/.local/bin/predictSV-single-resolution", line 276, in <module>
    run()
  File "/home/abhijit/.local/bin/predictSV-single-resolution", line 227, in run
    intra_expected_count = intraPredict(clr, cnn_models, chroms, cache_folder, seq_depth,
  File "eaglec/scoreUtils.pyx", line 1237, in eaglec.scoreUtils.intraPredict
  File "/home/abhijit/.local/lib/python3.8/site-packages/eaglec/utilities.py", line 450, in load_SVs_full
    with open(fil, 'r') as source:
FileNotFoundError: [Errno 2] No such file or directory: Sample.CNN_SVs.10K.txt'
Traceback (most recent call last):
  File "/home/abhijit/.local/bin/predictSV", line 176, in <module>
    run()
  File "/home/abhijit/.local/bin/predictSV", line 130, in run
    subprocess.check_call(' '.join(command), shell=True)
  File "/home/abhijit/miniconda3/envs/py38/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'predictSV-single-resolution -H ./mcools_files/Sample_1Kb.mcool::/resolutions/5000 --balance-type Raw -O Sample.CNN_SVs.10K_highres.txt --genome hg38 --low-resolution-breaks Sample.CNN_SVs.10K.txt --region-size 25000 -C "#" "X" --output-format full --prob-cutoff 0 --logFile Sample.log --cache-folder Sample_1Kb.mcool.296756349.Raw.None.100000.None' returned non-zero exit status 1.

This is the EagleC command

+ predictSV --hic-5k ./mcools_files/Sample_1Kb.mcool::/resolutions/5000 --hic-10k ./mcools_files/Sample_1Kb.mcool::/resolutions/10000 --hic-50k ./mcools_files/Sample_1Kb.mcool::/resolutions/50000 -O Sample -g hg38 --balance-type Raw --output-format NeoLoopFinder
root                      INFO    @ 04/24/23 11:44:50:
# ARGUMENT LIST:
# Cool URI at 5kb = ./mcools_files/Sample_1Kb.mcool::/resolutions/5000
# Cool URI at 10kb = ./mcools_files/Sample_1Kb.mcool::/resolutions/10000
# Cool URI at 50kb = ./mcools_files/Sample_1Kb.mcool::/resolutions/50000
# Balance Type = Raw
# Reference Genome = hg38
# Included Chromosomes = ['#', 'X']
# Probability Cutoff for 5kb SVs = 0.8
# Probability Cutoff for 10kb SVs = 0.8
# Probability Cutoff for 50kb SVs = 0.99999
# Output File Prefix = Sample
# Output Format = NeoLoopFinder
# Log file name = Sample.log

This is the mcool file conversion log

+ hic2cool convert Sample_1Kb.hic Sample_1Kb.mcool
##########################
### hic2cool / convert ###
##########################
### Header info from hic
... Chromosomes:  ['ALL', 'M', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', 'X', 'Y']
... Resolutions:  [2500000, 1000000, 500000, 250000, 100000, 50000, 25000, 10000, 5000]
... Normalizations:  ['VC', 'VC_SQRT', 'KR']
... Genome:  hg38
### Converting
... Resolution 2500000 took: 3.0350983142852783 seconds.
... Resolution 1000000 took: 11.69859790802002 seconds.
... Resolution 500000 took: 42.468045711517334 seconds.
... Resolution 250000 took: 154.32829523086548 seconds.
... Resolution 100000 took: 385.3717827796936 seconds.
... Resolution 50000 took: 533.3046452999115 seconds.
... Resolution 25000 took: 654.9617240428925 seconds.
... Resolution 10000 took: 823.1433110237122 seconds.
... Resolution 5000 took: 960.6333713531494 seconds.
### Finished! Output written to: Sample_1Kb.mcool
... This file is higlass compatible.

It seems like the mcool file has the 10Kb data but EagleC is unable to use that.
Your help is much appreciated.

unable to run the example on linux server (redhat)

Hi,
I followed the documentation to install EagleC. I get an error, mostly about the numpy version: AttributeError: module 'numpy.random' has no attribute 'Generator'

I set a conda environment and used mamba for package installation (faster). Is there a specific numpy version that is required? tensorflow 2.3.0 requires a specific one which seems different to some other packages.

this is the only tool that supposedly work on capture hi-c data, so I really need to test this tool

thanks!

[Issue] Error when constructing DIA matrix and running predictSV script

Bug Description:
I encountered an error while running the predictSV script. The error message indicates an issue with constructing a DIA matrix and a subsequent error in the script execution. Here are the details of the error:
gradle

warn("Constructing a DIA matrix with 6209 diagonals "
/home/fjx/ENTER/envs/EagleC/lib/python3.8/site-packages/scipy/sparse/_coo.py:428:
SparseEfficiencyWarning: Constructing a DIA matrix with %d diagonals is inefficient
warn("Constructing a DIA matrix with %d diagonals "
root INFO @ 08/12/23 20:26:35: Locate 10kb SV coordinates on the 5kb matrix ...
Traceback (most recent call last):
File "/home/fjx/ENTER/envs/EagleC/bin/predictSV", line 176, in
run()
File "/home/fjx/ENTER/envs/EagleC/bin/predictSV", line 130, in run
subprocess.check_call(' '.join(command), shell=True)
TypeError: sequence item 18: expected str instance, NoneType found

Key Error: 'sweight' in the predictSV

Hi,

I follow the runHiC pipeline with chromap to get mcool format of HiC contact map. In the higlass, it looks good. But when I run the EagleC, the script throw a error.

predictSV --hic-5k ./C001-MboI-R1-filtered.mcool::/resolutions/5000 --hic-10k ./C001-MboI-R1-filtered.mcool::/resolutions/10000 --hic-50k ./C001-MboI-R1-filtered.mcool::/resolutions/50000 -O C001 -g other --balance-type CNV --output-for
mat full --prob-cutoff-5k 0.8 --prob-cutoff-10k 0.8 --prob-cutoff-50k 0.99999
root                      INFO    @ 06/21/22 13:54:52:
# ARGUMENT LIST:
# Cool URI at 5kb = ./C001-MboI-R1-filtered.mcool::/resolutions/5000
# Cool URI at 10kb = ./C001-MboI-R1-filtered.mcool::/resolutions/10000
# Cool URI at 50kb = ./C001-MboI-R1-filtered.mcool::/resolutions/50000
# Balance Type = CNV
# Reference Genome = other
# Included Chromosomes = ['#', 'X']                                                                                                                                                                                                           # Probability Cutoff for 5kb SVs = 0.8
# Probability Cutoff for 10kb SVs = 0.8                                                                                                                                                                                                       # Probability Cutoff for 50kb SVs = 0.99999
# Output File Prefix = C001                                                                                                                                                                                                                   # Output Format = full
# Log file name = eaglec.log
root                      INFO    @ 06/21/22 13:54:52: Predict SVs at 5kb resolution ...
root                      INFO    @ 06/21/22 13:57:59: matched sequencing depth in human at 10Kb: 266781118.61029372                                                                                                                          root                      INFO    @ 06/21/22 13:57:59: Load CNN models from /home/baozhigui/software/miniconda3/envs/scaffold/lib/python3.8/site-packages/eaglec/data/bulk/200M-300M ...
2022-06-21 13:57:59.504833: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.                                                                                                                                                   2022-06-21 13:57:59.962777: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2100000000 Hz
2022-06-21 13:58:00.066879: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x562f27043c80 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-06-21 13:58:00.073798: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
root                      INFO    @ 06/21/22 13:58:09: Done
root                      INFO    @ 06/21/22 13:58:09: Interemediate results at the 5kb resolution will be cached to .C001-MboI-R1-filtered.mcool.78688160.CNV.None.100000.None
Traceback (most recent call last):
  File "/home/baozhigui/software/miniconda3/envs/scaffold/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3621, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'sweight'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/baozhigui/software/miniconda3/envs/scaffold/bin/predictSV-single-resolution", line 276, in <module>
    run()
  File "/home/baozhigui/software/miniconda3/envs/scaffold/bin/predictSV-single-resolution", line 227, in run
    intra_expected_count = intraPredict(clr, cnn_models, chroms, cache_folder, seq_depth,
  File "eaglec/scoreUtils.pyx", line 1263, in eaglec.scoreUtils.intraPredict
  File "eaglec/scoreUtils.pyx", line 861, in eaglec.scoreUtils._intra_global_core
  File "/home/baozhigui/software/miniconda3/envs/scaffold/lib/python3.8/site-packages/pandas/core/frame.py", line 3505, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/baozhigui/software/miniconda3/envs/scaffold/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3623, in get_loc
    raise KeyError(key) from err
KeyError: 'sweight'
Traceback (most recent call last):
  File "/home/baozhigui/software/miniconda3/envs/scaffold/bin/predictSV", line 176, in <module>
    run()
  File "/home/baozhigui/software/miniconda3/envs/scaffold/bin/predictSV", line 112, in run
    subprocess.check_call(' '.join(command), shell=True)
  File "/home/baozhigui/software/miniconda3/envs/scaffold/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'predictSV-single-resolution -H ./C001-MboI-R1-filtered.mcool::/resolutions/5000 --balance-type CNV -O C001.CNN_SVs.5K.txt --genome other --output-format full -C "#" "X" --prob-cutoff 0.8 --logFile e
aglec.log' returned non-zero exit status 1.

TypeError: sequence item 18: expected str instance, NoneType found

Hi Xiaotao,

I'm trying to evaluate the accuracy of scaffolding using EagleC. But I got an error:

Traceback (most recent call last):
  File "/work/home/zengxiaofei/miniforge3/envs/EagleC/bin/predictSV", line 176, in <module>
    run()
  File "/work/home/zengxiaofei/miniforge3/envs/EagleC/bin/predictSV", line 130, in run
    subprocess.check_call(' '.join(command), shell=True)
TypeError: sequence item 18: expected str instance, NoneType found

Below are my commands:

hic2cool convert yahs.hic yahs.mcool -p 28

cooler balance yahs.mcool::/resolutions/50000
cooler balance yahs.mcool::/resolutions/10000
cooler balance yahs.mcool::/resolutions/5000

source /work/home/zengxiaofei/miniforge3/bin/activate EagleC

predictSV --hic-5k yahs.mcool::/resolutions/50000 \
          --hic-10k yahs.mcool::/resolutions/10000 \
          --hic-50k yahs.mcool::/resolutions/5000 \
          -O yahs_sv -g other --balance-type ICE --output-format full \
          -C 'scaffold_1' 'scaffold_2' 'scaffold_3' 'scaffold_4' 'scaffold_5' \
          --prob-cutoff-5k 0.8 --prob-cutoff-10k 0.8 --prob-cutoff-50k 0.99999

conda deactivate

I have checked the source code of predictSV and found that it could be due to the cache directories were not correctly generated and the directory names were not written into the log file.

yahs_sv.log

Best regards,
Xiaofei

OSError: Unable to open file and subprocess.CalledProcessError:

Hi, I got the OSError and CalledProcessError.

The file in bulk/ is 50M-100M.zip, but this file cannot be unzipped.

I first downloaded SKNAS-MboI-allReps-filtered.mcool, and submited the job in cluster only with predictSV.
predictSV --hic-5k SKNAS-MboI-allReps-filtered.mcool::/resolutions/5000
--hic-10k SKNAS-MboI-allReps-filtered.mcool::/resolutions/10000
--hic-50k SKNAS-MboI-allReps-filtered.mcool::/resolutions/50000
-O SK-N-AS -g hg38 --balance-type CNV --output-format full
--prob-cutoff-5k 0.8 --prob-cutoff-10k 0.8 --prob-cutoff-50k 0.99999

It looks like predictSV-single-resolution was also called.
Could you help to figure out the errors? Thanks in advance!

OSError: Unable to open file (unable to open file: name = '/rsrch4/home/genomic_med/bzhao2/.conda/envs/EagleC/lib/python3.8/site-packages/eaglec/data/bulk/50M-100M/CNN-weights.0.1.0.4.0.6.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)
Traceback (most recent call last):
File "/home/bzhao2/.conda/envs/EagleC/bin/predictSV", line 176, in
run()
File "/home/bzhao2/.conda/envs/EagleC/bin/predictSV", line 112, in run
subprocess.check_call(' '.join(command), shell=True)
File "/rsrch4/home/genomic_med/bzhao2/.conda/envs/EagleC/lib/python3.8/subprocess.py", line 364, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'predictSV-single-resolution -H SKNAS-MboI-allReps-filtered.mcool::/resolutions/5000 --balance-type CNV -O SK-N-AS.CNN_SVs.5K.txt --genome hg38 --output-format full -C "#" "X" --prob-cutoff 0.8 --logFile eaglec.log' returned non-zero exit status 1.

Changing the --cnv-max-value parameter for plot-intraSVs does not work

Hello,

the --cnv-max-value parameter for plot-intraSVs does not seem work, as only the y axis labels seem to be changed, not the values on the plot. Pistures below:

or

Best wishes,
Kuba

No module named 'eaglec.scoreUtils'

Dear author,

I am trying to run EagleC on tumor data. The .hic file is generated by juicer and converted to .cool with hic2cool of 5k, 10k and 50k resolution.

The error is listed below:

root                      INFO    @ 09/06/23 19:52:32: Predict SVs at 5kb resolution ...
Traceback (most recent call last):
  File "/data/home/jdlin/miniconda3/envs/3dgenome/bin/predictSV-single-resolution", line 276, in <module>
    run()
  File "/data/home/jdlin/miniconda3/envs/3dgenome/bin/predictSV-single-resolution", line 110, in run
    from eaglec.scoreUtils import intraPredict, interPredict
ModuleNotFoundError: No module named 'eaglec.scoreUtils'
Traceback (most recent call last):
  File "/data/home/jdlin/miniconda3/envs/3dgenome/bin/predictSV", line 176, in <module>
    run()
  File "/data/home/jdlin/miniconda3/envs/3dgenome/bin/predictSV", line 112, in run
    subprocess.check_call(' '.join(command), shell=True)
  File "/data/home/jdlin/miniconda3/envs/3dgenome/lib/python3.7/subprocess.py", line 363, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'predictSV-single-resolution -H /data/home/jdlin/Prostate/A/HiC/inter_5k.cool --balance-type ICE -O /data/home/jdlin/Prostate/A/HiC/eagleC/A.eagleC.CNN_SVs.5K.txt --genome other --output-format full -C "#" "X" --prob-cutoff 0.8 --logFile /data/home/jdlin/Prostate/A/HiC/eagleC/A.eagleC.log' returned non-zero exit status 1.

Looking forward to your reply! Thanks!

inquiries on output files

Hi,

First of all, thank you for the great tool. I am wondering there is a way to convert the output files to vcf format because I want to compare other SV callers' with eagleC's. What would be the best option for it. I was thinking about SURVIVOR after converting them to vcf.

2nd, when I plot with the command, plot-intraSVs,
Traceback (most recent call last):
File "/eaglecnglatest/bin/plot-intraSVs", line 75, in
run()
File "/eaglecnglatest/bin/plot-intraSVs", line 57, in run
chrom, interval = args.region.split(':')
AttributeError: 'NoneType' object has no attribute 'split'

and, plot-interSVs,
Traceback (most recent call last):
File "/eaglecnglatest/bin/plot-interSVs", line 62, in
run()
File "/eaglecnglatest/bin/plot-interSVs", line 54, in run
vis = interChrom(args.cool_uri, args.chroms, correct=correct)
File "eaglec/visualize.pyx", line 212, in eaglec.visualize.interChrom.init
File "/eaglecnglatest/lib/python3.8/site-packages/cooler/api.py", line 75, in init
self.filename = store.file.filename
AttributeError: 'NoneType' object has no attribute 'file'
Could you give me suggestions?

3rd, I used Raw without normalization, would it be fine?
next, even though only 4 types are well detected by the tool, can I get information of insertion??

Also, when I compare SVs from eagleC with those from others, would it be fine to compare with all SVs combined from all resolutions, 1K, 5K, 10K and 50K to those from other tools??

And, after the running eagleC, I found that multiple files and folders. What is the difference between highres.txt and .txt? Also, there was a combined.txt instead of highres.txt in 5K resolution. and multiple filders..! (Files in folders look like intermediate files for the final results) Could you give me some information for it please?

Inquiry for KeyError:'sweight'

Hello, I am Sanghyun trying to make use of EagleC.

I am just an end user of bioinformatics.

When using predictSV, I encountered an error described below.

I'm asking here because I couldn't find a similar case by referring to other topics in the issue.

I tried cooler balance, but it did not work.

I would be very grateful if you could let me know how I should begin my approach to resolve the issue.

(EagleC) sanghyun@ubuntu:/data4/sanghyun/micro-c/chuna$ predictSV --hic-5k C.mcool::/resolutions/5000 --hic-10k C.mcool::/resolutions/10000 --hic-50k C.mcool::/resolutions/50000 -O C.eagle -g other --balance-type CNV --output-format full --prob-cutoff-5k 0.8 --prob-cutoff-10k 0.8 --prob-cutoff-50k 0.99999
root INFO @ 07/21/23 09:44:38:

ARGUMENT LIST:

Cool URI at 5kb = C.mcool::/resolutions/5000

Cool URI at 10kb = C.mcool::/resolutions/10000

Cool URI at 50kb = C.mcool::/resolutions/50000

Balance Type = CNV

Reference Genome = other

Included Chromosomes = ['#', 'X']

Probability Cutoff for 5kb SVs = 0.8

Probability Cutoff for 10kb SVs = 0.8

Probability Cutoff for 50kb SVs = 0.99999

Output File Prefix = C.eagle

Output Format = full

Log file name = C.eagle.log

root INFO @ 07/21/23 09:44:38: Predict SVs at 5kb resolution ...
numexpr.utils INFO @ 07/21/23 09:44:41: Note: detected 256 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
numexpr.utils INFO @ 07/21/23 09:44:41: Note: NumExpr detected 256 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
numexpr.utils INFO @ 07/21/23 09:44:41: NumExpr defaulting to 8 threads.
root INFO @ 07/21/23 09:44:52: matched sequencing depth in human at 10Kb: 260375696.9216156
root INFO @ 07/21/23 09:44:52: Load CNN models from /data2/sanghyun/miniconda3/envs/EagleC/lib/python3.8/site-packages/eaglec/data/bulk/200M-300M ...
2023-07-21 09:44:52.291873: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2023-07-21 09:44:52.308725: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-07-21 09:44:52.342923: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
root INFO @ 07/21/23 09:44:56: Done
root INFO @ 07/21/23 09:44:56: Interemediate results at the 5kb resolution will be cached to .C.mcool.218585564.CNV.None.100000.None
eaglec.scoreUtils INFO @ 07/21/23 09:44:56: (1, 1): someone else is working on it, skip
eaglec.scoreUtils INFO @ 07/21/23 09:44:56: (10, 10): someone else is working on it, skip
eaglec.scoreUtils INFO @ 07/21/23 09:44:56: (11, 11): someone else is working on it, skip
eaglec.scoreUtils INFO @ 07/21/23 09:44:56: (12, 12): someone else is working on it, skip
Traceback (most recent call last):
File "/data2/sanghyun/miniconda3/envs/EagleC/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3802, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'sweight'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/data2/sanghyun/miniconda3/envs/EagleC/bin/predictSV-single-resolution", line 276, in
run()
File "/data2/sanghyun/miniconda3/envs/EagleC/bin/predictSV-single-resolution", line 227, in run
intra_expected_count = intraPredict(clr, cnn_models, chroms, cache_folder, seq_depth,
File "eaglec/scoreUtils.pyx", line 1263, in eaglec.scoreUtils.intraPredict
File "eaglec/scoreUtils.pyx", line 861, in eaglec.scoreUtils._intra_global_core
File "/data2/sanghyun/miniconda3/envs/EagleC/lib/python3.8/site-packages/pandas/core/frame.py", line 3807, in getitem
indexer = self.columns.get_loc(key)
File "/data2/sanghyun/miniconda3/envs/EagleC/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3804, in get_loc
raise KeyError(key) from err
KeyError: 'sweight'
Traceback (most recent call last):
File "/data2/sanghyun/miniconda3/envs/EagleC/bin/predictSV", line 176, in
run()
File "/data2/sanghyun/miniconda3/envs/EagleC/bin/predictSV", line 112, in run
subprocess.check_call(' '.join(command), shell=True)
File "/data2/sanghyun/miniconda3/envs/EagleC/lib/python3.8/subprocess.py", line 364, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'predictSV-single-resolution -H C.mcool::/resolutions/5000 --balance-type CNV -O C.eagle.CNN_SVs.5K.txt --genome other --output-format full -C "#" "X" --prob-cutoff 0.8 --logFile C.eagle.log' returned non-zero exit status 1.

Failed to download pre-trained models

Dear author, your software is great! But NOW I'm in trouble about the download address https://www.dropbox.com/s/zcir6ivvwe928yv/5M-10M.zip?dl=0 could not be accessedwget , Can you send me the 5M-10M.zip data set.

How to identify the SV between two genomes from two species

hello，Professor
I have two genome from two species and　two set of hic reads from two genomes. I want to identify SVs between two genomes from two species by hic reads. I am confused whether should I align B genome hic reads to A reference genome to generate the input cool file of EagleC？
Could you give me any suggestions?
Looking forward with your reply!

Hi-C 3.0 with DdeI+DpnII

Xiaotao,
This looks like an amazing tool and I would love to give a shot on our newer Hi-C protocol that uses DdeI and DpnII.
I know your code can work with Arima, so is there something you or I can do to make this work for Hi-C with DdeI and DpnII dual digestion?

Johan

Application to dm6

I am studying possible SVs in dm6 and I managed to do the entire analysis with default parameters at 5, 10, and 100kb.
I was wondering if it is possible to polish my analysis by doing the following:

1 - I would like to filter out from the analysis all the peri-centromeric regions and I have them annotated. Is there a way to provide a list of regions that the software shouldn't consider? The other solution would be to artificially remove all the contacts in these regions from the input .cool file. Would this work?

2 - I used the default values for the probability cutoffs at 5, 10, and 50kb. Is there a way to optimize them or I should do some kind of manual search of the best parameters?

3 - Is there any reason a prior to give ICE or RAW maps as input, or the result should not depend on this choice?

Thanks for your time,
Marco

an error with 'weight'

Hi @XiaoTaoWang ,
Thank you for the tool to identify SVs using hic data. I used Hi-C data of rats using HiC-Pro, and I was able to get three different resolution, 5K, 10K and 50K with cool format. The command I used is like this below.

predictSV --hic-5k $INPUT_LOCATION/5000/69D_hicpro_0insert600_resolution_5000
--hic-10k $INPUT_LOCATION/10000/69D_hicpro_0insert600_resolution_10000
--hic-50k $INPUT_LOCATION/50000/69D_hicpro_0insert600_resolution_50000
-O 69D_eagleC_hicpro
-g other
--balance-type ICE
--output-format full
--prob-cutoff-5k 0.8
--prob-cutoff-10k 0.8
--prob-cutoff-50k 0.99999

When I used "CNV" for the option of balance-type, I got an error related to sweight. Based on the previous issue, I changed the value into "ICE" from "CNV", and I got an error with weight. I added the error log below and could you take a look at it please? Thank you.

Traceback (most recent call last):
File "/eaglec/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3621, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'weight'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/eaglec/bin/predictSV-single-resolution", line 276, in
run()
File "/eaglec/bin/predictSV-single-resolution", line 227, in run
intra_expected_count = intraPredict(clr, cnn_models, chroms, cache_folder, seq_depth,
File "eaglec/scoreUtils.pyx", line 1263, in eaglec.scoreUtils.intraPredict
File "eaglec/scoreUtils.pyx", line 861, in eaglec.scoreUtils._intra_global_core
File "/eaglec/lib/python3.8/site-packages/pandas/core/frame.py", line 3505, in getitem
indexer = self.columns.get_loc(key)
File "/eaglec/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3623, in get_loc
raise KeyError(key) from err
KeyError: 'weight'
Traceback (most recent call last):
File "/eaglec/bin/predictSV", line 176, in
run()
File "/eaglec/bin/predictSV", line 112, in run
subprocess.check_call(' '.join(command), shell=True)
File "/eaglec/lib/python3.8/subprocess.py", line 364, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'predictSV-single-resolution -H /resolution_5000 --balance-type ICE -O eagleC_hicpro.CNN_SVs.5K.txt --genome other --output-format full -C "#" "X"

how to generate file for '--cnv-file' in plot-intraSVs fuction?

Hi, I want to use the 'plot-intraSVs' function, but how can I generate such a 'MCF7_merged.dedup.bam_ratio.bw' file for another dataset? I tried using the neoloopfinder 'calculate-cnv' to generate the '.CNV-profile.bedGraph' file, and then used 'bedGraphToBigWig' to convert it into a '.bw' file. However, in the resulting PNG, the CNV track is blank. How should I solve this problem?

predictSV have some error

when i run predictSV,i have some error.
My code: predictSV --hic-5k M1.mcool::/resolutions/5000 --hic-10k M1.mcool::/resolutions/10000 --hic-50k M1.mcool::/resolutions/50000 -O SK-N-AS -g hg38 --balance-type ICE --output-format NeoLoopFinder --prob-cutoff-5k 0.8 --prob-cutoff-10k 0.8 --prob-cutoff-50k 0.99999
My error:File "/home/marh/miniconda3/envs/EagleC/bin/predictSV-single-resolution", line 276, in
run()
File "/home/marh/miniconda3/envs/EagleC/bin/predictSV-single-resolution", line 227, in run
intra_expected_count = intraPredict(clr, cnn_models, chroms, cache_folder, seq_depth,
File "eaglec/scoreUtils.pyx", line 1263, in eaglec.scoreUtils.intraPredict
File "eaglec/scoreUtils.pyx", line 1022, in eaglec.scoreUtils._intra_global_core
File "/home/marh/miniconda3/envs/EagleC/lib/python3.8/site-packages/eaglec/utilities.py", line 380, in check_gaps_and_decays
gaps = load_gap(clr, ref_genome=ref, balance=balance)
File "/home/marh/miniconda3/envs/EagleC/lib/python3.8/site-packages/eaglec/utilities.py", line 326, in load_gap
if ref_gaps[chromlabel][ref_i]:
IndexError: index 24896 is out of bounds for axis 0 with size 24896
Traceback (most recent call last):
File "/home/marh/miniconda3/envs/EagleC/bin/predictSV", line 176, in
run()
File "/home/marh/miniconda3/envs/EagleC/bin/predictSV", line 112, in run
subprocess.check_call(' '.join(command), shell=True)
File "/home/marh/miniconda3/envs/EagleC/lib/python3.8/subprocess.py", line 364, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'predictSV-single-resolution -H M1.mcool::/resolutions/5000 --balance-type ICE -O SK-N-AS.CNN_SVs.5K.txt --genome hg38 --output-format full -C "#" "X" --prob-cutoff 0.8 --logFile SK-N-AS.log' returned non-zero exit status 1.
How to solve it,thanks you.

OSError file signature not found

Hi,

I'm trying to run EagleC for a non-model organism but I'm getting this error:

predictSV --hic-5k $PAIRS5 --hic-10k $PAIRS10 --hic-50k $PAIRS50 -O uraCya_self -g other --balance-type ICE --output-format full --prob-cutoff-5k 0.8 --prob-cutoff-10k 0.8 --prob-cutoff-50k 0.99999 --logFile uraCya_self_eaglec.log

root                      INFO    @ 09/21/22 15:45:31: 
# ARGUMENT LIST:
# Cool URI at 5kb = uraCya_HiC_matrix_5000_balanced.cool
# Cool URI at 10kb = uraCya_HiC_matrix_10000_balanced.cool
# Cool URI at 50kb = uraCya_HiC_matrix_50000_balanced.cool
# Balance Type = ICE
# Reference Genome = other
# Included Chromosomes = ['#', 'X']
# Probability Cutoff for 5kb SVs = 0.8
# Probability Cutoff for 10kb SVs = 0.8
# Probability Cutoff for 50kb SVs = 0.99999
# Output File Prefix = uraCya_self
# Output Format = full
# Log file name = uraCya_self_eaglec.log
root                      INFO    @ 09/21/22 15:45:31: Predict SVs at 5kb resolution ...
Traceback (most recent call last):
  File "/home/vpeona/.conda/envs/EagleC/bin/predictSV-single-resolution", line 276, in <module>
    run()
  File "/home/vpeona/.conda/envs/EagleC/bin/predictSV-single-resolution", line 116, in run
    clr = cooler.Cooler(args.hic)
  File "/home/vpeona/.conda/envs/EagleC/lib/python3.8/site-packages/cooler/api.py", line 80, in __init__
    self._refresh()
  File "/home/vpeona/.conda/envs/EagleC/lib/python3.8/site-packages/cooler/api.py", line 84, in _refresh
    with open_hdf5(self.store, **self.open_kws) as h5:
  File "/home/vpeona/.conda/envs/EagleC/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/home/vpeona/.conda/envs/EagleC/lib/python3.8/site-packages/cooler/util.py", line 576, in open_hdf5
    fh = h5py.File(fp, mode, *args, **kwargs)
  File "/home/vpeona/.conda/envs/EagleC/lib/python3.8/site-packages/h5py/_hl/files.py", line 406, in __init__
    fid = make_fid(name, mode, userblock_size,
  File "/home/vpeona/.conda/envs/EagleC/lib/python3.8/site-packages/h5py/_hl/files.py", line 173, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 88, in h5py.h5f.open
OSError: Unable to open file (file signature not found)
Traceback (most recent call last):
  File "/home/vpeona/.conda/envs/EagleC/bin/predictSV", line 176, in <module>
    run()
  File "/home/vpeona/.conda/envs/EagleC/bin/predictSV", line 112, in run
    subprocess.check_call(' '.join(command), shell=True)
  File "/home/vpeona/.conda/envs/EagleC/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'predictSV-single-resolution -H uraCya_HiC_matrix_5000_balanced.cool --balance-type ICE -O uraCya_self.CNN_SVs.5K.txt --genome other --output-format full -C "#" "X" --prob-cutoff 0.8 --logFile uraCya_self_eaglec.log' returned non-zero exit status 1.

Can you help me to understand how to fix it?

Thank you!
Valentina

EagleC detecting too few SVs

Hello,

I am using EagleC to call SVs on some HiChIP data for cancer cell lines. For most of my samples it is detecting <50 SVs. WGS data for the some of these samples, however, shows >1000 SVs. I am starting my analysis with HiC-pro valid pair files (with ~ 75M pairs each), which I convert to cool, balance and input to EagleC. I am running everything with default parameters. I was wondering if you have any comments or things I can try?

Thanks

predictSV-ERROR: sequence item 2: expected str instance, NoneType found

Hi,

I got an error when I ran predictSV:

predictSV --hic-50k ./mcools/qpca0322082203.mcool::/resolutions/50000 -O EagleC_output -g hg19 --balance-type Raw --output-format full --prob-cutoff-50k 0.95

root                      INFO    @ 11/29/22 10:56:41: 
# ARGUMENT LIST:
# Cool URI at 5kb = None
# Cool URI at 10kb = None
# Cool URI at 50kb = ./mcools/qpca0322082203.mcool::/resolutions/50000
# Balance Type = Raw
# Reference Genome = hg19
# Included Chromosomes = ['#', 'X']
# Probability Cutoff for 5kb SVs = 0.8
# Probability Cutoff for 10kb SVs = 0.8
# Probability Cutoff for 50kb SVs = 0.95
# Output File Prefix = EagleC_output
# Output Format = full
# Log file name = EagleC_output.log
root                      INFO    @ 11/29/22 10:56:41: Predict SVs at 5kb resolution ...
Traceback (most recent call last):
  File "/share/home/hxie/miniconda3/envs/EagleC/bin/predictSV", line 176, in <module>
    run()
  File "/share/home/hxie/miniconda3/envs/EagleC/bin/predictSV", line 112, in run
    subprocess.check_call(' '.join(command), shell=True)
TypeError: sequence item 2: expected str instance, NoneType found

Thanks for your help!

Same results with ICE and CNV normalized data

Hi Xiaotao,

Wanted to update you with running EgaleC on our own datasets.

I used ICE normalized data (output from cooler with balance) and CNV normalized data (sequentially run neoloopfinder calculate-cnv, segment-cnv, and correct-cnv), but I got exactly same predictions. ICE data ran about 2h while CNV data ran about 24h (per run). Could you help to point what's the problem there?

I used the same data to run predictSV for 1 time and 16time, and both running will get the combined prediction. It looks like 1 time running yields same predictions as 16 times running. Is this I am supposed to expect?

Thank you!

Download manually for HPC

Hi,

Typically, the HPC internet is disabled. How did I download the pretrained models manually in my laptop and then upload it to the offline HPC?

"TypeError: sequence item 18: expected str instance, NoneType found", continue with merge-redundant-SVs?

Hi,

I ran into this error while running EagleC with "--output-format NeoLoopFinder", this error does not occur with "--output-format full"
It seems to be a similar issue to issue #15 , but at a different step during predictSV.
Input .mcool is CNV balanced by NeoLoopFinder, and this happens for both "hg38" and "other" (mm10)

Input code:

predictSV --hic-5k Sample.mcool::resolutions/5000 \
--hic-10k Sample.mcool::resolutions/10000 \
--hic-50k Sample.mcool::resolutions/50000 \
-O Sample_EagleC_predictSV \
-g hg38 \
--balance-type CNV \
--output-format NeoLoopFinder \
--prob-cutoff-5k 0.8 --prob-cutoff-10k 0.8 --prob-cutoff-50k 0.99999

After the breakpoints are found for each resolution, I get the *_SVs.5K.txt, *_SVs.10K.txt, and *_SVs.50K.txt files, but predictSV then throws a traceback error when trying to merge 10kb and 5kb.

2023-09-06 02:53:57.050786: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): 
INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype double and shape [2,21,21,1]
         [[{{node Placeholder/_0}}]]
1/1 [==============================] - 0s 5ms/step
root                      INFO    @ 09/06/23 02:53:58: Locate 10kb SV coordinates on the 5kb matrix ...
Traceback (most recent call last):
  File "/home/user/software/anaconda3/envs/eaglec/bin/predictSV", line 176, in <module>
    run()
  File "/home/user/software/anaconda3/envs/eaglec/bin/predictSV", line 130, in run
    subprocess.check_call(' '.join(command), shell=True)
TypeError: sequence item 18: expected str instance, NoneType found

Do I have to re-run predictSV with the cached files to get the combined list, or can I get the equivalent of finishing predictSV properly by using merge-redundant-SV with the 3 resolution files?

Thanks!

xiaotaowang / eaglec Goto Github PK

eaglec's People

Contributors

Stargazers

Watchers

Forkers

eaglec's Issues

ARGUMENT LIST:

Cool URI at 5kb = 1881_BME.mcool::/resolutions/5000

Cool URI at 10kb = 1881_BME.mcool::/resolutions/10000

Cool URI at 50kb = 1881_BME.mcool::/resolutions/50000

Balance Type = CNV

Reference Genome = other

Included Chromosomes = ['#', 'X']

Probability Cutoff for 5kb SVs = 0.8

Probability Cutoff for 10kb SVs = 0.8

Probability Cutoff for 50kb SVs = 0.99999

Output File Prefix = SK-N-AS

Output Format = full

Log file name = eaglec.log

ARGUMENT LIST:

Cool URI at 5kb = SKNAS-MboI-allReps-filtered.mcool::/resolutions/5000

Cool URI at 10kb = SKNAS-MboI-allReps-filtered.mcool::/resolutions/10000

Cool URI at 50kb = SKNAS-MboI-allReps-filtered.mcool::/resolutions/50000

Balance Type = ICE

Reference Genome = hg38

Included Chromosomes = ['#', 'X']

Probability Cutoff for 5kb SVs = 0.8

Probability Cutoff for 10kb SVs = 0.8

Probability Cutoff for 50kb SVs = 0.99999

Output File Prefix = SK-N-AS

Output Format = full

Log file name = SK-N-AS.log

ARGUMENT LIST:

Cool URI at 5kb = C.mcool::/resolutions/5000

Cool URI at 10kb = C.mcool::/resolutions/10000

Cool URI at 50kb = C.mcool::/resolutions/50000

Balance Type = CNV

Reference Genome = other

Included Chromosomes = ['#', 'X']

Probability Cutoff for 5kb SVs = 0.8

Probability Cutoff for 10kb SVs = 0.8

Probability Cutoff for 50kb SVs = 0.99999

Output File Prefix = C.eagle

Output Format = full

Log file name = C.eagle.log

Recommend Projects

Recommend Topics

Recommend Org