koszullab / chromosight Goto Github PK
View Code? Open in Web Editor NEWComputer vision based program for pattern recognition in chromosome (Hi-C) contact maps
Home Page: https://chromosight.readthedocs.io
License: Other
Computer vision based program for pattern recognition in chromosome (Hi-C) contact maps
Home Page: https://chromosight.readthedocs.io
License: Other
The sparse version of the pipeline is functional as of 03fa26a but a few things could be improved:
chromovision create-kernel
) to generate template kernel config to help users making new configs.Hello,
I had a go with chromosight 1.4.0 enabling the --inter option. Unfortunatelly I exceeded my RAM limit which is 400GB.
I was wondering if there is a way to analyse each individual interchromosome matrix separatelly and so try to fit in memory.
My datatset (the .cool file) comes from a HiC experiment done in human (GRCh38) and this is how I run chromosight:
/mnt/lustre/scratch/SOFTWARE/miniconda3/bin/chromosight \
detect \
--threads=4 \
--inter \
/mnt/lustre/scratch/results.test/matrix/test.matrix.cool \
loops_all
Any advise will be much appreciated.
Regards
Jorge
Hi:
Thanks for your great chromosight! We have used chromosight widely in a plant's Hi-C. Now, I am wondering how to compare the loops?For example, if we got two Hi-C samples under different conditions.
Option 1: we can call loops for A and B Hi-C separately. Then we can merged the LoopA and LoopB (sometimes, loops results differ too much which is inconsistent with hic results in juicebox ). Then we can use chromosight quantify to calculate similarity to kernel. Just DEG analysis in RNA-Seq, can we identify those significantly gain/loss loops in different conditions?
Option 2: we can merge the two hic files and call loops in an artificial merged hic samples.
In summary, how to call loops for two or more Hi-Cs and how to compare them?
Thank you for your time.
Best wishes!
Linhua
Hello!
I used chromosight quantify to compare two conditions (as describe in the documentation) and it worked overall, but I saw that some rows had empty score, pvalue and qvalue fields. Do you know why this happens for some rows? Does this correspond to masked bins in the HiC matrix?
Should I remove these rows for the rest of the analysis?
Best,
Perrine
Hi
I used chromosight to detect hairpins in our dataset, but I got a large number of hairpins, that's not normal, here is my report. Do you have any suggestions, is there any parameters that I set wrong?
detect --threads 10 --pattern hairpins --min-dist 50000 --max-dist 2000000 gm12878_microc.mcool::resolutions/10000 /gshare/xielab/wuhg/published/Micro-C_hairpins
pearson set to 0.1 based on config file.
min_separation set to 5000 based on config file.
max_perc_undetected set to 75.0 based on config file.
max_perc_zero set to 10.0 based on config file.
Matrix already balanced, reusing weights
Found 261150 / 308839 detectable bins
Preprocessing sub-matrices...
[====================] 100.0% chrY-chrY
Sub matrices extracted
Detecting patterns...
[--------------------] 0.0% Kernel: 0, Iteration: 0
[====================] 100.0% Kernel: 0, Iteration: 0
Minimum pattern separation is : 1
186221 patterns detected
Saving patterns in /gshare/xielab/wuhg/published/Micro-C_hairpins.tsv
Saving patterns in /gshare/xielab/wuhg/published/Micro-C_hairpins.json
Saving pileup plots in /gshare/xielab/wuhg/published/Micro-C_hairpins.pdf
Hello! Firstly, thank you for fixing error with pandas)
Then, I have tried to get known how do you exactly calculate Pearson correlation. It seems that in the article you wrote about correlation, but in the code the convolution is implemented. I don't get it...
Why do I need it? I want to know, if the detalization of kernel is so important. For instance, in one issue you proposed to generate the corner of TAD as a picture of ones and zeroes. But, obviously, the "real" corner is much more complicated. So, do I have to try to generate as detailed kernel as possible, or can I use rough one? Will it affect results?
Hi!
I run into the following issue. When I execute Chromosight v1.4.1, I get slightly different number of loops (<100 differences) each time I run the calculations for the exact same Hi-C matrix. I wonder why this might happen and whether there is a way to make it robust in detection (is there a seed somewhere?).
Best,
Mikhail
Dear chromosighter,
I have a few questions regarding the win-size option for chromosight detect and quantify.
Best regards,
Hello Chromosight developers,
This is not an issue. It's just I would like to know if Chromosight is able to call peaks between different chromosomes.
Thanks so much
Jorge
Hi:
Thank you very much for developing such an excellent software. It's gonna be so awesome. I am testing chromosight in Hi-C datasets from plants. I am interested in whether chromosight can be applied to quantify the chromatin interactions (strength) on domains (or say TAD, TAD-like domains, those local triangle-shaped domains). Like, we can score the chromatin strength of targeted regions in sample A and sample B. Thus, we can integrated the Hi-C datasets with RNA-Seq etc.
Thanks.
Linhua
Currently, the whole matrix is loaded into the RAM as a sparse array for balancing. This can be an issue for extremely large datasets. An alternative would be to:
This would allow to have only as many sub-matrices in memory as there are parallel processes.
It would be more convenient to repurpose the contact_map object to hold only single chromosomes (with their start bin, pixels, undetectable bins and temporary file path) and to have a different object (e.g. genome_object) which will store those contact_map object (and remember their original order.
This would allow to completely separate the preprocessing step from loading the file instead of doing the chromosome splitting in the contact_map constructor.
Thank you for making available this great tool! I've been trying to apply the quantify
module for scoring interaction strength at pairs of transcription factor binding sites using the supplied loop kernel, but the scores are all strictly positive unlike what's been shown for Rad21 in the tutorial or your manuscript.
Could you please help me troubleshoot what I'm doing wrong? I used the following command on the files here
chromosight quantify --pattern loops sites.bed2d 5000.cool out
Hello, thank you for this tool - I just tried to use it to call loops with default settings and the results look very good! The only issue is that I have a feeling that the reported coordinates are shifted by ~1-2 pixels relative to the actual highest point in the peak. Here is an example three calls, all just off the pixel with the highest value. Is it possible chromosight selects a wrong pixel from the neighborhood?
Do you observe anything like this?
I just used the file from here and called at 5 kb resolution, if you want to reproduce this, and looked in higlass. https://data.4dnucleome.org/files-processed/4DNFIFLDVASC/ I installed chromosight using conda.
Thanks again for developing and sharing this tool!
Hello,
I detected borders pattern using chromosight detect and then wanted to quantify those borders scores across different conditions. However, when running chromosight quantify on the initial matrix, border scores were much lower than those found using chromosight detect (except for the first border detected). Is this result expected ?
Here are the command line I used (with Chromosight version 1.4.1):
chromosight detect --pattern=borders --threads 6 --min-separation 20000 -p 0.5 -W 9 matrix.cool test/borders_detect
chromosight quantify -W 9 --pattern=borders test/borders_detect.tsv matrix.cool test/borders_quantify
Bellow are parts of the output files:
Chromosight detect output:
chrom1 start1 end1 chrom2 start2 end2 bin1 bin2 kernel_id iteration score pvalue qvalue
chromosome_ref 1315000 1320000 chromosome_ref 1315000 1320000 263 263 0 0 0.8382847806 0.0000000000 0.0000000000
chromosome_ref 110000 115000 chromosome_ref 110000 115000 22 22 1 0 0.7745989939 0.0000000000 0.0000000001
chromosome_ref 205000 210000 chromosome_ref 205000 210000 41 41 1 0 0.7087168920 0.0000000331 0.0000000777
chromosome_ref 260000 265000 chromosome_ref 260000 265000 52 52 1 0 0.5554020576 0.0000748932 0.0000818600
chromosome_ref 505000 510000 chromosome_ref 505000 510000 101 101 1 0 0.5250500651 0.0001567472 0.0001601547
chromosome_ref 710000 715000 chromosome_ref 710000 715000 142 142 1 0 0.7006494326 0.0000000181 0.0000000448
chromosome_ref 780000 785000 chromosome_ref 780000 785000 156 156 1 0 0.5201083470 0.0001869179 0.0001869179
chromosome_ref 1030000 1035000 chromosome_ref 1030000 1035000 206 206 1 0 0.6466182488 0.0000006142 0.0000011547
Chromosight quantify output:
chrom1 start1 end1 chrom2 start2 end2 bin1 bin2 score pvalue qvalue
chromosome_ref 1315000 1320000 chromosome_ref 1315000 1320000 263 263 0.8382847806 0.0000000000 0.0000000000
chromosome_ref 110000 115000 chromosome_ref 110000 115000 22 22 0.2937926314 0.0497858994 0.1559958182
chromosome_ref 205000 210000 chromosome_ref 205000 210000 41 41 0.3053649535 0.0488575488 0.1559958182
chromosome_ref 260000 265000 chromosome_ref 260000 265000 52 52 0.2115458335 0.1743224566 0.3724161573
chromosome_ref 505000 510000 chromosome_ref 505000 510000 101 101 -0.1663934145 0.2763862300 0.4811167707
chromosome_ref 710000 715000 chromosome_ref 710000 715000 142 142 0.2302008405 0.1287384415 0.3184582499
chromosome_ref 780000 785000 chromosome_ref 780000 785000 156 156 0.2003301451 0.1881451788 0.3770751398
chromosome_ref 1030000 1035000 chromosome_ref 1030000 1035000 206 206 -0.0722279965 0.6391357794 0.7952778302
Could you upload basic usage of 'point and click' mode?
I used config-generate function, but got an error as below:
sora@server:my/path/to/chromosight$ chromosight generate-config --threads 10 --click myfile.mcool::resolutions/5000 my_prefix
Matrix already balanced, reusing weights
Found 500267 / 545114 detectable bins
Preprocessing sub-matrices...
[====================] 100.0% chrY-chrY
Sub matrices extracted
Traceback (most recent call last):
File "/home/sora/.local/bin/chromosight", line 8, in
sys.exit(main())
File "/home/sora/.local/lib/python3.8/site-packages/chromosight/cli/chromosight.py", line 961, in main
cmd_generate_config(args)
File "/home/sora/.local/lib/python3.8/site-packages/chromosight/cli/chromosight.py", line 536, in cmd_generate_config
windows = click_finder(processed_mat, half_w=int((win_size - 1) / 2))
File "/home/sora/.local/lib/python3.8/site-packages/chromosight/utils/plotting.py", line 139, in click_finder
plt.imshow(mat.toarray(), cmap="afmhot_r", vmax=np.percentile(mat.data, 95))
File "/mnt/data0/apps/anaconda/Anaconda2-5.2/envs/py38/lib/python3.8/site-packages/scipy/sparse/compressed.py", line 1029, in toarray
out = self._process_toarray_args(order, out)
File "/mnt/data0/apps/anaconda/Anaconda2-5.2/envs/py38/lib/python3.8/site-packages/scipy/sparse/base.py", line 1185, in _process_toarray_args
return np.zeros(self.shape, dtype=self.dtype, order=order)
MemoryError: Unable to allocate 2.16 TiB for an array with shape (545114, 545114) and data type float64
I guess there is an memory issue. Could you suggest any solution for this??
Hi,
I am trying to compare the results I got from hicExplorer TAD prediction to chromosight. However, I am having trouble using my matrix. I exported the matrix to .cool as described in the instructions. However, i get the error:
"""
Traceback (most recent call last):
File "/home/vtracann/miniconda3/envs/chromosight/lib/python3.10/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/home/vtracann/miniconda3/envs/chromosight/lib/python3.10/site-packages/chromosight/cli/chromosight.py", line 610, in _detect_sub_mat
chrom_patterns, chrom_windows = cid.pattern_detector(
File "/home/vtracann/miniconda3/envs/chromosight/lib/python3.10/site-packages/chromosight/utils/detection.py", line 273, in pattern_detector
mat_conv = preproc.diag_trim(mat_conv.tocsr(), contact_map.max_dist)
File "/home/vtracann/miniconda3/envs/chromosight/lib/python3.10/site-packages/chromosight/utils/preprocessing.py", line 119, in diag_trim
trimmed = sp.tril(mat, n, format="csr")
File "/home/vtracann/miniconda3/envs/chromosight/lib/python3.10/site-packages/scipy/sparse/_extract.py", line 100, in tril
mask = A.row + k >= A.col
TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'
"""
I downloaded the example.cool file and ran it in parallel for debugging
I did some debugging on my end. And I got to the step where in the keep_distance function within contact_map.py.
Here, self.max_dist is None. Therefore, mat_max_dist = self.matrix.shape[0]+self.largest_kernel
(10607 + 3)
This value gets moved around (goes as max_dist in preprocess.detrend
) but at no point self.max_dist
is set to a different value. At the end (the error message) is because matrix.max_dist
is never changed from None and it gets fed to numpy.tril
that throws the error.
I checked my cool file with cooler info and the main difference compared to example.cool seems to be:
"bin-size": null,
"bin-type": "variable",
versus:
"bin-size": 1000,
"bin-type": "fixed",
For the example.cool file. I am not 100% sure that is the origin but that is my best guess atm.
In order to make the tool work, I changed self.max_dist = self.keep_distance
in contats_map.detrend
and it seems to work now. However, I am not sure what is the effect of this change on the overall results as I am not familiar with the role of max_dist in the overall tool. Do you think this workaround is likely to affect the final results in a major way?
Cheers,
Vittorio
Thanks for a great software!
For chromosight quantify , is there a way to use multiresolution .mcool files directly (specifying the desired resolution)?
Or does chromosight only take uni-resolution .cool files as input?
Hi: I found chromosight detect loops quite well in my Hi-C datasets. I found resolution
in loops.json and loops_small.json are both defined as 2000. I am wondering whether I should adjust the value if I used 5-kb or 10-kb resolution cooler files?
Another question: I found artificial_template_loops_type1.txt
is just like a pileup heatmap, each pixel has specific value. I wonder if these values in the matrix will affect the results of loop calling by the default options? Do we need to refine these values?
I also tried to use the --click option from generate-config in order to manually build the kernel by double-clicking on relevant regions in a Hi-C matrix. But it is quite consuming. So failed.
Thanks!
When the input cool file does not have a constant bin size, Chromosight crashes with an obscure error message.
We should either provide an informative error message, and/or allow overriding this behaviour to support variable bin size, maybe with a warning saying that the detrending and pattern calls may be less reliable.
Hello, thank you for the nice work~
I'm interested in how you evaluate your detected loops? In the paper, you mentioned that it was calculated through p-value
However, have you compared your result with the ground truth annotation?
For example, you detected a loop, let's say "loop No.1", so, how would you classify that your detected "loop No.1" into a True Positive detection? From my perspective, I think the detected "loop No.1" should be somehow "close" to the ground truth loop "Ground Truth loop No.1" before it can be classified into a correct detection.
Hope that you could help me thank you!
Now that the software works properly, we should do the following:
Perhaps also make a conda package
Hi
I'm a green hand of Hi-C.
What are the recommended parameters of chromosight detect --pattern borders
?
Especially min-dist
and min-dist
.
What I have used is
chromosight detect \
--threads 8 \
--min-dist 10000 \
--min-dist 1000000 \
--pattern $pattern \
$PAIRS_FILE \
$output_prefix
But the contact map plotted by .tsv
file is strange.
Thanks.
Hi,
thanks for the handy tool!
I wanted to ask if you could provide some code on how to label the loops, e.g. based on the order from the output file. This would make it easer to extract the coordinates from the file of a single loop I am interested in.
Also, how do I plot the chromosome coordinates on the axes instead of the bin number?
Thank you in advance!
Hello,
I was wondering how the detection step was deciding which coordinates to report. We have a gold standard set of loops that were manually annotated and I am trying to find the best overlap with those. Some loops were not found by detect
even if when using quantify
on with these sets of coordinates, they had a score that should have passed the Pearson threshold. They are not close to other loops that could have interfered.
I also saw that when just changing the Pearson parameter while running detect
, I get some unexpected behaviour: when I lower the threshold, I get more loops called, but I also lose some of them.
Do you know why this is happening?
Best,
Hello Cyril,
I have run chromosight 1.4.0. It did finishe but it seems to me it only processed one matrix, the EBV contact matrix that corresponds to Human_herpesvirus_4 in the GRCh38 fasta reference file. Is there something I am missing to set?
This the log:
pearson set to 0.3 based on config file.
min_separation set to 5000 based on config file.
max_perc_undetected set to 50.0 based on config file.
max_perc_zero set to 10.0 based on config file.
Matrix already balanced, reusing weights
Preprocessing sub-matrices...
[====================] 100.0% EBV-EBV
Detecting patterns...
[--------------------] 0.0% Kernel: 0, Iteration: 0
[--------------------] 0.0% Kernel: 0, Iteration: 0
No pattern detected ! Exiting.
This is how I run the program:
/mnt/lustre/scratch/SOFTWARE/miniconda3/bin/chromosight \
detect \
--threads=8 \
--min-dist 20000 --max-dist 200000 \
/mnt/lustre/scratch/results.test/matrix/test.matrix.cool \
loops_H2087_intra
Thanks so much
Jorge
Hi,
I read the Chromosight paper, and I have a question about the loop calling on GM12878 Hi-C data. Below were the two statements about these results from the paper.
Presentation and benchmark of Chromosight.
For instance, Chromosight found 85% of the loops detected by Cooltools, the software with the highest precision in our benchmark, while overall identifying a much larger number of loops (37,955 vs. 6264, respectively) (Supplementary Fig. 3c).
Exploration of various genomes and patterns.
With default parameters, Chromosight identified 18,839 loops (compared to !10,000 detected in ref. 6) whose anchors fall mostly (~ 58%, P < 10!16) into loci enriched in cohesin subunit Rad21 (Fig. 3b).
If I understand correctly, you applied Chromosight to call loops on the same dataset in Rao et al., Cell 2014. So, I was wondering why Chromosight yielded different results (37955 vs. 18839). I would be grateful if you could clarify this point.
Thank you so much.
Jinakun
Dear Chromosight developers:
First I'd like to thank you guys for developing such an excellent tool to call loop, and it is undoubtedly one of the best and popular loop callers among the users!
It is really necessary to call valid and precise loop to perform downstream analysis, so I am trying to apply chromosight to my Micro-C datasets, which is about 150M contacts for the mouse genome. I know it's a bit of an awkward size because it is a bit smaller than the lowest recommended size, but I still want to have a try. I have read about the closed issues and get to know that I may need to adjust the parameters (perc-zero, perc-undetected, pearson). But what can I do to assess the quality of loops called under different parameters? (I can only come up with this --- visualize the map and see with my eyes ). Can you give me some instructions on fine-tuning and assessing the parameters and the outcome? Or could you please share some professional experience?
Best wishes!
Woody
When looping over foci in the heatmap, we use np.array
and np.where
on each iteration. This is a performance bottleneck and we should use a different method to find local maxima in foci. Look into:
https://docs.scipy.org/doc/scipy-0.16.0/reference/ndimage.html#module-scipy.ndimage.measurements
Dear chromosighter,
I am running quantify on 1K-binned matrices (drosophila genome) for a set of 140 loops. I tried scaling the pattern by a factor of 2 or 3 to determine the best factor (normal size works well on 5K-binned data).
However, while the quantification is done after 35 min for the smallest matrices, after 72h it is still not done for the other ones (while they were at 16 or 33% after 35min according to the log, and didn't change after that, the process was killed after 72h).
I was wondering if you ever had such a problem with bigger data, and if you know the time complexity of the algorithm (to have an idea of the time limit I should put depending on the data). Do you know where the algorithm could be stuck for so long?
Best,
Hello,
I am trying to use chromosight detect
in a snakemake pipeline, and I have a problem that there is no way to change the default output file name. I could create a new folder for each analysis, and maybe automatically move/rename the files, but that's not the most convenient approach... Is there a reason why it's impossible to customize the file names? (or file prefix, since chromosight detect
always saves at least two files)
Thanks!
Ilya
Hello,
Thanks for developing such a great tool! When I use chromosight, I have a question in mind. Chromosight uses a loop kernel to detect loops from a Hi-C contact map. But where does this loop kernel come from? Does the loop kernel come from a Hi-C data set or somewhere else?
hi,
I am about to try chromosight on my data and I just want to double check what hic contact map I should provide. From the publication I understand the raw counts should be given as the ICE normalization will be made internally. Is this correct? Or should I give my corrected version ?
Also, would you have a good rationale to guess the resolution I should use (5K, 10K) ?
NB: My count matrices were generated using the hicexplorer package.
Hi,
I am wondering how you calculate the loop score?
thanks
Hello Cyril,
I have noticed that in my tsv output for chromosight v.1.4.0 the qvalues are always smaller than the pvalues. I would expect the opposite. Should I still filter based on the qvalues?
These are the first lines of my output:
chrom1 start1 end1 chrom2 start2 end2 bin1 bin2 kernel_id iteration score pvalue qvalue
1 9800000 9900000 1 9900000 10000000 98 99 0 0 0.3970866534 0.0000000617
1 12300000 12400000 1 12500000 12600000 123 125 0 0 0.4051226690 0.0000002507
1 20300000 20400000 1 20400000 20500000 203 204 0 0 0.4270293101 0.0000000041
1 33800000 33900000 1 34000000 34100000 338 340 0 0 0.3583274096 0.0000004544
1 39400000 39500000 1 39500000 39600000 394 395 0 0 0.4120016497 0.0000000167
This is how I run chromosight:
/mnt/lustre/scratch/SOFTWARE/miniconda3/bin/chromosight \
detect \
-z 100 -u 100 \
--threads=1 \
--inter \
/mnt/lustre/scratch/results.test/matrix/test.25000.cool \
loops_test_25000
Thanks so much
Jorge
Hi, which inputs are we supposed to give to the simulated data?
Particularly in this part of the file:
# Path to positions of borders in the experimental matrix (in bins,
# relative to chromosome start)
borders_pos = np.loadtxt(sys.argv[3])
Hi:
I am wondering whether chromosight can be used to detect chromatin loops based on restriction fragments level (1f, 2f, etc) HI-C matrix? Such as store Hi-C matrix in cooler format in restriction fragments level. I found using like 1-kb Hi-C matrix in cooler format to identify loops will make the Inaccurate anchors (if I am interested in small-scale loops). The Hi-C library was generated by MboI.
Thanks!
Best wishes!
Linhua
On the to do list (an easy one!)
Hello Cyril,
I have run chromosight 1.4.1 and I encountered the following error. This is the full log. I have used the same input .cool file on 1.4.0. Could you possibly think on something I'm doing wrong?
pearson set to 0.3 based on config file.
min_separation set to 5000 based on config file.
max_perc_undetected set to 50.0 based on config file.
max_perc_zero set to 10.0 based on config file.
WARNING: Detection on interchromosomal matrices is expensive in RAM
Matrix already balanced, reusing weights
Preprocessing sub-matrices...
[====================] 100.0% EBV-EBV
Detecting patterns...
[--------------------] 0.0% Kernel: 0, Iteration: 0
[====================] 100.0% Kernel: 0, Iteration: 0
Traceback (most recent call last):
File "/mnt/lustre/scratch/SOFTWARE/miniconda3/bin/chromosight", line 8, in
sys.exit(main())
File "/mnt/lustre/scratch/SOFTWARE/miniconda3/lib/python3.8/site-packages/chromosight/cli/chromosight.py", line 950, in main
cmd_detect(args)
File "/mnt/lustre/scratch/SOFTWARE/miniconda3/lib/python3.8/site-packages/chromosight/cli/chromosight.py", line 793, in cmd_detect
pval_mask = np.isnan(all_coords.pvalue)
File "/mnt/lustre/scratch/SOFTWARE/miniconda3/lib/python3.8/site-packages/pandas/core/generic.py", line 1935, in array_ufunc
return arraylike.array_ufunc(self, ufunc, method, *inputs, **kwargs)
File "/mnt/lustre/scratch/SOFTWARE/miniconda3/lib/python3.8/site-packages/pandas/core/arraylike.py", line 358, in array_ufunc
result = getattr(ufunc, method)(*inputs, **kwargs)
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
This is how I have run chromosight
export PATH=/mnt/lustre/scratch/SOFTWARE/miniconda3/bin:$PATH
/mnt/lustre/scratch/SOFTWARE/miniconda3/bin/chromosight \
detect \
--threads=1 \
--inter \
--min-dist 50000 --max-dist 200000 \
/mnt/lustre/scratch/test.matrix.cool \
loops_output
Thanks so much
Jorge
(python37) [wuhg@mgt module]$ chromosight detect --pattern hairpins --min-dist 50000 --max-dist 2000000 SC_035.mcool::/resolutions/10000 SC_035_hairpins
pearson set to 0.1 based on config file.
min_separation set to 5000 based on config file.
max_perc_undetected set to 75.0 based on config file.
max_perc_zero set to 10.0 based on config file.
Whole genome matrix balanced
Found 12393 / 272566 detectable bins
Preprocessing sub-matrices...
[====================] 100.0% Y-Y
Sub matrices extracted
Detecting patterns...
[--------------------] 0.0% Kernel: 0, Iteration: 0
[====================] 100.0% Kernel: 0, Iteration: 0
Minimum pattern separation is : 1
Traceback (most recent call last):
File "/share/home/wuhg/.local/bin/chromosight", line 8, in
sys.exit(main())
File "/share/home/wuhg/.local/lib/python3.7/site-packages/chromosight/cli/chromosight.py", line 959, in main
cmd_detect(args)
File "/share/home/wuhg/.local/lib/python3.7/site-packages/chromosight/cli/chromosight.py", line 802, in cmd_detect
pval_mask = np.isnan(all_coords.pvalue)
File "/share/home/wuhg/.local/lib/python3.7/site-packages/pandas/core/generic.py", line 1936, in array_ufunc
return arraylike.array_ufunc(self, ufunc, method, *inputs, **kwargs)
File "/share/home/wuhg/.local/lib/python3.7/site-packages/pandas/core/arraylike.py", line 358, in array_ufunc
result = getattr(ufunc, method)(*inputs, **kwargs)
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
Thank you for creating Chromosight, it is such a great tool. Are there any plans to create a forth pattern option for detection, which will be a TAD (a classic triangle-shaped domain)? Or, how easy it would be for a user to create it as a custom pattern?
I think this could be a cool thing to add since TADs have been shown to be not continuous along chromosomes of some species. In these cases, simply calling boundaries and calling the space between two boundaries a "TAD" doesn't always have biological meaning.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.