Comments (5)
Dear User,
I am glad to know cLoops can work nicely with typical sized Hi-C datasets for you.
Sorry for the potential problem. Here are some points I want to mention.
1. cLoops implemented a variant of DBSCAN, so maybe the issue from scikit-learn you mentioned maybe not the solution to this problem.
2. For the parallel computing bugs due to joblib, I found it will be fix with specific joblib version, joblib==0.11 works very well for all data and linux I used. So maybe there is no need to wrap all the 'Parallel()' in pipe.py .
3. according to error logger, the clustering process is finished. May I ask how many CPU used and the memory of your machine? If there is some out-of-memory issues for other chromosomes, it will also cause the parallel problem.
Hope the suggestions can fix this. Please let me know the feedback.
Thank you.
Best,
Yaqiang
Everything seems to work nicely with typical sized Hi-C datasets but when attempting to run on something larger (e.g., ~4e9 contacts genome-wide) with
-eps 5000,10000 -minPts 50,100 -hic
, the following sort of issue pops up:Clustering chr8 and chr8 finished. Estimated 43365022 self-ligation reads and 5506751 inter-ligation reads Traceback (most recent call last): File "/local/anaconda3/envs/cloops/bin/cLoops", line 8, in <module> sys.exit(main()) File "/local/anaconda3/envs/cloops/lib/python2.7/site-packages/cLoops/pipe.py", line 352, in main hic, op.washU, op.juice, op.cut, op.plot, op.max_cut) File "/local/anaconda3/envs/cloops/lib/python2.7/site-packages/cLoops/pipe.py", line 250, in pipe dataI_2, dataS_2, dis_2, dss_2 = runDBSCAN(cfs, ep, m, cut, cpu) File "/local/anaconda3/envs/cloops/lib/python2.7/site-packages/cLoops/pipe.py", line 118, in runDBSCAN for f in fs) File "/local/anaconda3/envs/cloops/lib/python2.7/site-packages/joblib/parallel.py", line 789, in __call__ self.retrieve() File "/local/anaconda3/envs/cloops/lib/python2.7/site-packages/joblib/parallel.py", line 699, in retrieve self._output.extend(job.get(timeout=self.timeout)) File "/local/anaconda3/envs/cloops/lib/python2.7/multiprocessing/pool.py", line 572, in get raise self._value multiprocessing.pool.MaybeEncodingError: Error sending result: '[(('chr8', 'chr8'), 'hic/chr8-chr8.jd', ...
Based on the scikit-learn/scikit-learn#8920, I wrapped all the
Parallel()
in pipe.py inside with-blocks using the "threading" back-end and seems to have gotten around the error.My question is whether this is the right way to go about this problem given the "parallel computating bugs" mentioned in the README.
from cloops.
Thank you for the suggestions. I went back to check and indeed there was an issue with memory usage: without changing joblib's back-end resulted in using >700GB of memory (far in excess of the system limit), but only 125GB after the change.
I've been processing each chromosome individually (i.e., splitting the genome-wide bedpe by chromosome), but this shouldn't affect the results, right?
from cloops.
from cloops.
It involved just changing Parallel(n_jobs=cpu)
to Parallel(n_jobs=cpu, backend='threading')
, but the runtime seems to be a bit longer. Though I haven't done any extensive testing.
from cloops.
from cloops.
Related Issues (20)
- Input format HOT 1
- Some details about the parameter 'eps' HOT 4
- adjust parameters to increase/decrease stringency HOT 5
- calling loops on a .hic file HOT 5
- HiC input from TADbit HOT 3
- tuning parameters for stripe calls ( callStripes ) HOT 4
- Distance cut-off HOT 2
- Conda env create issue HOT 8
- OpenSSL HOT 1
- cLopps hang for more than 24 hours HOT 1
- How to plot the loops in WashU HOT 1
- hicpro2bedpe bug HOT 9
- ERROR: no inter-ligation PETs detected for eps 5000 minPts 50,can't model the distance cutoff,continue anyway HOT 17
- Extracting enrichment score from a matrix given a list of loops HOT 1
- call stripe example file has error HOT 1
- cLoops differential loop calling HOT 1
- using cloops from Juicer HOT 2
- Loops Visualization HOT 4
- Issue with file preprocessing. HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cloops.