Git Product home page Git Product logo

Comments (5)

YaqiangCao avatar YaqiangCao commented on June 7, 2024

Dear User,
I am glad to know cLoops can work nicely with typical sized Hi-C datasets for you.
Sorry for the potential problem. Here are some points I want to mention.
1. cLoops implemented a variant of DBSCAN, so maybe the issue from scikit-learn you mentioned maybe not the solution to this problem.
2. For the parallel computing bugs due to joblib, I found it will be fix with specific joblib version, joblib==0.11 works very well for all data and linux I used. So maybe there is no need to wrap all the 'Parallel()' in pipe.py .
3. according to error logger, the clustering process is finished. May I ask how many CPU used and the memory of your machine? If there is some out-of-memory issues for other chromosomes, it will also cause the parallel problem.
Hope the suggestions can fix this. Please let me know the feedback.
Thank you.
Best,
Yaqiang

Everything seems to work nicely with typical sized Hi-C datasets but when attempting to run on something larger (e.g., ~4e9 contacts genome-wide) with -eps 5000,10000 -minPts 50,100 -hic, the following sort of issue pops up:

Clustering chr8 and chr8 finished. Estimated 43365022 self-ligation reads and 5506751 inter-ligation reads
Traceback (most recent call last):
  File "/local/anaconda3/envs/cloops/bin/cLoops", line 8, in <module>
    sys.exit(main())
  File "/local/anaconda3/envs/cloops/lib/python2.7/site-packages/cLoops/pipe.py", line 352, in main
    hic, op.washU, op.juice, op.cut, op.plot, op.max_cut)
  File "/local/anaconda3/envs/cloops/lib/python2.7/site-packages/cLoops/pipe.py", line 250, in pipe
    dataI_2, dataS_2, dis_2, dss_2 = runDBSCAN(cfs, ep, m, cut, cpu)
  File "/local/anaconda3/envs/cloops/lib/python2.7/site-packages/cLoops/pipe.py", line 118, in runDBSCAN
    for f in fs)
  File "/local/anaconda3/envs/cloops/lib/python2.7/site-packages/joblib/parallel.py", line 789, in __call__
    self.retrieve()
  File "/local/anaconda3/envs/cloops/lib/python2.7/site-packages/joblib/parallel.py", line 699, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/local/anaconda3/envs/cloops/lib/python2.7/multiprocessing/pool.py", line 572, in get
    raise self._value
multiprocessing.pool.MaybeEncodingError: Error sending result: '[(('chr8', 'chr8'), 'hic/chr8-chr8.jd', ...

Based on the scikit-learn/scikit-learn#8920, I wrapped all the Parallel() in pipe.py inside with-blocks using the "threading" back-end and seems to have gotten around the error.

My question is whether this is the right way to go about this problem given the "parallel computating bugs" mentioned in the README.

from cloops.

jessakay avatar jessakay commented on June 7, 2024

Thank you for the suggestions. I went back to check and indeed there was an issue with memory usage: without changing joblib's back-end resulted in using >700GB of memory (far in excess of the system limit), but only 125GB after the change.

I've been processing each chromosome individually (i.e., splitting the genome-wide bedpe by chromosome), but this shouldn't affect the results, right?

from cloops.

YaqiangCao avatar YaqiangCao commented on June 7, 2024

from cloops.

jessakay avatar jessakay commented on June 7, 2024

It involved just changing Parallel(n_jobs=cpu) to Parallel(n_jobs=cpu, backend='threading'), but the runtime seems to be a bit longer. Though I haven't done any extensive testing.

from cloops.

YaqiangCao avatar YaqiangCao commented on June 7, 2024

from cloops.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.