Git Product home page Git Product logo

Comments (6)

morispi avatar morispi commented on June 4, 2024 1

Hi Haowen,

Just released v1.1 of HG-CoLoR.

Took your issue into account, and no more temporary files are created.
You should now be fine to run HG-CoLoR if you still wish to.
Also added a line in the main script to increase the max size of the pile, which should help with the segfaults you encounter.

Cheers,
Pierre

from hg-color.

morispi avatar morispi commented on June 4, 2024

Hi,

Sorry for taking so long to answer.

Pretty weird error indeed. Only time I ran into that problem was when I, indeed, had not enough disk space to store the alignments.

What HG-CoLoR install are you using? Conda or cloning / compiling the git repo?

Also, could you please provide me the first few lines of your SR/LR alignments file? It should be located in the tmp directory you've chosen, under SR_on_LR.sam. Could help me to see if there's any issue with your reads ids.

Pierre

from hg-color.

haowenz avatar haowenz commented on June 4, 2024

I did some investigation several weeks ago. It seems that HG-CoLoR generated so many files during the correction (for each long/short read and alignment between them I guess?). There is a limit on the number of files on the sever I used to run HG-CoLoR. And I ran HG-CoLoR on several data sets at the same time. I guess that's why I got quota exceeded error. It would be helpful if all these intermediate files could go into a small number of files.

BTW, I noticed that HG-CoLoR will split the read ids with stroke or space (I cannot remember the exact splitter.). So I renamed the reads and dropped all the comments as well.

For installation, I installed Emboss and QuorUM using Conda and then downloaded and built HG-CoLoR from the repo.

Another problem is that I got segfault randomly on some data sets during the step "Generating the corrected long reads". One of them is as follows:

[Mon Dec 17 18:52:03 EST 2018] Correcting the short reads
[Mon Dec 17 18:52:42 EST 2018] Removing short reads containing weak K-mers
[Mon Dec 17 18:54:45 EST 2018] Building the graph
[Mon Dec 17 18:57:41 EST 2018] Preparing the raw long reads temporary files
[Mon Dec 17 18:58:42 EST 2018] Aligning the short reads on the long reads
[Mon Dec 17 20:14:22 EST 2018] Preparing the alignments temporary files
[Mon Dec 17 21:03:31 EST 2018] Generating the corrected long reads
/project/HG-CoLoR/HG-CoLoR: line 254: 19493 Segmentation fault (core dumped) $hgf/bin/CLRgen -t "$tmpdir" -K "$K" -d "$seedsdistance" -o "$seedsoverlap" -k "$k" -b "$branches" -s "$seedskips" -m "$mismatches" -j "$nproc" $tmpdir/"$K-mers.fa.pgsa" > "$out.fasta"

Sometimes It could still generate some corrected reads but much less than the original data set, which indicates HG-CoLoR interrupted at some point in that step. Since this seems to be a random error, I am not sure whether you could reproduce it on your machine. But I could send you some data if you want to investigate it a bit more.

Haowen

from hg-color.

morispi avatar morispi commented on June 4, 2024

Hi,

Indeed, HG-CoLoR creates one file per LR, and one file per SR/LR alignment. I mainly processed as so to avoid extreme RAM usage, but didn't think it could cause such problems. I guess I could easily get rid of this constraint by loading all the LRs into memory using a 2 bits encoding (that wouldn't affect the RAM usage too much), and rework my multithreadng process so that it doesn't need to "explode" the alignment file into multiple files.

For the read ids with stroke or space, this is due to BLASR's behavior. It it the tool responsible for the splitting, and I cannot do much to avoid it, except adding a script at the beginning of the HG-CoLoR pipeline that would reformat the LRs.

About the segfault, my best guess is that you exceeded your pile size (the algorithm uses a lot of backtracking). It also happened to me on a few datasets. You could probably get rid of that segfault by changing your pile size with eg: ulimit -s 65536.

I've also just seen and quickly went through your preprint on BioRxiv. If you are willing to re-run the experiments on which HG-CoLoR failed, please do tell me so I can quickly fix the problems you went through. :)

Best,
Pierre

from hg-color.

haowenz avatar haowenz commented on June 4, 2024

Thanks for the reply.

We have submitted the manuscript. But if you could fix the problems, I could try to run it and may add new results in a revised version later.

Thanks,
Haowen

from hg-color.

haowenz avatar haowenz commented on June 4, 2024

Got the following error:

HG-CoLoR: line 236: ulimit: stack size: cannot modify limit: Operation not permitted

I guess the limit cannot be changed on my machine.

from hg-color.

Related Issues (17)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.