Git Product home page Git Product logo

Comments (12)

cmdoret avatar cmdoret commented on August 25, 2024

Hi @Karimi-81,

I am not 100% sure, but I believe you need to explicitely mount your current working directory into the container when using singularity. So instead the command would be:

singularity run \
    -B $PWD:/data \
    --nv instagraal.sif \
    /data/hic_folder \
    /data/hifiasm_ccs.p_ctg.fa \
    /data/output_folder

from instagraal.

Karimi-81 avatar Karimi-81 commented on August 25, 2024

Thank you for your guidance. I followed your instruction and it worked. The program started building pyramids directory and several levels folders but it was finished with the following error:
INFO :: mean frag area = 728.0359497070312
INFO :: N frag duplicated = 0
INFO :: MAX ID CONTIG = 219
INFO :: total mem used by sparse data = 2607.0592
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/pycuda-2021.1-py3.6-linux-x86_64.egg/pycuda/tools.py", line 470, in wrapper
return ctx_dict[cur_ctx][cache_key]
KeyError: <pycuda._driver.Context object at 0x2ac3cf21e660>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/bin/instagraal", line 11, in
load_entry_point('instagraal', 'console_scripts', 'instagraal')()
File "/src/instagraal/instagraal/instagraal.py", line 1178, in main
output_folder=output_folder,
File "/src/instagraal/instagraal/instagraal.py", line 160, in init
output_folder=output_folder,
File "/src/instagraal/instagraal/simu_single.py", line 168, in init
self.pos,
File "/src/instagraal/instagraal/cuda_lib_gl_single.py", line 251, in init
self.setup_all_gpu_struct()
File "/src/instagraal/instagraal/cuda_lib_gl_single.py", line 371, in setup_all_gpu_struct
self.gpu_counter_select = ga.zeros(1, dtype=np.int32)
File "/usr/local/lib/python3.6/dist-packages/pycuda-2021.1-py3.6-linux-x86_64.egg/pycuda/gpuarray.py", line 1244, in zeros
result.fill(zero)
File "/usr/local/lib/python3.6/dist-packages/pycuda-2021.1-py3.6-linux-x86_64.egg/pycuda/gpuarray.py", line 659, in fill
func = elementwise.get_fill_kernel(self.dtype)
File "/usr/local/lib/python3.6/dist-packages/pycuda-2021.1-py3.6-linux-x86_64.egg/pycuda/tools.py", line 474, in wrapper
result = func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/pycuda-2021.1-py3.6-linux-x86_64.egg/pycuda/elementwise.py", line 566, in get_fill_kernel
"fill",
File "/usr/local/lib/python3.6/dist-packages/pycuda-2021.1-py3.6-linux-x86_64.egg/pycuda/elementwise.py", line 193, in get_elwise_kernel
arguments, operation, name, keep, options, **kwargs
File "/usr/local/lib/python3.6/dist-packages/pycuda-2021.1-py3.6-linux-x86_64.egg/pycuda/elementwise.py", line 178, in get_elwise_kernel_and_types
mod = module_builder(arguments, operation, name, keep, options, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/pycuda-2021.1-py3.6-linux-x86_64.egg/pycuda/elementwise.py", line 82, in get_elwise_module
no_extern_c=True,
File "/usr/local/lib/python3.6/dist-packages/pycuda-2021.1-py3.6-linux-x86_64.egg/pycuda/compiler.py", line 358, in init
include_dirs,
File "/usr/local/lib/python3.6/dist-packages/pycuda-2021.1-py3.6-linux-x86_64.egg/pycuda/compiler.py", line 298, in compile
return compile_plain(source, options, keep, nvcc, cache_dir, target)
File "/usr/local/lib/python3.6/dist-packages/pycuda-2021.1-py3.6-linux-x86_64.egg/pycuda/compiler.py", line 87, in compile_plain
checksum.update(preprocess_source(source, options, nvcc).encode("utf-8"))
File "/usr/local/lib/python3.6/dist-packages/pycuda-2021.1-py3.6-linux-x86_64.egg/pycuda/compiler.py", line 59, in preprocess_source
"nvcc preprocessing of %s failed" % source_path, cmdline, stderr=stderr
pycuda.driver.CompileError: nvcc preprocessing of /tmp/tmpuqy3jz0w.cu failed
[command: nvcc --preprocess -arch sm_70 -I/usr/local/lib/python3.6/dist-packages/pycuda-2021.1-py3.6-linux-x86_64.egg/pycuda/cuda /tmp/tmpuqy3jz0w.cu --compiler-options -P]
[stderr:
b"nvcc fatal : Could not open output file '/localscratch/karimi81.18994435.0/tmpxft_0002653d_00000000'\n"]

how can I solve this problem. Is this a problem related to memory or pyuda? I have used 4 x NVidia V100SXM2 (16G memory) on a GPU node for the analysis. Is this important to load a specific version of cuda before running codes?

from instagraal.

cmdoret avatar cmdoret commented on August 25, 2024

First time I see this issue, I assume you are running in an HPC environment, submitting the task via a job scheduler ?
It seems instagraal is unable to access /localscratch I'm assuming this is the node $TMP directory. You could try mounting it in the container as well, but it's just a wild guess.

singularity run \
    -B $PWD:/data \
    -B /localscratch:/localscratch \
    --nv instagraal.sif \
    /data/hic_folder \
    /data/hifiasm_ccs.p_ctg.fa \
    /data/output_folder

from instagraal.

cmdoret avatar cmdoret commented on August 25, 2024

I have been informed by @fabiengir that singularity's --nv option is only compatible with CUDA 11, but instagraal works with CUDA 10.

Apparently it's not currently possible to get instagraal working in singularity due to this incompatibility. We would need to rework instaGRAAL to be compatible with CUDA11 to fix that (PR welcome).

It works fine in docker, in case you can use it.

from instagraal.

Karimi-81 avatar Karimi-81 commented on August 25, 2024

I loaded cuda 9.2 and also added -B /localscratch:/localscratch \ to my submitted jobs as you suggested. It seems that it worked for me but there is just a warning message:
2021-05-07 09:03:55,149 :: INFO :: min fragment length = 0.071
2021-05-07 09:03:55,149 :: INFO :: estimation of the parameters of the model
2021-05-07 09:05:28,016 :: WARNING :: /src/instagraal/instagraal/optim_rippe_curve_update.py:47: RuntimeWarning: invalid value encountered in log

  • (d - 2) / ((np.power((lm * x / kuhn), 2) + d))

2021-05-07 09:05:28,034 :: INFO :: p from estimate parameters = [6.6873623556585935, 0.37393176601804595, -1.0670426891647966, 2, 3869.866840076318]
2021-05-07 09:05:28,034 :: INFO :: mean value trans = 0.004494739586123547
2021-05-07 09:05:28,034 :: INFO :: BEWARE!!! : I will lower mean value trans !!!
2021-05-07 09:05:28,035 :: INFO :: estimate max dist cis trans = 148975.29242700464
2021-05-07 09:05:28,052 :: INFO :: cycle = 0

I wonder if this warning can impact the output. In addition, do you have any estimation of the time of running, it is a long time that it was stuck to cycle =0 without any progress. My genome size is relatively large (2.6 G) but is of high quality (N50=30 Mb with 300 contigs). How much time is it expected for such analysis? Is there any suggestion to speed up progress?

from instagraal.

nadegeguiglielmoni avatar nadegeguiglielmoni commented on August 25, 2024

Hello,

The runtime depends on the size, the fragmentation, the parameter 'level', and the GPU. Since you have few contigs and a very nice GPU, these are not the culprits. I would suggest you use the parameter -l 5 (rather than 4 by default), which is more adequate for genomes > 1 Gb. It's difficult to estimate how long it will take, but I would expect that within a week you should have a result. The first cycles are the slowest. Make sure to run the module instagraal-polish at the end (but this step is fast).

from instagraal.

Karimi-81 avatar Karimi-81 commented on August 25, 2024

Thank you for your support. Since I run the program through a server (submitted job), I do not access to visualization or movie play to find out the efficiency of steps. I currently check the number of contigs and N50 in each cycle but I wonder if it is possible to visualize the final genome (Hic contact) such as those can be visualized through JuiceBox. For final report, we require to have Hi-C map visualizing the final scaffolds.

In addition, is instagraal-polish a separate module? how can I run that using docker of singularity?
Best wishes

from instagraal.

cmdoret avatar cmdoret commented on August 25, 2024

Unfortunately, there is no built-in way to view the contact map in instagraal. If you want to view it after a cycle, you need to grab genome.fasta in the output folder and regenerate the contact map using a standard pipeline like you would normally do. If you used hicstuff, you could then run hicstuff view on the matrix to visualize it.

The instagraal-polish command is installed along with the instagraal package, unfortunately, I realized it was not usable from the container due to a config issue in the dockerfile. I have updated the dockerfile in 4650a2a and if you update your container, it should work.

Alternatively, you could install graal with pip and run instagraal-polish without docker/singularity, as it does not require a GPU or CUDA.

from instagraal.

Karimi-81 avatar Karimi-81 commented on August 25, 2024

Thank you. Yes, I used the hicstuff to generate the input files. I am not sure if I understood correctly, but I think I have to add --save-matrix to my command line to save a preview of the contact map after each cycle. Then, likely, it can be used along with info_frags.txt from final genome.fasta to view the final map through hicstuff view?

from instagraal.

cmdoret avatar cmdoret commented on August 25, 2024

You would normally be right, but there is currently an unresolved bug preventing to use --save-matrix, so you'll need to regenerate the matrix instead.

from instagraal.

Karimi-81 avatar Karimi-81 commented on August 25, 2024

I would like to thank you and share my experience here. I had a genome assembly with the size of 2.6 Gb and the expected number of chromosomes 15. I ran instagraal using both bowtie2 and minimap2 separately. Both cases reached to convergence after 20 cycles (I let program for 25 cycles). Although both results were promising, the output of bowtie led to 16 big scaffolds while the minimap2 built the exact number of expected 15 scaffolds. In addition, the coverage of whole genome by minimap2 s' scaffolds were higher. The only problem is related to the final contact map where there are some white lines in the map, particularly seems to be in the centromere regions. The number of these disconnections are higher for minimap2 contact map. I wonder if it is a serious issue or I can go head with current assembly. Do you think increasing number of cycles can be helpful. I attached the contact map obtained using minimap2 aligner here. Do you have any suggestion to improve the map.
map1000k

from instagraal.

cmdoret avatar cmdoret commented on August 25, 2024

Some of these white bands are most likely repeated sequences where short read alignments are ambiguous. To my knowledge, this is to be expected is many organisms, especially in centromeric regions.

There may also be long-reads induced indels where Hi-C reads do not map. In my experience if you have shotgun reads, running pilon polishing on top of the instagraal assembly can improve it a bit.

from instagraal.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.