gsneha26 / segalign Goto Github PK
View Code? Open in Web Editor NEWA Scalable GPU-Based Whole Genome Aligner, published in SC20: https://doi.ieeecomputersociety.org/10.1109/SC41405.2020.00043
License: MIT License
A Scalable GPU-Based Whole Genome Aligner, published in SC20: https://doi.ieeecomputersociety.org/10.1109/SC41405.2020.00043
License: MIT License
When I run the human/chimp test on Terra, I get much smaller output than on aws. (all with 6G chunk size):
AWS cigar: 844M
Terra cigar: 131M
The runtimes are shorter too. Here are the 5 commands (time at right) on
AWS
2020-05-26 16:05:36.012853: Successfully ran the command: "run_wga_gpu /tmp/node-88f97fd5-82ad-41c1-8719-a3d68554c0c8-8de34f899cee454a94366f37f14aef03/tmpo4v16szp/3d842793-a906-4be8-b69a-45d9e4a37b6e/tmpvfj83knk.tmp /tmp/node-88f97fd5-82ad-41c1-8719-a3d68554c0c8-8de34f899cee454a94366f37f14aef03/tmpo4v16szp/3d842793-a906-4be8-b69a-45d9e4a37b6e/tmpo_g4f3gr.tmp --format=cigar --notrivial --step=3 --ambiguous=iupac,100,100 --ydrop=3000" in 276.4431502819061 seconds
2020-05-26 16:42:10.981833: Successfully ran the command: "run_wga_gpu /tmp/node-88f97fd5-82ad-41c1-8719-a3d68554c0c8-8de34f899cee454a94366f37f14aef03/tmpdhim63d0/9c0c4c18-4dc7-4079-8bee-9a0549dc4f64/tmpeqpkq3er.tmp /tmp/node-88f97fd5-82ad-41c1-8719-a3d68554c0c8-8de34f899cee454a94366f37f14aef03/tmpdhim63d0/9c0c4c18-4dc7-4079-8bee-9a0549dc4f64/tmp99ukp9hr.tmp --format=cigar --notrivial --step=3 --ambiguous=iupac,100,100 --ydrop=3000" in 2190.4813838005066 seconds
2020-05-26 16:43:31.858975: Successfully ran the command: "run_wga_gpu /tmp/node-88f97fd5-82ad-41c1-8719-a3d68554c0c8-8de34f899cee454a94366f37f14aef03/tmpe9bac_96/d4d206a9-b301-40b9-9445-6857779b849d/tmp_6q4bopn.tmp /tmp/node-88f97fd5-82ad-41c1-8719-a3d68554c0c8-8de34f899cee454a94366f37f14aef03/tmpe9bac_96/d4d206a9-b301-40b9-9445-6857779b849d/tmp_6q4bopn.tmp --format=cigar --notrivial --step=3 --ambiguous=iupac,100,100 --ydrop=3000" in 35.484325647354126 seconds
2020-05-26 16:47:27.691592: Successfully ran the command: "run_wga_gpu /tmp/node-88f97fd5-82ad-41c1-8719-a3d68554c0c8-8de34f899cee454a94366f37f14aef03/tmps5ew_ndz/a453b904-56c6-46e1-8d83-1cc67adb07c0/tmpkc8wp55_.tmp /tmp/node-88f97fd5-82ad-41c1-8719-a3d68554c0c8-8de34f899cee454a94366f37f14aef03/tmps5ew_ndz/a453b904-56c6-46e1-8d83-1cc67adb07c0/tmpuoyplo0i.tmp --format=cigar --notrivial --step=3 --ambiguous=iupac,100,100 --ydrop=3000" in 234.43672251701355 seconds
2020-05-26 17:56:35.524303: Successfully ran the command: "run_wga_gpu /tmp/node-88f97fd5-82ad-41c1-8719-a3d68554c0c8-8de34f899cee454a94366f37f14aef03/tmppqg4r1ev/3bea06ae-646e-4905-90d2-52c4670b2bac/tmpiwn8h081.tmp /tmp/node-88f97fd5-82ad-41c1-8719-a3d68554c0c8-8de34f899cee454a94366f37f14aef03/tmppqg4r1ev/3bea06ae-646e-4905-90d2-52c4670b2bac/tmpiwn8h081.tmp --format=cigar --notrivial --step=3 --ambiguous=iupac,100,100 --ydrop=3000" in 4140.146150350571 seconds
Terra
2020-05-27 23:38:53.182083: Successfully ran the command: "run_wga_gpu /cromwell_root/node-03b5c1da-b87c-4892-a269-2fc4ed90518f-afc6c71f-af35-4532-be8c-ac951907239e/tmphmdn9ppu/4977b907-7b21-435f-a4af-e1ab28c85726/tmpacxyj1_a.tmp /cromwell_root/node-03b5c1da-b87c-4892-a269-2fc4ed90518f-afc6c71f-af35-4532-be8c-ac951907239e/tmphmdn9ppu/4977b907-7b21-435f-a4af-e1ab28c85726/tmp4pbuo9yw.tmp --format=cigar --notrivial --step=3 --ambiguous=iupac,100,100 --ydrop=3000" in 314.20388889312744 seconds
2020-05-27 23:39:57.703589: Successfully ran the command: "run_wga_gpu /cromwell_root/node-03b5c1da-b87c-4892-a269-2fc4ed90518f-afc6c71f-af35-4532-be8c-ac951907239e/tmpp9e0skjr/18df358c-c68a-4881-b98d-7af5e666da36/tmpfrewxuu_.tmp /cromwell_root/node-03b5c1da-b87c-4892-a269-2fc4ed90518f-afc6c71f-af35-4532-be8c-ac951907239e/tmpp9e0skjr/18df358c-c68a-4881-b98d-7af5e666da36/tmpfrewxuu_.tmp --format=cigar --notrivial --step=3 --ambiguous=iupac,100,100 --ydrop=3000" in 50.41674566268921 seconds
2020-05-27 23:45:48.559084: Successfully ran the command: "run_wga_gpu /cromwell_root/node-03b5c1da-b87c-4892-a269-2fc4ed90518f-afc6c71f-af35-4532-be8c-ac951907239e/tmpq_kexxzf/2665bd91-b193-405b-8fd0-a03af6075f2d/tmphlklypmc.tmp /cromwell_root/node-03b5c1da-b87c-4892-a269-2fc4ed90518f-afc6c71f-af35-4532-be8c-ac951907239e/tmpq_kexxzf/2665bd91-b193-405b-8fd0-a03af6075f2d/tmpb2hm2b86.tmp --format=cigar --notrivial --step=3 --ambiguous=iupac,100,100 --ydrop=3000" in 348.9945948123932 seconds
2020-05-27 23:50:45.916521: Successfully ran the command: "run_wga_gpu /cromwell_root/node-03b5c1da-b87c-4892-a269-2fc4ed90518f-afc6c71f-af35-4532-be8c-ac951907239e/tmp_aus820b/79cdbba8-a350-4e48-8865-15c5f848674c/tmpe9gz97pg.tmp /cromwell_root/node-03b5c1da-b87c-4892-a269-2fc4ed90518f-afc6c71f-af35-4532-be8c-ac951907239e/tmp_aus820b/79cdbba8-a350-4e48-8865-15c5f848674c/tmp4gbznbdu.tmp --format=cigar --notrivial --step=3 --ambiguous=iupac,100,100 --ydrop=3000" in 283.46080327033997 seconds
2020-05-27 23:55:59.658561: Successfully ran the command: "run_wga_gpu /cromwell_root/node-03b5c1da-b87c-4892-a269-2fc4ed90518f-afc6c71f-af35-4532-be8c-ac951907239e/tmpollz8rah/ed503f44-88d1-4f00-9eac-c9605facbee5/tmpsbzg0eao.tmp /cromwell_root/node-03b5c1da-b87c-4892-a269-2fc4ed90518f-afc6c71f-af35-4532-be8c-ac951907239e/tmpollz8rah/ed503f44-88d1-4f00-9eac-c9605facbee5/tmpsbzg0eao.tmp --format=cigar --notrivial --step=3 --ambiguous=iupac,100,100 --ydrop=3000" in 309.87610483169556 seconds
It could be that there's something at the cactus level that is not passing in the same data. But as far as I can tell, this isn't the case. The input file sizes seem equivalent, and not obviously corrupt. I am continuing to debug. I am specifying similar hardware between the two.
It could also be that something is failing or crashing or being evicted by the host system and run_wga_gpu
is not detecting this. grepping for a 'FAILURE' message from lastz seems particularly fragile. Even though lastz seems very diligent about catching error cases with this message I don't think it would be possible to catch them all.
For running on the cloud, I would really be much more comfortable if we could somehow verify the exit code of each lastz command.
AWS data:
s3://glennhickey/share/cactus-blast-gpu-may26.tar.gz
Terra data:
s3://glennhickey/share/cactus-blast-gpu-may28-terra
s3://glennhickey/share/cactus-blast-gpu-may28-terra.log
Here's a small example
run_wga_gpu sadf asdfasdf
(crashes)
echo $?
0
This makes it very difficult to use this within a larger script (ie cactus), as there's no way to detect errors. I think the wga
binary is okay now for exit codes, but the run_wga_gpu
script gobles it up and always returns 0.
On a related note, it will be necessary to be able to specify the number of cores and gpus on the command line for Cactus to properly keep track of resources used. Right now it just uses everything on the system.
Thanks!
I think this has come up before, but it'd be immensely helpful to add a bit of information in the README
Some basic file commands such as grep
and rm
in the run_segalign
script are called without checking if the input files exist. As a result, the script crash and Cactus execution fail (attached below).
I was wondering with the files named with *.err
, .segments
, *.plus.*
, and *.minus.*
must be created every time the segalign
script is called.
[Update]
The input tmp2e_55g2q.tmp
and tmprx912f4d.tmp
files are available at https://www.dropbox.com/sh/9bf6o0tij7drafm/AACdWCSz6nkbW7hy6zbpDgeea?dl=0
Cheers,
Thiago
Log from job "kind-RunBlast/instance-v7g6i_gt" follows:
=========>
[2021-03-08T18:54:14+0000] [MainThread] [I] [toil.worker] ---TOIL WORKER OUTPUT LOG---
[2021-03-08T18:54:14+0000] [MainThread] [I] [toil] Running Toil version 5.2.0-047d0c4f2949c576c80e452a0807c5be6355c63d on host tf-cactus-slurm-compute-3-0.
[2021-03-08T18:54:14+0000] [MainThread] [D] [toil] Configuration: {'workflowID': '5d8e7caf-fad5-4d3b-9303-ae20634083f8', 'workflowAttemptNumber': 0, 'jobStore': 'file:/home/thiagogenez_ebi_ac_uk/viruses/run/jobStore', 'logLevel': 'Debug', 'workDir': None, 'noStdOutErr': False, 'stats': False, 'clean': 'onSuccess', 'cleanWorkDir': 'never', 'clusterStats': None, 'restart': False, 'batchSystem': 'single_machine', 'disableAutoDeployment': False, 'environment': {}, 'statePollingWait': 1, 'maxLocalJobs': 8, 'manualMemArgs': False, 'parasolCommand': 'parasol', 'parasolMaxBatches': 1000, 'scale': 1.0, 'linkImports': True, 'moveExports': False, 'mesosMasterAddress': '10.0.0.12:5050', 'allocate_mem': True, 'kubernetesHostPath': None, 'provisioner': None, 'nodeTypes': [], 'minNodes': None, 'maxNodes': [10], 'targetTime': 1800, 'betaInertia': 0.1, 'scaleInterval': 60, 'preemptableCompensation': 0.0, 'nodeStorage': 50, 'nodeStorageOverrides': [], 'metrics': False, 'maxPreemptableServiceJobs': 9223372036854775807, 'maxServiceJobs': 9223372036854775807, 'deadlockWait': 3600, 'deadlockCheckInterval': 30, 'defaultMemory': 2147483648, 'defaultCores': 1, 'defaultDisk': 2147483648, 'readGlobalFileMutableByDefault': False, 'defaultPreemptable': False, 'maxCores': 9223372036854775807, 'maxMemory': 9223372036854775807, 'maxDisk': 9223372036854775807, 'retryCount': 5, 'enableUnlimitedPreemptableRetries': False, 'doubleMem': False, 'maxJobDuration': 9223372036854775807, 'rescueJobsFrequency': 3600, 'disableCaching': True, 'disableChaining': True, 'disableJobStoreChecksumVerification': False, 'maxLogFileSize': 64000, 'writeLogs': None, 'writeLogsGzip': None, 'writeLogsFromAllJobs': False, 'sseKey': None, 'servicePollingInterval': 60, 'useAsync': True, 'forceDockerAppliance': False, 'runCwlInternalJobsOnWorkers': False, 'statusWait': 3600, 'disableProgress': False, 'debugWorker': False, 'disableWorkerOutputCapture': False, 'badWorker': 0.0, 'badWorkerFailInterval': 0.01, 'cwl': False}
[2021-03-08T18:54:14+0000] [MainThread] [D] [toil.deferred] Running for file /tmp/node-5d8e7caf-fad5-4d3b-9303-ae20634083f8-9a39eee8-e096-45e7-8d8e-a2a3f2cb7837/deferred/funceiihipwj
[2021-03-08T18:54:14+0000] [MainThread] [D] [toil.worker] Parsed job description
[2021-03-08T18:54:14+0000] [MainThread] [I] [toil.worker] Working on job 'RunBlast' kind-RunBlast/instance-v7g6i_gt
[2021-03-08T18:54:14+0000] [MainThread] [D] [toil.worker] Got a command to run: _toil files/for-job/kind-RunBlast/instance-v7g6i_gt/cleanup/file-ec7bc1a3a798485ab637ad8f261e43e4/stream /home/thiagogenez_ebi_ac_uk/.local/bin/cactus-bin-v1.3.0/venv/lib/python3.6/site-packages cactus.blast.blast True
[2021-03-08T18:54:14+0000] [MainThread] [D] [toil.job] Loading user module ModuleDescriptor(dirPath='/home/thiagogenez_ebi_ac_uk/.local/bin/cactus-bin-v1.3.0/venv/lib/python3.6/site-packages', name='cactus.blast.blast', fromVirtualEnv=True).
[2021-03-08T18:54:14+0000] [MainThread] [I] [toil.worker] Loaded body Job('RunBlast' kind-RunBlast/instance-v7g6i_gt) from description 'RunBlast' kind-RunBlast/instance-v7g6i_gt
[2021-03-08T18:54:14+0000] [MainThread] [D] [toil.deferred] Running orphaned deferred functions
[2021-03-08T18:54:14+0000] [MainThread] [D] [toil.deferred] Running job
[2021-03-08T18:54:14+0000] [MainThread] [I] [cactus.shared.common] Docker work dir: /tmp/node-5d8e7caf-fad5-4d3b-9303-ae20634083f8-9a39eee8-e096-45e7-8d8e-a2a3f2cb7837/tmph97m_giv/ee1594c4-2d84-45b0-b8ef-4ed978d7201c
[2021-03-08T18:54:14+0000] [MainThread] [I] [cactus.shared.common] Running the command ['singularity', '--silent', 'run', '--nv', '/home/thiagogenez_ebi_ac_uk/viruses/run/cactus.img', 'run_segalign', 'tmp2e_55g2q.tmp', 'tmprx912f4d.tmp', '--format=cigar', '--notrivial', '--step=2', '--ambiguous=iupac,100,100', '--ydrop=3000']
[2021-03-08T18:54:14+0000] [MainThread] [D] [toil.statsAndLogging] Suppressing the following loggers: {'websocket', 'bcdocs', 'urllib3', 'sonLib', 'google', 'requests_oauthlib', 'humanfriendly', 'galaxy', 'dill', 'prov', 'oauthlib', 'kubernetes', 'cactus', 'botocore', 'salad', 'boto', 'cachecontrol', 'rdflib', 'boto3', 'docker', 'requests'}
[2021-03-08T18:54:14+0000] [MainThread] [I] [toil-rt] 2021-03-08 18:54:14.635398: Running the command: "singularity --silent run --nv /home/thiagogenez_ebi_ac_uk/viruses/run/cactus.img run_segalign tmp2e_55g2q.tmp tmprx912f4d.tmp --format=cigar --notrivial --step=2 --ambiguous=iupac,100,100 --ydrop=3000"
/bin/bash: warning: setlocale: LC_ALL: cannot change locale (en_GB.UTF-8)
Converting fasta files to 2bit format
Executing: "segalign /tmp/node-5d8e7caf-fad5-4d3b-9303-ae20634083f8-9a39eee8-e096-45e7-8d8e-a2a3f2cb7837/tmph97m_giv/ee1594c4-2d84-45b0-b8ef-4ed978d7201c/tmp2e_55g2q.tmp /tmp/node-5d8e7caf-fad5-4d3b-9303-ae20634083f8-9a39eee8-e096-45e7-8d8e-a2a3f2cb7837/tmph97m_giv/ee1594c4-2d84-45b0-b8ef-4ed978d7201c/tmprx912f4d.tmp /tmp/node-5d8e7caf-fad5-4d3b-9303-ae20634083f8-9a39eee8-e096-45e7-8d8e-a2a3f2cb7837/tmph97m_giv/ee1594c4-2d84-45b0-b8ef-4ed978d7201c/output_11930/data_5519/ --format=cigar --notrivial --step=2 --ambiguous=iupac,100,100 --ydrop=3000"
Using 8 threads
Using 1 GPU(s)
Reading query file ...
Reading target file ...
Start alignment ...
Sending reference block 0 ...
Sending query block 0 with buffer 0 ...
Query block 0, interval 1/1 (0:28184) with buffer 0
real 0m1.184s
user 0m0.106s
sys 0m1.012s
grep: *.err: No such file or directory
rm: cannot remove '*.segments': No such file or directory
[2021-03-08T18:54:16+0000] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2021-03-08T18:54:16+0000] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-BlastSequencesAgainstEachOther/instance-7jk5yz3n/cleanup/file-e798bb1571b344409a9fdef384794920/0' to path '/tmp/node-5d8e7caf-fad5-4d3b-9303-ae20634083f8-9a39eee8-e096-45e7-8d8e-a2a3f2cb7837/tmph97m_giv/ee1594c4-2d84-45b0-b8ef-4ed978d7201c/tmp2e_55g2q.tmp'
[2021-03-08T18:54:16+0000] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-BlastSequencesAgainstEachOther/instance-7jk5yz3n/cleanup/file-7a446f73b95d4f5f938fc2ba16ebb915/0' to path '/tmp/node-5d8e7caf-fad5-4d3b-9303-ae20634083f8-9a39eee8-e096-45e7-8d8e-a2a3f2cb7837/tmph97m_giv/ee1594c4-2d84-45b0-b8ef-4ed978d7201c/tmprx912f4d.tmp'
[2021-03-08T18:54:16+0000] [MainThread] [D] [toil.fileStores.abstractFileStore] LOG-TO-MASTER: Job files/for-job/kind-RunBlast/instance-v7g6i_gt/cleanup/file-ec7bc1a3a798485ab637ad8f261e43e4/stream used 0.00% (48.0 KB [49152B] used, 1.0 GB [1677721600B] requested) at the end of its run.
[2021-03-08T18:54:16+0000] [MainThread] [D] [toil.deferred] Running own deferred functions
[2021-03-08T18:54:16+0000] [MainThread] [D] [toil.deferred] Out of deferred functions!
[2021-03-08T18:54:16+0000] [MainThread] [D] [toil.deferred] Running orphaned deferred functions
Traceback (most recent call last):
File "/home/thiagogenez_ebi_ac_uk/.local/bin/cactus-bin-v1.3.0/venv/lib/python3.6/site-packages/toil/worker.py", line 394, in workerScript
job._runner(jobGraph=None, jobStore=jobStore, fileStore=fileStore, defer=defer)
File "/home/thiagogenez_ebi_ac_uk/.local/bin/cactus-bin-v1.3.0/venv/lib/python3.6/site-packages/cactus/shared/common.py", line 1424, in _runner
fileStore=fileStore, **kwargs)
File "/home/thiagogenez_ebi_ac_uk/.local/bin/cactus-bin-v1.3.0/venv/lib/python3.6/site-packages/toil/job.py", line 2359, in _runner
returnValues = self._run(jobGraph=None, fileStore=fileStore)
File "/home/thiagogenez_ebi_ac_uk/.local/bin/cactus-bin-v1.3.0/venv/lib/python3.6/site-packages/toil/job.py", line 2280, in _run
return self.run(fileStore)
File "/home/thiagogenez_ebi_ac_uk/.local/bin/cactus-bin-v1.3.0/venv/lib/python3.6/site-packages/cactus/blast/blast.py", line 466, in run
gpuLastz = self.blastOptions.gpuLastz)
File "/home/thiagogenez_ebi_ac_uk/.local/bin/cactus-bin-v1.3.0/venv/lib/python3.6/site-packages/cactus/shared/common.py", line 833, in runLastz
parameters=[lastzCommand, seq1, seq2, "--format=cigar", "--notrivial"] + lastzArguments.split())
File "/home/thiagogenez_ebi_ac_uk/.local/bin/cactus-bin-v1.3.0/venv/lib/python3.6/site-packages/cactus/shared/common.py", line 1357, in cactus_call
raise RuntimeError("Command {} exited {}: {}".format(call, process.returncode, out))
RuntimeError: Command ['singularity', '--silent', 'run', '--nv', '/home/thiagogenez_ebi_ac_uk/viruses/run/cactus.img', 'run_segalign', 'tmp2e_55g2q.tmp', 'tmprx912f4d.tmp', '--format=cigar', '--notrivial', '--step=2', '--ambiguous=iupac,100,100', '--ydrop=3000'] exited 1: stdout=None
[2021-03-08T18:54:16+0000] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host tf-cactus-slurm-compute-3-0
<=========
This is the same command line and overall dataset as #53. While the patch for #53 worked and let most genomes go through (thanks!), there's still at least one problem. I will share the inputs offline, but the error message is
Chromosome block 2 interval 329/333 (2982000000:2985000000) with ref (570705458:1143705458) rc (4166672753:4169672753)
Chromosome block 2 interval 331/333 (2988000000:2991000000) with ref (570705458:1143705458) rc (4160672753:4163672753)
Chromosome block 2 interval 330/333 (2985000000:2988000000) with ref (570705458:1143705458) rc (4163672753:4166672753)
Chromosome block 2 interval 333/333 (2994000000:2996999981) with ref (570705458:1143705458) rc (4154672772:4157672753)
Chromosome block 2 interval 332/333 (2991000000:2994000000) with ref (570705458:1143705458) rc (4157672753:4160672753)
terminate called after throwing an instance of 'thrust::system::system_error'
terminate called recursively
terminate called recursively
terminate called recursively
what(): CUDA free failed: cudaErrorIllegalAddress: an illegal memory access was encountered
terminate called recursively
Command terminated by signal 6
Here is a piece of the cactus log. The segalign_repeat_masker
fails, but run_segalign_repeat_masker
exits with 0 and Cactus never knows. Luckily, cactus_covered_intervals
crashed right after in this case, but if it didn't I might never have known.
Executing: "segalign_repeat_masker /mnt/hdd/node-1a325814-9183-453f-bd00-b76a97486c1a-d4f46d3f-b448-4fcd-b192-6ef2c7deeb7c/tmp2rftgtwd/b438bd93-6b2d-4f7c-90e3-d885f7aefa2c/tmpc8hyqz3a.tmp --lastz_interval=3000000 --markend --neighbor_proportion 0.2 --step=3 --ambiguous=iupac,100,100 --nogapped"
Using 64 threads
Using 8 GPU(s)
Reading target file ...
Start alignment ...
Sending block 0 ...
Chromosome block 0 interval 1/333 (0:3000000) with ref (0:432000000) rc (2155712149:2158712149)
...
Chromosome block 0 interval 108/333 (321000000:324000000) with ref (105000000:537000000) rc (1834712149:1837712149)
Chromosome block 0 interval 109/333 (324000000:327000000) with ref (108000000:540000000) rc (1831712149:1834712149)
/usr/local/bin/run_segalign_repeat_masker: line 120: 461 Killed stdbuf -oL segalign_repeat_masker $refPath $optionalArguments
real 40m4.440s
user 97m8.070s
sys 39m48.293s
real 57m25.945s
user 201m34.776s
sys 47m54.629s
INFO:toil-rt:2020-08-25 23:36:24.984006: Successfully ran the command: "run_segalign_repeat_masker /mnt/hdd/node-1a325814-9183-453f-bd00-b76a97486c1a-d4f46d3f-b448-4fcd-b192-6ef2c7deeb7c/tmp2rftgtwd/b438bd93-6b2d-4f7c-90e3-d885f7aefa2c/tmpc8hyqz3a.tmp --lastz_interval=3000000 --markend --neighbor_proportion 0.2 --step=3 --ambiguous=iupac,100,100 --nogapped" in 7460.357779741287 seconds
FWIW The input file is "https://cgl-assemblies.s3.amazonaws.com/dipOrd1.fa", and I think it's crashing because I didn't give it enough disk.
Glenn Hickey wanted me to report this issue I'm seeing while running ComparativeGenomicsToolkit/cactus on Terra. It looks like there is an issue in SegAlign causing the crash. I've attached the full log of errors I'm seeing when running.
Hi. I am trying to test the tool but when I run the last command:
run_segalign ce11.fa cb4.fa --output=ce11.cb4.maf
I get the following error:
stdbuf: failed to run command โsegalignโ: No such file or directory
I don't really understand what is the problem.
Here are some rough benchmarks for a human-chimp alignment with gorilla used as an outgroup. They were obtained by using cactus-blast --root hc
on a tree like ((hg38:0.01,panTro6:0.01)hc:0.11,gorGor5:0.01)Anc0;
with sequences coming from the 200M hal (already masked an preprocessed)
times
# Method / Wall Time / Chunk size / # blast commands / total blast time
CPU / 11.7h / 25M / 63126 / 10.8h
GPU / 2.3h / 3G / 12 / 1.7h
GPU / 2.5h / 6G / 5 / 1.9h
GPU / 2.6 / 1G / 56 / 2h
spot market cost
CPU: 11.7h @ $1.13/hr (r5n.16xlarge) = $13.2
GPU-3G: 2.3h @ $7.30/hr (p3.16xlarge) = $16.8
human-chimp ancestral sequence length
CPU: 2804406856
GPU: 2803489026
coverage of human against chimp (10M bases sampled):
CPU: 8979808
GPU: 8974227
In summary: On these 64 core nodes, GPU is about 5X faster than CPU. It comes at a slight cost increase. There is slightly less aligned on the GPU than CPU, but this could be within the random variance we expect between cactus runs.
The fact that the 3G chunksize was faster than both 1G and 6G is somewhat perplexing. It could very well be related to this: 18f53dc
One caveat: I turned off Pecan realignment for all this. It was only in running these big files that I realized there's work to be done in adapting this code for the bigger chunks used by gpu.
WGA_GPU Commits:
3G chunk size:
4a970d3
1G and 6G chunk size:
72930be
Cactus Commit:
ComparativeGenomicsToolkit/cactus@78f7b28
This is with d1a73a0 on a ubuntu 18.04 p3.16xlarge AWS instance
It appears to work with d5fd293. So it's definitely a regression related to changes this June for the overflow bugs in the repeatmasker.
Just looking at these commits, it would seem that the changes to the repeat masker here: d1a73a0 would also need to be applied to run_segalign??
It is very quickly reproduced:
wget -q http://public.gi.ucsc.edu/~hickey/debug/segalign_debug/hg38_without_alts_preprocessed.fa.pp.gz
wget -q http://public.gi.ucsc.edu/~hickey/debug/segalign_debug/panTro6_preprocessed.fa.pp.gz
gzip -d hg38_without_alts_preprocessed.fa.pp.gz
gzip -d panTro6_preprocessed.fa.pp.gz
The segalign command
run_segalign panTro6_preprocessed.fa.pp hg38_without_alts_preprocessed.fa.pp --format=paf:minimap2 --step=2 --ambiguous=iupac,100,100 --ydrop=3000 --notransition
The crash
terminate called after throwing an instance of 'thrust::system::system_error'
what(): trivial_device_copy D->H failed: cudaErrorInvalidValue: invalid argument
/usr/local/bin/run_segalign: line 197: 35103 Aborted (core dumped) stdbuf -oL segalign $refPath $queryPath $DATA_FOLDER $optionalArguments
But run_segalign
still returns 0!
echo $?
0
The full log
Converting fasta files to 2bit format
Executing: "segalign /home/ubuntu/work/panTro6_preprocessed.fa.pp /home/ubuntu/work/hg38_without_alts_preprocessed.fa.pp /home/ubuntu/work/output_13442/data_23960/ --format=paf:minimap2 --step=2 --ambiguous=iupac,100,100 --ydrop=3000
--notransition"
Using 64 threads
Using 8 GPU(s)
Reading query file ...
Reading target file ...
Start alignment ...
Sending reference block 0 ...
Sending query block 0 with buffer 0 ...
Sending query block 1 with buffer 1 ...
Query block 0, interval 1/52 (0:10000000) with buffer 0
Query block 0, interval 3/52 (20000000:30000000) with buffer 0
Query block 0, interval 7/52 (60000000:70000000) with buffer 0
Query block 0, interval 10/52 (90000000:100000000) with buffer 0
Query block 0, interval 14/52 (130000000:140000000) with buffer 0
Query block 0, interval 17/52 (160000000:170000000) with buffer 0
Query block 0, interval 22/52 (210000000:220000000) with buffer 0
Query block 0, interval 26/52 (250000000:260000000) with buffer 0
Query block 0, interval 30/52 (290000000:300000000) with buffer 0
Query block 0, interval 35/52 (340000000:350000000) with buffer 0
Query block 0, interval 39/52 (380000000:390000000) with buffer 0
Query block 0, interval 42/52 (410000000:420000000) with buffer 0
Query block 0, interval 46/52 (450000000:460000000) with buffer 0
Query block 0, interval 48/52 (470000000:480000000) with buffer 0
Query block 0, interval 50/52 (490000000:500000000) with buffer 0
Query block 0, interval 2/52 (10000000:20000000) with buffer 0
Query block 0, interval 18/52 (170000000:180000000) with buffer 0
Query block 1, interval 4/60 (30000000:40000000) with buffer 1
Query block 0, interval 19/52 (180000000:190000000) with buffer 0
Query block 1, interval 10/60 (90000000:100000000) with buffer 1
Query block 0, interval 21/52 (200000000:210000000) with buffer 0
Query block 0, interval 23/52 (220000000:230000000) with buffer 0
Query block 0, interval 4/52 (30000000:40000000) with buffer 0
Query block 0, interval 24/52 (230000000:240000000) with buffer 0
Query block 0, interval 25/52 (240000000:250000000) with buffer 0
Query block 1, interval 12/60 (110000000:120000000) with buffer 1
Query block 0, interval 28/52 (270000000:280000000) with buffer 0
Query block 0, interval 8/52 (70000000:80000000) with buffer 0
Query block 0, interval 29/52 (280000000:290000000) with buffer 0
Query block 0, interval 31/52 (300000000:310000000) with buffer 0
Query block 0, interval 32/52 (310000000:320000000) with buffer 0
Query block 0, interval 9/52 (80000000:90000000) with buffer 0
Query block 0, interval 33/52 (320000000:330000000) with buffer 0
Query block 0, interval 34/52 (330000000:340000000) with buffer 0
Query block 0, interval 36/52 (350000000:360000000) with buffer 0
Query block 0, interval 5/52 (40000000:50000000) with buffer 0
Query block 0, interval 37/52 (360000000:370000000) with buffer 0
Query block 0, interval 38/52 (370000000:380000000) with buffer 0
Query block 0, interval 40/52 (390000000:400000000) with buffer 0
Query block 0, interval 11/52 (100000000:110000000) with buffer 0
Query block 0, interval 41/52 (400000000:410000000) with buffer 0
Query block 0, interval 43/52 (420000000:430000000) with buffer 0
Query block 0, interval 12/52 (110000000:120000000) with buffer 0
Query block 0, interval 44/52 (430000000:440000000) with buffer 0
Query block 0, interval 45/52 (440000000:450000000) with buffer 0
Query block 0, interval 13/52 (120000000:130000000) with buffer 0
Query block 0, interval 47/52 (460000000:470000000) with buffer 0
Query block 0, interval 15/52 (140000000:150000000) with buffer 0
Query block 0, interval 49/52 (480000000:490000000) with buffer 0
Query block 0, interval 16/52 (150000000:160000000) with buffer 0
Query block 0, interval 51/52 (500000000:510000000) with buffer 0
Query block 0, interval 52/52 (510000000:510113926) with buffer 0
Query block 1, interval 1/60 (0:10000000) with buffer 1
Query block 1, interval 2/60 (10000000:20000000) with buffer 1
Query block 1, interval 3/60 (20000000:30000000) with buffer 1
Query block 1, interval 5/60 (40000000:50000000) with buffer 1
Query block 1, interval 6/60 (50000000:60000000) with buffer 1
Query block 1, interval 7/60 (60000000:70000000) with buffer 1
Query block 0, interval 6/52 (50000000:60000000) with buffer 0
Query block 1, interval 8/60 (70000000:80000000) with buffer 1
Query block 1, interval 9/60 (80000000:90000000) with buffer 1
Query block 0, interval 20/52 (190000000:200000000) with buffer 0
Query block 1, interval 11/60 (100000000:110000000) with buffer 1
Query block 0, interval 27/52 (260000000:270000000) with buffer 0
terminate called after throwing an instance of 'thrust::system::system_error'
what(): trivial_device_copy D->H failed: cudaErrorInvalidValue: invalid argument
/usr/local/bin/run_segalign: line 197: 35103 Aborted (core dumped) stdbuf -oL segalign $refPath $queryPath $DATA_FOLDER $optionalArguments
real 2m19.050s
user 1m27.737s
sys 1m50.123s
real 2m19.072s
user 2m25.897s
sys 1m53.887s
No alignment generated
It now stops with make: *** No rule to make target 'install'. Stop.
I'm working to make cactus run on arbitrary cigar output ComparativeGenomicsToolkit/cactus#178. It would also be nice to be able to just drop in WGA_GPU as a replacement to cPecanLastz in cactus. To do this, it would need to support lastz's interface. Here is an example.
cat small-cactus.tmp
>id=0|simMouse.chr6|0
TTTTTCAGTTGCAATACCCAACCGGGAGAAACTTTCAGTGAGCACACCTCAGGTTCCTATATCAAGCAGGCAGTCTTGCATAGCAAATGGTCTCTGGTAG
ACGGTGCACTCAATCTATGTGAGGTATAGAAAATAAAGGACTACACACATCTCATCAAGTATCCCGTCATATTTGTGGCAAAACACACGTACAAATGCAC
ACTTGATGGTACTTGCCTGGAATATGACTCTAGGTTGATCCCTGGCACACAGGCACATTAATTCCCGAATGATGGTCTGCCTGTCCAGTTCTAGATAATG
>id=1|simRat.chr6|0
TTTTCGGCTGCAATACCCAACCTGAAGACATTTTCAGTGGGCCCACCTCAGGTTCTTATATCAAGCAGACAGTCTTGCGCAACAGATGGTCTCTGATAGA
CAGTGCACTCAATCTATGTGAAGGATAGAAAACAAAGGACTACTCATCTCATCAAGTATCCTGTCGTATTTGTGCTTTAGACTCAGCAAAACAAGTGTGC
AAGTGCACACCTGATTGTACTTGTCTGGAATATGACTCTAGGTTGAGCCCTGGCACACACTCAGGTCAACTCCAATGATGGTCTGCTTGTCCAATTCTAG
~/dev/cactus/bin/cPecanLastz ./small-cactus.tmp[multiple][nameparse=darkspace] ./small-cactus.tmp[nameparse=darkspace] --format=cigar --notrivial --step=1 --ambiguous=iupac,100,100 --ydrop=3000
cigar: id=0|simMouse.chr6|0 1 294 + id=1|simRat.chr6|0 0 300 + 19928 M 141 I 2 M 33 D 11 M 88 I 2 M 27
cigar: id=1|simRat.chr6|0 0 300 + id=0|simMouse.chr6|0 1 294 + 19928 M 141 D 2 M 33 I 11 M 88 D 2 M 27
From what I can see, this requires
[multiple][nameparse=darkspace]
--notrivial
|
in sequence namesRight now run_wga_gpu
uses $(nproc)
threads. This makes it difficult to use from workflows such as toil or cactus. Can you please add an option to specifiy the threads on the command line? A similar option to specify the number of GPUs would also be helpful. Thanks.
On a p3.8xlarge:
run_wga_gpu tmp9_wlnwgk.tmp tmp9_wlnwgk.tmp --format=cigar --notrivial --step=3 --ambiguous=iupac,100,100 --ydrop=3000
gives
Splitting reference chromosome
Converting chromosome wise fasta to 2bit format
Splitting query chromosome
Converting chromosome wise fasta to 2bit format
Executing: "wga tmp9_wlnwgk.tmp tmp9_wlnwgk.tmp /home/ubuntu/output_19346/data_26790/ --format=cigar --notrivial --step=3 --ambiguous=iupac,100,100 --ydrop=3000"
Using 32 threads
Using 4 GPU(s)
Reading query file ...
terminate called after throwing an instance of 'thrust::system::system_error'
what(): CUDA free failed: cudaErrorCudartUnloading: driver shutting down
/usr/local/bin/run_wga_gpu: line 115: 12419 Aborted (core dumped) stdbuf -oL wga $refPath $queryPath $DATA_FOLDER $optionalArguments
real 1m4.846s
user 0m0.000s
sys 0m0.000s
real 1m4.857s
user 0m55.655s
sys 0m9.106s
cat: '*.err': No such file or directory
rm: cannot remove '*.segments': No such file or directory
rm: cannot remove '*.err': No such file or directory
The file is at s3://glennhickey/share/tmp9_wlnwgk.tmp
Hi there,
On running cmake I get the following error, any idea what dependency I'm missing? I have tbb installed, of course.
Thanks for any ideas, best wishes,
Mick
(version 0.1.2)
CMake Error at CMakeLists.txt:15 (include):
include could not find requested file:
/cmake/TBBBuild.cmake
CMake Error at CMakeLists.txt:16 (tbb_build):
Unknown CMake command "tbb_build".
It seems to be duplicating the input file argument?
run_segalign_repeat_masker simCow.chr6 --lastz_interval=3000000 --step=3 --ambiguous=iupac,100,100 --nogapped --markend
Executing: "segalign_repeat_masker /home/hickey/dev/WGA_GPU/simCow.chr6 simCow.chr6 --lastz_interval=3000000 --step=3 --ambiguous=iupac,100,100 --nogapped --markend"
real 0m0.000s
user 0m0.000s
sys 0m0.000s
You must specify a sequence file
Usage: run_segalign_repeat_masker seq_file [options]
--seq_file arg sequence file in FASTA format
I'm still having trouble with kangaroo rat. It was running out of disk before, but fixing that in cactus and giving it 3TB on Terra wasn't enough for it to work.
Running on a p3.16large
gives the following, as it presumably runs out of memory (it's at 1 TB of output at that point, but I had another TB free):
Command terminated by signal 9
Command being timed: "segalign_repeat_masker dipOrd1.fa --lastz_interval=3000000 --markend --neighbor_proportion 0.2 --step=3 --ambiguous=iupac,100,100 --nogapped"
User time (seconds): 24733.33
System time (seconds): 7181.33
Percent of CPU this job got: 749%
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:10:57
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 499902360
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 6278
Minor (reclaiming a frame) page faults: 743073453
Voluntary context switches: 4601999
Involuntary context switches: 401723
Swaps: 0
File system inputs: 872856
File system outputs: 2140512800
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
This is the same genome as referred to in #37.
I'm reaching out to see if there is interest in adding support for AMD GPUs. I'm working with Pawsey Supercomputing Centre in Australia to port some applications important for their user base and Segalign is on our list :)
I'm opening this to see if you'd have interest in merging this support into the main branch. I'd also like to discuss some approaches to provide this support without breaking support for users hat don't have ROCm on their own systems.
I received this error message during the alignment against the final reference block suggesting an illegal memory access. Would there be any quick way to align the final reference block without starting over (i.e. just pulling the fasta headers in ref_block_n.name and potentially all of the query blocks and treating them as a "genome" input)?
Error: cudaMemcpy of 4 bytes for num_anchors failed with error " invalid argument "
terminate called after throwing an instance of 'thrust::system::system_error'
what(): CUDA free failed: cudaErrorCudartUnloading: driver shutting down
/usr/local/bin/run_segalign: line 197: 19477 Aborted stdbuf -oL segalign $refPath $queryPath $DATA_FOLDER $optionalArguments
Hello there,
I have been trying to install GPU version of cactus on linux centos7 cluster. I followed the installation guidelines on "installUbuntu.sh" under scripts. however I had an error while running cmake.
I am not sure how to install boost so I installed using Conda
module load 7/compiler/cuda/10.0
conda activate segAlign
conda install -c statiskit libboost-dev
conda install -c intel tbb
I also attached the file: SegAlign/build/CMakeFiles/CMakeError.log.
Any insights? thanks a lot, really appreciate your help!
Zhenzhen
-- The CXX compiler identification is GNU 4.8.5
-- The CUDA compiler identification is NVIDIA 10.0.130
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ - works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Check for working CUDA compiler: /public/software/compiler/cuda/7/cuda-10.0/bin/nvcc
-- Check for working CUDA compiler: /public/software/compiler/cuda/7/cuda-10.0/bin/nvcc - works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found CUDA: /public/software/compiler/cuda/7/cuda-10.0 (found suitable version "10.0", minimum required is "9.0")
-- Intel TBB can not be built: Makefile or src directory was not found in /public/home/yangzhzh/tools_zz/SegAlign/build/../tbb2019_20191006oss
CMake Warning (dev) at CMakeLists.txt:17 (find_package):
Policy CMP0074 is not set: find_package uses _ROOT variables.
Run "cmake --help-policy CMP0074" for policy details. Use the cmake_policy
command to set the policy and suppress this warning.
CMake variable TBB_ROOT is set to:
/public/home/yangzhzh/tools_zz/SegAlign/build/../tbb2019_20191006oss
For compatibility, CMake is ignoring the variable.
This warning is for project developers. Use -Wno-dev to suppress it.
-- Found ZLIB: /usr/lib64/libz.so (found version "1.2.7")
CMake Error at /public/software/apps/cmake/3.17.2/share/cmake-3.17/Modules/FindPackageHandleStandardArgs.cmake:164 (message):
Could NOT find Boost (missing: Boost_INCLUDE_DIR program_options)
Call Stack (most recent call first):
/public/software/apps/cmake/3.17.2/share/cmake-3.17/Modules/FindPackageHandleStandardArgs.cmake:445 (_FPHSA_FAILURE_MESSAGE)
/public/software/apps/cmake/3.17.2/share/cmake-3.17/Modules/FindBoost.cmake:2162 (find_package_handle_standard_args)
CMakeLists.txt:23 (find_package)
-- Configuring incomplete, errors occurred!
See also "/public/home/yangzhzh/tools_zz/SegAlign/build/CMakeFiles/CMakeOutput.log".
See also "/public/home/yangzhzh/tools_zz/SegAlign/build/CMakeFiles/CMakeError.log".
Hi there
I was wondering if the run_segalign_repeat_masker file is missing? I do not see it in the /scripts folder and adapting from the run_seqalign script doesnt yet work for me (runs but no alignment is produced. Is the code to run thetest correct for this?
thanks alot
If you could get this working again, it'd be great. It's a showstopper for cactus. Thanks.
run_wga_gpu cowdog.fa cowdog.fa --format=cigar --notrivial --step=1 --ambiguous=iupac,100,100 --ydrop=3000 --wga_chunk 25000 > gpu-cactus.cigar
Splitting reference chromosome
Converting chromosome wise fasta to 2bit format
/bin/bash: simCow.chr6: command not found
/bin/bash: 0.fa: command not found
/bin/bash: simCow.chr6: command not found
/bin/bash: 0.2bit: command not found
-id is not a valid option
/bin/bash: simDog.chr6: command not found
/bin/bash: 0.fa: command not found
/bin/bash: simDog.chr6: command not found
/bin/bash: 0.2bit: command not found
-id is not a valid option
I suspect this may be due to fragmentation? There are tens of thousands of contigs in each assembly. This is typical of cactus input, as reference-quality genome assemblies are only available for a small fraction of species.
Anyway, I let this run 10 hours on a p3.8xlarge ($4.50 / hour) before giving up. For comparison it took well under an hour on 90 cpu cores with chunked lastz
run_wga_gpu tmp5g8t96mo.tmp tmp5g8t96mo.tmp --format=cigar --notrivial --step=4 --ambiguous=iupac,100,100 --ydrop=3000
The file can be found here: s3://glennhickey/share/tmp5g8t96mo.tmp
I've been having trouble running segalign_repeat_masker
on some genomes. They are poor-quality, hardmasked assemblies, but I still don't think that should cause a crash. I've put details to reproduce below, but since the file's not public, I will share it with you offline.
command line
segalign_repeat_masker PDF_0085.fa --lastz_interval=3000000 --markend --neighbor_proportion 0.2 --M 10 --step=3 --ambiguous=iupac,100,100
instance type: p3.8xlarge
segalign commit: d5fd293
output
[...]
Chromosome block 1 interval 84/333 (1248000000:1251000000) with ref (249000000:777000000) rc (1385775517:1388775517)
Chromosome block 1 interval 95/333 (1281000000:1284000000) with ref (282000000:810000000) rc (1352775517:1355775517)
Chromosome block 1 interval 94/333 (1278000000:1281000000) with ref (279000000:807000000) rc (1355775517:1358775517)
Chromosome block 1 interval 92/333 (1272000000:1275000000) with ref (273000000:801000000) rc (1361775517:1364775517)
Chromosome block 1 interval 93/333 (1275000000:1278000000) with ref (276000000:804000000) rc (1358775517:1361775517)
Chromosome block 1 interval 96/333 (1284000000:1287000000) with ref (285000000:813000000) rc (1349775517:1352775517)
Error: cudaMemcpy of 4 bytes for num_anchors failed with error " invalid argument "
terminate called after throwing an instance of 'thrust::system::system_error'
what(): CUDA free failed: cudaErrorCudartUnloading: driver shutting down
Any ideas as to the cause would be most appreciated. Thanks!
First off, thank you so much for this tool. The speed-up compared to lastz is incredible and this is allowing me to accomplish in mere minutes and hours what was taking days or more with lastz alone.
I was hoping you give me some advice on running this for a whole genome alignment. With the original lastz, you can input your target as a multi-fasta, however it strongly recommended that you do not provide a multi-fasta as a query. Does this hold true with SegAlign if you use the run_segalign.sh script? It appears to me that this script breaks the sequences down into segments. So would it be ok to run a fasta with all chromosomes of a genome against another fasta with all chromosomes of another genome or would you advice breaking the query fasta down by each sequence in the fasta file?
Thank you for your time.
I'm on an AWS g3.8xlarge but can't run a whole-genome alignment. Is there an option I can pass? Do I need a bigger node? Perhaps you could update the README with some examples for whole-genome alignments? Thanks
run_wga_gpu ../inputs/hg38.fa ../inputs/hg38.fa --format=cigar --notrivial --step=3 --ambiguous=iupac,100,100 --ydrop=3000 > human_self.cigar
Splitting reference chromosome
Converting chromosome wise fasta to 2bit format
Academic tradition requires you to cite works you base your article on.
When using programs that use GNU Parallel to process data for publication
please cite:
O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
;login: The USENIX Magazine, February 2011:42-47.
This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
To silence this citation notice: run 'parallel --citation'.
parallel: Warning: Only enough file handles to run 252 jobs in parallel.
parallel: Warning: Running 'parallel -j0 -N 252 --pipe parallel -j0' or
parallel: Warning: raising ulimit -n or /etc/security/limits.conf may help.
Splitting query chromosome
Converting chromosome wise fasta to 2bit format
Academic tradition requires you to cite works you base your article on.
When using programs that use GNU Parallel to process data for publication
please cite:
O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
;login: The USENIX Magazine, February 2011:42-47.
This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
To silence this citation notice: run 'parallel --citation'.
parallel: Warning: Only enough file handles to run 252 jobs in parallel.
parallel: Warning: Running 'parallel -j0 -N 252 --pipe parallel -j0' or
parallel: Warning: raising ulimit -n or /etc/security/limits.conf may help.
Executing: "wga /home/ubuntu/work/inputs/hg38.fa /home/ubuntu/work/inputs/hg38.fa /home/ubuntu/work/cactus/output_24728/data_16836/ --format=cigar --notrivial --step=3 --ambiguous=iupac,100,100 --ydrop=3000"
Using 32 threads
Using 2 GPU(s)
Reading query file ...
Reading target file ...
Start alignment ...
Sending reference chr1 ...
Error: cudaMalloc of 19084000000 bytes failed2!
terminate called after throwing an instance of 'thrust::system::system_error'
what(): CUDA free failed: cudaErrorCudartUnloading: driver shutting down
/usr/local/bin/run_wga_gpu: line 65: 54866 Aborted (core dumped) stdbuf -oL wga $refPath $queryPath $FOLDER $optionalArguments
real 0m28.977s
user 0m24.489s
sys 0m7.593s
rm: cannot remove '*.segments': No such file or directory
cat: 'tmp*': No such file or directory
With the current master, I now get an error like
/home/hickey/dev/WGA_GPU/bin/run_wga_gpu: line 92: [: 1: unary operator expected
whenever I run run_wga_gpu
. I don't know if it affects the output at all. Here's a way to reproduce
wget http://s3-us-west-2.amazonaws.com/jcarmstr-misc/testRegions/evolverMammals/simMouse.chr6
wget http://s3-us-west-2.amazonaws.com/jcarmstr-misc/testRegions/evolverMammals/simRat.chr6
run_wga_gpu simMouse.chr6 simRat.chr6 > out
Splitting reference chromosome
Converting chromosome wise fasta to 2bit format
Splitting query chromosome
Converting chromosome wise fasta to 2bit format
Executing: "wga /home/hickey/dev/work/cactus-gpu/simMouse.orig /home/hickey/dev/work/cactus-gpu/simRat.orig /home/hickey/dev/work/cactus-gpu/output_19580/data_8325/ "
Using 8 threads
Using 1 GPU(s)
Reading query file ...
Reading target file ...
Start alignment ...
Sending reference simMouse.chr6 ...
Sending query simRat.chr6 with buffer 0 ...
Starting query simRat.chr6 with buffer 0 ...
Chromosome simRat.chr6 interval 1/1 (0:647196) with buffer 0
/home/hickey/dev/WGA_GPU/bin/run_wga_gpu: line 92: [: 1: unary operator expected
real 0m12.514s
user 0m12.297s
sys 0m0.405s
Hi,
I have encountered the following error why trying to align 2 genomes of size ~ 2-3 Gb. Is it related to the allocation of memory or a bug? Thanks a lot.
Hien
Using 2 threads
Using 1 GPU(s)
Reading query file ...
Reading target file ...
Start alignment ...
Sending reference block 0 ...
Sending query block 0 with buffer 0 ...
Sending query block 1 with buffer 1 ...
Query block 0, interval 1/56 (0:10000000) with buffer 0
Query block 0, interval 2/56 (10000000:20000000) with buffer 0
terminate called after throwing an instance of 'thrust::system::system_error'
what(): CUDA free failed: cudaErrorIllegalAddress: an illegal memory access was encountered
/usr/local/bin/run_segalign: line 197: 30983 Aborted stdbuf -oL segalign $refPath $queryPath $DATA_FOLDER $optionalArguments
I get a failure message on running on some small test data. It produces cigar output, but no alignments between cow and dog (only self alignments). It also does not set the error code.
This on the current master (f23537b) on a fresh install on a p3.8xlarge. Here is the input
run_wga_gpu cowdog.fa cowdog.fa --format=cigar --notrivial --step=1 --ambiguous=iupac,100,100 --ydrop=3000 --wga_chunk 25000 > gpu-cactus.cigar
Splitting reference chromosome
Converting chromosome wise fasta to 2bit format
Splitting query chromosome
Converting chromosome wise fasta to 2bit format
Executing: "wga /home/hickey/dev/cactus/gpu-work/cowdog_clean.fa /home/hickey/dev/cactus/gpu-work/cowdog_clean.fa /home/hickey/dev/cactus/gpu-work/output_355/data_10714/ --format=cigar --notrivial --step=1 --ambiguous=iupac,100,100 --ydrop=3000 --wga_chunk 25000"
Using 8 threads
Using 1 GPU(s)
Reading query file ...
Reading target file ...
Start alignment ...
Sending reference simCow.chr6 ...
Sending query simCow.chr6 with buffer 0 ...
Sending query simDog.chr6 with buffer 1 ...
Starting query simCow.chr6 with buffer 0 ...
Chromosome simCow.chr6 interval 1/1 (0:602600) with buffer 0
Starting query simDog.chr6 with buffer 1 ...
Chromosome simDog.chr6 interval 1/1 (0:593878) with buffer 1
FAILURE: query interval out of range (tmp1.ref0.query1.segments: line 13, 0<1)
Sending reference simDog.chr6 ...
Sending query simCow.chr6 with buffer 0 ...
Sending query simDog.chr6 with buffer 1 ...
Starting query simCow.chr6 with buffer 0 ...
Chromosome simCow.chr6 interval 1/1 (0:602600) with buffer 0
Starting query simDog.chr6 with buffer 1 ...
Chromosome simDog.chr6 interval 1/1 (0:593878) with buffer 1
FAILURE: target interval out of range (tmp1.ref1.query0.segments: line 13, 0<1)
real 0m6.747s
user 0m12.239s
sys 0m1.874s
no error code
echo $?
0
no non-self alignments (there are 937 when running cPecanLastz on this input)
grep -i cow gpu-cactus.cigar | grep -i dog | wc -l
0
Hi,
I am having issues running cactus on LSF, and it seems like a SeqAlign problem.
I was pointed here from:
ComparativeGenomicsToolkit/cactus#489
Any ideas?
Thanks!
[2023-09-27T10:02:11-0700] [MainThread] [W] [toil.job] Due to failure we are reducing the remaining try count of job 'LastzRepeatMaskJob' kind-LastzRepeatMaskJob/instance-r63tene1 v11 with ID kind-LastzRepeatMaskJob/instance-r63tene1 to 0
...
Log from job "'LastzRepeatMaskJob' kind-LastzRepeatMaskJob/instance-r63tene1 v12" follows:
=========>
...
File "/home/cactus/cactus_env/lib/python3.8/site-packages/cactus/preprocessor/lastzRepeatMasking/cactus_lastzRepeatMask.py", line 130, in gpuRepeatMask
segalign_messages = cactus_call(parameters=cmd, work_dir=self.work_dir, returnStdErr=True, gpus=self.repeatMaskOptions.gpu,
File "/home/cactus/cactus_env/lib/python3.8/site-packages/cactus/shared/common.py", line 889, in cactus_call
raise RuntimeError("{}Command {} exited {}: {}".format(sigill_msg, call, process.returncode, out))
RuntimeError: Command /usr/bin/time -f "CACTUS-LOGGED-MEMORY-IN-KB: %M" segalign_repeat_masker /tmp/58f5d3ffa02e55c3b06625f0f8626408/0d5a/937a/tmpfg2qo5qy/gSojMU042_0_0.tgt --lastz_interval=10000000 --markend --neighbor_proportion 0.2 --M 10 --step=3 --ambiguous=iupac,100,100 --num_gpu 1 exited 134: stderr=Using 64 threads
...
Error: cudaMemcpy of 4 bytes for num_anchors failed with error " invalid argument "
terminate called after throwing an instance of 'thrust::system::system_error'
what(): CUDA free failed: cudaErrorCudartUnloading: driver shutting down
Command terminated by signal 6
CACTUS-LOGGED-MEMORY-IN-KB: 69902308
My OS is Ubuntu 20.04.6 LTS (GNU/Linux 5.4.0-153-generic x86_64).
Some specs for the GPU I'm using:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06 Driver Version: 525.125.06 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A100 80G... On | 00000000:CA:00.0 Off | 0 |
| N/A 36C P0 48W / 300W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
run_segalign work/Maspalax.fasta work/Mpsiulurus.fasta --output=test.maf
Converting fasta files to 2bit format
Executing: "segalign /home/u220220932211/work/Maspalax.fasta /home/u220220932211/work/Mpsiulurus.fasta /home/u220220932211/output_10819/data_17631/ --output=test.maf"
Using 2 threads
Using 1 GPU(s)
Error: cudaMalloc of 256 bytes for sub_mat failed with error " the provided PTX was compiled with an unsupported toolchain. "
real 0m0.568s
user 0m0.077s
sys 0m0.201s
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
nvidia-smi
Fri Oct 13 17:22:14 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.31 Driver Version: 465.31 CUDA Version: 11.3 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA Tesla V1... Off | 00000000:41:00.0 Off | 0 |
| N/A 21C P0 24W / 250W | 0MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
So what did this happen and what can I do to fix it?
This is on a p3.8xlarge again.
run_wga_gpu tmp33kbj2ei.tmp tmp43pqr89c.tmp --format=cigar --notrivial --step=3 --ambiguous=iupac,100,100 --ydrop=3000
Here is the output
Splitting reference chromosome
Converting chromosome wise fasta to 2bit format
Splitting query chromosome
Converting chromosome wise fasta to 2bit format
Executing: "wga /tmp/toil-c6c4b39a-d37f-462d-923f-48a9cd1c5eb8-c9bfe0044c6e414a88a85526c0401889/tmp9n64pex_/8aac02ed-ad53-4b86-8c33-f79ad6ea6394/tmp33kbj2ei.tmp /tmp/toil-c6c4b39a-d37f-462d-923f-48a9cd1c5eb8-c9bfe0044c6e414a88a85526c0401889/tmp9n64pex_/8aac02ed-ad53-4b86-8c33-f79ad6ea6394/tmp43pqr89c.tmp /home/ubuntu/output_10844/data_6619/ --format=cigar --notrivial --step=3 --ambiguous=iupac,100,100 --ydrop=3000"
Using 32 threads
Using 4 GPU(s)
Reading query file ...
Reading target file ...
terminate called after throwing an instance of 'thrust::system::system_error'
what(): CUDA free failed: cudaErrorCudartUnloading: driver shutting down
/usr/local/bin/run_wga_gpu: line 115: 42399 Aborted (core dumped) stdbuf -oL wga $refPath $queryPath $DATA_FOLDER $optionalArguments
real 0m46.963s
user .m,).227s
sys 0m+*.440s
real 0m46.974s
user 0m38.309s
sys 0m8.576s
cat: '*.err': No such file or directory
rm: cannot remove '*.segments': No such file or directory
rm: cannot remove '*.err': No such file or directory
s3://glennhickey/share/tmp33kbj2ei.tmp
s3://glennhickey/share/tmp43pqr89c.tmp
Command:
run_segalign /tmpoat8dml4.tmp tmp2a87d84z.tmp --format=cigar --notrivial --step=2 --ambiguous=iupac,100,100 --ydrop=3000
Input:
s3://glennhickey/share/tmp2a87d84z.tmp.gz
s3://glennhickey/share/tmpoat8dml4.tmp.gz
Output
Converting fasta files to 2bit format
Executing: "segalign /home/ubuntu/work/blast-fail/work/node-c9637ce3-5d13-43ff-a839-9ca5bfddbbe0-e26b862451c6400c819df9042d6a6135/tmp5bhda67a/9a6da48b-421e-44e6-ac6b-4
29281f9d00f/tmpoat8dml4.tmp /home/ubuntu/work/blast-fail/work/node-c9637ce3-5d13-43ff-a839-9ca5bfddbbe0-e26b862451c6400c819df9042d6a6135/tmp5bhda67a/9a6da48b-421e-44e6
-ac6b-429281f9d00f/tmp2a87d84z.tmp /home/ubuntu/work/blast-fail/output_22643/data_4076/ --format=cigar --notrivial --step=2 --ambiguous=iupac,100,100 --ydrop=3000"
Using 64 threads
Using 8 GPU(s)
Reading query file ...
Reading target file ...
Start alignment ...
Sending reference block 0 ...
Sending query block 0 with buffer 0 ...
Query block 0, interval 1/26 (0:10000000) with buffer 0
Query block 0, interval 2/26 (10000000:20000000) with buffer 0
Query block 0, interval 3/26 (20000000:30000000) with buffer 0
Query block 0, interval 4/26 (30000000:40000000) with buffer 0
[...]
Sending reference block 10 ...
Sending query block 0 with buffer 0 ...
Query block 0, interval 1/26 (0:10000000) with buffer 0
Query block 0, interval 2/26 (10000000:20000000) with buffer 0
Query block 0, interval 3/26 (20000000:30000000) with buffer 0
Query block 0, interval 4/26 (30000000:40000000) with buffer 0
Query block 0, interval 26/26 (250000000:259697043) with buffer 0
Query block 0, interval 5/26 (40000000:50000000) with buffer 0
Query block 0, interval 6/26 (50000000:60000000) with buffer 0
Query block 0, interval 7/26 (60000000:70000000) with buffer 0
Query block 0, interval 8/26 (70000000:80000000) with buffer 0
Query block 0, interval 9/26 (80000000:90000000) with buffer 0
Query block 0, interval 10/26 (90000000:100000000) with buffer 0
Query block 0, interval 11/26 (100000000:110000000) with buffer 0
Query block 0, interval 12/26 (110000000:120000000) with buffer 0
Query block 0, interval 13/26 (120000000:130000000) with buffer 0
Query block 0, interval 14/26 (130000000:140000000) with buffer 0
Query block 0, interval 15/26 (140000000:150000000) with buffer 0
Query block 0, interval 16/26 (150000000:160000000) with buffer 0
Query block 0, interval 17/26 (160000000:170000000) with buffer 0
Query block 0, interval 18/26 (170000000:180000000) with buffer 0
Query block 0, interval 20/26 (190000000:200000000) with buffer 0
Query block 0, interval 19/26 (180000000:190000000) with buffer 0
Query block 0, interval 21/26 (200000000:210000000) with buffer 0
Query block 0, interval 22/26 (210000000:220000000) with buffer 0
Query block 0, interval 23/26 (220000000:230000000) with buffer 0
Query block 0, interval 24/26 (230000000:240000000) with buffer 0
Query block 0, interval 25/26 (240000000:250000000) with buffer 0
real 4m26.554s
user 57m1.447s
sys 19m27.509s
Error in LASTZ process!
minus strand subrange is 1..1159682
FAILURE: query interval out of range (tmp9.block0.r4008435271.plus.segments: line 22, 1482007>1159682)
FAILURE: query interval out of range (tmp9.block0.r4613734159.minus.segments: line 19, 4294647925>1159682)
minus strand subrange is 1..1159682
FAILURE: query interval out of range (tmp9.block0.r4613734159.plus.segments: line 19, 1481984>1159682)
FAILURE: query interval out of range (tmp9.block0.r5128203719.minus.segments: line 31, 4294645060>1159682)
minus strand subrange is 1..1159682
FAILURE: query interval out of range (tmp9.block0.r5128203719.plus.segments: line 9, 1482007>1159682)
FAILURE: query interval out of range (tmp9.block0.r517840468.minus.segments: line 104, 4294614720>1159682)
minus strand subrange is 1..1159682
FAILURE: query interval out of range (tmp9.block0.r517840468.plus.segments: line 144, 1479179>1159682)
FAILURE: query interval out of range (tmp9.block0.r5628944441.minus.segments: line 35, 4294645065>1159682)
minus strand subrange is 1..1159682
FAILURE: query interval out of range (tmp9.block0.r5628944441.plus.segments: line 8, 1482007>1159682)
(exits with code 6)
For reference, the cactus command is
cactus-blast ./js 10mammalsplus.txt Anc10.cigar --root Anc10 --pathOverrides tupChi1.fa.pp rheMac8.fa.pp hg38.fa.pp panTro6.fa.pp equCab3.fa.pp --pathOverrideNames Tree_shrew Rhesus Human Chimp Horse --realTimeLogging --logInfo --maxCores 64 --workDir ./work --cleanWorkDir never --configFile config.xml
Does this mean anything to you? I'm getting this on Terra with the latest Segalign. This is on a job that seems to have worked an older version:
FAILURE: extra segments in file (tmp9.block7.r4845374754.plus.segments: line 94, id=2|chrUn_JH373295|0/id=2|chr27|0+)
(for this usage segments must appear in the same order as the query file, with
all + strand segments before all - strand segments for each query)
FAILURE: extra segments in file (tmp9.block7.r570655384.plus.segments: line 277, id=0|chrB1|0/id=2|chr27|0+)
(for this usage segments must appear in the same order as the query file, with
all + strand segments before all - strand segments for each query)
Command exited with non-zero status 6
Command being timed: "run_segalign /mnt/hdd/node-9c3117ca-a2f4-418b-9a69-1b9eddc9b16c-577754ab-8162-4c0c-89c8-eb40417d150f/tmp6cjn70hu/e132e508-98b8-45ea-99dc-fde827b00bbc/tmpjey6y6io.tmp /mnt/hdd/node-9c3117ca-a2f4-418b-9a69-1b9eddc9b16c-577754ab-8162-4c0c-89c8-eb40417d150f/tmp6cjn70hu/e132e508-98b8-45ea-99dc-fde827b00bbc/tmpjey6y6io.tmp --format=cigar --notrivial --step=2 --ambiguous=iupac,100,100 --ydrop=3000"
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.