gsneha26 / segalign Goto Github PK

A Scalable GPU-Based Whole Genome Aligner, published in SC20: https://doi.ieeecomputersociety.org/10.1109/SC41405.2020.00043

License: MIT License

CMake 1.16% C++ 47.20% Cuda 39.32% C 6.31% Shell 5.31% Dockerfile 0.69%

whole-genome-alignment gpu-acceleration lastz cuda tbb comparative-genomics aws-ec2 genomics genome-alignments genome-aligner

segalign's People

Contributors

Stargazers

Watchers

Forkers

yatisht comparativegenomicstoolkit crelicthecleric genostack weiszd wtroy2 sivasan fluidnumerics-joe ruth-moraa abgulhan fluidnumerics

segalign's Issues

Still worried about exit codes

When I run the human/chimp test on Terra, I get much smaller output than on aws. (all with 6G chunk size):

AWS cigar: 844M
Terra cigar: 131M

The runtimes are shorter too. Here are the 5 commands (time at right) on

AWS

2020-05-26 16:05:36.012853: Successfully ran the command: "run_wga_gpu /tmp/node-88f97fd5-82ad-41c1-8719-a3d68554c0c8-8de34f899cee454a94366f37f14aef03/tmpo4v16szp/3d842793-a906-4be8-b69a-45d9e4a37b6e/tmpvfj83knk.tmp /tmp/node-88f97fd5-82ad-41c1-8719-a3d68554c0c8-8de34f899cee454a94366f37f14aef03/tmpo4v16szp/3d842793-a906-4be8-b69a-45d9e4a37b6e/tmpo_g4f3gr.tmp --format=cigar --notrivial --step=3 --ambiguous=iupac,100,100 --ydrop=3000" in 276.4431502819061 seconds
2020-05-26 16:42:10.981833: Successfully ran the command: "run_wga_gpu /tmp/node-88f97fd5-82ad-41c1-8719-a3d68554c0c8-8de34f899cee454a94366f37f14aef03/tmpdhim63d0/9c0c4c18-4dc7-4079-8bee-9a0549dc4f64/tmpeqpkq3er.tmp /tmp/node-88f97fd5-82ad-41c1-8719-a3d68554c0c8-8de34f899cee454a94366f37f14aef03/tmpdhim63d0/9c0c4c18-4dc7-4079-8bee-9a0549dc4f64/tmp99ukp9hr.tmp --format=cigar --notrivial --step=3 --ambiguous=iupac,100,100 --ydrop=3000" in 2190.4813838005066 seconds
2020-05-26 16:43:31.858975: Successfully ran the command: "run_wga_gpu /tmp/node-88f97fd5-82ad-41c1-8719-a3d68554c0c8-8de34f899cee454a94366f37f14aef03/tmpe9bac_96/d4d206a9-b301-40b9-9445-6857779b849d/tmp_6q4bopn.tmp /tmp/node-88f97fd5-82ad-41c1-8719-a3d68554c0c8-8de34f899cee454a94366f37f14aef03/tmpe9bac_96/d4d206a9-b301-40b9-9445-6857779b849d/tmp_6q4bopn.tmp --format=cigar --notrivial --step=3 --ambiguous=iupac,100,100 --ydrop=3000" in 35.484325647354126 seconds
2020-05-26 16:47:27.691592: Successfully ran the command: "run_wga_gpu /tmp/node-88f97fd5-82ad-41c1-8719-a3d68554c0c8-8de34f899cee454a94366f37f14aef03/tmps5ew_ndz/a453b904-56c6-46e1-8d83-1cc67adb07c0/tmpkc8wp55_.tmp /tmp/node-88f97fd5-82ad-41c1-8719-a3d68554c0c8-8de34f899cee454a94366f37f14aef03/tmps5ew_ndz/a453b904-56c6-46e1-8d83-1cc67adb07c0/tmpuoyplo0i.tmp --format=cigar --notrivial --step=3 --ambiguous=iupac,100,100 --ydrop=3000" in 234.43672251701355 seconds
2020-05-26 17:56:35.524303: Successfully ran the command: "run_wga_gpu /tmp/node-88f97fd5-82ad-41c1-8719-a3d68554c0c8-8de34f899cee454a94366f37f14aef03/tmppqg4r1ev/3bea06ae-646e-4905-90d2-52c4670b2bac/tmpiwn8h081.tmp /tmp/node-88f97fd5-82ad-41c1-8719-a3d68554c0c8-8de34f899cee454a94366f37f14aef03/tmppqg4r1ev/3bea06ae-646e-4905-90d2-52c4670b2bac/tmpiwn8h081.tmp --format=cigar --notrivial --step=3 --ambiguous=iupac,100,100 --ydrop=3000" in 4140.146150350571 seconds

Terra

2020-05-27 23:38:53.182083: Successfully ran the command: "run_wga_gpu /cromwell_root/node-03b5c1da-b87c-4892-a269-2fc4ed90518f-afc6c71f-af35-4532-be8c-ac951907239e/tmphmdn9ppu/4977b907-7b21-435f-a4af-e1ab28c85726/tmpacxyj1_a.tmp /cromwell_root/node-03b5c1da-b87c-4892-a269-2fc4ed90518f-afc6c71f-af35-4532-be8c-ac951907239e/tmphmdn9ppu/4977b907-7b21-435f-a4af-e1ab28c85726/tmp4pbuo9yw.tmp --format=cigar --notrivial --step=3 --ambiguous=iupac,100,100 --ydrop=3000" in 314.20388889312744 seconds
2020-05-27 23:39:57.703589: Successfully ran the command: "run_wga_gpu /cromwell_root/node-03b5c1da-b87c-4892-a269-2fc4ed90518f-afc6c71f-af35-4532-be8c-ac951907239e/tmpp9e0skjr/18df358c-c68a-4881-b98d-7af5e666da36/tmpfrewxuu_.tmp /cromwell_root/node-03b5c1da-b87c-4892-a269-2fc4ed90518f-afc6c71f-af35-4532-be8c-ac951907239e/tmpp9e0skjr/18df358c-c68a-4881-b98d-7af5e666da36/tmpfrewxuu_.tmp --format=cigar --notrivial --step=3 --ambiguous=iupac,100,100 --ydrop=3000" in 50.41674566268921 seconds
2020-05-27 23:45:48.559084: Successfully ran the command: "run_wga_gpu /cromwell_root/node-03b5c1da-b87c-4892-a269-2fc4ed90518f-afc6c71f-af35-4532-be8c-ac951907239e/tmpq_kexxzf/2665bd91-b193-405b-8fd0-a03af6075f2d/tmphlklypmc.tmp /cromwell_root/node-03b5c1da-b87c-4892-a269-2fc4ed90518f-afc6c71f-af35-4532-be8c-ac951907239e/tmpq_kexxzf/2665bd91-b193-405b-8fd0-a03af6075f2d/tmpb2hm2b86.tmp --format=cigar --notrivial --step=3 --ambiguous=iupac,100,100 --ydrop=3000" in 348.9945948123932 seconds
2020-05-27 23:50:45.916521: Successfully ran the command: "run_wga_gpu /cromwell_root/node-03b5c1da-b87c-4892-a269-2fc4ed90518f-afc6c71f-af35-4532-be8c-ac951907239e/tmp_aus820b/79cdbba8-a350-4e48-8865-15c5f848674c/tmpe9gz97pg.tmp /cromwell_root/node-03b5c1da-b87c-4892-a269-2fc4ed90518f-afc6c71f-af35-4532-be8c-ac951907239e/tmp_aus820b/79cdbba8-a350-4e48-8865-15c5f848674c/tmp4gbznbdu.tmp --format=cigar --notrivial --step=3 --ambiguous=iupac,100,100 --ydrop=3000" in 283.46080327033997 seconds
2020-05-27 23:55:59.658561: Successfully ran the command: "run_wga_gpu /cromwell_root/node-03b5c1da-b87c-4892-a269-2fc4ed90518f-afc6c71f-af35-4532-be8c-ac951907239e/tmpollz8rah/ed503f44-88d1-4f00-9eac-c9605facbee5/tmpsbzg0eao.tmp /cromwell_root/node-03b5c1da-b87c-4892-a269-2fc4ed90518f-afc6c71f-af35-4532-be8c-ac951907239e/tmpollz8rah/ed503f44-88d1-4f00-9eac-c9605facbee5/tmpsbzg0eao.tmp --format=cigar --notrivial --step=3 --ambiguous=iupac,100,100 --ydrop=3000" in 309.87610483169556 seconds

It could be that there's something at the cactus level that is not passing in the same data. But as far as I can tell, this isn't the case. The input file sizes seem equivalent, and not obviously corrupt. I am continuing to debug. I am specifying similar hardware between the two.

It could also be that something is failing or crashing or being evicted by the host system and run_wga_gpu is not detecting this. grepping for a 'FAILURE' message from lastz seems particularly fragile. Even though lastz seems very diligent about catching error cases with this message I don't think it would be possible to catch them all.

For running on the cloud, I would really be much more comfortable if we could somehow verify the exit code of each lastz command.

AWS data:
s3://glennhickey/share/cactus-blast-gpu-may26.tar.gz

Terra data:
s3://glennhickey/share/cactus-blast-gpu-may28-terra
s3://glennhickey/share/cactus-blast-gpu-may28-terra.log

Please set error code to non-zero on crash

Here's a small example

run_wga_gpu sadf asdfasdf
(crashes)
echo $?
0

This makes it very difficult to use this within a larger script (ie cactus), as there's no way to detect errors. I think the wga binary is okay now for exit codes, but the run_wga_gpu script gobles it up and always returns 0.

On a related note, it will be necessary to be able to specify the number of cores and gpus on the command line for Cactus to properly keep track of resources used. Right now it just uses everything on the system.

Thanks!

Please add system requirements and limits to README

I think this has come up before, but it'd be immensely helpful to add a bit of information in the README

What is the minimum system requirements (and aws node types) for whole genome alignment?
Is there a function of RAM / video RAM that can be used to guide node selection based on input size?
What are the hardcoded size limits? Is it exactly 3G? From a Cactus standpoint, it doesn't really matter, save for the fact that it needs to be known exactly a priori.

/usr/local/bin/run_segalign: line 60: segalign: command not found

Dear Sneha,

I have the same issue as in #51. And I was not able to find a solution.

When I am running

run_segalign --help

as was suggested in #51, I receive the following error:
/usr/local/bin/run_segalign: line 60: segalign: command not found

Thank you very much!

"grep: .err" and "m: cannot remove '.segments'" errors

Some basic file commands such as grep and rm in the run_segalign script are called without checking if the input files exist. As a result, the script crash and Cactus execution fail (attached below).

I was wondering with the files named with *.err, .segments, *.plus.*, and *.minus.* must be created every time the segalign script is called.

[Update]
The input tmp2e_55g2q.tmp and tmprx912f4d.tmp files are available at https://www.dropbox.com/sh/9bf6o0tij7drafm/AACdWCSz6nkbW7hy6zbpDgeea?dl=0

Cheers,
Thiago

Log from job "kind-RunBlast/instance-v7g6i_gt" follows:
=========>
	[2021-03-08T18:54:14+0000] [MainThread] [I] [toil.worker] ---TOIL WORKER OUTPUT LOG---
	[2021-03-08T18:54:14+0000] [MainThread] [I] [toil] Running Toil version 5.2.0-047d0c4f2949c576c80e452a0807c5be6355c63d on host tf-cactus-slurm-compute-3-0.
	[2021-03-08T18:54:14+0000] [MainThread] [D] [toil] Configuration: {'workflowID': '5d8e7caf-fad5-4d3b-9303-ae20634083f8', 'workflowAttemptNumber': 0, 'jobStore': 'file:/home/thiagogenez_ebi_ac_uk/viruses/run/jobStore', 'logLevel': 'Debug', 'workDir': None, 'noStdOutErr': False, 'stats': False, 'clean': 'onSuccess', 'cleanWorkDir': 'never', 'clusterStats': None, 'restart': False, 'batchSystem': 'single_machine', 'disableAutoDeployment': False, 'environment': {}, 'statePollingWait': 1, 'maxLocalJobs': 8, 'manualMemArgs': False, 'parasolCommand': 'parasol', 'parasolMaxBatches': 1000, 'scale': 1.0, 'linkImports': True, 'moveExports': False, 'mesosMasterAddress': '10.0.0.12:5050', 'allocate_mem': True, 'kubernetesHostPath': None, 'provisioner': None, 'nodeTypes': [], 'minNodes': None, 'maxNodes': [10], 'targetTime': 1800, 'betaInertia': 0.1, 'scaleInterval': 60, 'preemptableCompensation': 0.0, 'nodeStorage': 50, 'nodeStorageOverrides': [], 'metrics': False, 'maxPreemptableServiceJobs': 9223372036854775807, 'maxServiceJobs': 9223372036854775807, 'deadlockWait': 3600, 'deadlockCheckInterval': 30, 'defaultMemory': 2147483648, 'defaultCores': 1, 'defaultDisk': 2147483648, 'readGlobalFileMutableByDefault': False, 'defaultPreemptable': False, 'maxCores': 9223372036854775807, 'maxMemory': 9223372036854775807, 'maxDisk': 9223372036854775807, 'retryCount': 5, 'enableUnlimitedPreemptableRetries': False, 'doubleMem': False, 'maxJobDuration': 9223372036854775807, 'rescueJobsFrequency': 3600, 'disableCaching': True, 'disableChaining': True, 'disableJobStoreChecksumVerification': False, 'maxLogFileSize': 64000, 'writeLogs': None, 'writeLogsGzip': None, 'writeLogsFromAllJobs': False, 'sseKey': None, 'servicePollingInterval': 60, 'useAsync': True, 'forceDockerAppliance': False, 'runCwlInternalJobsOnWorkers': False, 'statusWait': 3600, 'disableProgress': False, 'debugWorker': False, 'disableWorkerOutputCapture': False, 'badWorker': 0.0, 'badWorkerFailInterval': 0.01, 'cwl': False}
	[2021-03-08T18:54:14+0000] [MainThread] [D] [toil.deferred] Running for file /tmp/node-5d8e7caf-fad5-4d3b-9303-ae20634083f8-9a39eee8-e096-45e7-8d8e-a2a3f2cb7837/deferred/funceiihipwj
	[2021-03-08T18:54:14+0000] [MainThread] [D] [toil.worker] Parsed job description
	[2021-03-08T18:54:14+0000] [MainThread] [I] [toil.worker] Working on job 'RunBlast' kind-RunBlast/instance-v7g6i_gt
	[2021-03-08T18:54:14+0000] [MainThread] [D] [toil.worker] Got a command to run: _toil files/for-job/kind-RunBlast/instance-v7g6i_gt/cleanup/file-ec7bc1a3a798485ab637ad8f261e43e4/stream /home/thiagogenez_ebi_ac_uk/.local/bin/cactus-bin-v1.3.0/venv/lib/python3.6/site-packages cactus.blast.blast True
	[2021-03-08T18:54:14+0000] [MainThread] [D] [toil.job] Loading user module ModuleDescriptor(dirPath='/home/thiagogenez_ebi_ac_uk/.local/bin/cactus-bin-v1.3.0/venv/lib/python3.6/site-packages', name='cactus.blast.blast', fromVirtualEnv=True).
	[2021-03-08T18:54:14+0000] [MainThread] [I] [toil.worker] Loaded body Job('RunBlast' kind-RunBlast/instance-v7g6i_gt) from description 'RunBlast' kind-RunBlast/instance-v7g6i_gt
	[2021-03-08T18:54:14+0000] [MainThread] [D] [toil.deferred] Running orphaned deferred functions
	[2021-03-08T18:54:14+0000] [MainThread] [D] [toil.deferred] Running job
	[2021-03-08T18:54:14+0000] [MainThread] [I] [cactus.shared.common] Docker work dir: /tmp/node-5d8e7caf-fad5-4d3b-9303-ae20634083f8-9a39eee8-e096-45e7-8d8e-a2a3f2cb7837/tmph97m_giv/ee1594c4-2d84-45b0-b8ef-4ed978d7201c
	[2021-03-08T18:54:14+0000] [MainThread] [I] [cactus.shared.common] Running the command ['singularity', '--silent', 'run', '--nv', '/home/thiagogenez_ebi_ac_uk/viruses/run/cactus.img', 'run_segalign', 'tmp2e_55g2q.tmp', 'tmprx912f4d.tmp', '--format=cigar', '--notrivial', '--step=2', '--ambiguous=iupac,100,100', '--ydrop=3000']
	[2021-03-08T18:54:14+0000] [MainThread] [D] [toil.statsAndLogging] Suppressing the following loggers: {'websocket', 'bcdocs', 'urllib3', 'sonLib', 'google', 'requests_oauthlib', 'humanfriendly', 'galaxy', 'dill', 'prov', 'oauthlib', 'kubernetes', 'cactus', 'botocore', 'salad', 'boto', 'cachecontrol', 'rdflib', 'boto3', 'docker', 'requests'}
	[2021-03-08T18:54:14+0000] [MainThread] [I] [toil-rt] 2021-03-08 18:54:14.635398: Running the command: "singularity --silent run --nv /home/thiagogenez_ebi_ac_uk/viruses/run/cactus.img run_segalign tmp2e_55g2q.tmp tmprx912f4d.tmp --format=cigar --notrivial --step=2 --ambiguous=iupac,100,100 --ydrop=3000"
	/bin/bash: warning: setlocale: LC_ALL: cannot change locale (en_GB.UTF-8)

	Converting fasta files to 2bit format

	Executing: "segalign /tmp/node-5d8e7caf-fad5-4d3b-9303-ae20634083f8-9a39eee8-e096-45e7-8d8e-a2a3f2cb7837/tmph97m_giv/ee1594c4-2d84-45b0-b8ef-4ed978d7201c/tmp2e_55g2q.tmp /tmp/node-5d8e7caf-fad5-4d3b-9303-ae20634083f8-9a39eee8-e096-45e7-8d8e-a2a3f2cb7837/tmph97m_giv/ee1594c4-2d84-45b0-b8ef-4ed978d7201c/tmprx912f4d.tmp /tmp/node-5d8e7caf-fad5-4d3b-9303-ae20634083f8-9a39eee8-e096-45e7-8d8e-a2a3f2cb7837/tmph97m_giv/ee1594c4-2d84-45b0-b8ef-4ed978d7201c/output_11930/data_5519/  --format=cigar --notrivial --step=2 --ambiguous=iupac,100,100 --ydrop=3000"
	Using 8 threads
	Using 1 GPU(s)

	Reading query file ...

	Reading target file ...

	Start alignment ...

	Sending reference block 0 ...

	Sending query block 0 with buffer 0 ...
	Query block 0, interval 1/1 (0:28184) with buffer 0

	real	0m1.184s
	user	0m0.106s
	sys	0m1.012s
	grep: *.err: No such file or directory
	rm: cannot remove '*.segments': No such file or directory
	[2021-03-08T18:54:16+0000] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
	[2021-03-08T18:54:16+0000] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-BlastSequencesAgainstEachOther/instance-7jk5yz3n/cleanup/file-e798bb1571b344409a9fdef384794920/0' to path '/tmp/node-5d8e7caf-fad5-4d3b-9303-ae20634083f8-9a39eee8-e096-45e7-8d8e-a2a3f2cb7837/tmph97m_giv/ee1594c4-2d84-45b0-b8ef-4ed978d7201c/tmp2e_55g2q.tmp'
	[2021-03-08T18:54:16+0000] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-BlastSequencesAgainstEachOther/instance-7jk5yz3n/cleanup/file-7a446f73b95d4f5f938fc2ba16ebb915/0' to path '/tmp/node-5d8e7caf-fad5-4d3b-9303-ae20634083f8-9a39eee8-e096-45e7-8d8e-a2a3f2cb7837/tmph97m_giv/ee1594c4-2d84-45b0-b8ef-4ed978d7201c/tmprx912f4d.tmp'
	[2021-03-08T18:54:16+0000] [MainThread] [D] [toil.fileStores.abstractFileStore] LOG-TO-MASTER: Job files/for-job/kind-RunBlast/instance-v7g6i_gt/cleanup/file-ec7bc1a3a798485ab637ad8f261e43e4/stream used 0.00% (48.0 KB [49152B] used, 1.0 GB [1677721600B] requested) at the end of its run.
	[2021-03-08T18:54:16+0000] [MainThread] [D] [toil.deferred] Running own deferred functions
	[2021-03-08T18:54:16+0000] [MainThread] [D] [toil.deferred] Out of deferred functions!
	[2021-03-08T18:54:16+0000] [MainThread] [D] [toil.deferred] Running orphaned deferred functions
	Traceback (most recent call last):
	  File "/home/thiagogenez_ebi_ac_uk/.local/bin/cactus-bin-v1.3.0/venv/lib/python3.6/site-packages/toil/worker.py", line 394, in workerScript
	    job._runner(jobGraph=None, jobStore=jobStore, fileStore=fileStore, defer=defer)
	  File "/home/thiagogenez_ebi_ac_uk/.local/bin/cactus-bin-v1.3.0/venv/lib/python3.6/site-packages/cactus/shared/common.py", line 1424, in _runner
	    fileStore=fileStore, **kwargs)
	  File "/home/thiagogenez_ebi_ac_uk/.local/bin/cactus-bin-v1.3.0/venv/lib/python3.6/site-packages/toil/job.py", line 2359, in _runner
	    returnValues = self._run(jobGraph=None, fileStore=fileStore)
	  File "/home/thiagogenez_ebi_ac_uk/.local/bin/cactus-bin-v1.3.0/venv/lib/python3.6/site-packages/toil/job.py", line 2280, in _run
	    return self.run(fileStore)
	  File "/home/thiagogenez_ebi_ac_uk/.local/bin/cactus-bin-v1.3.0/venv/lib/python3.6/site-packages/cactus/blast/blast.py", line 466, in run
	    gpuLastz = self.blastOptions.gpuLastz)
	  File "/home/thiagogenez_ebi_ac_uk/.local/bin/cactus-bin-v1.3.0/venv/lib/python3.6/site-packages/cactus/shared/common.py", line 833, in runLastz
	    parameters=[lastzCommand, seq1, seq2, "--format=cigar", "--notrivial"] + lastzArguments.split())
	  File "/home/thiagogenez_ebi_ac_uk/.local/bin/cactus-bin-v1.3.0/venv/lib/python3.6/site-packages/cactus/shared/common.py", line 1357, in cactus_call
	    raise RuntimeError("Command {} exited {}: {}".format(call, process.returncode, out))
	RuntimeError: Command ['singularity', '--silent', 'run', '--nv', '/home/thiagogenez_ebi_ac_uk/viruses/run/cactus.img', 'run_segalign', 'tmp2e_55g2q.tmp', 'tmprx912f4d.tmp', '--format=cigar', '--notrivial', '--step=2', '--ambiguous=iupac,100,100', '--ydrop=3000'] exited 1: stdout=None
	[2021-03-08T18:54:16+0000] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host tf-cactus-slurm-compute-3-0
<=========

segaling_repeat_masker still crashes

This is the same command line and overall dataset as #53. While the patch for #53 worked and let most genomes go through (thanks!), there's still at least one problem. I will share the inputs offline, but the error message is

Chromosome block 2 interval 329/333 (2982000000:2985000000) with ref (570705458:1143705458) rc (4166672753:4169672753)
Chromosome block 2 interval 331/333 (2988000000:2991000000) with ref (570705458:1143705458) rc (4160672753:4163672753)
Chromosome block 2 interval 330/333 (2985000000:2988000000) with ref (570705458:1143705458) rc (4163672753:4166672753)
Chromosome block 2 interval 333/333 (2994000000:2996999981) with ref (570705458:1143705458) rc (4154672772:4157672753)
Chromosome block 2 interval 332/333 (2991000000:2994000000) with ref (570705458:1143705458) rc (4157672753:4160672753)
terminate called after throwing an instance of 'thrust::system::system_error'
terminate called recursively
terminate called recursively
terminate called recursively
what(): CUDA free failed: cudaErrorIllegalAddress: an illegal memory access was encountered
terminate called recursively
Command terminated by signal 6

run_segalign_repeat_masker can fail without error code

Here is a piece of the cactus log. The segalign_repeat_masker fails, but run_segalign_repeat_masker exits with 0 and Cactus never knows. Luckily, cactus_covered_intervals crashed right after in this case, but if it didn't I might never have known.

Executing: "segalign_repeat_masker /mnt/hdd/node-1a325814-9183-453f-bd00-b76a97486c1a-d4f46d3f-b448-4fcd-b192-6ef2c7deeb7c/tmp2rftgtwd/b438bd93-6b2d-4f7c-90e3-d885f7aefa2c/tmpc8hyqz3a.tmp --lastz_interval=3000000 --markend --neighbor_proportion 0.2 --step=3 --ambiguous=iupac,100,100 --nogapped"
Using 64 threads
Using 8 GPU(s)

Reading target file ...

Start alignment ...

Sending block 0 ...
Chromosome block 0 interval 1/333 (0:3000000) with ref (0:432000000) rc (2155712149:2158712149)
...
Chromosome block 0 interval 108/333 (321000000:324000000) with ref (105000000:537000000) rc (1834712149:1837712149)
Chromosome block 0 interval 109/333 (324000000:327000000) with ref (108000000:540000000) rc (1831712149:1834712149)
/usr/local/bin/run_segalign_repeat_masker: line 120: 461 Killed stdbuf -oL segalign_repeat_masker $refPath $optionalArguments

real 40m4.440s
user 97m8.070s
sys 39m48.293s

real 57m25.945s
user 201m34.776s
sys 47m54.629s
INFO:toil-rt:2020-08-25 23:36:24.984006: Successfully ran the command: "run_segalign_repeat_masker /mnt/hdd/node-1a325814-9183-453f-bd00-b76a97486c1a-d4f46d3f-b448-4fcd-b192-6ef2c7deeb7c/tmp2rftgtwd/b438bd93-6b2d-4f7c-90e3-d885f7aefa2c/tmpc8hyqz3a.tmp --lastz_interval=3000000 --markend --neighbor_proportion 0.2 --step=3 --ambiguous=iupac,100,100 --nogapped" in 7460.357779741287 seconds

FWIW The input file is "https://cgl-assemblies.s3.amazonaws.com/dipOrd1.fa", and I think it's crashing because I didn't give it enough disk.

SegAlign crashes while running cactus on Terra

Glenn Hickey wanted me to report this issue I'm seeing while running ComparativeGenomicsToolkit/cactus on Terra. It looks like there is an issue in SegAlign causing the crash. I've attached the full log of errors I'm seeing when running.

Cactus_allerrors_20221014.txt

stdbuf: failed to run command ‘segalign’: No such file or directory

Hi. I am trying to test the tool but when I run the last command:

run_segalign ce11.fa cb4.fa --output=ce11.cb4.maf

I get the following error:
stdbuf: failed to run command ‘segalign’: No such file or directory

I don't really understand what is the problem.

Human/Chimp stats

Here are some rough benchmarks for a human-chimp alignment with gorilla used as an outgroup. They were obtained by using cactus-blast --root hc on a tree like ((hg38:0.01,panTro6:0.01)hc:0.11,gorGor5:0.01)Anc0; with sequences coming from the 200M hal (already masked an preprocessed)

times

# Method / Wall Time / Chunk size / # blast commands / total blast time
CPU / 11.7h / 25M / 63126 / 10.8h
GPU / 2.3h / 3G / 12 / 1.7h
GPU / 2.5h / 6G / 5 / 1.9h
GPU / 2.6 / 1G / 56 / 2h

spot market cost

CPU: 11.7h @ $1.13/hr (r5n.16xlarge) = $13.2
GPU-3G: 2.3h @ $7.30/hr (p3.16xlarge) = $16.8

human-chimp ancestral sequence length

CPU: 2804406856
GPU: 2803489026

coverage of human against chimp (10M bases sampled):

CPU: 8979808
GPU: 8974227

In summary: On these 64 core nodes, GPU is about 5X faster than CPU. It comes at a slight cost increase. There is slightly less aligned on the GPU than CPU, but this could be within the random variance we expect between cactus runs.

The fact that the 3G chunksize was faster than both 1G and 6G is somewhat perplexing. It could very well be related to this: 18f53dc

One caveat: I turned off Pecan realignment for all this. It was only in running these big files that I realized there's work to be done in adapting this code for the bigger chunks used by gpu.

WGA_GPU Commits:
3G chunk size:
4a970d3

1G and 6G chunk size:
72930be

Cactus Commit:
ComparativeGenomicsToolkit/cactus@78f7b28

run_segalign crashes on human-chimp (and exits 0!)

This is with d1a73a0 on a ubuntu 18.04 p3.16xlarge AWS instance

It appears to work with d5fd293. So it's definitely a regression related to changes this June for the overflow bugs in the repeatmasker.

Just looking at these commits, it would seem that the changes to the repeat masker here: d1a73a0 would also need to be applied to run_segalign??

It is very quickly reproduced:

wget -q http://public.gi.ucsc.edu/~hickey/debug/segalign_debug/hg38_without_alts_preprocessed.fa.pp.gz
wget -q http://public.gi.ucsc.edu/~hickey/debug/segalign_debug/panTro6_preprocessed.fa.pp.gz
gzip -d hg38_without_alts_preprocessed.fa.pp.gz 
gzip -d panTro6_preprocessed.fa.pp.gz

The segalign command

run_segalign panTro6_preprocessed.fa.pp hg38_without_alts_preprocessed.fa.pp --format=paf:minimap2 --step=2 --ambiguous=iupac,100,100 --ydrop=3000 --notransition

The crash

terminate called after throwing an instance of 'thrust::system::system_error'
  what():  trivial_device_copy D->H failed: cudaErrorInvalidValue: invalid argument
/usr/local/bin/run_segalign: line 197: 35103 Aborted                 (core dumped) stdbuf -oL segalign $refPath $queryPath $DATA_FOLDER $optionalArguments

But run_segalign still returns 0!

echo $?
0

The full log

Converting fasta files to 2bit format

Executing: "segalign /home/ubuntu/work/panTro6_preprocessed.fa.pp /home/ubuntu/work/hg38_without_alts_preprocessed.fa.pp /home/ubuntu/work/output_13442/data_23960/  --format=paf:minimap2 --step=2 --ambiguous=iupac,100,100 --ydrop=3000
 --notransition"
Using 64 threads
Using 8 GPU(s)

Reading query file ...


Reading target file ...

Start alignment ...

Sending reference block 0 ...

Sending query block 0 with buffer 0 ...

Sending query block 1 with buffer 1 ...
Query block 0, interval 1/52 (0:10000000) with buffer 0
Query block 0, interval 3/52 (20000000:30000000) with buffer 0
Query block 0, interval 7/52 (60000000:70000000) with buffer 0
Query block 0, interval 10/52 (90000000:100000000) with buffer 0
Query block 0, interval 14/52 (130000000:140000000) with buffer 0
Query block 0, interval 17/52 (160000000:170000000) with buffer 0
Query block 0, interval 22/52 (210000000:220000000) with buffer 0
Query block 0, interval 26/52 (250000000:260000000) with buffer 0
Query block 0, interval 30/52 (290000000:300000000) with buffer 0
Query block 0, interval 35/52 (340000000:350000000) with buffer 0
Query block 0, interval 39/52 (380000000:390000000) with buffer 0
Query block 0, interval 42/52 (410000000:420000000) with buffer 0
Query block 0, interval 46/52 (450000000:460000000) with buffer 0
Query block 0, interval 48/52 (470000000:480000000) with buffer 0
Query block 0, interval 50/52 (490000000:500000000) with buffer 0
Query block 0, interval 2/52 (10000000:20000000) with buffer 0
Query block 0, interval 18/52 (170000000:180000000) with buffer 0
Query block 1, interval 4/60 (30000000:40000000) with buffer 1
Query block 0, interval 19/52 (180000000:190000000) with buffer 0
Query block 1, interval 10/60 (90000000:100000000) with buffer 1
Query block 0, interval 21/52 (200000000:210000000) with buffer 0
Query block 0, interval 23/52 (220000000:230000000) with buffer 0
Query block 0, interval 4/52 (30000000:40000000) with buffer 0
Query block 0, interval 24/52 (230000000:240000000) with buffer 0
Query block 0, interval 25/52 (240000000:250000000) with buffer 0
Query block 1, interval 12/60 (110000000:120000000) with buffer 1
Query block 0, interval 28/52 (270000000:280000000) with buffer 0
Query block 0, interval 8/52 (70000000:80000000) with buffer 0
Query block 0, interval 29/52 (280000000:290000000) with buffer 0
Query block 0, interval 31/52 (300000000:310000000) with buffer 0
Query block 0, interval 32/52 (310000000:320000000) with buffer 0
Query block 0, interval 9/52 (80000000:90000000) with buffer 0
Query block 0, interval 33/52 (320000000:330000000) with buffer 0
Query block 0, interval 34/52 (330000000:340000000) with buffer 0
Query block 0, interval 36/52 (350000000:360000000) with buffer 0
Query block 0, interval 5/52 (40000000:50000000) with buffer 0
Query block 0, interval 37/52 (360000000:370000000) with buffer 0
Query block 0, interval 38/52 (370000000:380000000) with buffer 0
Query block 0, interval 40/52 (390000000:400000000) with buffer 0
Query block 0, interval 11/52 (100000000:110000000) with buffer 0
Query block 0, interval 41/52 (400000000:410000000) with buffer 0
Query block 0, interval 43/52 (420000000:430000000) with buffer 0
Query block 0, interval 12/52 (110000000:120000000) with buffer 0
Query block 0, interval 44/52 (430000000:440000000) with buffer 0
Query block 0, interval 45/52 (440000000:450000000) with buffer 0
Query block 0, interval 13/52 (120000000:130000000) with buffer 0
Query block 0, interval 47/52 (460000000:470000000) with buffer 0
Query block 0, interval 15/52 (140000000:150000000) with buffer 0
Query block 0, interval 49/52 (480000000:490000000) with buffer 0
Query block 0, interval 16/52 (150000000:160000000) with buffer 0
Query block 0, interval 51/52 (500000000:510000000) with buffer 0
Query block 0, interval 52/52 (510000000:510113926) with buffer 0
Query block 1, interval 1/60 (0:10000000) with buffer 1
Query block 1, interval 2/60 (10000000:20000000) with buffer 1
Query block 1, interval 3/60 (20000000:30000000) with buffer 1
Query block 1, interval 5/60 (40000000:50000000) with buffer 1
Query block 1, interval 6/60 (50000000:60000000) with buffer 1
Query block 1, interval 7/60 (60000000:70000000) with buffer 1
Query block 0, interval 6/52 (50000000:60000000) with buffer 0
Query block 1, interval 8/60 (70000000:80000000) with buffer 1
Query block 1, interval 9/60 (80000000:90000000) with buffer 1
Query block 0, interval 20/52 (190000000:200000000) with buffer 0
Query block 1, interval 11/60 (100000000:110000000) with buffer 1
Query block 0, interval 27/52 (260000000:270000000) with buffer 0
terminate called after throwing an instance of 'thrust::system::system_error'
  what():  trivial_device_copy D->H failed: cudaErrorInvalidValue: invalid argument
/usr/local/bin/run_segalign: line 197: 35103 Aborted                 (core dumped) stdbuf -oL segalign $refPath $queryPath $DATA_FOLDER $optionalArguments

real    2m19.050s
user    1m27.737s
sys     1m50.123s

real    2m19.072s
user    2m25.897s
sys     1m53.887s
No alignment generated

Install script no longer works

It now stops with make: *** No rule to make target 'install'. Stop.

Better lastz compliance would allow drop-in replacement in cactus

I'm working to make cactus run on arbitrary cigar output ComparativeGenomicsToolkit/cactus#178. It would also be nice to be able to just drop in WGA_GPU as a replacement to cPecanLastz in cactus. To do this, it would need to support lastz's interface. Here is an example.

cat small-cactus.tmp 
>id=0|simMouse.chr6|0

TTTTTCAGTTGCAATACCCAACCGGGAGAAACTTTCAGTGAGCACACCTCAGGTTCCTATATCAAGCAGGCAGTCTTGCATAGCAAATGGTCTCTGGTAG
ACGGTGCACTCAATCTATGTGAGGTATAGAAAATAAAGGACTACACACATCTCATCAAGTATCCCGTCATATTTGTGGCAAAACACACGTACAAATGCAC
ACTTGATGGTACTTGCCTGGAATATGACTCTAGGTTGATCCCTGGCACACAGGCACATTAATTCCCGAATGATGGTCTGCCTGTCCAGTTCTAGATAATG

>id=1|simRat.chr6|0

TTTTCGGCTGCAATACCCAACCTGAAGACATTTTCAGTGGGCCCACCTCAGGTTCTTATATCAAGCAGACAGTCTTGCGCAACAGATGGTCTCTGATAGA
CAGTGCACTCAATCTATGTGAAGGATAGAAAACAAAGGACTACTCATCTCATCAAGTATCCTGTCGTATTTGTGCTTTAGACTCAGCAAAACAAGTGTGC
AAGTGCACACCTGATTGTACTTGTCTGGAATATGACTCTAGGTTGAGCCCTGGCACACACTCAGGTCAACTCCAATGATGGTCTGCTTGTCCAATTCTAG

~/dev/cactus/bin/cPecanLastz ./small-cactus.tmp[multiple][nameparse=darkspace] ./small-cactus.tmp[nameparse=darkspace] --format=cigar --notrivial --step=1 --ambiguous=iupac,100,100 --ydrop=3000
cigar: id=0|simMouse.chr6|0 1 294 + id=1|simRat.chr6|0 0 300 + 19928 M 141 I 2 M 33 D 11 M 88 I 2 M 27
cigar: id=1|simRat.chr6|0 0 300 + id=0|simMouse.chr6|0 1 294 + 19928 M 141 D 2 M 33 I 11 M 88 D 2 M 27

From what I can see, this requires

support for [multiple][nameparse=darkspace]
--notrivial
special characters like | in sequence names

Command-line option for cpu count

Right now run_wga_gpu uses $(nproc) threads. This makes it difficult to use from workflows such as toil or cactus. Can you please add an option to specifiy the threads on the command line? A similar option to specify the number of GPUs would also be helpful. Thanks.

run_wga_gpu crashes on all-against-all human chimp alignment

On a p3.8xlarge:

run_wga_gpu tmp9_wlnwgk.tmp tmp9_wlnwgk.tmp --format=cigar --notrivial --step=3 --ambiguous=iupac,100,100 --ydrop=3000

gives

Splitting reference chromosome   
Converting chromosome wise fasta to 2bit format
Splitting query chromosome
Converting chromosome wise fasta to 2bit format

Executing: "wga tmp9_wlnwgk.tmp tmp9_wlnwgk.tmp /home/ubuntu/output_19346/data_26790/  --format=cigar --notrivial --step=3 --ambiguous=iupac,100,100 --ydrop=3000"
Using 32 threads
Using 4 GPU(s)

Reading query file ...
terminate called after throwing an instance of 'thrust::system::system_error'
  what():  CUDA free failed: cudaErrorCudartUnloading: driver shutting down
/usr/local/bin/run_wga_gpu: line 115: 12419 Aborted                 (core dumped) stdbuf -oL wga $refPath $queryPath $DATA_FOLDER $optionalArguments

real    1m4.846s
user    0m0.000s
sys     0m0.000s

real    1m4.857s
user    0m55.655s
sys     0m9.106s
cat: '*.err': No such file or directory
rm: cannot remove '*.segments': No such file or directory
rm: cannot remove '*.err': No such file or directory

The file is at s3://glennhickey/share/tmp9_wlnwgk.tmp

error during cmake

Hi there,
On running cmake I get the following error, any idea what dependency I'm missing? I have tbb installed, of course.
Thanks for any ideas, best wishes,
Mick

(version 0.1.2)

CMake Error at CMakeLists.txt:15 (include):
  include could not find requested file:

    /cmake/TBBBuild.cmake


CMake Error at CMakeLists.txt:16 (tbb_build):
  Unknown CMake command "tbb_build".

run_segalign_repeat_masker doesn't appear to work

It seems to be duplicating the input file argument?

run_segalign_repeat_masker simCow.chr6 --lastz_interval=3000000 --step=3 --ambiguous=iupac,100,100 --nogapped  --markend

Executing: "segalign_repeat_masker /home/hickey/dev/WGA_GPU/simCow.chr6  simCow.chr6 --lastz_interval=3000000 --step=3 --ambiguous=iupac,100,100 --nogapped --markend"

real	0m0.000s
user	0m0.000s
sys	0m0.000s
You must specify a sequence file 
Usage: run_segalign_repeat_masker seq_file [options]
  --seq_file arg        sequence file in FASTA format

segalign_repeat_masker runs out of memory on kangaroo rat

I'm still having trouble with kangaroo rat. It was running out of disk before, but fixing that in cactus and giving it 3TB on Terra wasn't enough for it to work.

Running on a p3.16large gives the following, as it presumably runs out of memory (it's at 1 TB of output at that point, but I had another TB free):

Command terminated by signal 9
        Command being timed: "segalign_repeat_masker dipOrd1.fa --lastz_interval=3000000 --markend --neighbor_proportion 0.2 --step=3 --ambiguous=iupac,100,100 --nogapped"
        User time (seconds): 24733.33
        System time (seconds): 7181.33
        Percent of CPU this job got: 749%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 1:10:57
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 499902360
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 6278
        Minor (reclaiming a frame) page faults: 743073453
        Voluntary context switches: 4601999
        Involuntary context switches: 401723
        Swaps: 0
        File system inputs: 872856
        File system outputs: 2140512800
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

This is the same genome as referred to in #37.

AMD GPU support

I'm reaching out to see if there is interest in adding support for AMD GPUs. I'm working with Pawsey Supercomputing Centre in Australia to port some applications important for their user base and Segalign is on our list :)

I'm opening this to see if you'd have interest in merging this support into the main branch. I'd also like to discuss some approaches to provide this support without breaking support for users hat don't have ROCm on their own systems.

segalign crashes while aligning final against final reference block

I received this error message during the alignment against the final reference block suggesting an illegal memory access. Would there be any quick way to align the final reference block without starting over (i.e. just pulling the fasta headers in ref_block_n.name and potentially all of the query blocks and treating them as a "genome" input)?

Error: cudaMemcpy of 4 bytes for num_anchors failed with error " invalid argument "
terminate called after throwing an instance of 'thrust::system::system_error'
what(): CUDA free failed: cudaErrorCudartUnloading: driver shutting down
/usr/local/bin/run_segalign: line 197: 19477 Aborted stdbuf -oL segalign $refPath $queryPath $DATA_FOLDER $optionalArguments

couldn't find boost

Hello there,
I have been trying to install GPU version of cactus on linux centos7 cluster. I followed the installation guidelines on "installUbuntu.sh" under scripts. however I had an error while running cmake.
I am not sure how to install boost so I installed using Conda

module load 7/compiler/cuda/10.0
conda activate segAlign
conda install -c statiskit libboost-dev
conda install -c intel tbb

I also attached the file: SegAlign/build/CMakeFiles/CMakeError.log.

Any insights? thanks a lot, really appreciate your help!
Zhenzhen

CMakeError.log

-- The CXX compiler identification is GNU 4.8.5
-- The CUDA compiler identification is NVIDIA 10.0.130
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ - works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Check for working CUDA compiler: /public/software/compiler/cuda/7/cuda-10.0/bin/nvcc
-- Check for working CUDA compiler: /public/software/compiler/cuda/7/cuda-10.0/bin/nvcc - works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found CUDA: /public/software/compiler/cuda/7/cuda-10.0 (found suitable version "10.0", minimum required is "9.0")
-- Intel TBB can not be built: Makefile or src directory was not found in /public/home/yangzhzh/tools_zz/SegAlign/build/../tbb2019_20191006oss
CMake Warning (dev) at CMakeLists.txt:17 (find_package):
Policy CMP0074 is not set: find_package uses _ROOT variables.
Run "cmake --help-policy CMP0074" for policy details. Use the cmake_policy
command to set the policy and suppress this warning.

CMake variable TBB_ROOT is set to:

/public/home/yangzhzh/tools_zz/SegAlign/build/../tbb2019_20191006oss

For compatibility, CMake is ignoring the variable.
This warning is for project developers. Use -Wno-dev to suppress it.

-- Found ZLIB: /usr/lib64/libz.so (found version "1.2.7")
CMake Error at /public/software/apps/cmake/3.17.2/share/cmake-3.17/Modules/FindPackageHandleStandardArgs.cmake:164 (message):
Could NOT find Boost (missing: Boost_INCLUDE_DIR program_options)
Call Stack (most recent call first):
/public/software/apps/cmake/3.17.2/share/cmake-3.17/Modules/FindPackageHandleStandardArgs.cmake:445 (_FPHSA_FAILURE_MESSAGE)
/public/software/apps/cmake/3.17.2/share/cmake-3.17/Modules/FindBoost.cmake:2162 (find_package_handle_standard_args)
CMakeLists.txt:23 (find_package)

-- Configuring incomplete, errors occurred!
See also "/public/home/yangzhzh/tools_zz/SegAlign/build/CMakeFiles/CMakeOutput.log".
See also "/public/home/yangzhzh/tools_zz/SegAlign/build/CMakeFiles/CMakeError.log".

run_segalign_repeat_masker file

Hi there

I was wondering if the run_segalign_repeat_masker file is missing? I do not see it in the /scripts folder and adapting from the run_seqalign script doesnt yet work for me (runs but no alignment is produced. Is the code to run thetest correct for this?

thanks alot

Special characters no longer supported

If you could get this working again, it'd be great. It's a showstopper for cactus. Thanks.

cowdog.fa.gz

run_wga_gpu cowdog.fa cowdog.fa  --format=cigar --notrivial --step=1 --ambiguous=iupac,100,100 --ydrop=3000 --wga_chunk 25000  > gpu-cactus.cigar 
Splitting reference chromosome
Converting chromosome wise fasta to 2bit format
/bin/bash: simCow.chr6: command not found
/bin/bash: 0.fa: command not found
/bin/bash: simCow.chr6: command not found
/bin/bash: 0.2bit: command not found
-id is not a valid option
/bin/bash: simDog.chr6: command not found
/bin/bash: 0.fa: command not found
/bin/bash: simDog.chr6: command not found
/bin/bash: 0.2bit: command not found
-id is not a valid option

run_wga_gpu very slow on insect assemblies

I suspect this may be due to fragmentation? There are tens of thousands of contigs in each assembly. This is typical of cactus input, as reference-quality genome assemblies are only available for a small fraction of species.

Anyway, I let this run 10 hours on a p3.8xlarge ($4.50 / hour) before giving up. For comparison it took well under an hour on 90 cpu cores with chunked lastz

run_wga_gpu tmp5g8t96mo.tmp tmp5g8t96mo.tmp --format=cigar --notrivial --step=4 --ambiguous=iupac,100,100 --ydrop=3000

The file can be found here: s3://glennhickey/share/tmp5g8t96mo.tmp

segalign_repeat_masker crashes

I've been having trouble running segalign_repeat_masker on some genomes. They are poor-quality, hardmasked assemblies, but I still don't think that should cause a crash. I've put details to reproduce below, but since the file's not public, I will share it with you offline.

command line

 segalign_repeat_masker PDF_0085.fa --lastz_interval=3000000 --markend --neighbor_proportion 0.2 --M 10 --step=3 --ambiguous=iupac,100,100

instance type: p3.8xlarge
segalign commit: d5fd293

output

[...]
Chromosome block 1 interval 84/333 (1248000000:1251000000) with ref (249000000:777000000) rc (1385775517:1388775517)
Chromosome block 1 interval 95/333 (1281000000:1284000000) with ref (282000000:810000000) rc (1352775517:1355775517)
Chromosome block 1 interval 94/333 (1278000000:1281000000) with ref (279000000:807000000) rc (1355775517:1358775517)
Chromosome block 1 interval 92/333 (1272000000:1275000000) with ref (273000000:801000000) rc (1361775517:1364775517)
Chromosome block 1 interval 93/333 (1275000000:1278000000) with ref (276000000:804000000) rc (1358775517:1361775517)
Chromosome block 1 interval 96/333 (1284000000:1287000000) with ref (285000000:813000000) rc (1349775517:1352775517)
Error: cudaMemcpy of 4 bytes for num_anchors failed with error " invalid argument "
terminate called after throwing an instance of 'thrust::system::system_error'
  what():  CUDA free failed: cudaErrorCudartUnloading: driver shutting down

Any ideas as to the cause would be most appreciated. Thanks!

Running on multi-fasta

First off, thank you so much for this tool. The speed-up compared to lastz is incredible and this is allowing me to accomplish in mere minutes and hours what was taking days or more with lastz alone.

I was hoping you give me some advice on running this for a whole genome alignment. With the original lastz, you can input your target as a multi-fasta, however it strongly recommended that you do not provide a multi-fasta as a query. Does this hold true with SegAlign if you use the run_segalign.sh script? It appears to me that this script breaks the sequences down into segments. So would it be ok to run a fasta with all chromosomes of a genome against another fasta with all chromosomes of another genome or would you advice breaking the query fasta down by each sequence in the fasta file?

Thank you for your time.

What hardware/settings to use for whole-genome?

I'm on an AWS g3.8xlarge but can't run a whole-genome alignment. Is there an option I can pass? Do I need a bigger node? Perhaps you could update the README with some examples for whole-genome alignments? Thanks

run_wga_gpu ../inputs/hg38.fa ../inputs/hg38.fa --format=cigar --notrivial --step=3 --ambiguous=iupac,100,100 --ydrop=3000 > human_self.cigar
Splitting reference chromosome
Converting chromosome wise fasta to 2bit format
Academic tradition requires you to cite works you base your article on.
When using programs that use GNU Parallel to process data for publication
please cite:

  O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
  ;login: The USENIX Magazine, February 2011:42-47.

This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.

To silence this citation notice: run 'parallel --citation'.

parallel: Warning: Only enough file handles to run 252 jobs in parallel.
parallel: Warning: Running 'parallel -j0 -N 252 --pipe parallel -j0' or
parallel: Warning: raising ulimit -n or /etc/security/limits.conf may help.
Splitting query chromosome
Converting chromosome wise fasta to 2bit format
Academic tradition requires you to cite works you base your article on.
When using programs that use GNU Parallel to process data for publication
please cite:

  O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
  ;login: The USENIX Magazine, February 2011:42-47.

This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.

To silence this citation notice: run 'parallel --citation'.

parallel: Warning: Only enough file handles to run 252 jobs in parallel.
parallel: Warning: Running 'parallel -j0 -N 252 --pipe parallel -j0' or
parallel: Warning: raising ulimit -n or /etc/security/limits.conf may help.
Executing: "wga /home/ubuntu/work/inputs/hg38.fa /home/ubuntu/work/inputs/hg38.fa /home/ubuntu/work/cactus/output_24728/data_16836/  --format=cigar --notrivial --step=3 --ambiguous=iupac,100,100 --ydrop=3000"
Using 32 threads
Using 2 GPU(s)

Reading query file ...

Reading target file ...

Start alignment ...

Sending reference chr1 ...
Error: cudaMalloc of 19084000000 bytes failed2!
terminate called after throwing an instance of 'thrust::system::system_error'
  what():  CUDA free failed: cudaErrorCudartUnloading: driver shutting down
/usr/local/bin/run_wga_gpu: line 65: 54866 Aborted                 (core dumped) stdbuf -oL wga $refPath $queryPath $FOLDER $optionalArguments

real    0m28.977s
user    0m24.489s
sys     0m7.593s
rm: cannot remove '*.segments': No such file or directory
cat: 'tmp*': No such file or directory

Bash error in run_wga_gpu

With the current master, I now get an error like

/home/hickey/dev/WGA_GPU/bin/run_wga_gpu: line 92: [: 1: unary operator expected

whenever I run run_wga_gpu. I don't know if it affects the output at all. Here's a way to reproduce

wget http://s3-us-west-2.amazonaws.com/jcarmstr-misc/testRegions/evolverMammals/simMouse.chr6
wget http://s3-us-west-2.amazonaws.com/jcarmstr-misc/testRegions/evolverMammals/simRat.chr6
run_wga_gpu simMouse.chr6 simRat.chr6 > out
Splitting reference chromosome
Converting chromosome wise fasta to 2bit format
Splitting query chromosome
Converting chromosome wise fasta to 2bit format
Executing: "wga /home/hickey/dev/work/cactus-gpu/simMouse.orig /home/hickey/dev/work/cactus-gpu/simRat.orig /home/hickey/dev/work/cactus-gpu/output_19580/data_8325/ "
Using 8 threads
Using 1 GPU(s)

Reading query file ...

Reading target file ...

Start alignment ...

Sending reference simMouse.chr6 ...

Sending query simRat.chr6 with buffer 0 ...

Starting query simRat.chr6 with buffer 0 ...
Chromosome simRat.chr6 interval 1/1 (0:647196) with buffer 0
/home/hickey/dev/WGA_GPU/bin/run_wga_gpu: line 92: [: 1: unary operator expected

real	0m12.514s
user	0m12.297s
sys	0m0.405s

cudaErrorIllegalAddress: an illegal memory access was encountered

Hi,
I have encountered the following error why trying to align 2 genomes of size ~ 2-3 Gb. Is it related to the allocation of memory or a bug? Thanks a lot.
Hien

Using 2 threads
Using 1 GPU(s)

Reading query file ...

Reading target file ...

Start alignment ...

Sending reference block 0 ...

Sending query block 0 with buffer 0 ...

Sending query block 1 with buffer 1 ...
Query block 0, interval 1/56 (0:10000000) with buffer 0
Query block 0, interval 2/56 (10000000:20000000) with buffer 0
terminate called after throwing an instance of 'thrust::system::system_error'
  what():  CUDA free failed: cudaErrorIllegalAddress: an illegal memory access was encountered
/usr/local/bin/run_segalign: line 197: 30983 Aborted                 stdbuf -oL segalign $refPath $queryPath $DATA_FOLDER $optionalArguments

Get Failure message on small test data

I get a failure message on running on some small test data. It produces cigar output, but no alignments between cow and dog (only self alignments). It also does not set the error code.

This on the current master (f23537b) on a fresh install on a p3.8xlarge. Here is the input

cowdog.fa.gz

run_wga_gpu cowdog.fa cowdog.fa  --format=cigar --notrivial --step=1 --ambiguous=iupac,100,100 --ydrop=3000 --wga_chunk 25000  > gpu-cactus.cigar 
Splitting reference chromosome
Converting chromosome wise fasta to 2bit format
Splitting query chromosome
Converting chromosome wise fasta to 2bit format
Executing: "wga /home/hickey/dev/cactus/gpu-work/cowdog_clean.fa /home/hickey/dev/cactus/gpu-work/cowdog_clean.fa /home/hickey/dev/cactus/gpu-work/output_355/data_10714/  --format=cigar --notrivial --step=1 --ambiguous=iupac,100,100 --ydrop=3000 --wga_chunk 25000"
Using 8 threads
Using 1 GPU(s)

Reading query file ...

Reading target file ...

Start alignment ...

Sending reference simCow.chr6 ...

Sending query simCow.chr6 with buffer 0 ...

Sending query simDog.chr6 with buffer 1 ...

Starting query simCow.chr6 with buffer 0 ...
Chromosome simCow.chr6 interval 1/1 (0:602600) with buffer 0

Starting query simDog.chr6 with buffer 1 ...
Chromosome simDog.chr6 interval 1/1 (0:593878) with buffer 1
FAILURE: query interval out of range (tmp1.ref0.query1.segments: line 13, 0<1)

Sending reference simDog.chr6 ...

Sending query simCow.chr6 with buffer 0 ...

Sending query simDog.chr6 with buffer 1 ...

Starting query simCow.chr6 with buffer 0 ...
Chromosome simCow.chr6 interval 1/1 (0:602600) with buffer 0

Starting query simDog.chr6 with buffer 1 ...
Chromosome simDog.chr6 interval 1/1 (0:593878) with buffer 1
FAILURE: target interval out of range (tmp1.ref1.query0.segments: line 13, 0<1)

real	0m6.747s
user	0m12.239s
sys	0m1.874s

no error code

echo $?
0

no non-self alignments (there are 937 when running cPecanLastz on this input)

grep -i cow gpu-cactus.cigar | grep -i dog | wc -l
0

SegAlign/progressivecactus errors on LSF

Hi,
I am having issues running cactus on LSF, and it seems like a SeqAlign problem.
I was pointed here from:
ComparativeGenomicsToolkit/cactus#489
Any ideas?
Thanks!

thrust::system::system_error | CUDA free failed: cudaErrorCudartUnloading

[2023-09-27T10:02:11-0700] [MainThread] [W] [toil.job] Due to failure we are reducing the remaining try count of job 'LastzRepeatMaskJob' kind-LastzRepeatMaskJob/instance-r63tene1 v11 with ID kind-LastzRepeatMaskJob/instance-r63tene1 to 0
...
Log from job "'LastzRepeatMaskJob' kind-LastzRepeatMaskJob/instance-r63tene1 v12" follows:
=========>
...
	  File "/home/cactus/cactus_env/lib/python3.8/site-packages/cactus/preprocessor/lastzRepeatMasking/cactus_lastzRepeatMask.py", line 130, in gpuRepeatMask
	    segalign_messages = cactus_call(parameters=cmd, work_dir=self.work_dir, returnStdErr=True, gpus=self.repeatMaskOptions.gpu,
	  File "/home/cactus/cactus_env/lib/python3.8/site-packages/cactus/shared/common.py", line 889, in cactus_call
	    raise RuntimeError("{}Command {} exited {}: {}".format(sigill_msg, call, process.returncode, out))
	RuntimeError: Command /usr/bin/time -f "CACTUS-LOGGED-MEMORY-IN-KB: %M" segalign_repeat_masker /tmp/58f5d3ffa02e55c3b06625f0f8626408/0d5a/937a/tmpfg2qo5qy/gSojMU042_0_0.tgt --lastz_interval=10000000 --markend --neighbor_proportion 0.2 --M 10 --step=3 --ambiguous=iupac,100,100 --num_gpu 1 exited 134: stderr=Using 64 threads
...
	Error: cudaMemcpy of 4 bytes for num_anchors failed with error " invalid argument " 
	terminate called after throwing an instance of 'thrust::system::system_error'
	  what():  CUDA free failed: cudaErrorCudartUnloading: driver shutting down
	Command terminated by signal 6
	CACTUS-LOGGED-MEMORY-IN-KB: 69902308

My OS is Ubuntu 20.04.6 LTS (GNU/Linux 5.4.0-153-generic x86_64).

Some specs for the GPU I'm using:

 +-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06   Driver Version: 525.125.06   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100 80G...  On   | 00000000:CA:00.0 Off |                    0 |
| N/A   36C    P0    48W / 300W |      0MiB / 81920MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

Error: cudaMalloc of 256 bytes for sub_mat failed with error " the provided PTX was compiled with an unsupported toolchain. "

run_segalign work/Maspalax.fasta work/Mpsiulurus.fasta --output=test.maf

Converting fasta files to 2bit format

Executing: "segalign /home/u220220932211/work/Maspalax.fasta /home/u220220932211/work/Mpsiulurus.fasta /home/u220220932211/output_10819/data_17631/ --output=test.maf"
Using 2 threads
Using 1 GPU(s)
Error: cudaMalloc of 256 bytes for sub_mat failed with error " the provided PTX was compiled with an unsupported toolchain. "
real 0m0.568s
user 0m0.077s
sys 0m0.201s

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

So what did this happen and what can I do to fix it?

run_wga_gpu crashes when aligning human and chimp to gorilla outgroup

This is on a p3.8xlarge again.

run_wga_gpu tmp33kbj2ei.tmp tmp43pqr89c.tmp --format=cigar --notrivial --step=3 --ambiguous=iupac,100,100 --ydrop=3000

Here is the output

Splitting reference chromosome
Converting chromosome wise fasta to 2bit format
Splitting query chromosome
Converting chromosome wise fasta to 2bit format

Executing: "wga /tmp/toil-c6c4b39a-d37f-462d-923f-48a9cd1c5eb8-c9bfe0044c6e414a88a85526c0401889/tmp9n64pex_/8aac02ed-ad53-4b86-8c33-f79ad6ea6394/tmp33kbj2ei.tmp /tmp/toil-c6c4b39a-d37f-462d-923f-48a9cd1c5eb8-c9bfe0044c6e414a88a85526c0401889/tmp9n64pex_/8aac02ed-ad53-4b86-8c33-f79ad6ea6394/tmp43pqr89c.tmp /home/ubuntu/output_10844/data_6619/  --format=cigar --notrivial --step=3 --ambiguous=iupac,100,100 --ydrop=3000"
Using 32 threads
Using 4 GPU(s)

Reading query file ...

Reading target file ...   
terminate called after throwing an instance of 'thrust::system::system_error'
  what():  CUDA free failed: cudaErrorCudartUnloading: driver shutting down
/usr/local/bin/run_wga_gpu: line 115: 42399 Aborted                 (core dumped) stdbuf -oL wga $refPath $queryPath $DATA_FOLDER $optionalArguments

real    0m46.963s
user    .m,).227s
sys     0m+*.440s

real    0m46.974s
user    0m38.309s
sys     0m8.576s
cat: '*.err': No such file or directory
rm: cannot remove '*.segments': No such file or directory
rm: cannot remove '*.err': No such file or directory

s3://glennhickey/share/tmp33kbj2ei.tmp
s3://glennhickey/share/tmp43pqr89c.tmp

Error in LASTZ process!

Command:

run_segalign /tmpoat8dml4.tmp tmp2a87d84z.tmp --format=cigar --notrivial --step=2 --ambiguous=iupac,100,100 --ydrop=3000

Input:

s3://glennhickey/share/tmp2a87d84z.tmp.gz
s3://glennhickey/share/tmpoat8dml4.tmp.gz

Output

Converting fasta files to 2bit format

Executing: "segalign /home/ubuntu/work/blast-fail/work/node-c9637ce3-5d13-43ff-a839-9ca5bfddbbe0-e26b862451c6400c819df9042d6a6135/tmp5bhda67a/9a6da48b-421e-44e6-ac6b-4
29281f9d00f/tmpoat8dml4.tmp /home/ubuntu/work/blast-fail/work/node-c9637ce3-5d13-43ff-a839-9ca5bfddbbe0-e26b862451c6400c819df9042d6a6135/tmp5bhda67a/9a6da48b-421e-44e6
-ac6b-429281f9d00f/tmp2a87d84z.tmp /home/ubuntu/work/blast-fail/output_22643/data_4076/  --format=cigar --notrivial --step=2 --ambiguous=iupac,100,100 --ydrop=3000"
Using 64 threads
Using 8 GPU(s)

Reading query file ...

Reading target file ...

Start alignment ...

Sending reference block 0 ...

Sending query block 0 with buffer 0 ...
Query block 0, interval 1/26 (0:10000000) with buffer 0
Query block 0, interval 2/26 (10000000:20000000) with buffer 0
Query block 0, interval 3/26 (20000000:30000000) with buffer 0
Query block 0, interval 4/26 (30000000:40000000) with buffer 0
[...]
Sending reference block 10 ...

Sending query block 0 with buffer 0 ...
Query block 0, interval 1/26 (0:10000000) with buffer 0
Query block 0, interval 2/26 (10000000:20000000) with buffer 0
Query block 0, interval 3/26 (20000000:30000000) with buffer 0
Query block 0, interval 4/26 (30000000:40000000) with buffer 0
Query block 0, interval 26/26 (250000000:259697043) with buffer 0
Query block 0, interval 5/26 (40000000:50000000) with buffer 0
Query block 0, interval 6/26 (50000000:60000000) with buffer 0
Query block 0, interval 7/26 (60000000:70000000) with buffer 0
Query block 0, interval 8/26 (70000000:80000000) with buffer 0
Query block 0, interval 9/26 (80000000:90000000) with buffer 0
Query block 0, interval 10/26 (90000000:100000000) with buffer 0
Query block 0, interval 11/26 (100000000:110000000) with buffer 0
Query block 0, interval 12/26 (110000000:120000000) with buffer 0
Query block 0, interval 13/26 (120000000:130000000) with buffer 0
Query block 0, interval 14/26 (130000000:140000000) with buffer 0
Query block 0, interval 15/26 (140000000:150000000) with buffer 0
Query block 0, interval 16/26 (150000000:160000000) with buffer 0
Query block 0, interval 17/26 (160000000:170000000) with buffer 0
Query block 0, interval 18/26 (170000000:180000000) with buffer 0
Query block 0, interval 20/26 (190000000:200000000) with buffer 0
Query block 0, interval 19/26 (180000000:190000000) with buffer 0
Query block 0, interval 21/26 (200000000:210000000) with buffer 0
Query block 0, interval 22/26 (210000000:220000000) with buffer 0
Query block 0, interval 23/26 (220000000:230000000) with buffer 0
Query block 0, interval 24/26 (230000000:240000000) with buffer 0
Query block 0, interval 25/26 (240000000:250000000) with buffer 0


real    4m26.554s
user    57m1.447s
sys     19m27.509s

Error in LASTZ process!

minus strand subrange is 1..1159682
FAILURE: query interval out of range (tmp9.block0.r4008435271.plus.segments: line 22, 1482007>1159682)
FAILURE: query interval out of range (tmp9.block0.r4613734159.minus.segments: line 19, 4294647925>1159682)
minus strand subrange is 1..1159682
FAILURE: query interval out of range (tmp9.block0.r4613734159.plus.segments: line 19, 1481984>1159682)
FAILURE: query interval out of range (tmp9.block0.r5128203719.minus.segments: line 31, 4294645060>1159682)
minus strand subrange is 1..1159682
FAILURE: query interval out of range (tmp9.block0.r5128203719.plus.segments: line 9, 1482007>1159682)
FAILURE: query interval out of range (tmp9.block0.r517840468.minus.segments: line 104, 4294614720>1159682)
minus strand subrange is 1..1159682
FAILURE: query interval out of range (tmp9.block0.r517840468.plus.segments: line 144, 1479179>1159682)
FAILURE: query interval out of range (tmp9.block0.r5628944441.minus.segments: line 35, 4294645065>1159682)
minus strand subrange is 1..1159682
FAILURE: query interval out of range (tmp9.block0.r5628944441.plus.segments: line 8, 1482007>1159682)

(exits with code 6)

For reference, the cactus command is

cactus-blast ./js 10mammalsplus.txt Anc10.cigar --root Anc10 --pathOverrides tupChi1.fa.pp rheMac8.fa.pp hg38.fa.pp panTro6.fa.pp equCab3.fa.pp --pathOverrideNames Tree_shrew Rhesus Human Chimp Horse --realTimeLogging --logInfo --maxCores 64 --workDir ./work --cleanWorkDir never --configFile config.xml

FAILURE: extra segments in file

Does this mean anything to you? I'm getting this on Terra with the latest Segalign. This is on a job that seems to have worked an older version:

FAILURE: extra segments in file (tmp9.block7.r4845374754.plus.segments: line 94, id=2|chrUn_JH373295|0/id=2|chr27|0+)
(for this usage segments must appear in the same order as the query file, with
all + strand segments before all - strand segments for each query)
FAILURE: extra segments in file (tmp9.block7.r570655384.plus.segments: line 277, id=0|chrB1|0/id=2|chr27|0+)
(for this usage segments must appear in the same order as the query file, with
all + strand segments before all - strand segments for each query)
Command exited with non-zero status 6
Command being timed: "run_segalign /mnt/hdd/node-9c3117ca-a2f4-418b-9a69-1b9eddc9b16c-577754ab-8162-4c0c-89c8-eb40417d150f/tmp6cjn70hu/e132e508-98b8-45ea-99dc-fde827b00bbc/tmpjey6y6io.tmp /mnt/hdd/node-9c3117ca-a2f4-418b-9a69-1b9eddc9b16c-577754ab-8162-4c0c-89c8-eb40417d150f/tmp6cjn70hu/e132e508-98b8-45ea-99dc-fde827b00bbc/tmpjey6y6io.tmp --format=cigar --notrivial --step=2 --ambiguous=iupac,100,100 --ydrop=3000"