Hi, I am running graphmap2 on the following data:
Reads:
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR598/003/SRR5989373/SRR5989373_1.fastq.gz
Reference GTF:
ftp://ftp.ensembl.org/pub/release-94/gtf/saccharomyces_cerevisiae/Saccharomyces_cerevisiae.R64-1-1.94.gtf.gz
Reference:
ftp://ftp.ensembl.org/pub/release-94/fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz
Using v0.6.3 (git pull and built today) I still get Segmentation Faults:
/mnt/d/dev/git/graphmap2/bin/Linux-x64/graphmap2 align --rebuild-index -x rnaseq --threads 8 -r /home/mjoppich/dev/data/genomes/Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.gm2.fa -d fastq/SRR5989373_1.fastq -o graphmap2/SRR5989373_1.unmod.sam
[16:45:22 BuildIndexes] Loading reference sequences.
[16:45:22 SetupIndex_] Building the index for shape: '11110111101111'.
[16:45:22 Create] Allocated memory for a list of 6078553 seeds (128 bits each) (0.00000 sec, diff: 0.03125 sec).
[16:45:22 Create] Memory consumption: [currentRSS = 38 MB, peakRSS = 38 MB]
[16:45:22 Create] Collecting seeds.
[16:45:22 Create] Minimizer seeds will be used. Minimizer window is 5.
[16:45:23 Create] [currentRSS = 170 MB, peakRSS = 223 MB] Sequence: 34/34, len: 1090940, name: 'VII'
[16:45:23 Create] Final memory allocation after collecting seeds: [currentRSS = 176 MB, peakRSS = 223 MB]
[16:45:23 Create] Sorting the seeds using 8 threads.
[16:45:23 Create] Generating the hash table.
[16:45:24 Create] Calculating the distribution statistics for key counts.
[16:45:24 Create] Index statistics: average key count = 2.130736, max key count = 3204.000000, std dev = 3.374890, percentil (99.00%) (count cutoff) = 13.000000
[16:45:25 Create] Memory consumption: [currentRSS = 559 MB, peakRSS = 754 MB]
[16:45:25 SetupIndex_] Finished building index.
[16:45:25 SetupIndex_] Storing the index to file: '/home/mjoppich/dev/data/genomes/Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.gm2.fa.gmidx'.
[16:45:26 Index] Memory consumption: [currentRSS = 548 MB, peakRSS = 754 MB]
[16:45:26 Run] Hits will be thresholded at the percentil value (percentil: 99.000000%, frequency: 13).
[16:45:26 Run] Minimizers will be used. Minimizer window length: 5
[16:45:26 Run] Reference genome is assumed to be linear.
[16:45:26 Run] One or more similarly good alignments will be output per mapped read. Will be marked secondary.
[16:45:26 ProcessReads] All reads will be loaded in memory.
[16:45:28 ProcessReads] All reads loaded in 2.56 sec (size around 469 MB). (239685513 bases)
[16:45:28 ProcessReads] Memory consumption: [currentRSS = 1101 MB, peakRSS = 1101 MB]
[16:52:26 ProcessReads] [CPU time: 2825.36 sec, RSS: 5221 MB] Read: 76803/241446 (31.81%) [m: 75674, u: 1122], length = 691, qname: SRR5989373.76804 5859c147-15...runAlignment.sh: line 11: 2125 Segmentation fault (core dumped) /mnt/d/dev/git/graphmap2/bin/Linux-x64/graphmap2 align --rebuild-index -x rnaseq --threads 8 -r $GENOMEGM2 -d $FASTQ -o graphmap2/$BASENAME.unmod.sam
I can also call graphmap2 in a different fashion:
/mnt/d/dev/git/graphmap2/bin/Linux-x64/graphmap2 align --rebuild-index -x rnaseq --gtf /home/mjoppich/dev/data/genomes/Saccharomyces_cerevisiae.R64-1-1.94.gtf --threads 8 -r /home/mjoppich/dev/data/genomes/Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.gm2.fa -d fastq/SRR5989373_1.fastq -o graphmap2/SRR5989373_1.unmod.sam
Then the segmentation fault occurs earlier:
[16:57:43 BuildIndexes] Loading GTF annotations.
[16:57:43 BuildIndexes] Loading genomic sequences.
[16:57:43 BuildIndexes] Generating the transcriptome.
[16:57:43 GenerateTranscriptomeSeqs] Constructing the transcriptome sequences.
[16:57:43 GenerateTranscriptomeSeqs] In total, there are 7036 transcripts.
[16:57:43 SetupIndex_] Building the index for shape: '11110111101111'.
[16:57:43 Create] Allocated memory for a list of 4428323 seeds (128 bits each) (0.00000 sec, diff: 0.03125 sec).
[16:57:43 Create] Memory consumption: [currentRSS = 33 MB, peakRSS = 33 MB]
[16:57:43 Create] Collecting seeds.
[16:57:43 Create] Minimizer seeds will be used. Minimizer window is 5.
[16:57:47 Create] [currentRSS = 133 MB, peakRSS = 169 MB] Sequence: 14072/14072, len: 483, name: 'YGR296C-B_mRNA_VII'''
[16:57:47 Create] Final memory allocation after collecting seeds: [currentRSS = 133 MB, peakRSS = 169 MB]
[16:57:47 Create] Sorting the seeds using 8 threads.
[16:57:48 Create] Generating the hash table.
[16:57:48 Create] Calculating the distribution statistics for key counts.
[16:57:48 Create] Index statistics: average key count = 1.829742, max key count = 273.000000, std dev = 1.723991, percentil (99.00%) (count cutoff) = 9.000000
[16:57:49 Create] Memory consumption: [currentRSS = 325 MB, peakRSS = 424 MB]
[16:57:49 SetupIndex_] Finished building index.
[16:57:49 SetupIndex_] Storing the index to file: '/home/mjoppich/dev/data/genomes/Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.gm2.fa.gmidx'.
[16:57:49 Index] Memory consumption: [currentRSS = 325 MB, peakRSS = 424 MB]
[16:57:49 Run] Hits will be thresholded at the percentil value (percentil: 99.000000%, frequency: 9).
[16:57:49 Run] Minimizers will be used. Minimizer window length: 5
[16:57:49 Run] Reference genome is assumed to be linear.
[16:57:49 Run] One or more similarly good alignments will be output per mapped read. Will be marked secondary.
[16:57:49 ProcessReads] All reads will be loaded in memory.
[16:57:52 ProcessReads] All reads loaded in 2.92 sec (size around 469 MB). (239685513 bases)
[16:57:52 ProcessReads] Memory consumption: [currentRSS = 868 MB, peakRSS = 868 MB]
[16:57:53 ProcessReads] [CPU time: 8.52 sec, RSS: 960 MB] Read: 180/241446 (0.07%) [m: 93, u: 80], length = 1743, qname: SRR5989373.181 6b89fcff-b792-4f6a-a00f-...Segmentation fault (core dumped)
I'd be more than happy if you could look into this issue. Given the few introns, the yeast genome might be useful for debugging :)