Git Product home page Git Product logo

Comments (11)

husamia avatar husamia commented on June 3, 2024 1

@kcleal I got it working after I downloaded the reference, uncompressed it, compressed it with bgzip and indexed it with samtools.

from dysgu.

kcleal avatar kcleal commented on June 3, 2024

Hi @husamia, does the reference genome use 17 instead of chr17? The bam file looks like it uses the chr17 convention, but not the reference? check with head Homo_sapiens_assembly19_1000genomes_decoy.fasta.fai

from dysgu.

husamia avatar husamia commented on June 3, 2024

@kcleal I've tried replacing chr17 and just 17 in the bed file and it didn't help. I even used samtools view -b chr17 original.bam > chr17.bam and removed the --search and still getting the same error!

from dysgu.

kcleal avatar kcleal commented on June 3, 2024

I think the bam file is consistent with chr17 - the rest of the pipeline works ok. I was wondering about the reference genome. The error message is saying that chr17 is not in the fasta file. Tne functionref_genome.get_reference_length is throwing the key error. Could you confirm this?

from dysgu.

husamia avatar husamia commented on June 3, 2024

@kcleal here is the check

head /mnt/d/Research/Homo_sapiens_assembly19_1000genomes_decoy.fasta.fai
1 249250621 52 100 101
2 243199373 251743232 100 101
3 198022430 497374651 100 101
4 191154276 697377358 100 101
5 180915260 890443229 100 101
6 171115067 1073167694 100 101
7 159138663 1245993964 100 101
8 146364022 1406724066 100 101
9 141213431 1554551781 100 101
10 135534747 1697177401 100 101

from dysgu.

kcleal avatar kcleal commented on June 3, 2024

Ah ok, thats the problem - the 'chr' is missing from the chromosome names. You can add this in using sed/awk, and then re-index the genome. It should read chr1 249250621 52 100 101. Alternatively, you should probably use the same reference genome that the sample was aligned to. You can normally check using samtools view -H and look at the command used during mapping along with the reference genome.

from dysgu.

husamia avatar husamia commented on June 3, 2024

@kcleal the reference used is the generic hg19.fasta which I have two copies of downloaded from different sources and both are the same. furthermore, I can open the file in IGV and the chr17 file created with samtools in IGV as well

from dysgu.

kcleal avatar kcleal commented on June 3, 2024

I think IGV does automatic conversion between the two representations. This is not supported in dysgu unfortunately

from dysgu.

husamia avatar husamia commented on June 3, 2024

@kcleal this is an issue.

the reference is generic named hg19.fasta which is not provided with the data. I used the reference with many other tools just fine. so can a feature added to do the conversion. Provide me me the awk/sed command to add the chr to the file?

from dysgu.

kcleal avatar kcleal commented on June 3, 2024

Its a problem I have encountered before with other genomics analysis, the different representations can be a pain. However doing automatic conversion can also cause problems. I recommend
you download the hg19 with the chr representation from ucsc TableBrowser.

from dysgu.

kcleal avatar kcleal commented on June 3, 2024

Actually from here: http://hgdownload.cse.ucsc.edu/goldenpath/hg19/bigZips/

from dysgu.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.