Git Product home page Git Product logo

Comments (12)

maryawood avatar maryawood commented on August 23, 2024

Sorry you're running into this issue! The only time I've seen something like that before was when there were very, very long indel variants being processed - did you happen to use a tool like Pindel to create your VCF file? Either way, if it's possible for you to share your data with me, I'd be happy to try it myself and see if I can recreate the issue/figure out what's causing it!

from neoepiscope.

jfass avatar jfass commented on August 23, 2024

Thanks @maryawood! There are some complex, long variants, up to ~40-60bp long. This particular set is from VarScan, I believe, but I also may want to test mutect, strelka, somaticsniper ...

What are some guidelines on lengths / types of variants that neoepiscope may have trouble handling?

from neoepiscope.

maryawood avatar maryawood commented on August 23, 2024

Hmm, I don't think variants of that size should be an issue - when I've had trouble in the past, it's been with very long indels of greater than 1000 bp in length. I'm not sure what your restrictions are on data sharing, but would it be possible for you to share your adjusted_haplotype_output file with me so I can try to recreate the problem? You're welcome to send it to the [email protected] account if you don't want to post it here. (I see now that you tried to email about this problem there - I'm sorry that we didn't see it more quickly!)

from neoepiscope.

jfass avatar jfass commented on August 23, 2024

from neoepiscope.

maryawood avatar maryawood commented on August 23, 2024

Okay, keep me posted on data sharing!

Re: incorporating germline variants, you can use neoepiscope merge before running the HapCUT2 steps to generate a merged VCF file that has somatic and germline variants together. That way, the haplotypes can be assigned incorporating both variant types simultaneously, and you can call neoepitopes that have the appropriate germline context included! One important thing to note is that if your somatic VCF lists the "normal" column before the "tumor" column, you'll want to run neoepiscope swap first to make sure you're getting tumor sample information for the variants, not normal sample information.

The README has information about running the swap and merge modes, but let me know if you have any questions after looking through that!

from neoepiscope.

jfass avatar jfass commented on August 23, 2024

from neoepiscope.

maryawood avatar maryawood commented on August 23, 2024

Ahh, yes, that's a good point! neoepiscope doesn't handle that at present, so filtering variants first to separate somatic variants would be necessary! There aren't any required INFO fields, but if you want to report VAF with your neoepitopes, your VCF will need to have a FREQ, AF, or FA field - definitely not mandatory though, and if those are missing, neoepiscope will just report an NA for VAF. I think most VCFs should work for neoepiscope, as long as they have the standard columns, and somatic and germline variants are kept separate at the beginning (and then merged through neoepiscope merge if desired).

As far as the initial problem, we might have to get a little creative if you can't share the data - would you feel comfortable editing the code of your neoepiscope install a bit to add some print statements? If so, I can give you some spots to start adding some so we can try to track down exactly where the problem is.

from neoepiscope.

jfass avatar jfass commented on August 23, 2024

from neoepiscope.

maryawood avatar maryawood commented on August 23, 2024

Okay, let's just start with a few in the file transcript.py and see what happens...

First can you add this import statement after the other import statements?

from datetime import datetime

After the docstring in the get_peptides_from_transcripts function (so after line 3108 in the latest commit, although depending on your version could be a bit different), could you add the following statements:

print('Affected transcripts:', len(relevant_transcripts))

print('Homozygous variants:', len(homozygous_variants))

I just want to get an idea of how many variants/transcripts we're working with.

Next, after the line for affected_transcript in relevant_transcripts: (line 3112 in the latest commit), could you add these lines:

print(datetime.now(), affected_transcript, len(relevant_transcripts[affected_transcript]), 'haplotypes')

i = 0

Similarly, after the line for ht in haplotypes: (line 3146 in the latest commit), could you add these lines:

i += 1

print(datetime.now(), 'haplotype', i)

Also, after the line for transcript in homozygous_variants: (line 3208 in the latest commit), could you add this line:

print(datetime.now(), transcript)

I'd like to get an idea of about how long it's taking to process the relevant variants for each transcript/different haplotypes that are being applied to those transcripts.

Finally, just above the return statement for the get_peptides_from_transcripts function, could you add this line:

print(datetime.now(), 'Finished getting peptides')

I'm hoping we can get an idea of whether things are just taking a long time to process, or if the program is getting caught on a specific variant. If we get to the last print statement, then we can look elsewhere for the problem. I think these print statements should keep things relatively private (i.e. no variants should be printed), so if you could send me the output that would be great!

from neoepiscope.

jfass avatar jfass commented on August 23, 2024

from neoepiscope.

jfass avatar jfass commented on August 23, 2024

from neoepiscope.

maryawood avatar maryawood commented on August 23, 2024

Hi Joe! I'm not seeing the stdout/stderr here - would you want to try emailing them to the [email protected] address? GitHub may not like the size or something...

from neoepiscope.

Related Issues (13)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.