Git Product home page Git Product logo

Comments (29)

andytyk avatar andytyk commented on June 2, 2024

Hi Weixian,

Can you attach or e-mail me your fragger.params? For non-specific digests, we need to impose further constraints on the search space to reduce memory usage, otherwise, it can get quickly out of control.

Andy

from msfragger.

chhh avatar chhh commented on June 2, 2024

@weixiandeng

  • You can find the fragger.params file in the output directory that you specified in the GUI.
  • Alternatively you can set the same parameters that you used in the GUI and click the "Save" button at the top of MSFragger tab.
  • Please update the GUI to a more recent version: http://github.com/chhh/msfragger-gui/releases/latest
    This will help with updates in the future.

from msfragger.

weixiandeng avatar weixiandeng commented on June 2, 2024

Thank you for reply, here's the params file.
fragger.params.zip

from msfragger.

weixiandeng avatar weixiandeng commented on June 2, 2024

by the way, in this file, I didn't delete anything in the enzyme selection tab, but I tried delete enzyme name and digestion site, it reported same issue.

from msfragger.

weixiandeng avatar weixiandeng commented on June 2, 2024

from msfragger.

chhh avatar chhh commented on June 2, 2024

@weixiandeng the size of the mzml file is more or less irrelevant, it's the size of the database (fasta file) that matters the most for RAM usage. How large is your fasta file?

from msfragger.

anesvi avatar anesvi commented on June 2, 2024

In case the problem is that your are running out of memory, I suggest you reduce the maximum peptide length to say 25
digest_min_length = 7
digest_max_length = 25

This will reduce the size of the fragment index and the overall memory requirement (if that's an issue with your run)

Alexey

from msfragger.

weixiandeng avatar weixiandeng commented on June 2, 2024

from msfragger.

weixiandeng avatar weixiandeng commented on June 2, 2024

from msfragger.

chhh avatar chhh commented on June 2, 2024

@weixiandeng As I said before, it would help if you shared info about your fasta file. What database are you using? How large is the file? Can you share the file?

from msfragger.

Dmorgen avatar Dmorgen commented on June 2, 2024

Assuming I'm using the human swiss-prot DB, and I've got access to 128GB machine. would that be enough? should I have more?

from msfragger.

andytyk avatar andytyk commented on June 2, 2024

That should be more than enough using reasonable parameters for non-specific digests. I tried a non-specific digests on the human Uniprot database (with reversed decoys) and peptide lengths 7-25 and managed to fit it all within 32GB of memory (-Xmx32G). You can try reducing max_variable_mods_combinations to 1000 to reduce the number of modified peptides.

from msfragger.

weixiandeng avatar weixiandeng commented on June 2, 2024

from msfragger.

weixiandeng avatar weixiandeng commented on June 2, 2024

from msfragger.

anesvi avatar anesvi commented on June 2, 2024

Have you tried a default closed search, with trypsin digestion? Just to make sure the problem is really related no non-specifc digestion.
Alexey

from msfragger.

weixiandeng avatar weixiandeng commented on June 2, 2024

from msfragger.

chhh avatar chhh commented on June 2, 2024

Can you give us the database you're trying to use?

from msfragger.

cabarnescabarnes avatar cabarnescabarnes commented on June 2, 2024

I'm trying to do something similar and could imagine this no enzyme searching thing being very useful in the MHC peptides world and the search for undigested peptide hormones. In an idealized setting where you might have MS data for such a thing, do you have an idealized params file that could be tried for no enzyme searches? I'm used to using Comet, where you can define the enzyme with a "0" in the params file for no enzyme searching.

from msfragger.

anesvi avatar anesvi commented on June 2, 2024

from msfragger.

cabarnescabarnes avatar cabarnescabarnes commented on June 2, 2024

Great. Thanks for the quick response. That is a good summary of the necessary parameters. In my particular case, I am actually interested in longer undigested peptides. Would it reduce the computational burden if I extended the range of the digestion length to like 20-50 or does that matter at all?

from msfragger.

anesvi avatar anesvi commented on June 2, 2024

from msfragger.

cabarnescabarnes avatar cabarnescabarnes commented on June 2, 2024

Ok. Great. Thanks for the help!

from msfragger.

anesvi avatar anesvi commented on June 2, 2024

from msfragger.

weixiandeng avatar weixiandeng commented on June 2, 2024

from msfragger.

cabarnescabarnes avatar cabarnescabarnes commented on June 2, 2024

Hi. Thanks for all of your help. I'm using a compute cluster that has really a lot of resources (up to 1TB per node) and I still can't seem to make this work. I have a single .mzXML file that has an enrichment of non-digested peptides. I tried many many times now to get this to work with this command:

java -Xmx512G -jar MSFragger-20171106.jar fragger_no_enzyme-2.params 20180124_QEp1_CPBA_EASY03_025_30_SA_plasma_endopeps_Joan_1mL_StageTip_1to1_01.mzXML 

and I get the following error:

Peptide index read in 81ms
Selected fragment tolerance 0.02 Da and maximum fragment slice size of 404630.20MB
Exception in thread "main" java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
	at e.a(Unknown Source)
	at MSFragger.main(Unknown Source)
	... 5 more

TROUBLESHOOTING: I now went back and took the same .mzXML file in the same directory using the same command as above and just ran it exactly the same using the fragger.params file that came with the MSFragger initial download and it runs past where the error is occurring. I changed each of the parameters individually and found that the following parameter causes this error to occur:

search_enzyme_cutafter = ARNDCQEGHILKMFPSTWYV

If I use all of the parameters that you outlined in the post above for no enzyme searches with leaving the default 'KR' in the 'search_enzyme_cutafter' line, I can make it run through (I don't know if the results are right, but the algorithm doesn't create the error above). This is even leaving the 7-50 amino acid length designations. If I put back the "search_enzyme_cutafter = ARNDCQEGHILKMFPSTWYV" and reduce the amino acid length to 8-25 like you suggested, I also recreate the error above. I even tried changing the length to something arbitrarily defined (in this case 20-21 amino acids) and it also failed. Is there something that I am missing or some other strategy that I should try?

Thanks so much for your help.

Best,

chris

PS - I forgot to mention that I also did this against an extremely oversimplified uniprot database where I deleted all of the proteins except for 2 and then added back the reverse sequences with philosopher.

from msfragger.

cabarnescabarnes avatar cabarnescabarnes commented on June 2, 2024

I might have just solved my own problem with the parameter that I wasn't changing and probably should have. How do this parameter play into the rest of the parameter set? I set it to 0 and can make this No_enzyme search get past the previous error at least.

num_enzyme_termini = 0 # 2 for enzymatic, 1 for semi-enzymatic, 0 for nonspecific digestion

from msfragger.

anesvi avatar anesvi commented on June 2, 2024

from msfragger.

andytyk avatar andytyk commented on June 2, 2024

Yes, that is correct. If you were to leave things as enzymatic cleavage while setting your enzyme to cut at every single point, you would not get any peptides due to the limits on the number of missed cleavages. Non-enzymatic searches should always be done with num_enzyme_termini = 0.

Andy

from msfragger.

anesvi avatar anesvi commented on June 2, 2024

Update on the parameters for nonspecific search:

Using MSFragger-GUI, please specify:
Enzyme Name: nonspecific
That way PeptideProphet will automatically recognize that the enzyme was nonspecific and you would not need to add --enzyme nonspecific in PeptideProphet tab

Please also specify Cut After: ARNDCQEGHILKMFPSTWYV
and Not After: empty
and select Cleavages: NON_SPECIFIC

If you edit fragger.params directly, specify the following:

search_enzyme_name = nonspecific
search_enzyme_cutafter = ARNDCQEGHILKMFPSTWYV
search_enzyme_butnotafter = 
num_enzyme_termini = 0

As mentioned before, reduce peptide length to 8-25 (perhaps less) and do not use variable mods other than M+16 (unless you have enough memory). If you want to add more variable mods e.g. extra variable modifications on Cys (for MHC peptides) and the program crashes, please reduce the peptide length to 8-15. Adding STY+80 with nonspecific search will certainly require a cluster with a lot of memory. Instead, you can perform searches without variable mods like STY+80 but using mass_offsets option. Please specify mass shifts of interest, e.g. for phosphorylation, as mass_offsets: 0/79.9663 . Hopefully we can put together a tutorial/website soon explaining these options better, and we will also provide sample parameter files for various scenarios.

Alexey

from msfragger.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.