mannlabs / alphabase Goto Github PK
View Code? Open in Web Editor NEWInfrastructure of AlphaX ecosystem
Home Page: https://alphabase.readthedocs.io
License: Apache License 2.0
Infrastructure of AlphaX ecosystem
Home Page: https://alphabase.readthedocs.io
License: Apache License 2.0
Describe the bug
Currently, non AA characters like X,J,B break the isotope calculation and might have undefined behaviour in other functions.
Where should we capture these precursors? during the FASTA digest?
To Reproduce
Steps to reproduce the behavior:
Expected behavior
I think it would be best to drop them with a warning.
Logs
2024-01-15 18:54:41> Running IsotopeGenerator
0%| | 1/464 [00:05<39:16, 5.09s/it]/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/precursor.py:613: RuntimeWarning: invalid value encountered in divide
precursor_dist [/](https://file+.vscode-resource.vscode-cdn.net/)= np.sum(precursor_dist, axis=1, keepdims=True)
8%|▊ | 35/464 [00:06<00:30, 14.08it/s]/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/precursor.py:613: RuntimeWarning: invalid value encountered in divide
precursor_dist [/](https://file+.vscode-resource.vscode-cdn.net/)= np.sum(precursor_dist, axis=1, keepdims=True)
19%|█▊ | 86/464 [00:08<00:13, 27.12it/s]/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/precursor.py:613: RuntimeWarning: invalid value encountered in divide
precursor_dist [/](https://file+.vscode-resource.vscode-cdn.net/)= np.sum(precursor_dist, axis=1, keepdims=True)
22%|██▏ | 101/464 [00:09<00:13, 27.31it/s]/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/precursor.py:613: RuntimeWarning: invalid value encountered in divide
precursor_dist [/](https://file+.vscode-resource.vscode-cdn.net/)= np.sum(precursor_dist, axis=1, keepdims=True)
27%|██▋ | 126/464 [00:10<00:13, 25.06it/s]/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/precursor.py:613: RuntimeWarning: invalid value encountered in divide
precursor_dist [/](https://file+.vscode-resource.vscode-cdn.net/)= np.sum(precursor_dist, axis=1, keepdims=True)
[/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/precursor.py:613](https://file+.vscode-resource.vscode-cdn.net/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/precursor.py:613): RuntimeWarning: invalid value encountered in divide
precursor_dist [/](https://file+.vscode-resource.vscode-cdn.net/)= np.sum(precursor_dist, axis=1, keepdims=True)
35%|███▌ | 163/464 [00:12<00:18, 16.47it/s]/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/precursor.py:613: RuntimeWarning: invalid value encountered in divide
precursor_dist [/](https://file+.vscode-resource.vscode-cdn.net/)= np.sum(precursor_dist, axis=1, keepdims=True)
46%|████▌ | 213/464 [00:14<00:11, 22.18it/s]/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/precursor.py:613: RuntimeWarning: invalid value encountered in divide
precursor_dist [/](https://file+.vscode-resource.vscode-cdn.net/)= np.sum(precursor_dist, axis=1, keepdims=True)
62%|██████▏ | 288/464 [00:17<00:07, 24.15it/s]/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/precursor.py:613: RuntimeWarning: invalid value encountered in divide
precursor_dist [/](https://file+.vscode-resource.vscode-cdn.net/)= np.sum(precursor_dist, axis=1, keepdims=True)
74%|███████▎ | 342/464 [00:20<00:06, 19.47it/s]/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/precursor.py:613: RuntimeWarning: invalid value encountered in divide
precursor_dist [/](https://file+.vscode-resource.vscode-cdn.net/)= np.sum(precursor_dist, axis=1, keepdims=True)
93%|█████████▎| 432/464 [00:24<00:01, 19.63it/s]/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/precursor.py:613: RuntimeWarning: invalid value encountered in divide
precursor_dist [/](https://file+.vscode-resource.vscode-cdn.net/)= np.sum(precursor_dist, axis=1, keepdims=True)
100%|██████████| 464/464 [00:26<00:00, 17.76it/s]
Describe the bug
Alphabase can't generate decooys when the sequence contains a selenocysteine.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Decoy sequences are appended.
Logs
fasta_lib.append_decoy_sequence()
fails with: 'substring not found'
Example
for seq in decoy_lib.precursor_df.sequence:
try:
(lambda x:
x[0]+decoy_lib.mutated_AAs[decoy_lib.raw_AAs.index(x[1])]+
x[2:-2]+decoy_lib.mutated_AAs[decoy_lib.raw_AAs.index(x[-2])]+x[-1])(seq)
except Exception as e:
print(seq)
#print(e)
SUCCHCR
KUEUPSN
GPSSGGUG
RGPSSGGUG
FUIFSSSLK
AEENITESCQUR
SGLDPTVTGCUG
SGASILQAGCUG
SSGLDITQKGCUG
RSGLDPTVTGCUG
GUGCKVPQEALLK
FUIFSSSLKFVPK
RSGASILQAGCUG
SUCCHCRHLIFEK
LYAGAILEVCGUK
KLYAGAILEVCGUK
LANGGEGMEEATVVIEHCTSUR
LPPAAUQISQQLIPTEASASUR
EKLANGGEGMEEATVVIEHCTSUR
LPPAAUQISQQLIPTEASASURUK
ENLPSLCSUQGLRAEENITESCQUR
AEENITESCQURLPPAAUQISQQLIPTEASASUR
There is still an issue if there fewer datapoints than n_kernels * kernel_size
. Will change this in the future.
Originally posted by @GeorgWa in #41 (comment)
At the moment SpecLibFasta.add_modifications()
fails to include any terminal modifications.
Were the modloss_importance values derived from somewhere or just arbitrary? I noticed they were only set to non-zero values for phospho and glygly. Does this mean that unless manually changed in modification.tsv to be 1, b/y ions with modlosses will be ignored?
Thanks,
Kevin
At the moment decoy generation is not using multiprocessing by default.
A batched approach could significantly decrease runtime.
Hi Feng, I observed the following bug when using AB to create multiplexed DIA libraries. It's technically not a bug but more of a incompatible behavior between functions which might confuse users. Let me know what you think, I'm happy to make a PR.
Describe the bug
When calling calc_precursor_mz
on a spectral library, clip_by_precursor_mz_
is always invoked, which messes up the precursor fragment mapping.
To Reproduce
precursor_df = pd.DataFrame([
{'sequence': 'AGHCEWQMKPERGWWWWWPPWWWWGGGGAGGAG', 'mods': 'Dimethyl@Any N-term;Dimethyl@K', 'mod_sites': '0;8', 'charge': 2},
{'sequence': 'AGHCEWQMKPE', 'mods': 'Dimethyl@Any N-term;Dimethyl@K', 'mod_sites': '0;8', 'charge': 3},
{'sequence': 'AGHCEWQMKPERGWWWWWPPWWWWGGGGAGGAGG', 'mods': '', 'mod_sites': '', 'charge': 2},
])
spec_lib = SpecLibBase(
['b_z1','b_z2','y_z1','y_z2'],
decoy='pseudo_reverse',
precursor_mz_max=2000,
)
spec_lib._precursor_df = precursor_df
spec_lib.calc_precursor_mz()
spec_lib.calc_fragment_mz_df()
spec_lib.precursor_df
sequence | mods | mod_sites | charge | nAA | precursor_mz | frag_start_idx | frag_stop_idx |
---|---|---|---|---|---|---|---|
AGHCEWQMKPE | Dimethyl@Any N-term;Dimethyl@K | 0;8 | 3 | 11 | 457.877652 | 0 | 10 |
AGHCEWQMKPERGWWWWWPPWWWWGGGGAGGAG | Dimethyl@Any N-term;Dimethyl@K | 0;8 | 2 | 33 | 1997.896037 | 10 | 42 |
AGHCEWQMKPERGWWWWWPPWWWWGGGGAGGAGG | 2 | 34 | 1998.375468 | 42 | 75 |
spec_lib._precursor_df['mods'] = spec_lib._precursor_df['mods'].str.replace('Dimethyl', 'Dimethyl:2H(6)13C(2)')
spec_lib.calc_precursor_mz()
sequence | mods | mod_sites | charge | nAA | precursor_mz | frag_start_idx | frag_stop_idx |
---|---|---|---|---|---|---|---|
AGHCEWQMKPE | Dimethyl:2H(6)13C(2)@Any N-term;Dimethyl:2H(6)... | 0;8 | 3 | 11 | 463.240566 | 0 | 10 |
AGHCEWQMKPERGWWWWWPPWWWWGGGGAGGAGG | 2 | 34 | 1998.375468 | 42 | 75 |
spec_lib.calc_fragment_mz_df()
File "/Users/georgwallmann/miniconda3/envs/alpha/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3378, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "/var/folders/lc/9594t94d5b5_gn0y04w1jh980000gn/T/ipykernel_6587/3999293303.py", line 1, in <module>
spec_lib.calc_fragment_mz_df()
File "/Users/georgwallmann/Documents/git/alphabase/alphabase/spectral_library/base.py", line 361, in calc_fragment_mz_df
)
File "/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/fragment.py", line 818, in create_fragment_mz_dataframe
File "/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/fragment.py", line 876, in create_fragment_mz_dataframe
'''
File "/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/fragment.py", line 435, in mask_fragments_for_charge_greater_than_precursor_charge
elif frag_type == 'y':
File "/Users/georgwallmann/miniconda3/envs/alpha/lib/python3.9/site-packages/pandas/core/indexing.py", line 815, in __setitem__
indexer = self._get_setitem_indexer(key)
File "/Users/georgwallmann/miniconda3/envs/alpha/lib/python3.9/site-packages/pandas/core/indexing.py", line 698, in _get_setitem_indexer
return self._convert_tuple(key)
File "/Users/georgwallmann/miniconda3/envs/alpha/lib/python3.9/site-packages/pandas/core/indexing.py", line 897, in _convert_tuple
keyidx = [self._convert_to_indexer(k, axis=i) for i, k in enumerate(key)]
File "/Users/georgwallmann/miniconda3/envs/alpha/lib/python3.9/site-packages/pandas/core/indexing.py", line 897, in <listcomp>
keyidx = [self._convert_to_indexer(k, axis=i) for i, k in enumerate(key)]
File "/Users/georgwallmann/miniconda3/envs/alpha/lib/python3.9/site-packages/pandas/core/indexing.py", line 1394, in _convert_to_indexer
key = check_bool_indexer(labels, key)
File "/Users/georgwallmann/miniconda3/envs/alpha/lib/python3.9/site-packages/pandas/core/indexing.py", line 2567, in check_bool_indexer
return check_array_indexer(index, result)
File "/Users/georgwallmann/miniconda3/envs/alpha/lib/python3.9/site-packages/pandas/core/indexers/utils.py", line 553, in check_array_indexer
raise IndexError(
IndexError: Boolean index has wrong length: 43 instead of 75
Expected behavior
calc_fragment_mz_df
requires a continous frag_start_idx, frag_stop_idx
, which is not compatible with the way clip_by_precursor_mz_
works. It is also not immediatley clear thatclip_by_precursor_mz_
has been called and precursors were removed.
Possible Solutions
clip_by_precursor_mz_
by default within calc_precursor_mz
n precursors were remove because they were outside the mz limits
spec_lib.remove_unused_fragments()
right after clip_by_precursor_mz_
In certain cases the array returned from dist, mono = isotope_dist.calc_formula_distribution()
will start before the monoisotopic mass. In these cases, it needs to betrimmed or tracked.
alphabase.peptide.precursor.calc_precursor_isotope_intensity()
needs to be adapted to handle this case.
I'd like to use the output of the msgfplus search engine (which I can convert to percolator format) as an input toAlphaPeptDeep's rescoring module. However, I don't think that either msgfplus output or percolator output is supported? Thoughts?
(One idea I had was to write a converter from msgfplus (or percolator) to the "AlphaPeptDeep" format, but I couldn't find an example file for AlphaPeptDeep and documentation for what fields are required so that I could create such a converter. Thoughts?)
Update the isotope calculation for spectral libraries to allow for dense and sparse isotope creation.
Both should be possible by calling a class method like create_dense_isotopes, create_sparse_isotopes
.
Define MZ_DTYPE/RELATIVE_INTENSITY_DTYPE in a const file.
Originally posted by jalew188 January 18, 2023
As mmh3 depends on compiling system (e.g. Visual Studio in Windows) as its core is written in C/C++, it is better to write our own hash method using numba to replace mmh3.
See pypa/pip#8657
TODO after fixing #120 for the next release
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.