Git Product home page Git Product logo

alphabase's People

Contributors

ammarcsj avatar georgwa avatar jalew188 avatar mo-sameh avatar sophiamaedler avatar swillems avatar yangkl96 avatar zhouxiexuan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

alphabase's Issues

Handling of non amino-acid FASTA characters

Describe the bug
Currently, non AA characters like X,J,B break the isotope calculation and might have undefined behaviour in other functions.
Where should we capture these precursors? during the FASTA digest?

To Reproduce
Steps to reproduce the behavior:

  1. Download Isoform FASTA
  2. Run fasta digest

Expected behavior
I think it would be best to drop them with a warning.

Logs

2024-01-15 18:54:41> Running IsotopeGenerator
  0%|          | 1/464 [00:05<39:16,  5.09s/it]/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/precursor.py:613: RuntimeWarning: invalid value encountered in divide
  precursor_dist [/](https://file+.vscode-resource.vscode-cdn.net/)= np.sum(precursor_dist, axis=1, keepdims=True)
  8%|▊         | 35/464 [00:06<00:30, 14.08it/s]/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/precursor.py:613: RuntimeWarning: invalid value encountered in divide
  precursor_dist [/](https://file+.vscode-resource.vscode-cdn.net/)= np.sum(precursor_dist, axis=1, keepdims=True)
 19%|█▊        | 86/464 [00:08<00:13, 27.12it/s]/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/precursor.py:613: RuntimeWarning: invalid value encountered in divide
  precursor_dist [/](https://file+.vscode-resource.vscode-cdn.net/)= np.sum(precursor_dist, axis=1, keepdims=True)
 22%|██▏       | 101/464 [00:09<00:13, 27.31it/s]/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/precursor.py:613: RuntimeWarning: invalid value encountered in divide
  precursor_dist [/](https://file+.vscode-resource.vscode-cdn.net/)= np.sum(precursor_dist, axis=1, keepdims=True)
 27%|██▋       | 126/464 [00:10<00:13, 25.06it/s]/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/precursor.py:613: RuntimeWarning: invalid value encountered in divide
  precursor_dist [/](https://file+.vscode-resource.vscode-cdn.net/)= np.sum(precursor_dist, axis=1, keepdims=True)
[/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/precursor.py:613](https://file+.vscode-resource.vscode-cdn.net/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/precursor.py:613): RuntimeWarning: invalid value encountered in divide
  precursor_dist [/](https://file+.vscode-resource.vscode-cdn.net/)= np.sum(precursor_dist, axis=1, keepdims=True)
 35%|███▌      | 163/464 [00:12<00:18, 16.47it/s]/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/precursor.py:613: RuntimeWarning: invalid value encountered in divide
  precursor_dist [/](https://file+.vscode-resource.vscode-cdn.net/)= np.sum(precursor_dist, axis=1, keepdims=True)
 46%|████▌     | 213/464 [00:14<00:11, 22.18it/s]/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/precursor.py:613: RuntimeWarning: invalid value encountered in divide
  precursor_dist [/](https://file+.vscode-resource.vscode-cdn.net/)= np.sum(precursor_dist, axis=1, keepdims=True)
 62%|██████▏   | 288/464 [00:17<00:07, 24.15it/s]/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/precursor.py:613: RuntimeWarning: invalid value encountered in divide
  precursor_dist [/](https://file+.vscode-resource.vscode-cdn.net/)= np.sum(precursor_dist, axis=1, keepdims=True)
 74%|███████▎  | 342/464 [00:20<00:06, 19.47it/s]/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/precursor.py:613: RuntimeWarning: invalid value encountered in divide
  precursor_dist [/](https://file+.vscode-resource.vscode-cdn.net/)= np.sum(precursor_dist, axis=1, keepdims=True)
 93%|█████████▎| 432/464 [00:24<00:01, 19.63it/s]/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/precursor.py:613: RuntimeWarning: invalid value encountered in divide
  precursor_dist [/](https://file+.vscode-resource.vscode-cdn.net/)= np.sum(precursor_dist, axis=1, keepdims=True)
100%|██████████| 464/464 [00:26<00:00, 17.76it/s]

DiaNNDecoyLib decoy generation fails with Selenocystein

Describe the bug
Alphabase can't generate decooys when the sequence contains a selenocysteine.

To Reproduce
Steps to reproduce the behavior:

  1. Download reviewed Fasta from Uniprot
  2. Generate DIA-NN decoy lib as described in APD notebook

Expected behavior
Decoy sequences are appended.

Logs
fasta_lib.append_decoy_sequence() fails with: 'substring not found'

Example

for seq in decoy_lib.precursor_df.sequence:
    try:
        (lambda x:
            x[0]+decoy_lib.mutated_AAs[decoy_lib.raw_AAs.index(x[1])]+
            x[2:-2]+decoy_lib.mutated_AAs[decoy_lib.raw_AAs.index(x[-2])]+x[-1])(seq)
    except Exception as e:
        print(seq)
        #print(e)
SUCCHCR
KUEUPSN
GPSSGGUG
RGPSSGGUG
FUIFSSSLK
AEENITESCQUR
SGLDPTVTGCUG
SGASILQAGCUG
SSGLDITQKGCUG
RSGLDPTVTGCUG
GUGCKVPQEALLK
FUIFSSSLKFVPK
RSGASILQAGCUG
SUCCHCRHLIFEK
LYAGAILEVCGUK
KLYAGAILEVCGUK
LANGGEGMEEATVVIEHCTSUR
LPPAAUQISQQLIPTEASASUR
EKLANGGEGMEEATVVIEHCTSUR
LPPAAUQISQQLIPTEASASURUK
ENLPSLCSUQGLRAEENITESCQUR
AEENITESCQURLPPAAUQISQQLIPTEASASUR

modloss_importance in modification.tsv

Were the modloss_importance values derived from somewhere or just arbitrary? I noticed they were only set to non-zero values for phospho and glygly. Does this mean that unless manually changed in modification.tsv to be 1, b/y ions with modlosses will be ignored?

Thanks,
Kevin

calc_precursor_mz leads to unexpected behavior

Hi Feng, I observed the following bug when using AB to create multiplexed DIA libraries. It's technically not a bug but more of a incompatible behavior between functions which might confuse users. Let me know what you think, I'm happy to make a PR.

Describe the bug
When calling calc_precursor_mz on a spectral library, clip_by_precursor_mz_ is always invoked, which messes up the precursor fragment mapping.

To Reproduce

  1. Create a library with a preecursor just below the upper mz limit:
precursor_df = pd.DataFrame([
    {'sequence': 'AGHCEWQMKPERGWWWWWPPWWWWGGGGAGGAG', 'mods': 'Dimethyl@Any N-term;Dimethyl@K', 'mod_sites': '0;8', 'charge': 2},
    {'sequence': 'AGHCEWQMKPE', 'mods': 'Dimethyl@Any N-term;Dimethyl@K', 'mod_sites': '0;8', 'charge': 3},
    {'sequence': 'AGHCEWQMKPERGWWWWWPPWWWWGGGGAGGAGG', 'mods': '', 'mod_sites': '', 'charge': 2},
])

spec_lib = SpecLibBase(
    ['b_z1','b_z2','y_z1','y_z2'],
    decoy='pseudo_reverse',
    precursor_mz_max=2000,
)
spec_lib._precursor_df = precursor_df
spec_lib.calc_precursor_mz()
spec_lib.calc_fragment_mz_df()

spec_lib.precursor_df
sequence mods mod_sites charge nAA precursor_mz frag_start_idx frag_stop_idx
AGHCEWQMKPE Dimethyl@Any N-term;Dimethyl@K 0;8 3 11 457.877652 0 10
AGHCEWQMKPERGWWWWWPPWWWWGGGGAGGAG Dimethyl@Any N-term;Dimethyl@K 0;8 2 33 1997.896037 10 42
AGHCEWQMKPERGWWWWWPPWWWWGGGGAGGAGG     2 34 1998.375468 42 75
  1. Modify the sequence or modification which will change the m/z of some precursors, recalculate the precursor mz:
spec_lib._precursor_df['mods'] = spec_lib._precursor_df['mods'].str.replace('Dimethyl', 'Dimethyl:2H(6)13C(2)')
spec_lib.calc_precursor_mz()
sequence mods mod_sites charge nAA precursor_mz frag_start_idx frag_stop_idx
AGHCEWQMKPE Dimethyl:2H(6)13C(2)@Any N-term;Dimethyl:2H(6)... 0;8 3 11 463.240566 0 10
AGHCEWQMKPERGWWWWWPPWWWWGGGGAGGAGG     2 34 1998.375468 42 75
  1. recalculate the fragment masses:
    spec_lib.calc_fragment_mz_df()
File "/Users/georgwallmann/miniconda3/envs/alpha/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3378, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "/var/folders/lc/9594t94d5b5_gn0y04w1jh980000gn/T/ipykernel_6587/3999293303.py", line 1, in <module>
    spec_lib.calc_fragment_mz_df()
  File "/Users/georgwallmann/Documents/git/alphabase/alphabase/spectral_library/base.py", line 361, in calc_fragment_mz_df
    )
  File "/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/fragment.py", line 818, in create_fragment_mz_dataframe
  File "/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/fragment.py", line 876, in create_fragment_mz_dataframe
    '''
  File "/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/fragment.py", line 435, in mask_fragments_for_charge_greater_than_precursor_charge
    elif frag_type == 'y':
  File "/Users/georgwallmann/miniconda3/envs/alpha/lib/python3.9/site-packages/pandas/core/indexing.py", line 815, in __setitem__
    indexer = self._get_setitem_indexer(key)
  File "/Users/georgwallmann/miniconda3/envs/alpha/lib/python3.9/site-packages/pandas/core/indexing.py", line 698, in _get_setitem_indexer
    return self._convert_tuple(key)
  File "/Users/georgwallmann/miniconda3/envs/alpha/lib/python3.9/site-packages/pandas/core/indexing.py", line 897, in _convert_tuple
    keyidx = [self._convert_to_indexer(k, axis=i) for i, k in enumerate(key)]
  File "/Users/georgwallmann/miniconda3/envs/alpha/lib/python3.9/site-packages/pandas/core/indexing.py", line 897, in <listcomp>
    keyidx = [self._convert_to_indexer(k, axis=i) for i, k in enumerate(key)]
  File "/Users/georgwallmann/miniconda3/envs/alpha/lib/python3.9/site-packages/pandas/core/indexing.py", line 1394, in _convert_to_indexer
    key = check_bool_indexer(labels, key)
  File "/Users/georgwallmann/miniconda3/envs/alpha/lib/python3.9/site-packages/pandas/core/indexing.py", line 2567, in check_bool_indexer
    return check_array_indexer(index, result)
  File "/Users/georgwallmann/miniconda3/envs/alpha/lib/python3.9/site-packages/pandas/core/indexers/utils.py", line 553, in check_array_indexer
    raise IndexError(
IndexError: Boolean index has wrong length: 43 instead of 75

Expected behavior
calc_fragment_mz_df requires a continous frag_start_idx, frag_stop_idx, which is not compatible with the way clip_by_precursor_mz_ works. It is also not immediatley clear thatclip_by_precursor_mz_ has been called and precursors were removed.

Possible Solutions

  • don't call clip_by_precursor_mz_ by default within calc_precursor_mz
  • issue a warning: n precursors were remove because they were outside the mz limits
  • call spec_lib.remove_unused_fragments() right after clip_by_precursor_mz_

msgfplus rescoring support or AlphaPeptDeep format example

I'd like to use the output of the msgfplus search engine (which I can convert to percolator format) as an input toAlphaPeptDeep's rescoring module. However, I don't think that either msgfplus output or percolator output is supported? Thoughts?

(One idea I had was to write a converter from msgfplus (or percolator) to the "AlphaPeptDeep" format, but I couldn't find an example file for AlphaPeptDeep and documentation for what fields are required so that I could create such a converter. Thoughts?)

Update isotope calculation

Update the isotope calculation for spectral libraries to allow for dense and sparse isotope creation.
Both should be possible by calling a class method like create_dense_isotopes, create_sparse_isotopes.

mmh3 installation error

Discussed in #78

Originally posted by jalew188 January 18, 2023
As mmh3 depends on compiling system (e.g. Visual Studio in Windows) as its core is written in C/C++, it is better to write our own hash method using numba to replace mmh3.

See pypa/pip#8657

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.