mannlabs / alphabase Goto Github PK

View Code? Open in Web Editor NEW

27.0 10.0 8.0 45.78 MB

Infrastructure of AlphaX ecosystem

Home Page: https://alphabase.readthedocs.io

License: Apache License 2.0

Python 30.87% Shell 0.46% Jupyter Notebook 68.43% HTML 0.08% Inno Setup 0.16%

mass-spectrometry proteomics python infrastructure

alphabase's Introduction

AlphaBase

AlphaBase provides all basic python functionalities for AlphaPept ecosystem from the Mann Labs at the Max Planck Institute of Biochemistry and the University of Copenhagen. To enable all hyperlinks in this document, please view it at GitHub. For documentation, please see readthedocs.

About
License
Installation
- Pip installer
- Developer installer
Usage
Troubleshooting
Citations
How to contribute
Changelog

About

The infrastructure package of AlphaX ecosystem for MS proteomics. It was first published with AlphaPeptDeep, see Citations.

Packages built upon AlphaBase

AlphaPeptDeep: deep learning framework for proteomics.
AlphaRaw: raw data reader for different vendors.
AlphaDIA: DIA search engine.
PeptDeep-HLA: personalized HLA-binding peptide prediction.
AlphaViz: visualization for MS-based proteomics.
AlphaQuant: quantification for MS-based proteomics.

Citations

Wen-Feng Zeng, Xie-Xuan Zhou, Sander Willems, Constantin Ammar, Maria Wahle, Isabell Bludau, Eugenia Voytik, Maximillian T. Strauss & Matthias Mann. AlphaPeptDeep: a modular deep learning framework to predict peptide properties for proteomics. Nat Commun 13, 7238 (2022). https://doi.org/10.1038/s41467-022-34904-3

License

AlphaBase was developed by the Mann Labs at the Max Planck Institute of Biochemistry and the University of Copenhagen and is freely available with an Apache License. External Python packages (available in the requirements folder) have their own licenses, which can be consulted on their respective websites.

Installation

AlphaBase can be installed and used on all major operating systems (Windows, macOS and Linux). There are two different types of installation possible:

Pip installer: Choose this installation if you want to use AlphaBase as a Python package in an existing Python 3.8 environment (e.g. a Jupyter notebook).
Developer installer: Choose this installation if you are familiar with conda and Python. This installation allows access to all available features of AlphaBase and even allows to modify its source code directly. Generally, the developer version of AlphaBase outperforms the precompiled versions which makes this the installation of choice for high-throughput experiments.

Pip

AlphaBase can be installed in an existing Python 3.8 environment with a single bash command. This bash command can also be run directly from within a Jupyter notebook by prepending it with a !:

pip install alphabase

Installing AlphaBase like this avoids conflicts when integrating it in other tools, as this does not enforce strict versioning of dependancies. However, if new versions of dependancies are released, they are not guaranteed to be fully compatible with AlphaBase. While this should only occur in rare cases where dependencies are not backwards compatible, you can always force AlphaBase to use dependancy versions which are known to be compatible with:

pip install "alphabase[stable]"

NOTE: You might need to run pip install -U pip before installing AlphaBase like this. Also note the double quotes ".

For those who are really adventurous, it is also possible to directly install any branch (e.g. @development) with any extras (e.g. #egg=alphabase[stable,development-stable]) from GitHub with e.g.

pip install "git+https://github.com/MannLabs/alphabase.git@development#egg=alphabase[stable,development-stable]"

Developer

AlphaBase can also be installed in editable (i.e. developer) mode with a few bash commands. This allows to fully customize the software and even modify the source code to your specific needs. When an editable Python package is installed, its source code is stored in a transparent location of your choice. While optional, it is advised to first (create and) navigate to e.g. a general software folder:

mkdir ~/folder/where/to/install/software
cd ~/folder/where/to/install/software

The following commands assume you do not perform any additional cd commands anymore.

Next, download the AlphaBase repository from GitHub either directly or with a git command. This creates a new AlphaBase subfolder in your current directory.

git clone https://github.com/MannLabs/alphabase.git

For any Python package, it is highly recommended to use a separate conda virtual environment, as otherwise dependancy conflicts can occur with already existing packages.

conda create --name alphabase python=3.9 -y
conda activate alphabase

Finally, AlphaBase and all its dependancies need to be installed. To take advantage of all features and allow development (with the -e flag), this is best done by also installing the development dependencies instead of only the core dependencies:

pip install -e "./alphabase[development]"

By default this installs loose dependancies (no explicit versioning), although it is also possible to use stable dependencies (e.g. pip install -e "./alphabase[stable,development-stable]").

By using the editable flag -e, all modifications to the AlphaBase source code folder are directly reflected when running AlphaBase. Note that the AlphaBase folder cannot be moved and/or renamed if an editable version is installed. In case of confusion, you can always retrieve the location of any Python module with e.g. the command import module followed by module.__file__.

Usage

TODO

Troubleshooting

In case of issues, check out the following:

Issues: Try a few different search terms to find out if a similar problem has been encountered before
Discussions: Check if your problem or feature requests has been discussed before.

How to contribute

If you like this software, you can give us a star to boost our visibility! All direct contributions are also welcome. Feel free to post a new issue or clone the repository and create a pull request with a new branch. For an even more interactive participation, check out the discussions and the the Contributors License Agreement.

Notes for developers

pre-commit hooks

It is highly recommended to use the provided pre-commit hooks, as the CI pipeline enforces all checks therein to pass in order to merge a branch.

The hooks need to be installed once by

pre-commit install

You can run the checks yourself using:

pre-commit run --all-files

Changelog

See the HISTORY.md for a full overview of the changes made in each version.

alphabase's People

Contributors

Stargazers

Watchers

Forkers

gsaxena888 jalew188 yangkl96 lucasmiranda42 vbrennsteiner sayan10-pixel sophiamaedler onurserce

alphabase's Issues

Remove non-core parts from alphabase

WHO
Alphabase users and developers

WHAT
Remove submodules which are reduntant, not used or not part of the core scope.

Move alphabase/statsistics/regression to alphadia/calibration
Remove alphabase/scoring
Discuss if we want to unify alphabase/quantification/quant_reader right now or move it back to directLFQ

WHY
We've discussed the scope of alphaBase a couple of times.
The package has evolved and by now it's clear that alphaBase should contain

Functionality to calculate masses, isotopes and properties of peptides
Functionality to manage spectral libraries and PSMs

It should NOT

be a utility function collection for code shared acros AlphaX
contain peptide search, statistics etc. specific code

Acceptance Criteria
An alphaBase package whith a clearly defined scope and focus

Additional information

Multithreading does not work for isotope calculation

In development branch, running IsotopeGenerator fails for unknown modifications and new nomenclature with "_" instead of " " (e.g. @Any_N-term instead of @Any N-term), but it works if you set threads = 1.
therefore, some issue occur with multithreading, probably still using the original mod df tables (from main) and not the updated ones (from development).

Pandas column is non-writeable on linux

Describe the bug
For some weird reason alphabase fails because a pandas column seems to be a read-only, non-owned array.
The same function fails at different occasions with the same error.

I have only observed this on linux:

Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.4 LTS
Release:	22.04
Codename:	jammy

To Reproduce
Steps to reproduce the behavior:

run nbdev_test

Logs

ValueError in /home/tim/git_repositories/alphabase/nbdev_nbs/spectral_library/translate.ipynb:
===========================================================================

While Executing Cell #12:
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[1], line 5
      3 spec_lib._fragment_intensity_df = frag_mass_df.copy()
      4 spec_lib._fragment_mz_df = frag_mass_df.copy()
----> 5 df = speclib_to_single_df(spec_lib, min_frag_mz=300, max_frag_mz=1800)
      6 assert (df.FragmentMz>=300).all()
      7 assert (df.FragmentMz<=1800).all()

File ~/git_repositories/alphabase/alphabase/spectral_library/translate.py:311, in speclib_to_single_df(speclib, translate_mod_dict, keep_k_highest_fragments, min_frag_mz, max_frag_mz, min_frag_intensity, min_frag_nAA, modloss, frag_type_head, frag_mass_head, frag_inten_head, frag_charge_head, frag_loss_head, frag_series_head, verbose)
    308 df['StrippedPeptide'] = speclib.precursor_df['sequence']
    310 if 'precursor_mz' not in speclib._precursor_df.columns:
--> 311     speclib.calc_precursor_mz()
    312 df['PrecursorMz'] = speclib._precursor_df['precursor_mz']
    314 for prot_col in ['uniprot_ids','proteins']:

File ~/git_repositories/alphabase/alphabase/spectral_library/base.py:342, in SpecLibBase.calc_precursor_mz(self)
    338 def calc_precursor_mz(self):
    339     """
    340     Calculate precursor mz for self._precursor_df
    341     """
--> 342     fragment.update_precursor_mz(self._precursor_df)

File ~/git_repositories/alphabase/alphabase/peptide/precursor.py:122, in update_precursor_mz(precursor_df, batch_size)
    120 if _calc_in_order:
    121     print(precursor_df.iloc[:,precursor_mz_idx].values.flags)
--> 122     precursor_df.iloc[:,precursor_mz_idx].values[
    123         df_group.index.values[0]:
    124         df_group.index.values[-1]+1
    125     ] = pep_mzs
    126 else:
    127     precursor_df.loc[
    128         df_group.index, 'precursor_mz'
    129     ] = pep_mzs

ValueError: assignment destination is read-only

print(precursor_df['precursor_mz'].values.flags)
returns:
C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : False WRITEABLE : False ALIGNED : True WRITEBACKIFCOPY : False

Version (please complete the following information):

# packages in environment at /home/tim/miniconda3/envs/alpha:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
_openmp_mutex             5.1                       1_gnu  
alphabase                 1.2.1                     dev_0    <develop>
alphadia                  1.5.4                     dev_0    <develop>
alpharaw                  0.4.3                    pypi_0    pypi
alphatims                 1.0.8                    pypi_0    pypi
altair                    5.2.0                    pypi_0    pypi
argparse                  1.4.0                    pypi_0    pypi
arrow                     1.3.0                    pypi_0    pypi
attrs                     23.2.0                   pypi_0    pypi
biopython                 1.83                     pypi_0    pypi
blinker                   1.7.0                    pypi_0    pypi
boto3                     1.34.62                  pypi_0    pypi
botocore                  1.34.62                  pypi_0    pypi
bravado                   11.0.3                   pypi_0    pypi
bravado-core              6.1.1                    pypi_0    pypi
ca-certificates           2023.12.12           h06a4308_0  
cachetools                5.3.3                    pypi_0    pypi
certifi                   2024.2.2                 pypi_0    pypi
cffi                      1.16.0                   pypi_0    pypi
charset-normalizer        3.3.2                    pypi_0    pypi
click                     8.1.7                    pypi_0    pypi
cloudpickle               3.0.0                    pypi_0    pypi
clr-loader                0.2.6                    pypi_0    pypi
contextlib2               21.6.0                   pypi_0    pypi
contourpy                 1.2.0                    pypi_0    pypi
cycler                    0.12.1                   pypi_0    pypi
dask                      2024.3.0                 pypi_0    pypi
dask-expr                 1.0.1                    pypi_0    pypi
dill                      0.3.8                    pypi_0    pypi
directlfq                 0.2.18                   pypi_0    pypi
filelock                  3.13.1                   pypi_0    pypi
fonttools                 4.49.0                   pypi_0    pypi
fqdn                      1.5.1                    pypi_0    pypi
fsspec                    2024.2.0                 pypi_0    pypi
future                    1.0.0                    pypi_0    pypi
gitdb                     4.0.11                   pypi_0    pypi
gitpython                 3.1.42                   pypi_0    pypi
greenlet                  3.0.3                    pypi_0    pypi
h5py                      3.10.0                   pypi_0    pypi
huggingface-hub           0.21.4                   pypi_0    pypi
idna                      3.6                      pypi_0    pypi
importlib-metadata        7.0.2                    pypi_0    pypi
importlib-resources       6.3.0                    pypi_0    pypi
isoduration               20.11.0                  pypi_0    pypi
jinja2                    3.1.3                    pypi_0    pypi
jmespath                  1.0.1                    pypi_0    pypi
joblib                    1.3.2                    pypi_0    pypi
jsonpointer               2.4                      pypi_0    pypi
jsonref                   1.1.0                    pypi_0    pypi
jsonschema                4.21.1                   pypi_0    pypi
jsonschema-specifications 2023.12.1                pypi_0    pypi
kiwisolver                1.4.5                    pypi_0    pypi
ld_impl_linux-64          2.38                 h1181459_1  
libffi                    3.4.4                h6a678d5_0  
libgcc-ng                 11.2.0               h1234567_1  
libgomp                   11.2.0               h1234567_1  
libstdcxx-ng              11.2.0               h1234567_1  
lightning-utilities       0.10.1                   pypi_0    pypi
llvmlite                  0.42.0                   pypi_0    pypi
locket                    1.0.0                    pypi_0    pypi
lxml                      5.1.0                    pypi_0    pypi
markdown-it-py            3.0.0                    pypi_0    pypi
markupsafe                2.1.5                    pypi_0    pypi
matplotlib                3.8.3                    pypi_0    pypi
mdurl                     0.1.2                    pypi_0    pypi
mono                      6.12.0.182           he621ea3_0  
monotonic                 1.6                      pypi_0    pypi
mpmath                    1.3.0                    pypi_0    pypi
msgpack                   1.0.8                    pypi_0    pypi
multiprocess              0.70.16                  pypi_0    pypi
ncurses                   6.4                  h6a678d5_0  
neptune                   1.9.1                    pypi_0    pypi
networkx                  3.2.1                    pypi_0    pypi
numba                     0.59.0                   pypi_0    pypi
numpy                     1.26.4                   pypi_0    pypi
nvidia-cublas-cu12        12.1.3.1                 pypi_0    pypi
nvidia-cuda-cupti-cu12    12.1.105                 pypi_0    pypi
nvidia-cuda-nvrtc-cu12    12.1.105                 pypi_0    pypi
nvidia-cuda-runtime-cu12  12.1.105                 pypi_0    pypi
nvidia-cudnn-cu12         8.9.2.26                 pypi_0    pypi
nvidia-cufft-cu12         11.0.2.54                pypi_0    pypi
nvidia-curand-cu12        10.3.2.106               pypi_0    pypi
nvidia-cusolver-cu12      11.4.5.107               pypi_0    pypi
nvidia-cusparse-cu12      12.1.0.106               pypi_0    pypi
nvidia-nccl-cu12          2.19.3                   pypi_0    pypi
nvidia-nvjitlink-cu12     12.4.99                  pypi_0    pypi
nvidia-nvtx-cu12          12.1.105                 pypi_0    pypi
oauthlib                  3.2.2                    pypi_0    pypi
openssl                   3.0.13               h7f8727e_0  
packaging                 23.2                     pypi_0    pypi
pandas                    2.2.1                    pypi_0    pypi
partd                     1.4.1                    pypi_0    pypi
peptdeep                  1.1.6                    pypi_0    pypi
pillow                    10.2.0                   pypi_0    pypi
pip                       23.3.1           py39h06a4308_0  
plotly                    5.20.0                   pypi_0    pypi
progressbar               2.5                      pypi_0    pypi
protobuf                  4.25.3                   pypi_0    pypi
psutil                    5.9.8                    pypi_0    pypi
pyahocorasick             2.0.0                    pypi_0    pypi
pyarrow                   15.0.1                   pypi_0    pypi
pycparser                 2.21                     pypi_0    pypi
pydeck                    0.8.1b0                  pypi_0    pypi
pygments                  2.17.2                   pypi_0    pypi
pyjwt                     2.8.0                    pypi_0    pypi
pyparsing                 3.1.2                    pypi_0    pypi
pyteomics                 4.7.1                    pypi_0    pypi
python                    3.9.18               h955ad1f_0  
python-dateutil           2.9.0.post0              pypi_0    pypi
python-decouple           3.8                      pypi_0    pypi
pythonnet                 3.0.3                    pypi_0    pypi
pytz                      2024.1                   pypi_0    pypi
pyyaml                    6.0.1                    pypi_0    pypi
pyzstd                    0.15.9                   pypi_0    pypi
readline                  8.2                  h5eee18b_0  
referencing               0.33.0                   pypi_0    pypi
regex                     2023.12.25               pypi_0    pypi
requests                  2.31.0                   pypi_0    pypi
requests-oauthlib         1.4.0                    pypi_0    pypi
rfc3339-validator         0.1.4                    pypi_0    pypi
rfc3986-validator         0.1.1                    pypi_0    pypi
rich                      13.7.1                   pypi_0    pypi
rocket-fft                0.2.4                    pypi_0    pypi
rpds-py                   0.18.0                   pypi_0    pypi
s3transfer                0.10.0                   pypi_0    pypi
safetensors               0.4.2                    pypi_0    pypi
scikit-learn              1.4.1.post1              pypi_0    pypi
scipy                     1.12.0                   pypi_0    pypi
seaborn                   0.13.2                   pypi_0    pypi
setuptools                68.2.2           py39h06a4308_0  
simplejson                3.19.2                   pypi_0    pypi
six                       1.16.0                   pypi_0    pypi
smmap                     5.0.1                    pypi_0    pypi
sqlalchemy                2.0.28                   pypi_0    pypi
sqlite                    3.41.2               h5eee18b_0  
streamlit                 1.32.2                   pypi_0    pypi
streamlit-aggrid          0.3.4.post3              pypi_0    pypi
swagger-spec-validator    3.0.3                    pypi_0    pypi
sympy                     1.12                     pypi_0    pypi
tenacity                  8.2.3                    pypi_0    pypi
threadpoolctl             3.3.0                    pypi_0    pypi
tk                        8.6.12               h1ccaba5_0  
tokenizers                0.15.2                   pypi_0    pypi
toml                      0.10.2                   pypi_0    pypi
toolz                     0.12.1                   pypi_0    pypi
torch                     2.2.1                    pypi_0    pypi
torchmetrics              1.3.1                    pypi_0    pypi
tornado                   6.4                      pypi_0    pypi
tqdm                      4.66.2                   pypi_0    pypi
transformers              4.38.2                   pypi_0    pypi
triton                    2.2.0                    pypi_0    pypi
types-python-dateutil     2.8.19.20240311          pypi_0    pypi
typing-extensions         4.10.0                   pypi_0    pypi
tzdata                    2024.1                   pypi_0    pypi
uri-template              1.3.0                    pypi_0    pypi
urllib3                   1.26.18                  pypi_0    pypi
watchdog                  4.0.0                    pypi_0    pypi
webcolors                 1.13                     pypi_0    pypi
websocket-client          1.7.0                    pypi_0    pypi
wget                      3.2                      pypi_0    pypi
wheel                     0.41.2           py39h06a4308_0  
xxhash                    3.4.1                    pypi_0    pypi
xz                        5.4.6                h5eee18b_0  
zipp                      3.18.0                   pypi_0    pypi
zlib                      1.2.13               h5eee18b_0  
zstandard                 0.22.0                   pypi_0    pypi

Remove import sides effects and wildcard importing

WHO
Devs

WHAT
Use imports in a standardized way

WHY
Clean code, less bugs

Acceptance Criteria

unused wildcard imports are removed
import side effects (reader registration) are made explicit

Additional information
see "noqa"'s in alphabase code, also:
#159 (review)

missing proteins and genes in alphabase.spectral_library.reader.SWATHLibraryReader

Handle two columns with same name in dataframe validation

Tutorial: basic settings in alphabase

The tutorial notebook of basic settings in alphabase:

base DF structures
spec libs
HDF5 storage

LOESS Regression fails with few datapoints

There is still an issue if there fewer datapoints than n_kernels * kernel_size. Will change this in the future.

Originally posted by @GeorgWa in #41 (comment)

Ion mobility column of spectronaut report not being read in by psm_reader

When trying to read in the output of spectronaut_report file, the column called "FG.ApexIonMobility" is not being imported with the rest of the data (the column seems to be dropped). Trying out the following seems to read in the information into self._psm_df.

from alphabase.psm_reader.dia_psm_reader import psm_reader_yaml
psm_reader_yaml["spectronaut_report"]["column_mapping"]["mobility"]="FG.ApexIonMobility"

However, the data type of the entries in the "mobility" column are strings and not floats. This leads to a downstream error in the alphabase.peptide.mobility.mobility_to_ccs_bruker function, as here the strings are multiplied with a float value (CCS_IM_COEF). Ive attached a screenshot with the issue.

Is there a way to import the column correctly?

Screenshots

Version:

Installation Type pip install
- Platform information
  - system windows
- Python information:
  - alphabase version 1.2.5
  - [other packages]

DiaNNDecoyLib decoy generation fails with Selenocystein

Describe the bug
Alphabase can't generate decooys when the sequence contains a selenocysteine.

To Reproduce
Steps to reproduce the behavior:

Download reviewed Fasta from Uniprot
Generate DIA-NN decoy lib as described in APD notebook

Expected behavior
Decoy sequences are appended.

Logs
fasta_lib.append_decoy_sequence() fails with: 'substring not found'

Example

for seq in decoy_lib.precursor_df.sequence:
    try:
        (lambda x:
            x[0]+decoy_lib.mutated_AAs[decoy_lib.raw_AAs.index(x[1])]+
            x[2:-2]+decoy_lib.mutated_AAs[decoy_lib.raw_AAs.index(x[-2])]+x[-1])(seq)
    except Exception as e:
        print(seq)
        #print(e)

SUCCHCR
KUEUPSN
GPSSGGUG
RGPSSGGUG
FUIFSSSLK
AEENITESCQUR
SGLDPTVTGCUG
SGASILQAGCUG
SSGLDITQKGCUG
RSGLDPTVTGCUG
GUGCKVPQEALLK
FUIFSSSLKFVPK
RSGASILQAGCUG
SUCCHCRHLIFEK
LYAGAILEVCGUK
KLYAGAILEVCGUK
LANGGEGMEEATVVIEHCTSUR
LPPAAUQISQQLIPTEASASUR
EKLANGGEGMEEATVVIEHCTSUR
LPPAAUQISQQLIPTEASASURUK
ENLPSLCSUQGLRAEENITESCQUR
AEENITESCQURLPPAAUQISQQLIPTEASASUR

Last fragment missing in create_fragment_mz_dataframe_by_sort_precursor

When calculating the fragments for a precursor, the last fragment is missing:

Note that b7 (782.35672) and y7 is missing (800.36728)

Tutorial Outline Notebook

Move tests to pytest

WHO
Devs
WHAT
Have tests run by pytest
WHY
To get rid of nbdev

Acceptance Criteria

all tests that were run by nbdev so far should be moved to pytest

Additional information
cf. #156

Tutorial: Basic functionalities in alphabase

mass calculation
digestion
TBD psm readers
TBD hdf
isotopes

calc_precursor_mz leads to unexpected behavior

Hi Feng, I observed the following bug when using AB to create multiplexed DIA libraries. It's technically not a bug but more of a incompatible behavior between functions which might confuse users. Let me know what you think, I'm happy to make a PR.

Describe the bug
When calling calc_precursor_mz on a spectral library, clip_by_precursor_mz_ is always invoked, which messes up the precursor fragment mapping.

To Reproduce

Create a library with a preecursor just below the upper mz limit:

precursor_df = pd.DataFrame([
    {'sequence': 'AGHCEWQMKPERGWWWWWPPWWWWGGGGAGGAG', 'mods': 'Dimethyl@Any N-term;Dimethyl@K', 'mod_sites': '0;8', 'charge': 2},
    {'sequence': 'AGHCEWQMKPE', 'mods': 'Dimethyl@Any N-term;Dimethyl@K', 'mod_sites': '0;8', 'charge': 3},
    {'sequence': 'AGHCEWQMKPERGWWWWWPPWWWWGGGGAGGAGG', 'mods': '', 'mod_sites': '', 'charge': 2},
])

spec_lib = SpecLibBase(
    ['b_z1','b_z2','y_z1','y_z2'],
    decoy='pseudo_reverse',
    precursor_mz_max=2000,
)
spec_lib._precursor_df = precursor_df
spec_lib.calc_precursor_mz()
spec_lib.calc_fragment_mz_df()

spec_lib.precursor_df

sequence	mods	mod_sites	charge	nAA	precursor_mz	frag_start_idx	frag_stop_idx
AGHCEWQMKPE	Dimethyl@Any N-term;Dimethyl@K	0;8	3	11	457.877652	0	10
AGHCEWQMKPERGWWWWWPPWWWWGGGGAGGAG	Dimethyl@Any N-term;Dimethyl@K	0;8	2	33	1997.896037	10	42
AGHCEWQMKPERGWWWWWPPWWWWGGGGAGGAGG			2	34	1998.375468	42	75

Modify the sequence or modification which will change the m/z of some precursors, recalculate the precursor mz:

spec_lib._precursor_df['mods'] = spec_lib._precursor_df['mods'].str.replace('Dimethyl', 'Dimethyl:2H(6)13C(2)')
spec_lib.calc_precursor_mz()

sequence	mods	mod_sites	charge	nAA	precursor_mz	frag_start_idx	frag_stop_idx
AGHCEWQMKPE	Dimethyl:2H(6)13C(2)@Any N-term;Dimethyl:2H(6)...	0;8	3	11	463.240566	0	10
AGHCEWQMKPERGWWWWWPPWWWWGGGGAGGAGG			2	34	1998.375468	42	75

recalculate the fragment masses:
spec_lib.calc_fragment_mz_df()

File "/Users/georgwallmann/miniconda3/envs/alpha/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3378, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "/var/folders/lc/9594t94d5b5_gn0y04w1jh980000gn/T/ipykernel_6587/3999293303.py", line 1, in <module>
    spec_lib.calc_fragment_mz_df()
  File "/Users/georgwallmann/Documents/git/alphabase/alphabase/spectral_library/base.py", line 361, in calc_fragment_mz_df
    )
  File "/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/fragment.py", line 818, in create_fragment_mz_dataframe
  File "/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/fragment.py", line 876, in create_fragment_mz_dataframe
    '''
  File "/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/fragment.py", line 435, in mask_fragments_for_charge_greater_than_precursor_charge
    elif frag_type == 'y':
  File "/Users/georgwallmann/miniconda3/envs/alpha/lib/python3.9/site-packages/pandas/core/indexing.py", line 815, in __setitem__
    indexer = self._get_setitem_indexer(key)
  File "/Users/georgwallmann/miniconda3/envs/alpha/lib/python3.9/site-packages/pandas/core/indexing.py", line 698, in _get_setitem_indexer
    return self._convert_tuple(key)
  File "/Users/georgwallmann/miniconda3/envs/alpha/lib/python3.9/site-packages/pandas/core/indexing.py", line 897, in _convert_tuple
    keyidx = [self._convert_to_indexer(k, axis=i) for i, k in enumerate(key)]
  File "/Users/georgwallmann/miniconda3/envs/alpha/lib/python3.9/site-packages/pandas/core/indexing.py", line 897, in <listcomp>
    keyidx = [self._convert_to_indexer(k, axis=i) for i, k in enumerate(key)]
  File "/Users/georgwallmann/miniconda3/envs/alpha/lib/python3.9/site-packages/pandas/core/indexing.py", line 1394, in _convert_to_indexer
    key = check_bool_indexer(labels, key)
  File "/Users/georgwallmann/miniconda3/envs/alpha/lib/python3.9/site-packages/pandas/core/indexing.py", line 2567, in check_bool_indexer
    return check_array_indexer(index, result)
  File "/Users/georgwallmann/miniconda3/envs/alpha/lib/python3.9/site-packages/pandas/core/indexers/utils.py", line 553, in check_array_indexer
    raise IndexError(
IndexError: Boolean index has wrong length: 43 instead of 75

Expected behavior
calc_fragment_mz_df requires a continous frag_start_idx, frag_stop_idx, which is not compatible with the way clip_by_precursor_mz_ works. It is also not immediatley clear thatclip_by_precursor_mz_ has been called and precursors were removed.

Possible Solutions

don't call clip_by_precursor_mz_ by default within calc_precursor_mz
issue a warning: n precursors were remove because they were outside the mz limits
call spec_lib.remove_unused_fragments() right after clip_by_precursor_mz_

Race condition when reading speclib from two instances

Describe the bug
When two alphaDIA instance access the same speclib file (for reading) in the same instance of time, alphabase throws an error (see below).

Expected behavior
No error, as the file is just opened for reading in this cases
I think the problem is that files are opened for 'a' here (hdf.py)

class HDF_File(HDF_Group):
    def __init__():
...
        mode = "w" if delete_existing else "a"
        with h5py.File(file_name, mode):  # , swmr=True):
            pass

Logs

[2024-06-12, 15:19:11 UTC] {ssh.py:526} INFO - 0:00:00.022445 INFO: Running DynamicLoader
[2024-06-12, 15:19:11 UTC] {ssh.py:526} INFO - 0:00:00.027390 INFO: Loading .hdf library from /fs/hela_hybrid.small.hdf
[2024-06-12, 15:19:11 UTC] {ssh.py:526} INFO - 0:00:00.031234 INFO: Traceback (most recent call last):
[2024-06-12, 15:19:11 UTC] {ssh.py:526} INFO -   File "/fs/home/xx/conda-envs/alphadia-1.6.2/lib/python3.11/site-packages/alphadia/cli.py", line 333, in run
[2024-06-12, 15:19:11 UTC] {ssh.py:526} INFO -     plan = Plan(
[2024-06-12, 15:19:11 UTC] {ssh.py:526} INFO -            ^^^^^
[2024-06-12, 15:19:11 UTC] {ssh.py:526} INFO -   File "/fs/home/kraken/conda-envs/alphadia-1.6.2/lib/python3.11/site-packages/alphadia/planning.py", line 126, in __init__
[2024-06-12, 15:19:11 UTC] {ssh.py:526} INFO -     self.load_library()
[2024-06-12, 15:19:11 UTC] {ssh.py:526} INFO -   File "/fs/home/kraken/conda-envs/alphadia-1.6.2/lib/python3.11/site-packages/alphadia/planning.py", line 205, in load_library
[2024-06-12, 15:19:11 UTC] {ssh.py:526} INFO -     spectral_library = dynamic_loader(self.library_path)
[2024-06-12, 15:19:11 UTC] {ssh.py:526} INFO -                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2024-06-12, 15:19:11 UTC] {ssh.py:526} INFO -   File "/fs/home/kraken/conda-envs/alphadia-1.6.2/lib/python3.11/site-packages/alphadia/libtransform.py", line 40, in __call__
[2024-06-12, 15:19:11 UTC] {ssh.py:526} INFO -     return self.forward(*args)
[2024-06-12, 15:19:11 UTC] {ssh.py:526} INFO -            ^^^^^^^^^^^^^^^^^^^
[2024-06-12, 15:19:11 UTC] {ssh.py:526} INFO -   File "/fs/home/kraken/conda-envs/alphadia-1.6.2/lib/python3.11/site-packages/alphadia/libtransform.py", line 121, in forward
[2024-06-12, 15:19:11 UTC] {ssh.py:526} INFO -     library.load_hdf(input_path, load_mod_seq=True)
[2024-06-12, 15:19:11 UTC] {ssh.py:526} INFO -   File "/fs/home/kraken/conda-envs/alphadia-1.6.2/lib/python3.11/site-packages/alphabase/spectral_library/base.py", line 681, in load_hdf
[2024-06-12, 15:19:11 UTC] {ssh.py:526} INFO -     _hdf = HDF_File(
[2024-06-12, 15:19:11 UTC] {ssh.py:526} INFO -            ^^^^^^^^^
[2024-06-12, 15:19:11 UTC] {ssh.py:526} INFO -   File "/fs/home/kraken/conda-envs/alphadia-1.6.2/lib/python3.11/site-packages/alphabase/io/hdf.py", line 533, in __init__
[2024-06-12, 15:19:11 UTC] {ssh.py:526} INFO -     with h5py.File(file_name, mode):#, swmr=True):
[2024-06-12, 15:19:11 UTC] {ssh.py:526} INFO -          ^^^^^^^^^^^^^^^^^^^^^^^^^^
[2024-06-12, 15:19:11 UTC] {ssh.py:526} INFO -   File "/fs/home/kraken/conda-envs/alphadia-1.6.2/lib/python3.11/site-packages/h5py/_hl/files.py", line 562, in __init__
[2024-06-12, 15:19:11 UTC] {ssh.py:526} INFO -     fid = make_fid(name, mode, userblock_size, fapl, fcpl, swmr=swmr)
[2024-06-12, 15:19:11 UTC] {ssh.py:526} INFO -           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2024-06-12, 15:19:11 UTC] {ssh.py:526} INFO -   File "/fs/home/kraken/conda-envs/alphadia-1.6.2/lib/python3.11/site-packages/h5py/_hl/files.py", line 247, in make_fid
[2024-06-12, 15:19:11 UTC] {ssh.py:526} INFO -     fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)
[2024-06-12, 15:19:11 UTC] {ssh.py:526} INFO -           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2024-06-12, 15:19:11 UTC] {ssh.py:526} INFO -   File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
[2024-06-12, 15:19:11 UTC] {ssh.py:526} INFO -   File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
[2024-06-12, 15:19:11 UTC] {ssh.py:526} INFO -   File "h5py/h5f.pyx", line 102, in h5py.h5f.open
[2024-06-12, 15:19:11 UTC] {ssh.py:526} INFO - BlockingIOError: [Errno 11] Unable to synchronously open file (unable to lock file, errno = 11, error message = 'Resource temporarily unavailable')
[2024-06-12, 15:19:11 UTC] {ssh.py:526} INFO -

modloss_importance in modification.tsv

Were the modloss_importance values derived from somewhere or just arbitrary? I noticed they were only set to non-zero values for phospho and glygly. Does this mean that unless manually changed in modification.tsv to be 1, b/y ions with modlosses will be ignored?

Thanks,
Kevin

ERROR: 'charmap' codec can't decode byte 0x81 in position 4429...

Describe the bug
Running alphaDIA (one-Click installer for Windows) with the example HeLa test samples and the parameters as described resulted in this error thrown after a few seconds:

Any idea about what went wrong?

Loading library from hdf to translate to tsv fails with KeyError

Describe the bug
I am trying to read a spectral library saved to .hdf to translate it to a .tsv for use with DIA-NN. When running translate_to_tsv I get a key error instead:

--> [277]df["ModifiedPeptide"] = speclib._precursor_df[
    [278]     ["sequence", "mods", "mod_sites"]
    [279] ].apply(
    [280]    create_modified_sequence,
    [281]     axis=1,
    [282]     translate_mod_dict=translate_mod_dict,
    [283]     mod_sep="[]",
    [284] )

KeyError: "None of [Index(['sequence', 'mods', 'mod_sites'], dtype='object')] are in the [columns]"

To Reproduce
Steps to reproduce the behavior:

I ran this previously to predict and save the library:

from peptdeep.protein.fasta import PredictSpecLibFasta
fasta_lib = PredictSpecLibFasta(...)
fasta_lib.predict_all()
fasta_lib.save_hdf("filename.hdf")

Now I am trying to read and save using alphabase:

from alphabase.spectral_library.base import SpecLibBase
speclib = SpecLibBase()
speclib.load_hdf("filename")
from peptdeep.spec_lib.translate import translate_to_tsv #pointing to alphabase
translate_to_tsv(speclib, "filename.tsv", keep_k_highest_fragments=12, min_frag_intensity=0.001,
)

Expected behavior
This should behave exaclty the same as if I had the original PredictSpecLibFasta object as input to translate_to_tsv.

Logs
See above.

Version (please complete the following information):

PIP installed
If no log is available, provide the following:
- Platform information
  - system Sonoma
  - release 14.5
- Python information:
  - alphabase 1.2.5
  - alpharaw 0.4.5
  - numpy 2.0.1
  - pandas 2.2.2
  - peptdeep 1.2.1
  - torch 2.4.0

Additional context
I used mods not in the original constants table, but had updated the table on disc to make prediction work.

Use multiprocessing for decoy generation

At the moment decoy generation is not using multiprocessing by default.
A batched approach could significantly decrease runtime.

GUI with servers crashed due to multiprocessing with torch and mono

Possible solution is to start the server and GUI separately.

Use https://charlesreid1.github.io/python-patterns-the-registry.html for registers

          just a note, there's a cool "registry" pattern, that we could use here, if we wanted ;-) https://charlesreid1.github.io/python-patterns-the-registry.html

Originally posted by @mschwoer in #172 (comment)

msgfplus rescoring support or AlphaPeptDeep format example

I'd like to use the output of the msgfplus search engine (which I can convert to percolator format) as an input toAlphaPeptDeep's rescoring module. However, I don't think that either msgfplus output or percolator output is supported? Thoughts?

(One idea I had was to write a converter from msgfplus (or percolator) to the "AlphaPeptDeep" format, but I couldn't find an example file for AlphaPeptDeep and documentation for what fields are required so that I could create such a converter. Thoughts?)

Using `calc_precursor_isotope_intensity()` instead of `calc_precursor_isotope()` for default isotope processing

TODO after fixing #120 for the next release

SpecLib function to filter fragments by number of allowed fragments

Update isotope calculation

Update the isotope calculation for spectral libraries to allow for dense and sparse isotope creation.
Both should be possible by calling a class method like create_dense_isotopes, create_sparse_isotopes.

SpecLib function to count number of fragments per precursor

Code formatting

WHO
Developers
WHAT
Have properly formatted code in alphabase
WHY
Best practises, less overhead when developing

Code is formatted once using ruff
pre-commit hook is in place that takes care about the formatting
pre-commit hook runs in CI pipeline

Basic Linting

WHO
Developers
WHAT
Have properly linted code
WHY
Best practises, less overhead when developing

Code is linted once using ruff (with sensible defaults), all issues are fixed (either by code changes or by applying noqa)
pre-commit hook is in place that takes care about the linting (also in CI)
add default pre-commit hooks (whitespaces, line-endings)

Add more tests to CI pipeline

WHO
Developers
WHAT
Have immediate feedback on test results
WHY
Best practises, less overhead when developing

more tests are run in the CI
-- doc notebooks
-- some other notebooks that are lying around in the repo

Note that this should just be a temporary state: we should move all nbdev tests to pytest (but still run the documentation notebooks as part of CI)

Handling of non amino-acid FASTA characters

Describe the bug
Currently, non AA characters like X,J,B break the isotope calculation and might have undefined behaviour in other functions.
Where should we capture these precursors? during the FASTA digest?

To Reproduce
Steps to reproduce the behavior:

Download Isoform FASTA
Run fasta digest

Expected behavior
I think it would be best to drop them with a warning.

Logs

2024-01-15 18:54:41> Running IsotopeGenerator
  0%|          | 1/464 [00:05<39:16,  5.09s/it]/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/precursor.py:613: RuntimeWarning: invalid value encountered in divide
  precursor_dist [/](https://file+.vscode-resource.vscode-cdn.net/)= np.sum(precursor_dist, axis=1, keepdims=True)
  8%|▊         | 35/464 [00:06<00:30, 14.08it/s]/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/precursor.py:613: RuntimeWarning: invalid value encountered in divide
  precursor_dist [/](https://file+.vscode-resource.vscode-cdn.net/)= np.sum(precursor_dist, axis=1, keepdims=True)
 19%|█▊        | 86/464 [00:08<00:13, 27.12it/s]/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/precursor.py:613: RuntimeWarning: invalid value encountered in divide
  precursor_dist [/](https://file+.vscode-resource.vscode-cdn.net/)= np.sum(precursor_dist, axis=1, keepdims=True)
 22%|██▏       | 101/464 [00:09<00:13, 27.31it/s]/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/precursor.py:613: RuntimeWarning: invalid value encountered in divide
  precursor_dist [/](https://file+.vscode-resource.vscode-cdn.net/)= np.sum(precursor_dist, axis=1, keepdims=True)
 27%|██▋       | 126/464 [00:10<00:13, 25.06it/s]/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/precursor.py:613: RuntimeWarning: invalid value encountered in divide
  precursor_dist [/](https://file+.vscode-resource.vscode-cdn.net/)= np.sum(precursor_dist, axis=1, keepdims=True)
[/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/precursor.py:613](https://file+.vscode-resource.vscode-cdn.net/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/precursor.py:613): RuntimeWarning: invalid value encountered in divide
  precursor_dist [/](https://file+.vscode-resource.vscode-cdn.net/)= np.sum(precursor_dist, axis=1, keepdims=True)
 35%|███▌      | 163/464 [00:12<00:18, 16.47it/s]/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/precursor.py:613: RuntimeWarning: invalid value encountered in divide
  precursor_dist [/](https://file+.vscode-resource.vscode-cdn.net/)= np.sum(precursor_dist, axis=1, keepdims=True)
 46%|████▌     | 213/464 [00:14<00:11, 22.18it/s]/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/precursor.py:613: RuntimeWarning: invalid value encountered in divide
  precursor_dist [/](https://file+.vscode-resource.vscode-cdn.net/)= np.sum(precursor_dist, axis=1, keepdims=True)
 62%|██████▏   | 288/464 [00:17<00:07, 24.15it/s]/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/precursor.py:613: RuntimeWarning: invalid value encountered in divide
  precursor_dist [/](https://file+.vscode-resource.vscode-cdn.net/)= np.sum(precursor_dist, axis=1, keepdims=True)
 74%|███████▎  | 342/464 [00:20<00:06, 19.47it/s]/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/precursor.py:613: RuntimeWarning: invalid value encountered in divide
  precursor_dist [/](https://file+.vscode-resource.vscode-cdn.net/)= np.sum(precursor_dist, axis=1, keepdims=True)
 93%|█████████▎| 432/464 [00:24<00:01, 19.63it/s]/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/precursor.py:613: RuntimeWarning: invalid value encountered in divide
  precursor_dist [/](https://file+.vscode-resource.vscode-cdn.net/)= np.sum(precursor_dist, axis=1, keepdims=True)
100%|██████████| 464/464 [00:26<00:00, 17.76it/s]

DIA-NN .speclib reader

Allow to import DIA-NN .speclib files to alphaBase based on @jalew188 implementation Link.

Protein-level decoy is not consistant with `DecoyGenerator` class

TODO: comment protein-level decoy as a temp solution.

implement SpecLibFlat.remove_unused_fragments()

Currently, remove_unused_fragments is only implemented for dense libraries.
An implementation for flat libraries would be very usefull and should be easy to implement.

Include testcase for unexpected modifications in the library reader

Include dataframe validation schema

Modification site bug for SageReader

https://github.com/MannLabs/alphabase/blob/development/alphabase/psm_reader/sage_reader.py#L81

The modification site in alphabase (mod_sites) is 1<=site<=n, 0 is for N-term, and -1 for C-term.

Sage reader generates a wroing site (0<=site<=n-1).

Adjust isotope distribution if first isotope isn't the monoisotopic intensity

In certain cases the array returned from dist, mono = isotope_dist.calc_formula_distribution() will start before the monoisotopic mass. In these cases, it needs to betrimmed or tracked.

alphabase.peptide.precursor.calc_precursor_isotope_intensity() needs to be adapted to handle this case.

mmh3 installation error

Discussed in #78

^{Originally posted by jalew188 January 18, 2023}
As mmh3 depends on compiling system (e.g. Visual Studio in Windows) as its core is written in C/C++, it is better to write our own hash method using numba to replace mmh3.

See pypa/pip#8657

mannlabs / alphabase Goto Github PK

alphabase's Introduction

AlphaBase

About

Packages built upon AlphaBase

Citations

License

Installation

Pip

Developer

Usage

Troubleshooting

How to contribute

Notes for developers

pre-commit hooks

Changelog

alphabase's People

Contributors

Stargazers

Watchers

Forkers

alphabase's Issues

Discussed in #78

Recommend Projects

Recommend Topics

Recommend Org