mannlabs / alphapeptstats Goto Github PK

View Code? Open in Web Editor NEW

48.0 4.0 12.0 118.12 MB

Python Package for the downstream analysis of mass-spectrometry-based proteomics data

Home Page: https://alphapeptstats.readthedocs.io/en/latest/

License: Apache License 2.0

Python 1.30% Shell 0.02% Jupyter Notebook 98.66% HTML 0.01% Inno Setup 0.01% Dockerfile 0.01% TeX 0.01%

mass-spectrometry maxquant proteomics alphapept-ecosystem dia-nn fragpipe msfragger spectronaut

alphapeptstats's People

Contributors

Stargazers

Watchers

Forkers

animesh educhicano techthiyanes elena-krismer bolak92 bioinf-lab dtwork alexpmagalhaes chuanping-zhao zty-2001 idajsh ibludau

alphapeptstats's Issues

One Click Installer crashing

there seems to be an issue with the newest streamlit update

Cut offs for Volcano Plots greatly shifted from significance

Inserting image up top show the discrepancy I'm seeing. I'm employing the stats from this package by applying select formulas to log2(LFQ) by repeatedly calling the perform_ttest_analysis function in the following function.

def alphastats_ttest(data, s0, fdr): 
    dfs = [df.reset_index() for df in data]
    stat_dfs = []
    t_limits = []
    for df in dfs: 
        grp1colnames = df.columns[df.columns.str.contains('WT')].to_list()
        grp2colnames = df.columns[df.columns.str.contains('Test')].to_list()
        
        stats, tmax = perform_ttest_analysis(
                                            df,  
                                            grp2colnames, 
                                            grp1colnames, 
                                            s0=s0, #Refer to Tusher et al. 2001 for s0 definition. 
                                            n_perm=2,
                                            fdr=fdr, #5% FDR 
                                            id_col="Uniprot",
                                            plot_fdr_line=True,
                                            parallelize=True
        )

        stat_dfs.append(stats)
        t_limits.append(tmax)
        
    return stat_dfs, t_limits

I then calculate df's from the list of returned t_limits, before plotting and getting the volcanoes attached here. Seems that there is a disconnect between how the ttest cut-off is being calculated in the get_MaxS vs the usual perform_ttest_analysis?

Here's the last portion of my code before plotting.

cut_offs = []

for i, df in enumerate(stats):
    n_x, n_y = len(df), len(df)
    s0 = 1.5
    
    cut_off = get_fdr_line(t_limits[i],
                        s0, n_x, n_y, plot=False,
                        fc_s=np.arange(0, 10, 0.05),
                        s_s=np.arange(0.005, 10, 0.05))
    
    cut_off['-logp'] = -np.log(cut_off['pvals']) #transform into -log10 space for plotting
    cut_offs.append(cut_off)

info for the GUI

this is not clear to me.
The tutorial notebook does not touch on the GUI and the example plots shown there appear to be normal jupyter notebook plots, independent from the GUI interface and GUI-generated interactive plots shown on the alphapeptstats landing page

Type of FDR correction

What are the Type of FDR corrections
availables at the Alphapeptstats? Benjamini Hochberg?

Thank you

PyPi Links not working

The links on the readme work fine, e.g. the one linking to the Notebook tutorial, however, the ones on PyPi do not:
https://pypi.org/project/alphastats/

NOt working

Traceback (most recent call last):
File "C:\Anaconda\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 552, in _run_script
exec(code, module.dict)
File "C:\Users\rs3869\AppData\Local\Programs\AlphaPeptStats\alphastats\gui\AlphaPeptStats.py", line 8, in
from utils.ui_helper import sidebar_info, img_to_bytes
File "C:\Users\rs3869\AppData\Local\Programs\AlphaPeptStats\alphastats\gui\utils\ui_helper.py", line 4, in
from alphastats import version
ModuleNotFoundError: No module named 'alphastats'

Not able to input FragPipe Output data

Greetings,

Describe the bug
I am not able to input my FragPipe output data into your tool on your GUI. Please refer to screenshot of the error below.

Regards,
Ben

can't install: Microsoft visual error message

I have picked python 3.10.9 according to the recommendations of python versions.

Tries to install:

yields:
Installing collected packages: iteration-utilities, gitdb, click, blinker, tzlocal, sparse, rich, pydeck, pandas, outdated, numba-stats, gitpython, sklearn-pandas, data-cache, altair, streamlit, pandas-flavor, batchglm, pingouin, diffxpy, swifter, alphastats
Running setup.py install for iteration-utilities: started
Running setup.py install for iteration-utilities: finished with status 'error'
Note: you may need to restart the kernel to use updated packages.
Output exceeds the size limit. Open the full output data in a text editor error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [15 lines of output]
running bdist_wheel
running build
running build_py
creating build
creating build\lib.win-amd64-cpython-310
creating build\lib.win-amd64-cpython-310\iteration_utilities
copying src\iteration_utilities_additional_recipes.py -> build\lib.win-amd64-cpython-310\iteration_utilities
copying src\iteration_utilities_classes.py -> build\lib.win-amd64-cpython-310\iteration_utilities
copying src\iteration_utilities_convenience.py -> build\lib.win-amd64-cpython-310\iteration_utilities
copying src\iteration_utilities_recipes.py -> build\lib.win-amd64-cpython-310\iteration_utilities
copying src\iteration_utilities_utils.py -> build\lib.win-amd64-cpython-310\iteration_utilities
copying src\iteration_utilities_init_.py -> build\lib.win-amd64-cpython-310\iteration_utilities
running build_ext
building 'iteration_utilities._iteration_utilities' extension
error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for iteration-utilities
error: subprocess-exited-with-error
...
╰─> iteration-utilities

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.

While actually, I have:

Feedbacks from intro

explain metadata scheme, make a template
downsize metadata in case there are more metadata samples than in proteinGroups.txt
display gene names in plotly Volcano plot
add option to add labels to Volcano plot
implement U-map

efficient onversion back and forth to AnnData

Proteomics data typically contains Protein intensity across samples (matrix X), with extra variable annotations (var, ie. protein annotation) and observation annotations (obs, ie sample annotations). a very efficient way of storing this in a single object is Anndata objects.

https://anndata.readthedocs.io/en/latest/

I use them alot and would like to efficiently input them into alphapeptstats and out of it (e.g. applying a transformation in alphapeptstats and then exporting it as modified Anndata object)

Note that metadata is already perfectly matched in the AnnData object, both for variables and observations.

Can you make input and output between alphapeptstats and AnnData objects efficient?

`nbformat` requirement missing

Excited to try alphapeptstats! It's great to see a new open-source proteomics tool out there.

Describe the bug
I'm working through the tutorial in jupyter notebook, and the plot_sampledistribution threw an error about nbformat.

It looks like nbformat is a missing requirement that's not downloaded with pip install alphastats.

Version (please complete the following information):

OS: macOS
Version: 0.6.3
Installation Type: pip

Features

implement Variance stabilizing transformation from PyDesq2 https://github.com/owkin/PyDESeq2 (when released)
GSEApy
Combat
Paired T-test
Distribution for each sample

sam method in VolcanoPlot checking Log2(x) with incorrect condition

alphapeptstats/alphastats/plots/VolcanoPlot.py

Line 152 in 89843be

if self.dataset.preprocessing_info["Normalization"] is None:

Hi, I noticed this line in the sam method of VolcanoPlot.py checks normalization status of the processing info instead of the Log2 status:

Here's what I suggest it to be instead:
if self.dataset.preprocessing_info['Log2-transformed'] is False:

Additionally, this method doesn't allow s0 to be defined by user, and instead defines constants within itself. Could this be added for users trying to call a moderated t-test and SAM FDR method?

Support Peptide data

DEqMS for differential expression

I suggest the inclusion of DEqMS in the differential expression algorithms https://www.bioconductor.org/packages/release/bioc/html/DEqMS.html

the package exists in R and can be maybe called through rpy2 package (I could potentially help with this)

Further explanation can be found in the paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7261819/

Incorrect VST normalization

Describe the bug

While attempting to run VST (Variance Stabilizing Transformation) normalization using AlphaPept, I encountered several issues that suggest the normalization process might not be functioning as intended.

Issue 1: Axis of Normalization
Upon debugging, it appears that the normalization is being performed across proteins (columns in ds.mat) rather than across samples (rows). Below is a screenshot of a table that supports my hypothesis:

To Reproduce:
I used a standard ProteinGroups.txt file and preprocessed it using the following code:

ds.preprocess(
    remove_contaminations=True,
    normalization = "vst"
)

Issue 2: Inconsistent PCA Graphs
The PCA graphs generated post-normalization are inconsistent, both in terms of axis scales and explained variance. Here's a screenshot for reference:

Issue 3: VST vs VSN Normalization
Is the VST normalization in AlphaPept intended to perform similarly to the VSN normalization method available in R? For reference, here is the VSN documentation.

Additional Information
Operating System: Windows 10
Python Environment: Conda

I would appreciate any guidance or fixes for these issues. Thank you!

streamlit windows installer problem

Describe the bug
Hi, I'm trying to install alphapeptstats via Windows installer, but after the installation it doesnt start properly.
The prompt windows suddenly closes and it shows the following message: " 'streamlit' is not recognized as an internal or external command, operable program or batch file "

To Reproduce
Steps to reproduce the behavior:
Without any previous install, try to install alphapeptstats with or without administrator permissions, for all users or current user.

Version (please complete the following information):

OS: win10 pro for workstations

Installation fails on M1-Mac due to tables dependency

Hi,

I tried installation on my M1 Mac (Ventura 13.5) but default installation failed:

(alphapeptstats) user@computer programming % conda create --name alphapeptstats python=3.10 -y
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /Users/user/miniconda3/envs/alphapeptstats

  added / updated specs:
    - python=3.10


The following NEW packages will be INSTALLED:

  bzip2              pkgs/main/osx-arm64::bzip2-1.0.8-h620ffc9_4 
  ca-certificates    pkgs/main/osx-arm64::ca-certificates-2023.05.30-hca03da5_0 
  libffi             pkgs/main/osx-arm64::libffi-3.4.4-hca03da5_0 
  ncurses            pkgs/main/osx-arm64::ncurses-6.4-h313beb8_0 
  openssl            pkgs/main/osx-arm64::openssl-3.0.9-h1a28f6b_0 
  pip                pkgs/main/osx-arm64::pip-23.2.1-py310hca03da5_0 
  python             pkgs/main/osx-arm64::python-3.10.12-hb885b13_0 
  readline           pkgs/main/osx-arm64::readline-8.2-h1a28f6b_0 
  setuptools         pkgs/main/osx-arm64::setuptools-68.0.0-py310hca03da5_0 
  sqlite             pkgs/main/osx-arm64::sqlite-3.41.2-h80987f9_0 
  tk                 pkgs/main/osx-arm64::tk-8.6.12-hb8d0fd4_0 
  tzdata             pkgs/main/noarch::tzdata-2023c-h04d1e81_0 
  wheel              pkgs/main/osx-arm64::wheel-0.38.4-py310hca03da5_0 
  xz                 pkgs/main/osx-arm64::xz-5.4.2-h80987f9_0 
  zlib               pkgs/main/osx-arm64::zlib-1.2.13-h5a0b063_0 



Downloading and Extracting Packages

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate alphapeptstats
#
# To deactivate an active environment, use
#
#     $ conda deactivate

(alphapeptstats) user@computer programming % conda activate alphapeptstats 
(alphapeptstats) user@computer programming % pip install alphastats
Collecting alphastats
  Obtaining dependency information for alphastats from https://files.pythonhosted.org/packages/6f/45/f0533a5192f0883102d9b345b5ac3638aa74fd161f82778eef88d36be16d/alphastats-0.6.3-py3-none-any.whl.metadata
  Using cached alphastats-0.6.3-py3-none-any.whl.metadata (6.9 kB)
Collecting pandas==2.0.2 (from alphastats)
  Obtaining dependency information for pandas==2.0.2 from https://files.pythonhosted.org/packages/54/d7/9e8ff0685d3454a13949e0503bdc789b4bc5bb35989c3948101e71b362cd/pandas-2.0.2-cp310-cp310-macosx_11_0_arm64.whl.metadata
  Using cached pandas-2.0.2-cp310-cp310-macosx_11_0_arm64.whl.metadata (18 kB)
Collecting scikit-learn==1.2.2 (from alphastats)
  Using cached scikit_learn-1.2.2-cp310-cp310-macosx_12_0_arm64.whl (8.5 MB)
Collecting data-cache>=0.1.6 (from alphastats)
  Using cached data_cache-0.1.6-py3-none-any.whl (6.7 kB)
Collecting plotly==5.15.0 (from alphastats)
  Obtaining dependency information for plotly==5.15.0 from https://files.pythonhosted.org/packages/a5/07/5bef9376c975ce23306d9217ab69ca94c07f2a3c90b17c03e3ae4db87170/plotly-5.15.0-py2.py3-none-any.whl.metadata
  Using cached plotly-5.15.0-py2.py3-none-any.whl.metadata (7.0 kB)
Collecting statsmodels==0.14.0 (from alphastats)
  Using cached statsmodels-0.14.0-cp310-cp310-macosx_11_0_arm64.whl (9.4 MB)
Collecting sklearn-pandas==2.2.0 (from alphastats)
  Using cached sklearn_pandas-2.2.0-py2.py3-none-any.whl (10 kB)
Collecting pingouin==0.5.3 (from alphastats)
  Using cached pingouin-0.5.3-py3-none-any.whl (198 kB)
Collecting scipy==1.10.1 (from alphastats)
  Using cached scipy-1.10.1-cp310-cp310-macosx_12_0_arm64.whl (28.8 MB)
Collecting tqdm>=4.64.0 (from alphastats)
  Using cached tqdm-4.65.0-py3-none-any.whl (77 kB)
Collecting diffxpy==0.7.4 (from alphastats)
  Using cached diffxpy-0.7.4-py3-none-any.whl (85 kB)
Collecting anndata==0.9.1 (from alphastats)
  Using cached anndata-0.9.1-py3-none-any.whl (102 kB)
Collecting umap-learn==0.5.3 (from alphastats)
  Using cached umap-learn-0.5.3.tar.gz (88 kB)
  Preparing metadata (setup.py) ... done
Collecting streamlit==1.22.0 (from alphastats)
  Using cached streamlit-1.22.0-py2.py3-none-any.whl (8.9 MB)
Collecting tables==3.7.0 (from alphastats)
  Using cached tables-3.7.0.tar.gz (8.2 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error
  
  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [12 lines of output]
      <string>:18: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
      ld: library not found for -lhdf5
      clang: error: linker command failed with exit code 1 (use -v to see invocation)
      cpuinfo failed, assuming no CPU features: No module named 'cpuinfo'
      * Using Python 3.10.12 (main, Jul  5 2023, 15:02:25) [Clang 14.0.6 ]
      * Found cython 3.0.0
      * USE_PKGCONFIG: False
      * Found conda env: ``/Users/user/miniconda3/envs/alphapeptstats``
      .. ERROR:: Could not find a local HDF5 installation.
         You may need to explicitly state where your local HDF5 headers and
         library can be found by setting the ``HDF5_DIR`` environment
         variable or by using the ``--hdf5`` command-line option.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
(alphapeptstats) user@computer programming %

I got it to work by first installing tables manually:

conda install -c anaconda pytables

Maybe this can be added to the installation instructions?

Preprocessing GUI Page - Data completeness entry html validation incorrect

Describe the bug
In the preprocessing page, "Data completeness across samples cut-off (0.7 -> protein has to be detected in at least 70% of the samples)", the input box only allows either 0 or 1 integer values. However, we need float with a smaller step size.

Verified Fix
Original:

# ...\venv\alphapeptstatsenv\Lib\site-packages\alphastats\gui\pages\03_Preprocessing.py

data_completeness = st.number_input(
    f"Data completeness across samples cut-off \n(0.7 -> protein has to be detected in at least 70% of the samples)",
    value=0, min_value=0, max_value=1
)

Fixed version:

# ...\venv\alphapeptstatsenv\Lib\site-packages\alphastats\gui\pages\03_Preprocessing.py

data_completeness = st.number_input(
    f"Data completeness across samples cut-off \n(0.7 -> protein has to be detected in at least 70% of the samples)",
    value=0.0, min_value=0.0, max_value=1.0, step=0.01,
)

Note:

must be float so use the 0.0 and 1.0 etc
step size should be specified otherwise the html renders as integer of 1.

ModuleNotFoundError: No module named '_curses'

Hi,

when starting the GUI version 0.4.4 (Win10) the following error is displayed in the Import Data tab:

Could you please help me finding a solution?
Thanks in advance
Michael

Support Spectronaut output

color from plot_sampledistribution

bug description
the bug happens when using the function .plot_sampledistribution while selecting a column for coloring by values. the column from the metadata has two unique values "D" or "N" for Diseased and Normal cells.

the graph output is in Black and white, there are still two entities in the legend but they are both having grey color?

I assume that the color pallete is not activated somehow or wrongly referenced in this case, or I might be missing some dependencies.
I tried to reinstall plotly package within my conda, but the issue was reproduced.

Screenshots

Version:

OS: Windows 10
Version: 0.6.3
Installation Type: Python (within a conda env)

Implementing `MissForest` imputation?

Is your feature request related to a problem? Please describe.
This preprint from the Noble Lab shows great performance of the MissForest method, which differs from random forest. I'm wondering if that could be implemented here.

MissForest is written in R, and the peprint uses rpy2 to use it in an iPython Notebook. That said, there's this implementation in python, but it doesn't have any testing, but it could perhaps be a starting point, https://github.com/yuenshingyan/MissForest.

Describe the solution you'd like
Offering MissForest as an imputation option.

Describe alternatives you've considered
Using no imputation and the rpy2 setup -- this requires a bit of setup with getting the R packages in the right place and referencing them. The R package allows threading, but that isn't tested in the notebook above, which just uses one thread.

Raw data number of groups in `preprocessing_info` has a filtered count

Is your feature request related to a problem? Please describe.
In loading, there is a filter for rows that have only zeroes after loading the raw values, but the ds.preprocessing_info value 'Raw data number of Protein Groups' reports the count of rows after that filtering.

Describe the solution you'd like
ds.preprocessing_info should report the 'Raw data number of Protein Groups' before filtering only-zero rows, and there should be an additional value (something like 'Raw data number of Protein Groups after removing proteins with only zero intensity') with the filtered count.

Imputation considers only NaN; should it also consider zeros?

Is your feature request related to a problem? Please describe.
I was getting some mysterious negative log2-transformed values after imputation, and it took me a while to figure out that zeros aren't being imputed. I changed them to NaN, and now they're being imputed.

Describe the solution you'd like
Adding a parameter for the preprocessing engine for which values to impute. Or change the default behavior to imputing both zeros and NaN are imputed.

Describe alternatives you've considered
I found a workaround with converting zeros to NaN before running imputation.

groupind replicates from sample names

Dear Developers,
My data consists of treatment VS control each consisting of 3 replicates, im using output from MAXquant, on how to group replicates int control and treatment groups during data analysis in alphapeptststs. PLease help this query. Thankyou in advance.
Best regards,
Muthu

Conversion of pd.DataFrame to a DataSet instance

Sometimes I need to use external tools for data processing, such as batch effect correction or normalization techniques that are not available in Alphapeptstats. However, the resulting data frame cannot be analyzed using the statistical methods provided by alphastats.

Is there a way to convert a data frame to a DataSet instance?

one-click Windows installer?

In the readme file it says:

However, when following the link the installers for Linux and MacOS are present but not the installer for Windows:

Could you please provide the missing installer as well?
Best,