Git Product home page Git Product logo

mannlabs / alphapeptstats Goto Github PK

View Code? Open in Web Editor NEW
48.0 4.0 12.0 118.12 MB

Python Package for the downstream analysis of mass-spectrometry-based proteomics data

Home Page: https://alphapeptstats.readthedocs.io/en/latest/

License: Apache License 2.0

Python 1.30% Shell 0.02% Jupyter Notebook 98.66% HTML 0.01% Inno Setup 0.01% Dockerfile 0.01% TeX 0.01%
mass-spectrometry maxquant proteomics alphapept-ecosystem dia-nn fragpipe msfragger spectronaut

alphapeptstats's People

Contributors

dependabot[bot] avatar elena-krismer avatar straussmaximilian avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

alphapeptstats's Issues

Cut offs for Volcano Plots greatly shifted from significance

Cut-offs_VolcanoPlots

Inserting image up top show the discrepancy I'm seeing. I'm employing the stats from this package by applying select formulas to log2(LFQ) by repeatedly calling the perform_ttest_analysis function in the following function.

def alphastats_ttest(data, s0, fdr): 
    dfs = [df.reset_index() for df in data]
    stat_dfs = []
    t_limits = []
    for df in dfs: 
        grp1colnames = df.columns[df.columns.str.contains('WT')].to_list()
        grp2colnames = df.columns[df.columns.str.contains('Test')].to_list()
        
        stats, tmax = perform_ttest_analysis(
                                            df,  
                                            grp2colnames, 
                                            grp1colnames, 
                                            s0=s0, #Refer to Tusher et al. 2001 for s0 definition. 
                                            n_perm=2,
                                            fdr=fdr, #5% FDR 
                                            id_col="Uniprot",
                                            plot_fdr_line=True,
                                            parallelize=True
        )

        stat_dfs.append(stats)
        t_limits.append(tmax)
        
    return stat_dfs, t_limits

I then calculate df's from the list of returned t_limits, before plotting and getting the volcanoes attached here. Seems that there is a disconnect between how the ttest cut-off is being calculated in the get_MaxS vs the usual perform_ttest_analysis?

Here's the last portion of my code before plotting.

cut_offs = []

for i, df in enumerate(stats):
    n_x, n_y = len(df), len(df)
    s0 = 1.5
    
    cut_off = get_fdr_line(t_limits[i],
                        s0, n_x, n_y, plot=False,
                        fc_s=np.arange(0, 10, 0.05),
                        s_s=np.arange(0.005, 10, 0.05))
    
    cut_off['-logp'] = -np.log(cut_off['pvals']) #transform into -log10 space for plotting
    cut_offs.append(cut_off)

info for the GUI

grafik

this is not clear to me.
The tutorial notebook does not touch on the GUI and the example plots shown there appear to be normal jupyter notebook plots, independent from the GUI interface and GUI-generated interactive plots shown on the alphapeptstats landing page

Type of FDR correction

What are the Type of FDR corrections
availables at the Alphapeptstats? Benjamini Hochberg?

Thank you

NOt working

Traceback (most recent call last):
File "C:\Anaconda\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 552, in _run_script
exec(code, module.dict)
File "C:\Users\rs3869\AppData\Local\Programs\AlphaPeptStats\alphastats\gui\AlphaPeptStats.py", line 8, in
from utils.ui_helper import sidebar_info, img_to_bytes
File "C:\Users\rs3869\AppData\Local\Programs\AlphaPeptStats\alphastats\gui\utils\ui_helper.py", line 4, in
from alphastats import version
ModuleNotFoundError: No module named 'alphastats'

Not able to input FragPipe Output data

Greetings,

Describe the bug
I am not able to input my FragPipe output data into your tool on your GUI. Please refer to screenshot of the error below.

image

Regards,
Ben

can't install: Microsoft visual error message

I have picked python 3.10.9 according to the recommendations of python versions.
grafik

Tries to install:
grafik

yields:
Installing collected packages: iteration-utilities, gitdb, click, blinker, tzlocal, sparse, rich, pydeck, pandas, outdated, numba-stats, gitpython, sklearn-pandas, data-cache, altair, streamlit, pandas-flavor, batchglm, pingouin, diffxpy, swifter, alphastats
Running setup.py install for iteration-utilities: started
Running setup.py install for iteration-utilities: finished with status 'error'
Note: you may need to restart the kernel to use updated packages.
Output exceeds the size limit. Open the full output data in a text editor error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [15 lines of output]
running bdist_wheel
running build
running build_py
creating build
creating build\lib.win-amd64-cpython-310
creating build\lib.win-amd64-cpython-310\iteration_utilities
copying src\iteration_utilities_additional_recipes.py -> build\lib.win-amd64-cpython-310\iteration_utilities
copying src\iteration_utilities_classes.py -> build\lib.win-amd64-cpython-310\iteration_utilities
copying src\iteration_utilities_convenience.py -> build\lib.win-amd64-cpython-310\iteration_utilities
copying src\iteration_utilities_recipes.py -> build\lib.win-amd64-cpython-310\iteration_utilities
copying src\iteration_utilities_utils.py -> build\lib.win-amd64-cpython-310\iteration_utilities
copying src\iteration_utilities_init_.py -> build\lib.win-amd64-cpython-310\iteration_utilities
running build_ext
building 'iteration_utilities._iteration_utilities' extension
error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for iteration-utilities
error: subprocess-exited-with-error
...
╰─> iteration-utilities

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.

While actually, I have:
grafik

Feedbacks from intro

  • explain metadata scheme, make a template
  • downsize metadata in case there are more metadata samples than in proteinGroups.txt
  • display gene names in plotly Volcano plot
  • add option to add labels to Volcano plot
  • implement U-map

efficient onversion back and forth to AnnData

Proteomics data typically contains Protein intensity across samples (matrix X), with extra variable annotations (var, ie. protein annotation) and observation annotations (obs, ie sample annotations). a very efficient way of storing this in a single object is Anndata objects.

https://anndata.readthedocs.io/en/latest/

I use them alot and would like to efficiently input them into alphapeptstats and out of it (e.g. applying a transformation in alphapeptstats and then exporting it as modified Anndata object)

Note that metadata is already perfectly matched in the AnnData object, both for variables and observations.

Can you make input and output between alphapeptstats and AnnData objects efficient?

`nbformat` requirement missing

Excited to try alphapeptstats! It's great to see a new open-source proteomics tool out there.

Describe the bug
I'm working through the tutorial in jupyter notebook, and the plot_sampledistribution threw an error about nbformat.

It looks like nbformat is a missing requirement that's not downloaded with pip install alphastats.

Version (please complete the following information):

  • OS: macOS
  • Version: 0.6.3
  • Installation Type: pip

sam method in VolcanoPlot checking Log2(x) with incorrect condition

if self.dataset.preprocessing_info["Normalization"] is None:

Hi, I noticed this line in the sam method of VolcanoPlot.py checks normalization status of the processing info instead of the Log2 status:

Here's what I suggest it to be instead:
if self.dataset.preprocessing_info['Log2-transformed'] is False:

Additionally, this method doesn't allow s0 to be defined by user, and instead defines constants within itself. Could this be added for users trying to call a moderated t-test and SAM FDR method?

Incorrect VST normalization

Describe the bug

While attempting to run VST (Variance Stabilizing Transformation) normalization using AlphaPept, I encountered several issues that suggest the normalization process might not be functioning as intended.

Issue 1: Axis of Normalization
Upon debugging, it appears that the normalization is being performed across proteins (columns in ds.mat) rather than across samples (rows). Below is a screenshot of a table that supports my hypothesis:
image

To Reproduce:
I used a standard ProteinGroups.txt file and preprocessed it using the following code:

ds.preprocess(
    remove_contaminations=True,
    normalization = "vst"
)

Issue 2: Inconsistent PCA Graphs
The PCA graphs generated post-normalization are inconsistent, both in terms of axis scales and explained variance. Here's a screenshot for reference:
dim_red_PCA_HealthStatus_group_circle

Issue 3: VST vs VSN Normalization
Is the VST normalization in AlphaPept intended to perform similarly to the VSN normalization method available in R? For reference, here is the VSN documentation.

Additional Information
Operating System: Windows 10
Python Environment: Conda

I would appreciate any guidance or fixes for these issues. Thank you!

streamlit windows installer problem

Describe the bug
Hi, I'm trying to install alphapeptstats via Windows installer, but after the installation it doesnt start properly.
The prompt windows suddenly closes and it shows the following message: " 'streamlit' is not recognized as an internal or external command, operable program or batch file "

To Reproduce
Steps to reproduce the behavior:
Without any previous install, try to install alphapeptstats with or without administrator permissions, for all users or current user.

Version (please complete the following information):

  • OS: win10 pro for workstations

Installation fails on M1-Mac due to tables dependency

Hi,

I tried installation on my M1 Mac (Ventura 13.5) but default installation failed:

(alphapeptstats) user@computer programming % conda create --name alphapeptstats python=3.10 -y
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /Users/user/miniconda3/envs/alphapeptstats

  added / updated specs:
    - python=3.10


The following NEW packages will be INSTALLED:

  bzip2              pkgs/main/osx-arm64::bzip2-1.0.8-h620ffc9_4 
  ca-certificates    pkgs/main/osx-arm64::ca-certificates-2023.05.30-hca03da5_0 
  libffi             pkgs/main/osx-arm64::libffi-3.4.4-hca03da5_0 
  ncurses            pkgs/main/osx-arm64::ncurses-6.4-h313beb8_0 
  openssl            pkgs/main/osx-arm64::openssl-3.0.9-h1a28f6b_0 
  pip                pkgs/main/osx-arm64::pip-23.2.1-py310hca03da5_0 
  python             pkgs/main/osx-arm64::python-3.10.12-hb885b13_0 
  readline           pkgs/main/osx-arm64::readline-8.2-h1a28f6b_0 
  setuptools         pkgs/main/osx-arm64::setuptools-68.0.0-py310hca03da5_0 
  sqlite             pkgs/main/osx-arm64::sqlite-3.41.2-h80987f9_0 
  tk                 pkgs/main/osx-arm64::tk-8.6.12-hb8d0fd4_0 
  tzdata             pkgs/main/noarch::tzdata-2023c-h04d1e81_0 
  wheel              pkgs/main/osx-arm64::wheel-0.38.4-py310hca03da5_0 
  xz                 pkgs/main/osx-arm64::xz-5.4.2-h80987f9_0 
  zlib               pkgs/main/osx-arm64::zlib-1.2.13-h5a0b063_0 



Downloading and Extracting Packages

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate alphapeptstats
#
# To deactivate an active environment, use
#
#     $ conda deactivate

(alphapeptstats) user@computer programming % conda activate alphapeptstats 
(alphapeptstats) user@computer programming % pip install alphastats
Collecting alphastats
  Obtaining dependency information for alphastats from https://files.pythonhosted.org/packages/6f/45/f0533a5192f0883102d9b345b5ac3638aa74fd161f82778eef88d36be16d/alphastats-0.6.3-py3-none-any.whl.metadata
  Using cached alphastats-0.6.3-py3-none-any.whl.metadata (6.9 kB)
Collecting pandas==2.0.2 (from alphastats)
  Obtaining dependency information for pandas==2.0.2 from https://files.pythonhosted.org/packages/54/d7/9e8ff0685d3454a13949e0503bdc789b4bc5bb35989c3948101e71b362cd/pandas-2.0.2-cp310-cp310-macosx_11_0_arm64.whl.metadata
  Using cached pandas-2.0.2-cp310-cp310-macosx_11_0_arm64.whl.metadata (18 kB)
Collecting scikit-learn==1.2.2 (from alphastats)
  Using cached scikit_learn-1.2.2-cp310-cp310-macosx_12_0_arm64.whl (8.5 MB)
Collecting data-cache>=0.1.6 (from alphastats)
  Using cached data_cache-0.1.6-py3-none-any.whl (6.7 kB)
Collecting plotly==5.15.0 (from alphastats)
  Obtaining dependency information for plotly==5.15.0 from https://files.pythonhosted.org/packages/a5/07/5bef9376c975ce23306d9217ab69ca94c07f2a3c90b17c03e3ae4db87170/plotly-5.15.0-py2.py3-none-any.whl.metadata
  Using cached plotly-5.15.0-py2.py3-none-any.whl.metadata (7.0 kB)
Collecting statsmodels==0.14.0 (from alphastats)
  Using cached statsmodels-0.14.0-cp310-cp310-macosx_11_0_arm64.whl (9.4 MB)
Collecting sklearn-pandas==2.2.0 (from alphastats)
  Using cached sklearn_pandas-2.2.0-py2.py3-none-any.whl (10 kB)
Collecting pingouin==0.5.3 (from alphastats)
  Using cached pingouin-0.5.3-py3-none-any.whl (198 kB)
Collecting scipy==1.10.1 (from alphastats)
  Using cached scipy-1.10.1-cp310-cp310-macosx_12_0_arm64.whl (28.8 MB)
Collecting tqdm>=4.64.0 (from alphastats)
  Using cached tqdm-4.65.0-py3-none-any.whl (77 kB)
Collecting diffxpy==0.7.4 (from alphastats)
  Using cached diffxpy-0.7.4-py3-none-any.whl (85 kB)
Collecting anndata==0.9.1 (from alphastats)
  Using cached anndata-0.9.1-py3-none-any.whl (102 kB)
Collecting umap-learn==0.5.3 (from alphastats)
  Using cached umap-learn-0.5.3.tar.gz (88 kB)
  Preparing metadata (setup.py) ... done
Collecting streamlit==1.22.0 (from alphastats)
  Using cached streamlit-1.22.0-py2.py3-none-any.whl (8.9 MB)
Collecting tables==3.7.0 (from alphastats)
  Using cached tables-3.7.0.tar.gz (8.2 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error
  
  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [12 lines of output]
      <string>:18: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
      ld: library not found for -lhdf5
      clang: error: linker command failed with exit code 1 (use -v to see invocation)
      cpuinfo failed, assuming no CPU features: No module named 'cpuinfo'
      * Using Python 3.10.12 (main, Jul  5 2023, 15:02:25) [Clang 14.0.6 ]
      * Found cython 3.0.0
      * USE_PKGCONFIG: False
      * Found conda env: ``/Users/user/miniconda3/envs/alphapeptstats``
      .. ERROR:: Could not find a local HDF5 installation.
         You may need to explicitly state where your local HDF5 headers and
         library can be found by setting the ``HDF5_DIR`` environment
         variable or by using the ``--hdf5`` command-line option.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
(alphapeptstats) user@computer programming % 

I got it to work by first installing tables manually:

conda install -c anaconda pytables

Maybe this can be added to the installation instructions?

Preprocessing GUI Page - Data completeness entry html validation incorrect

Describe the bug
In the preprocessing page, "Data completeness across samples cut-off (0.7 -> protein has to be detected in at least 70% of the samples)", the input box only allows either 0 or 1 integer values. However, we need float with a smaller step size.

Verified Fix
Original:

# ...\venv\alphapeptstatsenv\Lib\site-packages\alphastats\gui\pages\03_Preprocessing.py

data_completeness = st.number_input(
    f"Data completeness across samples cut-off \n(0.7 -> protein has to be detected in at least 70% of the samples)",
    value=0, min_value=0, max_value=1
)

Fixed version:

# ...\venv\alphapeptstatsenv\Lib\site-packages\alphastats\gui\pages\03_Preprocessing.py

data_completeness = st.number_input(
    f"Data completeness across samples cut-off \n(0.7 -> protein has to be detected in at least 70% of the samples)",
    value=0.0, min_value=0.0, max_value=1.0, step=0.01,
)

Note:

  1. must be float so use the 0.0 and 1.0 etc
  2. step size should be specified otherwise the html renders as integer of 1.

color from plot_sampledistribution

bug description
the bug happens when using the function .plot_sampledistribution while selecting a column for coloring by values. the column from the metadata has two unique values "D" or "N" for Diseased and Normal cells.

the graph output is in Black and white, there are still two entities in the legend but they are both having grey color?

I assume that the color pallete is not activated somehow or wrongly referenced in this case, or I might be missing some dependencies.
I tried to reinstall plotly package within my conda, but the issue was reproduced.

Screenshots
image

Version:

  • OS: Windows 10
  • Version: 0.6.3
  • Installation Type: Python (within a conda env)

Implementing `MissForest` imputation?

Is your feature request related to a problem? Please describe.
This preprint from the Noble Lab shows great performance of the MissForest method, which differs from random forest. I'm wondering if that could be implemented here.

MissForest is written in R, and the peprint uses rpy2 to use it in an iPython Notebook. That said, there's this implementation in python, but it doesn't have any testing, but it could perhaps be a starting point, https://github.com/yuenshingyan/MissForest.

Describe the solution you'd like
Offering MissForest as an imputation option.

Describe alternatives you've considered
Using no imputation and the rpy2 setup -- this requires a bit of setup with getting the R packages in the right place and referencing them. The R package allows threading, but that isn't tested in the notebook above, which just uses one thread.

Raw data number of groups in `preprocessing_info` has a filtered count

Is your feature request related to a problem? Please describe.
In loading, there is a filter for rows that have only zeroes after loading the raw values, but the ds.preprocessing_info value 'Raw data number of Protein Groups' reports the count of rows after that filtering.

Describe the solution you'd like
ds.preprocessing_info should report the 'Raw data number of Protein Groups' before filtering only-zero rows, and there should be an additional value (something like 'Raw data number of Protein Groups after removing proteins with only zero intensity') with the filtered count.

Imputation considers only NaN; should it also consider zeros?

Is your feature request related to a problem? Please describe.
I was getting some mysterious negative log2-transformed values after imputation, and it took me a while to figure out that zeros aren't being imputed. I changed them to NaN, and now they're being imputed.

Describe the solution you'd like
Adding a parameter for the preprocessing engine for which values to impute. Or change the default behavior to imputing both zeros and NaN are imputed.

Describe alternatives you've considered
I found a workaround with converting zeros to NaN before running imputation.

groupind replicates from sample names

Dear Developers,
My data consists of treatment VS control each consisting of 3 replicates, im using output from MAXquant, on how to group replicates int control and treatment groups during data analysis in alphapeptststs. PLease help this query. Thankyou in advance.
Best regards,
Muthu

Conversion of pd.DataFrame to a DataSet instance

Sometimes I need to use external tools for data processing, such as batch effect correction or normalization techniques that are not available in Alphapeptstats. However, the resulting data frame cannot be analyzed using the statistical methods provided by alphastats.

Is there a way to convert a data frame to a DataSet instance?

one-click Windows installer?

In the readme file it says:
image
However, when following the link the installers for Linux and MacOS are present but not the installer for Windows:
image
Could you please provide the missing installer as well?
Best,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.