mannlabs / alphapept Goto Github PK

View Code? Open in Web Editor NEW

162.0 11.0 29.0 208.96 MB

A modular, python-based framework for mass spectrometry. Powered by nbdev.

Home Page: https://mannlabs.github.io/alphapept/

License: Apache License 2.0

Python 42.82% Inno Setup 0.10% Batchfile 0.05% HTML 53.25% CSS 0.04% JavaScript 3.69% Dockerfile 0.05%

mass-spectrometry bioinformatics proteomics alphapept-ecosystem

alphapept's Issues

Compatibility of workflow steps

Describe the bug
Currently people can select 'lfq_quantification' without ever having imported or searched any file. It would be important to make the selection of workflow steps somehow linked to whether an hdf file with required information is already there.

Importantly, in these cases, the settings files need to somehow be saved differently to ensure that users can still follow how the final results were created in case multiple different settings were used and then stacked on top of each other.

Bug: Duplication of functions in `io` module

Greetings,

While reviewing the AlphaPept code, I noticed that the io.py module has a number of functions which appear to be repeated, as noted below.

Functions for loading Thermo data -- looks like this relates to the switch away from pymsfilereader since it's no longer a required installation

Functions for extracting mzML information -- these have exactly the same name but code is different, so I'm assuming the second one supersedes the first. But that's potentially a nasty bug.

Functions for reading mzML information

Windows GUI crashes due to access restricitons

Describe the bug
The Win GUI crashes and reports "PermissionError: [ WinError 5]".

To Reproduce
Steps to reproduce the behavior:

Open Win GUI (0.3.13-dev0)
Drag and drop Fasta files and raw files
Start Analysis
See error

Expected behavior
Expected analysis to start

Screenshots

Version (please complete the following information):

Windows 10 Pro 64-bit (10.0, Build 19041)
Version [0.3.13-dev0]
One-Click Installer

Additional context
After receiving the error I tried to run as admin, but I was unable to load any raw files or fast files to the GUI.

High: a fasta_paths parameter is always empty in the Settings tab in GUI

Describe the bug
A list of paths to fasta files is always empty in the "Settings" tab in GUI. It can't be modified and not updated with the paths to fasta files that are selected in the "Experiment" tab.

To Reproduce

Run the AlphaPept.
In the "Settings" tab extend "fasta" parameters and click on a "fasta_paths" option.
Nothing happens and the user doesn't have an option to specify the path. It's always empty even if the user specified the paths to the fasta file(s) in the "Experiment" tab.

Screenshots
https://i.gyazo.com/b7962a7232e288fd1a65b003f9101b7c.png

Version (please complete the following information):

OS: Windows 10
Version: 0.3.11-dev0
Installation Type: One-Click Installer

Dependabot

Deopendabot currently pushedsto master. This means that develop is behind master, which should never happen normally. Perhaps it is best to update dependabot to push to develop, so that master is always the "correct release version"?

[Medium] "UnboundLocalError: local variable 'file' referenced before assignment" error in the Results section

Describe the bug
[Medium] "UnboundLocalError: local variable 'file' referenced before assignment" error in the Results section when you start the AP for the first time or when user cleans the "alphapept/finished" folder.

To Reproduce

Install the tool on your computer and run it for the first time.
Switch to "Results" section.
"UnboundLocalError: local variable 'file' referenced before assignment" is displayed in this section.

Expected behavior
It would be nice to avoid showing the error when the folder is still empty.

Screenshots
https://gyazo.com/5d207434bd44156650f1612f8c1fe9e8

Version (please complete the following information):

OS: Windows
Version: 10.0.19041
Installation Type: One-Click Installer

Inconsistency between workflow selection and additional settings

Is your feature request related to a problem? Please describe.
I find it a bit unintuitive that you first select workflow steps, but then you can still choose settings for parts of the workflow that you didn't select.

Here, matching is not selected in the workflow, but I can still adjust parameters for it.

Describe the solution you'd like
Would it be possible to restrict the settings to the ones that are actually relevant for the workflow steps that were chosen?

[Medium] Change the name/logo of the AlphaPept tab in the browser

Describe the solution you'd like
Now the tab with the tool calls ‘webui – Streamlit’. It would be nice to change it to AlphaPept with the proper logo if it’s possible.

nbdev windows vs. macOS / Linux

@jalew188 already encountered the following issue, which I can now confirm: Line endings of Jupter notebooks on Window include a carriage return, which unfortunately means that a nbdev_build_lib or nbdev_clean_nbs function creates massive "differences", which are in fact just line endings. This obscures which code has been modified though, making it difficult to work on both windows and macOS/linux machine simultaneously...

Medium: change the message that is shown after "Check" of the settings if no problems were found

Is your feature request related to a problem? Please describe.
If a user makes a check of the settings specified for the run in the "Settings" tab and no problem were found he gets the following message:
"Found a total of 0 problems: []" (see the screenshot - https://i.gyazo.com/8a515079cffed3dba4f92ffdce1123f8.png)

Describe the solution you'd like
It would be better to change the message for the case when no problems were found in the result of the check to something like:
"No problems were found."

GUI speed

The GUI seems extremely slow compared to the CLI (thermo quick test approx 4min instead of < 2min) on a. MacBook. Am I the first/only to notice this, or is this is a common issue/bug?

strings are loaded as binary strings from HDF_File (io.py)

Describe the bug
With h5py==3.2.1, pandas raised an error: ERROR:root:Scoring of file d:\DataSets\APTest\20170518_QEp1_FlMe_SA_BOX0_HeLa12_Ecoli1_Shotgun.ms_data.hdf failed. Exception Can only use .str accessor with string values!. alphapept could run smoothly after switching to h5py==2.10.0. I check the data frame from .ms_data.hdf, it seems that strings are loaded (or stored) as binary strings from HDF_File in h5py==3.2.1.

To Reproduce
Install h5py==3.2.1, and run the whole workflow.

Expected behavior
Error ...... Can only use .str accessor with string values!

Version (master branch, 3.11-dev0):

OS: Windows 10
Python: 3.8.0
Installation Type: pip

MacOS gets different results with Windows for the same raw file.

I used alphapept import to extract ms1 and ms2 on Windows, and then run alphapept workflow to identified the extracted ms_data. I could get extractly the same identifications comparing with running the whole workflow on Windows. So the bug should be in pyrawfilereader. Same codes have different behaviors on Windows and MacOS.

Installation security block

Describe the bug
AlphaPept installation is cancelled after progress bar already reached the end. Security warning as shown in screenshot pops up. Error occurs both as user and administrator.

To Reproduce
Steps to reproduce the behavior:

Start the installation

Expected behavior
Installation should result in AlphaPept being installed.

Screenshots

Version (please complete the following information):

OS: Windows 10 Pro
Version: 20H2, OS build 19042.804
Installation Type One-Click Installer

UI Bugs and Enhancements

The current UI has several bugs, and several ideas for enhancement exists.

Bugs

ProgessBar is not correctly implemented
Logging sometimes shows weird spaces
Stop busy indicator after run is complete
Loading settings file via drag and drop doesn't work
Program crashes after completing the run

Enhancements

Yellow progress bar is hardly readable when having more than 50% progress
Auto-download FASTA files from UniProt
Starting GUI is relatively slow: Speed up or create starting screen
Include more information in the files dialog (i.e., the number of entries in the FASTA file)
Ability to make crude data exploration in the explore tab (i.e., search for sequence/protein), make histograms
General Code Cleanup: Variable names etc

Usability: Database / Fasta Files

Currently, one can select FASTA files and a database file. This isn't very clear:
When are we creating a new database file, when are we using the existing one?

Stability: Waiting indicator

Several steps could become unresponsive when having extreme cases. Examples are when dropping thousands of files in the file dialog or when displaying first_search in the explore tab.

Possible Solutions:
Start the respective steps in a thread and show a wait indicator.

ValueError when using a small fasta (contaminants.fasta) for BSA raw

Probably because the fasta is too small to be divided into >1 fasta_block.

ValueError                                Traceback (most recent call last)
<ipython-input-3-cdeabb6ebdae> in <module>
      1 from alphapept.runner import run_alphapept
----> 2 run_alphapept(params)

/home/feng/alphapept/alphapept/alphapept/runner.py in run_alphapept(settings, callback)
     67                 cb = callback
     68 
---> 69             spectra, pept_dict, fasta_dict = generate_database_parallel(settings, callback = cb)
     70             logging.info('Digested {:,} proteins and generated {:,} spectra'.format(len(fasta_dict), len(spectra)))
     71 

/home/feng/alphapept/alphapept/alphapept/fasta.py in generate_database_parallel(settings, callback)
    708     spectra_set.append(spectra[-1])
    709 
--> 710     pept_dict = merge_pept_dicts(pept_dicts)
    711 
    712     return spectra_set, pept_dict, fasta_dict

/home/feng/alphapept/alphapept/alphapept/fasta.py in merge_pept_dicts(list_of_pept_dicts)
    515 
    516     if len(list_of_pept_dicts) < 2:
--> 517         raise ValueError('Need to pass at least two elements to merge.')
    518 
    519     new_pept_dict = list_of_pept_dicts[0]

ValueError: Need to pass at least two elements to merge.

High: prevent the changing of the values in the Settings tab using the mouse scroll wheel

Is your feature request related to a problem? Please describe.
In the "Settings" tab if a user looks at all available options and scroll them up and down and if at this moment the mouse cursor appears on one of the IntegetInput/FloatInput options, the values of these options could be changed even without the user's notice.

Describe the solution you'd like
To prevent the changing of the IntegetInput/FloatInput options in the "Settings" tab using the mouse scrolling.

Github workflow security issue?

Updating the github workflow before pushing a commit allows to run arbitrary code on the runners. While some changes require to modify the github workflow to actually make a CI pass, this can pose a serious security threat once the repository is publically accessible. How can/will we deal with this?

Naming convention

I revised the settings file to have some more consistency:
 
Please find attached the current set of options:
I roughly tried to sort the categories as follows:
 
calibration: Related to calibration
experiment: The experiment details
fasta: related to creating a theoretical database from fasta
features: related to feature finding
general: general workflow settings
misc: everything else
quantification: related to quantification options
raw: related to handling raw files
search: search options
 
All options are defined in the settings_template.yaml. This is what is used to create the user interface. It defines the type, some min/max and default values and also a brief description.
 
As there is always a lot of debate about the naming convention, I would be happy to collect ideas on which options should be renamed and which ones you find missing.

Starting page for the GUI

General idea
I would like to sugest a starting page for the GUI which briefly introduces AlphaPept and provides a mini-overview of what can be done and how. This can also be the place to provide a download button for the detailed user guide thats currently in the making. I think this would be good especially if people use AlphaPept for the first time and there is nothing to see or use on the 'status' page. What do you think?

DependaBot Integration

Currently, dependable is not tracking the software versions.
Check why this is.
Ideally, going back to the requirements.txt would be nice.

Installer Compilation

The current route to create an installer on Windows is to use pyinstaller and inno setup.
While the pyinstaller script (create_installer.bat) runs through on some machines, it does not do work on the self-hosted runner.

In general, we should make an installation routine that is platform-specific as PyInstaller does not support cross-platform compilation.

Also, the current installer script create_installer.bat is not testing intermediate steps (e.g. was pyinstaller successful?) so we could have an exe installer that installs something that is not working, so intermediate testing steps are necessary.

This is particularly relevant for the GUI, as PyQT can cause problems, but a GUI is not that straightforward to test.

This issue will be linked to a project so that we can track the ToDo.

Current ToDo

Things that need to be done for the Alpha Release (v.0.4.0)

Major

DIA library export
Upgrade GPU FF to CPU
Include GPU CI/CD
Test unspecific search again

Minor Bugs

Unknown AAs -> What to do with this?
Update Runner notebooks

Stability

Include a check that when running a workflow the requirements are met

Bugs

Revise Quantification (@jalew188, @straussmaximilian ) -> Upgrade split level
Bug in Testrunner 2020-10-26 00:18:05 INFO - Numba version 0.50.1 -> Something is wrong here

GUI

Make check settings button usable (i.e. this setting needs to be changed) or remove
Fix speed issue (why is process slower than cmd) #116
Get FileWatcher to work again
Fix progress bar
Make data preview happen in GUI again

Installer

**- [ ] Loading of ext does not work -> Check installed version

Auto upload for proper release cycle**

Due Diligence

Go over Documentation
Check that all settings are being used
Revise logging output
Revise naming convention (peptides / precursors / psms, m_offset <> m_tol -> ms1_tolerance, ms2_tolerance)
Version: When are we calling what? version ?
Clean up Performance plots
Clean up Git Config
Clean up Branches

CI / CD

CI Test for Windows, Linux and Mac #134
Perform Integration Test
Include Styleguide Tests

Future

Features

PTM Support #39

Code Stability

Test Coverage
Include Bound checks in functions to make usage more stable (i.e. error when having negative intensity) #58

Isobaric Labels

TMT <> EasiTag <> Silac
Second Peptide Search: Shared fragments etc

Quantification

Constantins Quantification

Performance

Optimal parameters for Bruker FF
Speed up CLI

Extreme: exception occurs at the end of DB generation for Thermo HeLa test run

Describe the bug
An unknown exception occurs when the process of the DB generation ends for the test HeLa file(Thermo) -
/04_hela_testrun/20190402_QX1_SeVW_MA_HeLa_500ng_LC11.raw.

To Reproduce

see the attached settings.txt file with all information about the parameters of the run.

Load the attached settings renaming the file into settings.yaml.
Specify the path to the HeLa file and run the analysis.
At the end of the process of DB generation you'll get an exception

!!! After getting an exception, the process(es) of alphapept.exe still exists in the Task Manager like active.
settings.txt

Screenshots
https://i.gyazo.com/9b04df0db8a5c85c492e408928301c03.png

Version (please complete the following information):

OS: Windows 10
Version: 0.3.11-dev0
Installation Type: One-Click Installer

Axis labels in History tab

Describe the bug
Not really a bug, but I would add units to the axes in the History tab of the GUI.
Specifically rt_length should be I guess in minutes and timing as well.

Git Config Issues

Hi,
The current git configs seem to be not in order. Please make sure that your git config is set correctly and you are using the correct credentials.

I am planning on re-writing the git history so that we all are consistent.
For now, I will change @ibludau and my username. @swillems and @jalew188 should I change yours as well? To the biochem email or the personal?

Dependabot can't evaluate your Python dependency files

Dependabot can't evaluate your Python dependency files.

As a result, Dependabot couldn't check whether any of your dependencies are out-of-date.

The error Dependabot encountered was:

Illformed requirement ["==.0.71.0"]

View the update logs.

Test Scope

To have a maintainable package, automated tests and performance benchmarks are crucial.
I have the following tests in mind, considering a versioning scheme where we are using (X-Y-Z): X-Major, Y-Minor, Z-Patch. In terms of branching we would have a master/dev and feature branches.

Unit Tests:

Simple function tests within nbdev. They should be run for every push on every branch. Duration ~ minutes

Workflow test:

Tests to run a full pipeline (i.e., perform a search on HeLa Thermo / Bruker data). We could run them for every version, even minor on dev). Duration <1h
We would auto-create a settings template for the current version and replace the file_path with the respective filenames)

Integration test:

Test to try all possible (or at least most) combinations of settings.
This is something we could do for every Minor version. Duration will be several hours.

Installer Test:

I think shipping is very crucial, and we should have one-click installers ready for each patch. To compile an installer takes approx. <10 minutes so this could be done for every push on the dev branch.

UI Test:

This is a very difficult test but very important to keep a userbase. Implementing a new feature and then the GUI doesn't work anymore. The current settings scheme is very flexible so that the core functionality should be tested with the Workflow test. A proper GUI test would probably include using tools like pyautogui that automatically "clicks" through workflows. If we want to be fancy, we could also use this to make a screenshoted documentation for each version automatically. Ideally I think this would be for every push on the dev branch.

Performance test:

The workflow test from above will not allow us to give a good estimate on performance. We will get execution time and proteins and peptides, but we should also consider metrics like quantification accuracy. For this, we should use multi-species samples with known mixing ratios that are computationally more demanding, and I would hence consider as a different kind of test.
The idea would be to have a set of PRIDE datasets like PXD010012, which we always re-run. As we could the analysis results from the repository, we would also have a baseline to compare the results to.
Depending on the number of datasets this could take
This is something we could potentially do for every minor version.

List of potential performance test sets:

PXD010012: Online PASEF Paper
PXD006109: BoxCar

Implementation

For running those tests, I will use GitHub-Actions self-hosted runners. This would allow us to use powerful workstations to run the tests.

Ideally, we can also set up runners for each Windows / Linux and Mac.

At some point, one could also make the testing results more explorable, i.e., pushing the results to a db and having a little dashboard app that shows performance over version/time.

Also, note that we can always trigger the tests manually.

Let me know if you would suggest additional or think the current test set should be optimized.

Styleguide

We should also add an automatic style test.

HeLa performance drastically decreased

Hi,
After investigating the latest performance runs, I noticed that we have approx only 40k of peptides for Thermo Runs.
In the past, before the automated tracking, hen making the performance runs, we had approx. 50k, so it seems that we lost 20% somewhere...
I checked the settings, and they did not change (same FASTA / file, tolerances, etc.)
I presume that we have introduced a bug along the way (raw conversion? feature calibration? mapping MS1 to MS2) and should investigate.

Medium: wrong size for one option in the dropdown menu in GUI

Describe the bug
In GUI for all parameters in "Settings" tab that have a dropdown menu there is always an empty big space or one option from the menu is incomparable big.

To Reproduce

Run the AlphaPept.
In the "Settings" tab extend "fasta" parameters and click on "Protease" selection option.

Can be reproduced for the following dropdown menus:

"general > score"
"quantification > mode"

Expected behavior
It has to be the same size for all dropdown menu options.

Screenshots
https://i.gyazo.com/b2ef0ef36aa23bbb006e1aaf83d76404.jpg
https://i.gyazo.com/26d6244c747c468a7f1a86ee481a385a.jpg

Version (please complete the following information):

OS: Windows
Version: 0.3.11-dev0
Installation Type: One-Click Installer

Python versions

For the CI main, we apparantly use python3.6, while the sample files use python3.8.
This is very strange and turns out to mess up some things with the CI since dependencies might be different. Probably we should just make a single install script where e.g. a fixed conda setup (updates are relatively unreliable otherwise) installs python and alphapept consistenly. Calling this install script equally in all the .github/workflows should then be far more consistent.

Medium: change the overall progress value during DB generation in GUI when it starts

Is your feature request related to a problem? Please describe.
The overall progress during DB generation starts only in several minutes after the running of the tool.

Describe the solution you'd like
Maybe, it would be better to show from the beginning 0.1% when the user just run the tool. Otherwise, it could look like that nothing happens. (see the screenshot - https://i.gyazo.com/cb1c9bb4d2721d4074147183881ce3a8.jpg)

UI log buggy

There is a bug in the user interface, where the log is displayed. When having too many log updates in a short time, some characters appear to disappear. This seems to be related to the QTextEditLogger Widget.

This thread could be related: https://forum.qt.io/topic/76958/how-to-efficiently-display-almost-real-time-text-logging-without-freezing-an-ui

Selection of multiple proteases

Is your feature request related to a problem? Please describe.
Some people use multiple proteases in one experiment. It would be cool if it were possible to also select multiple proteases in alphapept.

Installation on macOS Big Sur with M1 ARM chip

Describe the bug
I want to install alphapept on my Macbook with M1 chip (ARM). This does not work out-of-the-box. I want to share my learnings and open issues here for others to be more efficient in solving them with new dependency versions available.

Several packages will not work, such as numba, PyQt5 and pythonnet. One error message (regarding numba) is:

FileNotFoundError: [Errno 2] No such file or directory: 'llvm-config'

To Reproduce
Steps to reproduce the behavior:

conda create --name alphapept python=3.8
conda activate alphapept
cd
git clone https://github.com/MannLabs/alphapept.git
cd alphapept
pip install -r requirements.txt
pip install .

Expected behavior
Installation runs through with out an error and within the python console, I can run:

import alphapept

Version (please complete the following information):

OS: macOS Big Sur 11.2.2
Version: Current version (commit 1060009)
Installation Type: pip (see above)

Quantification Accuracy with MaxLFQ

Currently, the quantification accuracy with MaxLFQ is not yet satisfactory.

These are the results when running the species test from PXD010012

Notably, no clear distinction between the species is possible. The original data from the paper looks like this:

When testing only the algorithm part on the data, the following results can be observed.

The following two observations will be investigated:

Why is there a population at zero?
E.coli seems to have a population on the left side. Why?
From investigating the evidence files from MaxQuant, there seems to be a discrepancy between the protein groups, i.e., sometimes multiple protein groups are merged. Find out why this happens.

[Medium] "New experiment" section is not updated if user specifies the same path several times

To Reproduce

Run AlphaPept and switch to the "New experiment" section.
I.e. remove the fasta file from your experimental folder (or any other necessary file) and copy the path to this folder to the path input field.
"No fasta files in folder" message appears that is correct.
Add the fasta file to the folder and put the same path to the field again > we still see the same "No fasta files in folder" message. If the path is the same as before the section is not updated.

Expected behavior
To update the section when a file path is typed even if it's the same path.

Version (please complete the following information):

OS: Windows
Version: 10.0.19041
Installation Type: One-Click Installer

Option to cancel a running job

Is your feature request related to a problem? Please describe.
If you realise that you selected inappropriate settings or similar, there should be a possibility to cancel a job. Also, in case alphapept gets stuck (as happend for me - see below) you should be able to cancel.

Describe the solution you'd like
There could be a cancelation button in the 'queue' tab on the 'status' page.

Nice to have
There could also be an option to prioritise jobs in the queue. In case an urgent analysis comes up you could move it up in the list.

Additional context
Specifically, I have an issue with a job that (I guess) got stuck and it restarts and gets stuck at the same point again when I relaunch alphapept (this is stuck for >20 min already). I will write a separate issue for this.

Branch cleanup

There are quite some "stale" branches and some of the HDF branches were only merged and not properly deleted after merging/pulling.
I am not sure how it works if multiple people work on the ame branch, but ptobably we should try tro keep the github as clean as possible. The stale branches can probably be deleted, unless someone is acitvely working on it?
Merged/pulled branches can probably be deleted easily with a "merge and delete", kinstead of just merging, although I am not sure how good this is if other contributors are still working on this branch locally...

For what it is worth, I only consider the following branches active and all others can be deleted:

master
develop
readability
all dependabot branches

Replace NPZ with HDF

The NPZ format to store data files is only a temporary solution. @swillems has some fast implementations that use indexing and are being used in the ion networks project. This would allow to save even the query data in the hdf.

Keys steps would be:

Replace the save function in io from npz to hdf.
Replace the save function in FASTA
One would need to think about how to store the peptide dict and the sequences here.

Code Revision - Architecture

In terms of architecture, several design ideas can be utilized.

So far, a major approach was to use numba-optimized functions for the core code. Numba allows to use OOP with Jitclasses. A downside here is that you need to type the variables, which affects the clear and easy python syntax.

As discussed, we could use a combination of having regular Python classes with Numba functions.

The UI implementation relies much more on OOP, and basically no numba functions are employed.

This issue is intended to collect ideas on where we should revise the code so that we have more flexibility for further modules.

Documentation not working

Describe the bug

The documentation page build is failing. NBDEV documentation can be best tested by installing Jekyll and running locally (Best for Mac). There is a tutorial here

Timing no longer reported in history

Describe the bug
The timing plot is not correctly displayed in release 0.3.23-dev0

Screenshots

Version (please complete the following information):

OS: Windows 10 Pro
Version: 20H2, OS build 19042.804
Installation Type One-Click Installer

[High] Impossible to reanalyze the file if the previous analysis was terminated

Describe the bug
If terminate the analysis process at any point (f.e. FF) and after that try to rerun the analysis using default workflow steps,
"Processing of D:\04_hela_testrun\20190402_QX1_SeVW_MA_HeLa_500ng_LC11.ms_data.hdf for step raw_conversion failed. Exception File extension .hdf not understood." exception occurs in the terminal and the program is frozen.

To Reproduce
Steps to reproduce the behavior:

Run any analysis with any settings and terminate the process, f.e. at the FF stage.
In the New experiment section enter the path to the same folder again. Here we see that .ms_data.hdf file appears as in the Raw files section (I also don't think that it should be included in this section-

)
Leave all other options as default ones (continue_runs check box in the Workflow shouldn't be selected which means that ms_data file should be deleted) and Submit an experiment.
"Processing of D:\04_hela_testrun\20190402_QX1_SeVW_MA_HeLa_500ng_LC11.ms_data.hdf for step raw_conversion failed. Exception File extension .hdf not understood." occurs in the log and the whole run is frozen (longer than 30 minutes). (see attached log.file)

Expected behavior
It should be possible to rerun any terminated processes again.
As a suggestion, maybe, it makes sense not to show the .ms_data.hdf in the Raw files section.

Version (please complete the following information):

OS: Windows
Version: 10.0.19042
Installation Type: One-Click Installer v.0.3.23

Attached log file.
2021_05_27_thermo_hela_run2.log

Formatting of settings

Is your feature request related to a problem? Please describe.
I think the setting selection is not optimal in a way that you cannot distinguish what is a setting selection and what opens a drop-down menu (see screenshot)

Here, 'raw' opens a drop-down selection, 'use_profile_ms1' is a setting and 'fasta' is a drop-down again.

Describe the solution you'd like
I would generally prefer if the overall drop-down categories wouldn't be selected by a tick box but rather with a plus on the right site as is done for the settings in general. Another option would to indent the actual settings selection or use another background color.

[Medium] "Auto-Update Page" checkbox is automatically unchecked when user switches between different sections

To Reproduce

Run AlphaPept and in the "Status" section check "Auto-Update Page" check box.
Switch to any other section, i.e. "New experiment".
Switch back to the "Status" section and check that the "Auto-Update Page" check box is again unchecked.

Expected behavior
It would be nice to save the checking/unchecking of this check box during switching between the sections.

Version (please complete the following information):

OS: Windows
Version: 10.0.19041
Installation Type: One-Click Installer

Results display gets messed-up when multiple settings were used on the same folder

Describe the bug
If you run a second analysis on the same folder as a previous analysis then the result files are overwritten. In the GUI you can however still select the yaml of the initial analysis, see the initial parameters, but non-matching results in the tables.

To Reproduce
Steps to reproduce the behaviour:

Run one analysis with FDR=1
Start a second analysis with FDR=0.01 in the same folder (rename the yaml file)
Go to results tab and select the yaml of the first analysis
Select the protein FDR table and sort by FDR >> the maximum FDR is 0.01, if you scroll up to 'run log' it states FDR=1 though

Expected behavior
I think it would be good to either append the name of the yaml file also to all result files as a general suffix for an analysis. This way alternative analysis results can be stored in teh same folder.
Alternatively, if only one set of results should be available per folder, I would suggest to restructure the results panel so that the results.yaml settings are shown and not any settings that were saved by the user at any point. They might not match the shown results.

Version (please complete the following information):

OS: Windows 10 Pro
Version: 20H2, OS build 19042.804
Installation Type: One-Click Installer

Weird error in search.py

I used a wrong raw (DIA raw) file while testing, and a weird error showed below. It may be because some spectra are empty in the DIA run. We should check whether idxs_lower, idxs_higher are empty or not during the run?

ValueError                                Traceback (most recent call last)
 in 
      1 from alphapept.runner import run_alphapept
----> 2 run_alphapept(params)

~/opt/anaconda3/lib/python3.8/site-packages/alphapept/runner.py in run_alphapept(settings, callback)
    140             cb = callback
    141 
--> 142         fasta_dict, pept_dict = search_parallel_db(settings, callback=cb)
    143 
    144     else:

~/opt/anaconda3/lib/python3.8/site-packages/alphapept/search.py in search_parallel_db(settings, calibration, callback)
   1062         file_npz, settings_ = to_process[0]
   1063         settings_['search']['parallel'] = True
-> 1064         search_db((file_npz, settings_))
   1065     else:
   1066         with Pool(n_processes) as p:

~/opt/anaconda3/lib/python3.8/site-packages/alphapept/search.py in search_db(to_process)
   1022             features = pd.read_hdf(base+'.hdf', 'features')
   1023 
-> 1024         psms, num_specs_compared = get_psms(query_data, db_data, features, **settings["search"])
   1025         if len(psms) > 0:
   1026             psms, num_specs_scored = get_score_columns(psms, query_data, db_data, features, **settings["search"])

~/opt/anaconda3/lib/python3.8/site-packages/alphapept/search.py in get_psms(query_data, db_data, features, parallel, m_tol, m_offset, ppm, min_frag_hits, callback, m_offset_calibrated, **kwargs)
    249     idxs_lower, idxs_higher = get_idxs(db_masses, query_masses, m_offset, ppm)
    250     frag_hits = np.zeros(
--> 251         (len(query_masses), np.max(idxs_higher - idxs_lower)), dtype=int
    252     )
    253 

<__array_function__ internals> in amax(*args, **kwargs)

~/opt/anaconda3/lib/python3.8/site-packages/numpy/core/fromnumeric.py in amax(a, axis, out, keepdims, initial, where)
   2665     5
   2666     """
-> 2667     return _wrapreduction(a, np.maximum, 'max', axis, None, out,
   2668                           keepdims=keepdims, initial=initial, where=where)
   2669 

~/opt/anaconda3/lib/python3.8/site-packages/numpy/core/fromnumeric.py in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs)
     88                 return reduction(axis=axis, out=out, **passkwargs)
     89 
---> 90     return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
     91 
     92 

ValueError: zero-size array to reduction operation maximum which has no identity

Issue of get_isoforms with itertools.product

For phosphorylation as an example, there may be many phospho sites on a sequence, resulting in a lot of isoforms. If max_isoforms is large enough, there may be too many isoforms to be considered; if max_isoforms is too small, modifications on left-side AAs may be excluded due to the behavior of itertools.product.

seq = "SBCDMFSSSSSSSSSSSMMFD" # maybe more phosites when sequence gets longer
mod_dict = dict(zip("S,M".split(","), "(ph)S,(ox)M".split(",")))

max_isoforms = 1000000
peplist = []
from time import perf_counter
start = perf_counter()
for i in range(1000):
    peptides = get_isoforms(mod_dict, seq, max_isoforms)
    peplist.extend(peptides)
end = perf_counter()
print(peptides[:100])
import sys
print(f'Memory usage and running time for 1000 repeats: {sys.getsizeof(peplist)/10**6:2f} MB, {end-start:2f} s, # of isoforms without repeat: {len(peptides)}')

Outputs:

['SBCDMFSSSSSSSSSSSMMFD', 'SBCDMFSSSSSSSSSSSM(ox)MFD', 'SBCDMFSSSSSSSSSSS(ox)MMFD', 'SBCDMFSSSSSSSSSSS(ox)M(ox)MFD', 'SBCDMFSSSSSSSSSS(ph)SMMFD', 'SBCDMFSSSSSSSSSS(ph)SM(ox)MFD', 'SBCDMFSSSSSSSSSS(ph)S(ox)MMFD', 'SBCDMFSSSSSSSSSS(ph)S(ox)M(ox)MFD', 'SBCDMFSSSSSSSSS(ph)SSMMFD', 'SBCDMFSSSSSSSSS(ph)SSM(ox)MFD', ..., 'SBCDMFSSSSSS(ph)SS(ph)S(ph)S(ph)S(ox)MMFD', 'SBCDMFSSSSSS(ph)SS(ph)S(ph)S(ph)S(ox)M(ox)MFD', 'SBCDMFSSSSSS(ph)S(ph)SSSSMMFD', 'SBCDMFSSSSSS(ph)S(ph)SSSSM(ox)MFD', 'SBCDMFSSSSSS(ph)S(ph)SSSS(ox)MMFD', 'SBCDMFSSSSSS(ph)S(ph)SSSS(ox)M(ox)MFD']
Memory usage and running time for 1000 repeats: 271.614064 MB, 14.134513 s, # of isoforms without repeat: 32768

Deinstall

Describe the bug
If I deinstall AlphaPept this does not delete the .alphapept folder.
Should this be the case?

mannlabs / alphapept Goto Github PK

alphapept's Issues

Bugs

Enhancements

Usability: Database / Fasta Files

Stability: Waiting indicator

Things that need to be done for the Alpha Release (v.0.4.0)

Major

Minor Bugs

Stability

Bugs

GUI

Installer

Due Diligence

CI / CD

Future

Features

Code Stability

Isobaric Labels

Quantification

Performance

Unit Tests:

Workflow test:

Integration test:

Installer Test:

UI Test:

Performance test:

List of potential performance test sets:

Implementation

Styleguide

Recommend Projects

Recommend Topics

Recommend Org