kusterlab / curve_curator Goto Github PK
View Code? Open in Web Editor NEWAnalysis platform for large-scale dose-dependent data
License: Apache License 2.0
Analysis platform for large-scale dose-dependent data
License: Apache License 2.0
Bokeh version ~3.4 changed scatterplot API. This manifests in all data points disappearing upon selection.
https://docs.bokeh.org/en/latest/docs/user_guide/basic/scatters.html#ug-basic-scatters
https://docs.bokeh.org/en/latest/docs/releases.html
Adapt changes in the dashboard. This will be covered in 0.4.1.
To prevent the older versions from failing, fix the current bokeh version in 0.4.0.
adapt this also in poetry builds.
https://python-poetry.org/docs/configuration/
TOML parser is now a standard library starting from Python 3.11. This can replace the toml package and simultaneously enable a better TOML syntax, e.g., inf objects.
https://docs.python.org/3/library/tomllib.html#module-tomllib
It will lead to a concomitant minimal Python version of 3.11.
Not correct calculation of total sums of squares.
Should be of course:
ss_total = np.sum((y - np.mean(y)) ** 2)
Protein and Peptide from DIA should be fine for now...
Here the fixed parameters are taken although a model has been fitted and is then raising an error...
curve_curator/curve_curator/models.py
Line 864 in b87b212
Example:
# data
line_x = np.linspace(-10.3, -4.3, 1000)
x = np.log10([0.1, 1.0, 10.0, 30.0, 100.0, 300.0, 1000.0, 3000.0, 10000.0, 30000.0]) - 9
y = pd.Series([1, 1, 4, 6, 7, 16, 55, 105, 147, 160])
# Define the logistic Model
logistic_model = LogisticModel()
# Fit the unrestricted model with ordinary least squares (ols)
logistic_model.find_best_guess_ols(x, y)
logistic_model.fit_ols(x,y)
logistic_model(line_x)
Error message:
TypeError: LogisticModel.core() missing 4 required positional arguments: 'pec50', 'slope', 'front', and 'back'
Hello!
I would like to know how to derive the estimated 'Curve Fold Change' given in the output curves file?
I have tried taking the log2-transformed average response for the samples with highest concentration, minus the log2-transformed average response for samples with lowest concentration (not control samples). Even if I get somewhat close to the output 'Curve Fold Change' value, they are not identical.
Not yet implemented...
So far, one can add empty intensity columns. This can mess up correct decoy formation and FDR estimation.
Add a filter system that removes all these empty columns and warns the user.
Hi I am unsure of how to interpret this error or what to do to fix it.
Uncaught exception
Traceback (most recent call last):
File "C:\Users.conda\envs\CurveCuratorEnv\Lib\site-packages\pandas\core\indexes\base.py", line 3805, in get_loc
return self._engine.get_loc(casted_key)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "index.pyx", line 167, in pandas._libs.index.IndexEngine.get_loc
File "index.pyx", line 196, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 7081, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 7089, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Modified sequence'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "", line 198, in run_module_as_main
File "", line 88, in run_code
File "C:\Users.conda\envs\CurveCuratorEnv\Scripts\CurveCurator.exe_main.py", line 7, in
sys.exit(main())
^^^^^^
File "C:\Users.conda\envs\CurveCuratorEnv\Lib\site-packages\curve_curator_main.py", line 99, in main
data = data_parser.load(config)
^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users.conda\envs\CurveCuratorEnv\Lib\site-packages\curve_curator\data_parser.py", line 427, in load
df = load_mq_tmt_peptides(path, search_engine_version, unique_cols=unique_cols, sum_cols=raw_cols, first_cols=first_cols, max_cols=max_cols)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users.conda\envs\CurveCuratorEnv\Lib\site-packages\curve_curator\data_parser.py", line 156, in load_mq_tmt_peptides
df['Modified sequence'] = clean_modified_sequence(df['Modified sequence'])
~~^^^^^^^^^^^^^^^^^^^^^
File "C:\Users.conda\envs\CurveCuratorEnv\Lib\site-packages\pandas\core\frame.py", line 4090, in getitem
indexer = self.columns.get_loc(key)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users.conda\envs\CurveCuratorEnv\Lib\site-packages\pandas\core\indexes\base.py", line 3812, in get_loc
raise KeyError(key) from err
KeyError: 'Modified sequence'
The RMSE of the curve fit can be an interesting parameter for fold-change analysis.
It is currently not reported in the curves.txt file.
Make this available to the user.
The current interpolation only works for single observations per dose because it is filling in between adjacent data points. However, this can lead to weird distortions of the interpolated estimate when the adjacent data points are the upper or lower points in both cases.
curve_curator/curve_curator/quantification.py
Lines 274 to 282 in 01b22bc
Todo:
Hi,
Please could you give an example of how a TOML and input file with replicates would look?
I have my control in triplicate, and my inhibitor treatments in duplicates.
If possible I want to do one curve fit based off the replicates without averaging them prior to the analysis.
Is it possible to generate a curve with the both duplicates present and error bars?
Also, is there any way to change the size of the text on the axis of the generated figures? It is very small.
Thank you
Report the number of curves that cannot be processed due to missing values in the controls.
Currently, using the generic matrix upload, the "Name" duplicates are not aggregated.
TOML:
measurement_type= 'OTHER'
data_type = 'OTHER'
search_engine = 'OTHER'
However, they should be aggregated to enforce the "Name" column being unique to enable the three different modes of aggregation of viability data. Make this also clearer with more elaborate example data (CYL viability).
Currently, missing values were changed to 0 after median normalization, which is an undesired behavior and an unexpected imputation step.
NaNs should be conserved and dealt with elsewhere.
Good morning,
First of all congratulations on this useful tool :)
I have data output from PD with triplicates. While looking into your TOML examples, I noticed there's an example for triplicates, but unfortunately, I couldn't find any example specific to PD. Could you give an example for PD output data? I've been trying to implement it but haven't succeeded so far. Any help would be greatly appreciated!
Thank you,
Paula
PD: For the control setting on TOML, should we add all the sample names of the control curve or only the 0 value?
CurveCurator breaks if data has N <= 4 datapoints because the false number of nan values are returned. It returns 19 but expects 20.
Some users got trapped in a while loop during decoy simulation because there was a high degree of data imputation, leading to 0 variance, which in turn made the min_noise threshold 0 and prevented a successful while loop exit.
This can be fixed by removing 0 variance estimates from the empirical noise distribution.
New ProteomeDiscoverer (3.1.0.622) output files do not show the flanking amino acids in the Annotated Sequence
column, i.e. APEPTIDE
instead of [K].APEPTIDE.[R]
previously. This results in an error in the following line:
Should be an easy fix to check if the flanking amino acids exist before removing them.
There are situations where many controls are used. This can inflate the goal of the max missing parameter. As the many controls are anyways aggregated, it makes sense to only apply it to the observations covering the dose range and not the controls.
With the newer bokeh version, one needed to double-click to re-select other dots via text search.
This was caused by not-triggering re-selection with:
source.change.emit()
The trick is to correctly emit selection changes with:
source.selected.change.emit()
which invokes in turn the table update via:
source.selected.js_on_change('indices', CustomJS(args=dict(source=source, view=source_view_table), code=code))
CurveCurator expects all values to be positive (since a negative signal is also impossible).
2 new features:
Add test suits for these methods for both sigmoidal and mean models:
A warning message is triggered if fewer than 10k decoys are generated, but the warning message warns that there are fewer than 1k decoys generated:
curve_curator/curve_curator/data_simulator.py
Lines 166 to 167 in 0760d7d
The pandas aggregation function is silently overwriting NaNs to 0.
https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.sum.html
Fix this by introducing the min_count parameter.
Dear Florian Bayer,
I would be highly grateful if you could make the search function in the dashboard case INsensitive. My pinky already hurts from pressing down the shift key.
Thanks a lot!
Stephan Eckert
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.