Git Product home page Git Product logo

curve_curator's People

Contributors

flobay avatar matthewthe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

curve_curator's Issues

Bokeh version ~3.4 changed scatterplot API

Bokeh version ~3.4 changed scatterplot API. This manifests in all data points disappearing upon selection.
https://docs.bokeh.org/en/latest/docs/user_guide/basic/scatters.html#ug-basic-scatters
https://docs.bokeh.org/en/latest/docs/releases.html

Adapt changes in the dashboard. This will be covered in 0.4.1.

To prevent the older versions from failing, fix the current bokeh version in 0.4.0.

adapt this also in poetry builds.
https://python-poetry.org/docs/configuration/

Only fixed params of logistic model are used for calling the core method after fitting

Here the fixed parameters are taken although a model has been fitted and is then raising an error...

def build_model(self):

Example:

# data
line_x = np.linspace(-10.3, -4.3, 1000)
x = np.log10([0.1, 1.0, 10.0, 30.0, 100.0, 300.0, 1000.0, 3000.0, 10000.0, 30000.0]) - 9
y = pd.Series([1, 1, 4, 6, 7, 16, 55, 105, 147, 160])


# Define the logistic Model
logistic_model = LogisticModel()

# Fit the unrestricted model with ordinary least squares (ols)
logistic_model.find_best_guess_ols(x, y)
logistic_model.fit_ols(x,y)
logistic_model(line_x)

Error message:
TypeError: LogisticModel.core() missing 4 required positional arguments: 'pec50', 'slope', 'front', and 'back'

Fold change calculation

Hello!

I would like to know how to derive the estimated 'Curve Fold Change' given in the output curves file?

I have tried taking the log2-transformed average response for the samples with highest concentration, minus the log2-transformed average response for samples with lowest concentration (not control samples). Even if I get somewhat close to the output 'Curve Fold Change' value, they are not identical.

CurveCurator

Hi I am unsure of how to interpret this error or what to do to fix it.

Uncaught exception

Traceback (most recent call last):
File "C:\Users.conda\envs\CurveCuratorEnv\Lib\site-packages\pandas\core\indexes\base.py", line 3805, in get_loc
return self._engine.get_loc(casted_key)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "index.pyx", line 167, in pandas._libs.index.IndexEngine.get_loc
File "index.pyx", line 196, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 7081, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 7089, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Modified sequence'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "", line 198, in run_module_as_main
File "", line 88, in run_code
File "C:\Users.conda\envs\CurveCuratorEnv\Scripts\CurveCurator.exe_main
.py", line 7, in
sys.exit(main())
^^^^^^
File "C:\Users.conda\envs\CurveCuratorEnv\Lib\site-packages\curve_curator_main
.py", line 99, in main
data = data_parser.load(config)
^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users.conda\envs\CurveCuratorEnv\Lib\site-packages\curve_curator\data_parser.py", line 427, in load
df = load_mq_tmt_peptides(path, search_engine_version, unique_cols=unique_cols, sum_cols=raw_cols, first_cols=first_cols, max_cols=max_cols)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users.conda\envs\CurveCuratorEnv\Lib\site-packages\curve_curator\data_parser.py", line 156, in load_mq_tmt_peptides
df['Modified sequence'] = clean_modified_sequence(df['Modified sequence'])
~~^^^^^^^^^^^^^^^^^^^^^
File "C:\Users.conda\envs\CurveCuratorEnv\Lib\site-packages\pandas\core\frame.py", line 4090, in getitem
indexer = self.columns.get_loc(key)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users.conda\envs\CurveCuratorEnv\Lib\site-packages\pandas\core\indexes\base.py", line 3812, in get_loc
raise KeyError(key) from err
KeyError: 'Modified sequence'

Add Curve RMSE to the output file

The RMSE of the curve fit can be an interesting parameter for fold-change analysis.
It is currently not reported in the curves.txt file.
Make this available to the user.

Linear interpolation option only works well for non replicated data

The current interpolation only works for single observations per dose because it is filling in between adjacent data points. However, this can lead to weird distortions of the interpolated estimate when the adjacent data points are the upper or lower points in both cases.

f_linear = interpolate.interp1d(x_data, y_data, kind='linear')
x_linear = fit_params['x_interpolated']
# Mask terminal missing values
if not all(finite_values):
x_linear = x_linear[(x_linear >= x_data.min()) & (x_linear <= x_data.max())]
# Add the values to the fit variables
x_fit = np.append(x, x_linear)
y_fit = np.append(y, f_linear(x_linear))
weights = None

Todo:

  • Make it a function
  • Make it compatible with aggregations per dose such that the linear interpolation estimate is relative to the means (robust)
  • Unit test it
  • Think about weights again and how to construct them (if possible).
  • Warn users that when they have weights, they are not in action when interpolation is activated.

Replicates

Hi,

Please could you give an example of how a TOML and input file with replicates would look?
I have my control in triplicate, and my inhibitor treatments in duplicates.
If possible I want to do one curve fit based off the replicates without averaging them prior to the analysis.
Is it possible to generate a curve with the both duplicates present and error bars?
Also, is there any way to change the size of the text on the axis of the generated figures? It is very small.

Thank you

Add max_imputation parameter

  • Make the maximal imputation parameter available to the user. For backward compatibility, the default value is equivalent to the max missing value.
  • Report how many curves were filtered because too many imputed values were filtered out.
  • Add this to toml file parameter and readme

In generic mode, no duplicates are aggregated

Currently, using the generic matrix upload, the "Name" duplicates are not aggregated.

TOML:

measurement_type= 'OTHER'
data_type = 'OTHER'
search_engine = 'OTHER'

However, they should be aggregated to enforce the "Name" column being unique to enable the three different modes of aggregation of viability data. Make this also clearer with more elaborate example data (CYL viability).

PD Example question

Good morning,

First of all congratulations on this useful tool :)

I have data output from PD with triplicates. While looking into your TOML examples, I noticed there's an example for triplicates, but unfortunately, I couldn't find any example specific to PD. Could you give an example for PD output data? I've been trying to implement it but haven't succeeded so far. Any help would be greatly appreciated!

Thank you,
Paula

PD: For the control setting on TOML, should we add all the sample names of the control curve or only the 0 value?

High degree of imputed data can cause while loop trap

Some users got trapped in a while loop during decoy simulation because there was a high degree of data imputation, leading to 0 variance, which in turn made the min_noise threshold 0 and prevented a successful while loop exit.

This can be fixed by removing 0 variance estimates from the empirical noise distribution.

New PD output has no flanking amino acids

New ProteomeDiscoverer (3.1.0.622) output files do not show the flanking amino acids in the Annotated Sequence column, i.e. APEPTIDE instead of [K].APEPTIDE.[R] previously. This results in an error in the following line:

df['Modified sequence'] = df['Modified sequence'].str.split('.', expand=True)[1].str.upper()

Should be an easy fix to check if the flanking amino acids exist before removing them.

Apply the max_missing parameter without the doses

There are situations where many controls are used. This can inflate the goal of the max missing parameter. As the many controls are anyways aggregated, it makes sense to only apply it to the observations covering the dose range and not the controls.

Correct index update from string selection with newer bokeh version

With the newer bokeh version, one needed to double-click to re-select other dots via text search.

This was caused by not-triggering re-selection with:

source.change.emit()

The trick is to correctly emit selection changes with:

source.selected.change.emit()

which invokes in turn the table update via:

source.selected.js_on_change('indices', CustomJS(args=dict(source=source, view=source_view_table), code=code))

Add negative value warning and clipping option

CurveCurator expects all values to be positive (since a negative signal is also impossible).

2 new features:

  • If negative values are present in the ratio data, warn the user and count how many are present in the data.
  • Give the option to clip negative (and positive) values to a pre-specified range as a new optional parameter to the TOML file.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.