Git Product home page Git Product logo

hadex's People

Contributors

dominikrafacz avatar michbur avatar werpuc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

hadex's Issues

Input file validation

Package function to check what kind of file is uploaded by user and change it if needed (calculate and aggregate).

DynamiX can produce different files - HaDeX should be able to work with all of them.

Kinetic plots

  • research
  • function to calculate + plot results
  • tab in app

Mass calculation

The calculation of the mass (aggregation of measurements for different charge values) should be available as a separate function, not only as a part of calculating state deuteration.

multiple protein files

  • analyse two different proteins from one file (overlapping sequences) plus others
  • two or more different proteins from one file

Additional information for user

Add basic description how theoretical calucations work (theoretical ins and outs). This information is included in the vignette but we're not sure the potential user will check them out or just get discouraged. How to do it cleverly?

Back-exchange included in calculations

As an option - include back-exchange effects in experimental calculations as proposed in "Recommedations for ... HDX-MS experiments" by Rand and others

Results should be similar to theoretical calculations but would be nice to compare them.

dplyr 1.0.0

We're seeing these errors when testing HaDeX against the soon to be released version of dplyr (1.0.0):

[master*] 126.1 MiB ❯ revdepcheck::revdep_details(, "HaDeX")
══ Reverse dependency check ═══════════════════════════════════════ HaDeX 1.1 ══

Status: BROKEN

── Still failing

✖ checking dependencies in R code ... NOTE

── Newly failing

✖ checking examples ... ERROR
✖ checking tests ...

── Before ──────────────────────────────────────────────────────────────────────
❯ checking dependencies in R code ... NOTE
  Namespaces in Imports field not imported from:
    ‘DT’ ‘gsubfn’ ‘stringr’
    All declared Imports should be used.

0 errors ✔ | 0 warnings ✔ | 1 note ✖

── After ───────────────────────────────────────────────────────────────────────
❯ checking examples ... ERROR
  Running examples in ‘HaDeX-Ex.R’ failed
  The error most likely occurred in:

  > ### Name: quality_control
  > ### Title: Experiment quality control
  > ### Aliases: quality_control
  >
  > ### ** Examples
  >
  > # load example data
  > dat <- read_hdx(system.file(package = "HaDeX", "HaDeX/data/KD_180110_CD160_HVEM.csv"))
  >
  > # calculate mean uncertainty
  > (result <- quality_control(dat = dat,
  +                            state_first = "CD160",
  +                            state_second = "CD160_HVEM",
  +                            chosen_time = 1,
  +                            in_time = 0.001))
  Error in `[.data.table`(dat, "Exposure") :
    When i is a data.table (or character vector), the columns to join by must be specified using 'on=' argument (see ?data.table), by keying x (i.e. sorted, and, marked as sorted, see ?setkey), or by sharing column names between x and i (i.e., a natural join). Keyed joins might have further speed benefits on very large data due to x being sorted in RAM.
  Calls: quality_control -> unique -> [ -> [.data.table
  Execution halted

❯ checking tests ...
  See below...

❯ checking dependencies in R code ... NOTE
  Namespaces in Imports field not imported from:
    ‘DT’ ‘gsubfn’ ‘stringr’
    All declared Imports should be used.

── Test failures ───────────────────────────────────────────────── testthat ────

> library(testthat)
> library(HaDeX)
>
> test_check("HaDeX")
── 1. Error: class is right  ───────────────────────────────────────────────────
When i is a data.table (or character vector), the columns to join by must be specified using 'on=' argument (see ?data.table), by keying x (i.e. sorted, and, marked as sorted, see ?setkey), or by sharing column names between x and i (i.e., a natural join). Keyed joins might have further speed benefits on very large data due to x being sorted in RAM.
Backtrace:
 1. testthat::expect_is(...)
 4. HaDeX::quality_control(...)
 7. data.table:::`[.data.table`(dat, "Exposure")

── 2. Error: size is right  ────────────────────────────────────────────────────
When i is a data.table (or character vector), the columns to join by must be specified using 'on=' argument (see ?data.table), by keying x (i.e. sorted, and, marked as sorted, see ?setkey), or by sharing column names between x and i (i.e., a natural join). Keyed joins might have further speed benefits on very large data due to x being sorted in RAM.
Backtrace:
 1. testthat::expect_equal(...)
 6. HaDeX::quality_control(...)
 9. data.table:::`[.data.table`(dat, "Exposure")

══ testthat results  ═══════════════════════════════════════════════════════════
[ OK: 16 | SKIPPED: 0 | WARNINGS: 0 | FAILED: 2 ]
1. Error: class is right
2. Error: size is right

Error: testthat unit tests failed
Execution halted

2 errors ✖ | 0 warnings ✔ | 1 note ✖

Rework code base

Woods plot computations and plotting functions should be moved into the package.

Protein Sequence Function

Code for reconstructing covered protein sequence from the dynamix file should be available as a function

Uncertainty function - quality control

Uncertainty - treat each time (aside from t_0) point as t_100. Returns data.fram, where each row represents a single time point and its mean uncertainty.

Houde test implementation

Before our intended semiparametric test implementation:

there should be an option in the app to show Houde test for all of the time points (now is only calculated for the selected time point) as the semiparametric test uses all time points as well.

S3 objects

S3 objects and methods (print, summary) for

  1. Input data.
  2. Results of computations (e.g., Woods plot).

Butterfly plots

There should be elements:

  • function to create butterfly plot
  • function to create butterfly plot data
  • tab in GUI for butterfly plot with parameterization and tooltip
  • description in the documentation
  • helpers in GUI
  • plot generation in report

The same for the butterfly differential plot.

Major documentation changes

The reworking of the application and the package must be accompanied by major changes in the documentation.

The changes should cover:

  • #68
  • coherent use of terms in the documentation
  • #66
  • #75
  • remove current vignette
  • precise description of the data processing

I suggest more short articles than one long vignette.

Changes in the UI according to the glossary

As we rework the application, changes in UI should be made:

  • default values should be Daltons, with fractional as an option
  • labels and descriptions should be changed accordingly
  • helper should be updated and added links to the improved documentation
  • ...

All of these changes should be also included in the report!

Thank you and comment

First at all, thank you very much for your package. It has been really cool to see an R package for HDX data.
I have just started using it, so I would like to thank you all.

I have one or two comments.
###Disclaimer: I am not programmer. I just know some R and do HDX and was just using your software to plot some HDX data, since I like it, I wanted to share these ideas with you (I guess you probably already thought of it, but just in case) ####

In order to process directly the kinetic for all the peptides, I wrote this:

dat <- read_hdx(system.file(package = "HaDeX", "HaDeX/data/KD_180110_CD160_HVEM.csv"))

#check the states and proteins in the experiment. This is only for me, to list the diff states and proteins
States <- unique(dat[["State"]])
Proteins <- unique(dat[["Protein"]])
Exposure <- unique(dat[["Exposure"]])

kin_state <- function(state, protein, time_in, time_out, start, end, sequence, data) { #this function calculates the kin for a list of peptides

peptide_state <- data %>% group_by(Sequence) %>% filter(Sequence == sequence) %>% group_by(State) %>% summarise() %>% as_vector()

if (!state %in% peptide_state) { #this filters out the peptides that are not present in the state indicated
print(peptide_state)
print(paste(as.character(sequence), " does not belong to", as.character(state)))
kin <- NA
} else {
kin <- calculate_kinetics(data, #calculates kin for all the peptides for a given state in the table
protein = protein,
state = state,
sequence = sequence,
start = start,
end = end,
time_in = time_in,
time_out = time_out)

}

return(kin)

}

so that then you can have a tidy df as follows:

kin_data <- dat %>%
filter(!Sequence == "") %>%
mutate("Seq_peptide" = Sequence) %>%
group_by(Sequence) %>%
nest() %>%
mutate("kin_State1" = map(.x = data,
.f = ~kin_state(state = States[1],
protein = Proteins[1],
time_in = 0,
time_out = 1500,
start = .x$Start[1],
end = .x$End[1],
sequence = .x$Seq_peptide[1],
data = dat)),
"kin_State2" = map(.x = data,
.f = ~kin_state(state = States[2],
protein = Proteins[1],
time_in = 0,
time_out = 1500,
start = .x$Start[1],
end = .x$End[1],
sequence = .x$Seq_peptide[1],
data = dat))
)

and now, this is the main reason why I wrote. Regarding plot_kinetics, I think that it would be a good idea to have color/fill = State as well (I did it in my PC), but I think it would be useful to have it out of the box, so that you can easily do things like:

bind_rows(kin_data$kin_State1[1:8], kin_data$kin_State2[1:8]) %>%
plot_kinetics_State(theoretical = FALSE,
relative = FALSE) +
facet_grid( ~ Sequence)

210311_example_kin_state

and then you have in one plot many peptides in panels with same color for easy comparison.

Thank you very much again.

Best,
Alonso

Download deuterium uptake curves

There should be a possibility to download the deuterium curves of all the peptides (one uptake curve per peptide, with biological states), as it is usually included in a supplement of the publication.

There should be two options: download a zipped folder with all the plots separately or one file with all the plots plotted in a grid. Each plot should have in the title the peptide sequence and its position in the protein sequence.

This is a request from MD.

Supplement: deuteration

the supplement says:

After obtaining the mass of the peptide, we can compute the deuteration level depending on the chosen maximum deuteration level. The maximum deuteration can also be computed in two different ways: either as theoretical (where the maximum deuteration depends on the theoretical deuteration levels) and experimental (where the maximum deuteration is assumed to be equal to the deuteration measured at the last time point).
Experimental deuteration level
The experimental deuteration level is computed as the deuteration level of the peptide from a protein in a specific state and after incubation time 𝑡
compared to the deuteration level measured at the start of the incubation (𝑡0). It yields a value for the chosen state and chosen time 𝑡.

It would be nice to have a formula explaining how exactly we go from from average mass to deuteration, just for the sake of clarity.

reconstruct_sequence improvement

add parameter "length" to the function reconstruct_sequence to manually correct sequence length read from experimental data. this functionality is already done in app

Logic of the app to the package

The manipulation of the data to produce data frames for later use should be put into package functions.

The reactives that should contain a function call instead of data processing:

  • dat
  • aminoDist_out
  • stateOverlap_data
  • stateOverlap_out - plot on stateOverlap_data
  • stateOverlapDist_data
  • stateOverlapDist - plot on stateOverlapDist_data
  • comparison_plot_data_* - this four elements can be in one function
  • comparison_plot_* - this four elements can be in one function - plot on the data above
  • woods_plot_dat
  • differential_plot_data_* - this four elements can be in one function
  • differential_plot_* - this four elements can be in one function - plot on the data above
  • kin_dat
  • kin_plot_*_data
  • kin_plot_*
  • quality_control_dat
  • quality_control_plot_data_out
  • qc_out - plot on quality_control_plot_data_out
  • summary_data

For tests should be generated test data - from actual version the data from plots (available to download from the app) should be in separate csv files.

Assumptions about the task:

  • Test Driven Development (checkmate advised)
  • ensure coherency of the names within the functions
  • data.table instead of dplyr
  • all the commits related to this task should be mentioned below
  • working changes on branch dev_1.5, pull request from fork
  • documentation is crucial

Renaming variables according to glossary

Primary code is hard to read due to messy-named variables (that's on me). Once we reach an agreement on the glossary project, the variables should be renamed for consistency.

Add tables to charts

Add tables as tabs for charts (nested tabs). Tables should be created using DT template.

Article about data visualization

An article for the documentation about our methods of data visualization.
It should include the description, construction, and examples of:

  • comparison plot
  • woods plot
  • deuterium uptake
  • quality control
  • butterfly plot
  • butterfly differential plot
  • volcano plot

Expand vignette

Add all equations along with the exemplary code and plots. The vignette should parse to the pdf.

Production ready plots

The parametrization of the label size is useful and allows the production of readable descriptions.

Every text input for a label (axis/title) should be accompanied by the select input of the size for that object. The size of the legend is the same as for x axis.

Code upgrade

  • make sure that variable names are coherent
  • remove unnecessary code
  • look for simpler solutions

Personalize plot with color and label

Additional personalization : user can choose colors of the plot and change labels if there is need.

It is not default - enabled by action button.

Volcano plot

There should be:

  • a function to generate volcano plot
  • a function to generate volcano plot data
  • a tab in GUI to present this plot with parameterization and tooltips
  • description in the documentation
  • helpers in the GUI
  • option in the report

report generation

.rmd template for html reports

  1. FIle name + md5 sum
  2. Checkboxes for parts of the report.

Comparison of replicates

There should be a comparison of data measured in repetitions of the experiment.

The comparison should be performed for a specific time point, as the measurements are repeated n times in each time point and can be treated as somehow separate experiments.

This comparison should be implemented as a function and available in the GUI, as it allows to spot the differences between replicates and possibly disqualify a replicate.

Rework package functions

For consistency, there should be made small changes:

  • create_uptake_dataset() and create_diff_uptake_dataset() functions to produce data.frames with data based on supplied parameters. this is the input for all possible uptake plots. this integrates generate_*_datset() functions.
  • generate_plot should be renamed to plot as it is basic functionality of the package
  • there should be one (or maybe two, for diff) function to generate data frame for data tabs in UI. This is al the same data, no matter the plot type. Additional ID columns is making no harm to anybody. then, the generate_*_data() functions are not necessary.

I want to keep generate_* functions as internal and keep only necessary ones. Some of already existing ones should not be internal.

Update report generation

At the end of all changes, the report template should be updated to include all new features.

Reworking functions in the package to employ data.table and stringi

All of the existing functions (except for the deprecated ones) should be re-written with data.table and stringi instead of dplyr and stringr. This is also an opportunity to brush up the documentation of each function and provide complete test coverage (aim for the green).

The current summary of the situation can be found in the document.

This issue also covers updates of tests.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.