hadexversum / hadex Goto Github PK

View Code? Open in Web Editor NEW

8.0 8.0 2.0 25.58 MB

Analysis and Visualisation of Hydrogen/Deuterium Exchange Mass Spectrometry Data

Home Page: http://mslab-ibb.pl/shiny/HaDeX/

R 83.65% CSS 13.74% JavaScript 2.61%

hdx-ms mass-spectrometry shiny-r

hadex's People

Contributors

Stargazers

Watchers

Forkers

romainfrancois krystynagrzesiak

hadex's Issues

Input file validation

Package function to check what kind of file is uploaded by user and change it if needed (calculate and aggregate).

DynamiX can produce different files - HaDeX should be able to work with all of them.

Kinetic plots

research
function to calculate + plot results
tab in app

Mass calculation

The calculation of the mass (aggregation of measurements for different charge values) should be available as a separate function, not only as a part of calculating state deuteration.

multiple protein files

analyse two different proteins from one file (overlapping sequences) plus others
two or more different proteins from one file

Additional information for user

Add basic description how theoretical calucations work (theoretical ins and outs). This information is included in the vignette but we're not sure the potential user will check them out or just get discouraged. How to do it cleverly?

Plot download

I have added download button for one of our plots https://github.com/hadexversum/HaDeX/blob/master/inst/HaDeX/server.R#L889-L901. It is also related to moving the part of the code responsible for the plot generation into the plots directory. Would you mind dealing with other plots?

Back-exchange included in calculations

As an option - include back-exchange effects in experimental calculations as proposed in "Recommedations for ... HDX-MS experiments" by Rand and others

Results should be similar to theoretical calculations but would be nice to compare them.

Add redundancy

propagate c-end information

Replicate analysis into package

Add package functions for replicate analysis (tab Replicates) from the application.

dplyr 1.0.0

We're seeing these errors when testing HaDeX against the soon to be released version of dplyr (1.0.0):

[master*] 126.1 MiB ❯ revdepcheck::revdep_details(, "HaDeX")
══ Reverse dependency check ═══════════════════════════════════════ HaDeX 1.1 ══

Status: BROKEN

── Still failing

✖ checking dependencies in R code ... NOTE

── Newly failing

✖ checking examples ... ERROR
✖ checking tests ...

── Before ──────────────────────────────────────────────────────────────────────
❯ checking dependencies in R code ... NOTE
  Namespaces in Imports field not imported from:
    ‘DT’ ‘gsubfn’ ‘stringr’
    All declared Imports should be used.

0 errors ✔ | 0 warnings ✔ | 1 note ✖

── After ───────────────────────────────────────────────────────────────────────
❯ checking examples ... ERROR
  Running examples in ‘HaDeX-Ex.R’ failed
  The error most likely occurred in:

  > ### Name: quality_control
  > ### Title: Experiment quality control
  > ### Aliases: quality_control
  >
  > ### ** Examples
  >
  > # load example data
  > dat <- read_hdx(system.file(package = "HaDeX", "HaDeX/data/KD_180110_CD160_HVEM.csv"))
  >
  > # calculate mean uncertainty
  > (result <- quality_control(dat = dat,
  +                            state_first = "CD160",
  +                            state_second = "CD160_HVEM",
  +                            chosen_time = 1,
  +                            in_time = 0.001))
  Error in `[.data.table`(dat, "Exposure") :
    When i is a data.table (or character vector), the columns to join by must be specified using 'on=' argument (see ?data.table), by keying x (i.e. sorted, and, marked as sorted, see ?setkey), or by sharing column names between x and i (i.e., a natural join). Keyed joins might have further speed benefits on very large data due to x being sorted in RAM.
  Calls: quality_control -> unique -> [ -> [.data.table
  Execution halted

❯ checking tests ...
  See below...

❯ checking dependencies in R code ... NOTE
  Namespaces in Imports field not imported from:
    ‘DT’ ‘gsubfn’ ‘stringr’
    All declared Imports should be used.

── Test failures ───────────────────────────────────────────────── testthat ────

> library(testthat)
> library(HaDeX)
>
> test_check("HaDeX")
── 1. Error: class is right  ───────────────────────────────────────────────────
When i is a data.table (or character vector), the columns to join by must be specified using 'on=' argument (see ?data.table), by keying x (i.e. sorted, and, marked as sorted, see ?setkey), or by sharing column names between x and i (i.e., a natural join). Keyed joins might have further speed benefits on very large data due to x being sorted in RAM.
Backtrace:
 1. testthat::expect_is(...)
 4. HaDeX::quality_control(...)
 7. data.table:::`[.data.table`(dat, "Exposure")

── 2. Error: size is right  ────────────────────────────────────────────────────
When i is a data.table (or character vector), the columns to join by must be specified using 'on=' argument (see ?data.table), by keying x (i.e. sorted, and, marked as sorted, see ?setkey), or by sharing column names between x and i (i.e., a natural join). Keyed joins might have further speed benefits on very large data due to x being sorted in RAM.
Backtrace:
 1. testthat::expect_equal(...)
 6. HaDeX::quality_control(...)
 9. data.table:::`[.data.table`(dat, "Exposure")

══ testthat results  ═══════════════════════════════════════════════════════════
[ OK: 16 | SKIPPED: 0 | WARNINGS: 0 | FAILED: 2 ]
1. Error: class is right
2. Error: size is right

Error: testthat unit tests failed
Execution halted

2 errors ✖ | 0 warnings ✔ | 1 note ✖

New css

Possible css templates?

Rework code base

Woods plot computations and plotting functions should be moved into the package.

Two files comparison

amino coverage comparison from two IAO files in two colors

Protein Sequence Function

Code for reconstructing covered protein sequence from the dynamix file should be available as a function

Uncertainty function - quality control

Uncertainty - treat each time (aside from t_0) point as t_100. Returns data.fram, where each row represents a single time point and its mean uncertainty.

Differential HDX

Houde test implementation

Before our intended semiparametric test implementation:

there should be an option in the app to show Houde test for all of the time points (now is only calculated for the selected time point) as the semiparametric test uses all time points as well.

Mean mass instead of monoisotopic mass

S3 objects

S3 objects and methods (print, summary) for

Input data.
Results of computations (e.g., Woods plot).

Butterfly plots

There should be elements:

function to create butterfly plot
function to create butterfly plot data
tab in GUI for butterfly plot with parameterization and tooltip
description in the documentation
helpers in GUI
plot generation in report

The same for the butterfly differential plot.

Major documentation changes

The reworking of the application and the package must be accompanied by major changes in the documentation.

The changes should cover:

I suggest more short articles than one long vignette.

fileInput file extensions

Right now fileInput accepts only csv files. Should we extend it to xlsx?

Changes in the UI according to the glossary

As we rework the application, changes in UI should be made:

default values should be Daltons, with fractional as an option
labels and descriptions should be changed accordingly
helper should be updated and added links to the improved documentation
...

All of these changes should be also included in the report!

Thank you and comment

First at all, thank you very much for your package. It has been really cool to see an R package for HDX data.
I have just started using it, so I would like to thank you all.

I have one or two comments.
###Disclaimer: I am not programmer. I just know some R and do HDX and was just using your software to plot some HDX data, since I like it, I wanted to share these ideas with you (I guess you probably already thought of it, but just in case) ####

In order to process directly the kinetic for all the peptides, I wrote this:

dat <- read_hdx(system.file(package = "HaDeX", "HaDeX/data/KD_180110_CD160_HVEM.csv"))

#check the states and proteins in the experiment. This is only for me, to list the diff states and proteins
States <- unique(dat[["State"]])
Proteins <- unique(dat[["Protein"]])
Exposure <- unique(dat[["Exposure"]])

kin_state <- function(state, protein, time_in, time_out, start, end, sequence, data) { #this function calculates the kin for a list of peptides

peptide_state <- data %>% group_by(Sequence) %>% filter(Sequence == sequence) %>% group_by(State) %>% summarise() %>% as_vector()

if (!state %in% peptide_state) { #this filters out the peptides that are not present in the state indicated
print(peptide_state)
print(paste(as.character(sequence), " does not belong to", as.character(state)))
kin <- NA
} else {
kin <- calculate_kinetics(data, #calculates kin for all the peptides for a given state in the table
protein = protein,
state = state,
sequence = sequence,
start = start,
end = end,
time_in = time_in,
time_out = time_out)

}

return(kin)

}

so that then you can have a tidy df as follows:

kin_data <- dat %>%
filter(!Sequence == "") %>%
mutate("Seq_peptide" = Sequence) %>%
group_by(Sequence) %>%
nest() %>%
mutate("kin_State1" = map(.x = data,
.f = ~kin_state(state = States[1],
protein = Proteins[1],
time_in = 0,
time_out = 1500,
start = .x$Start[1],
end = .x$End[1],
sequence = .x$Seq_peptide[1],
data = dat)),
"kin_State2" = map(.x = data,
.f = ~kin_state(state = States[2],
protein = Proteins[1],
time_in = 0,
time_out = 1500,
start = .x$Start[1],
end = .x$End[1],
sequence = .x$Seq_peptide[1],
data = dat))
)

and now, this is the main reason why I wrote. Regarding plot_kinetics, I think that it would be a good idea to have color/fill = State as well (I did it in my PC), but I think it would be useful to have it out of the box, so that you can easily do things like:

bind_rows(kin_data$kin_State1[1:8], kin_data$kin_State2[1:8]) %>%
plot_kinetics_State(theoretical = FALSE,
relative = FALSE) +
facet_grid( ~ Sequence)

and then you have in one plot many peptides in panels with same color for easy comparison.

Thank you very much again.

Best,
Alonso

Download deuterium uptake curves

There should be a possibility to download the deuterium curves of all the peptides (one uptake curve per peptide, with biological states), as it is usually included in a supplement of the publication.

There should be two options: download a zipped folder with all the plots separately or one file with all the plots plotted in a grid. Each plot should have in the title the peptide sequence and its position in the protein sequence.

This is a request from MD.

Adjust calculations and plots to files with more than two states

For comparison plot: show all states (user can choose which states to see simultaneously in checkbox, default : all)

For differential plot: user chooses two states to compare

Supplement: deuteration

the supplement says:

After obtaining the mass of the peptide, we can compute the deuteration level depending on the chosen maximum deuteration level. The maximum deuteration can also be computed in two different ways: either as theoretical (where the maximum deuteration depends on the theoretical deuteration levels) and experimental (where the maximum deuteration is assumed to be equal to the deuteration measured at the last time point).
Experimental deuteration level
The experimental deuteration level is computed as the deuteration level of the peptide from a protein in a specific state and after incubation time 𝑡
compared to the deuteration level measured at the start of the incubation (𝑡0). It yields a value for the chosen state and chosen time 𝑡.

It would be nice to have a formula explaining how exactly we go from from average mass to deuteration, just for the sake of clarity.

reconstruct_sequence improvement

add parameter "length" to the function reconstruct_sequence to manually correct sequence length read from experimental data. this functionality is already done in app

Logic of the app to the package

The manipulation of the data to produce data frames for later use should be put into package functions.

The reactives that should contain a function call instead of data processing:

dat
aminoDist_out
stateOverlap_data
stateOverlap_out - plot on stateOverlap_data
stateOverlapDist_data
stateOverlapDist - plot on stateOverlapDist_data
comparison_plot_data_* - this four elements can be in one function
comparison_plot_* - this four elements can be in one function - plot on the data above
woods_plot_dat
differential_plot_data_* - this four elements can be in one function
differential_plot_* - this four elements can be in one function - plot on the data above
kin_dat
kin_plot_*_data
kin_plot_*
quality_control_dat
quality_control_plot_data_out
qc_out - plot on quality_control_plot_data_out
summary_data

For tests should be generated test data - from actual version the data from plots (available to download from the app) should be in separate csv files.

Assumptions about the task:

Test Driven Development (checkmate advised)
ensure coherency of the names within the functions
data.table instead of dplyr
all the commits related to this task should be mentioned below
working changes on branch dev_1.5, pull request from fork
documentation is crucial

Create log files to monitor traffic

Short log entries to check if tool is used (not by use).

My proposition : when, size file, maybe ip?, if raport was generated

Renaming variables according to glossary

Primary code is hard to read due to messy-named variables (that's on me). Once we reach an agreement on the glossary project, the variables should be renamed for consistency.

DT paging

Plot zooming

Confidence limits

Confidence limits and t-tests from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3164548/ Note that this article comes with their own error propagation method.

PyMol data export

Add tables to charts

Add tables as tabs for charts (nested tabs). Tables should be created using DT template.

Article about data visualization

An article for the documentation about our methods of data visualization.
It should include the description, construction, and examples of:

comparison plot
woods plot
deuterium uptake
quality control
butterfly plot
butterfly differential plot
volcano plot

Expand vignette

Add all equations along with the exemplary code and plots. The vignette should parse to the pdf.

Production ready plots

The parametrization of the label size is useful and allows the production of readable descriptions.

Every text input for a label (axis/title) should be accompanied by the select input of the size for that object. The size of the legend is the same as for x axis.

Code upgrade

make sure that variable names are coherent
remove unnecessary code
look for simpler solutions

Check spelling with spelling package

https://cran.r-project.org/web/packages/spelling/index.html :)

Change state name

Change state name from gui and propagate changes everywhere

Personalize plot with color and label

Additional personalization : user can choose colors of the plot and change labels if there is need.

It is not default - enabled by action button.

Volcano plot

There should be:

a function to generate volcano plot
a function to generate volcano plot data
a tab in GUI to present this plot with parameterization and tooltips
description in the documentation
helpers in the GUI
option in the report

report generation

.rmd template for html reports

FIle name + md5 sum
Checkboxes for parts of the report.

Comparison of replicates

There should be a comparison of data measured in repetitions of the experiment.

The comparison should be performed for a specific time point, as the measurements are repeated n times in each time point and can be treated as somehow separate experiments.

This comparison should be implemented as a function and available in the GUI, as it allows to spot the differences between replicates and possibly disqualify a replicate.

Rework package functions

For consistency, there should be made small changes:

create_uptake_dataset() and create_diff_uptake_dataset() functions to produce data.frames with data based on supplied parameters. this is the input for all possible uptake plots. this integrates generate_*_datset() functions.
generate_plot should be renamed to plot as it is basic functionality of the package
there should be one (or maybe two, for diff) function to generate data frame for data tabs in UI. This is al the same data, no matter the plot type. Additional ID columns is making no harm to anybody. then, the generate_*_data() functions are not necessary.

I want to keep generate_* functions as internal and keep only necessary ones. Some of already existing ones should not be internal.

Update report generation

At the end of all changes, the report template should be updated to include all new features.

Reworking functions in the package to employ data.table and stringi

All of the existing functions (except for the deprecated ones) should be re-written with data.table and stringi instead of dplyr and stringr. This is also an opportunity to brush up the documentation of each function and provide complete test coverage (aim for the green).

The current summary of the situation can be found in the document.

This issue also covers updates of tests.