Git Product home page Git Product logo

nmdata's Introduction

NMdata

Lifecycle: experimental R-CMD-check Codecov test coverage CRAN status

A fast R package for efficient data preparation, consistency-checking and post-processing in PK/PD modeling

Pharmacometrics and PK/PD modeling offers unique information for decision-making in several steps of drug development. However, it often takes a lot of work to get there, and there are many pitfalls along the way. NMdata helps simplifying this work and steering around the pitfalls or at least making sure we didn’t fall into them.

Automate book keeping and focus on modeling

Preparing data sets - and if you use NONMEM, reading the results data - can be tedious, and mistakes can lead to hours of frustration. NMdata provides useful tools (including automated checks) for these trivial tasks.

NMdata is not a silo

Any functionality in the NMdata can be used independently of the rest of the package, and NMdata is not intended to force you to change any habits or preferences. Instead, NMdata tries to fit in with how you (or your colleague who worked on the project before you) do things. It likely provides helpful additions no matter what other tools you already use.

More information

The best place to browse information about the package is here. The quickest way in is the Cheatsheet.

CheatSheet

How to install

NMdata is on CRAN and MPN. To install from the package archive you are already using, do:

install.packages("NMdata")
library(NMdata)

See further below for instructions on how to install from other sources than your default archive, if need be.

Prepare, check, and export PK/PD data

On the data-generation side, functionality is provided for documentation of the datasets while generating them. Check out this vignette on the topic. There are functions for automatic checks of (some) data merges, handling and counting of exclusions flags, final preparations for ensuring readability in NONMEM, and ensuring traceability of datasets back to data generation scripts.

Check data as read by NONMEM

The NMcheckData function will do an extensive and fully automated set of checks of the data before you run NONMEM. And did NONMEM not behave? NMcheckData can debug the data as seen by NONMEM. That’s right - it has never been easier to find data bugs.

Automated and general reader of NONMEM results data

Reading the resulting data from NONMEM can require a few manual steps. Especially because all modelers seem to do things a little differently. NMscanData can return all data output ($TABLE) from NONMEM combined, and if wanted with additional columns and rows in input data. It’s as simple as

res <- NMscanData("xgxr014.lst",recover.rows=TRUE)
#> Model:  xgxr014 
#> 
#> Used tables, contents shown as used/total:
#>               file      rows columns     IDs
#>    xgxr014_res.txt   905/905   12/12 150/150
#>  xgxr2.rds (input) 1502/1502   22/24 150/150
#>           (result)      1502    34+2     150
#> 
#> Input and output data merged by: ROW 
#> 
#> Distribution of rows on event types in returned data:
#>  EVID CMT input-only output result
#>     0   1          2      0      2
#>     0   2        595    755   1350
#>     1   1          0    150    150
#>   All All        597    905   1502

And we are ready to plot (a subset of) the result:

res.plot <- subset(res,ID%in%c(113,135)&EVID==0)
library(ggplot2)
ggplot(res.plot,aes(TIME))+
    geom_point(aes(y=DV,colour=flag))+
    geom_line(aes(y=PRED))+
    facet_wrap(~trtact)+
    labs(y="Concentration (unit)",colour="Observations",
         subtitle="NOTICE:\nObservations are coloured by a character column fetched from input data.\nSamples below LLOQ are rows added from input data.\nPlots are correctly sorted because factor levels of dose are preserved from input data.")+
    theme_bw()+theme(legend.position="bottom")
#> Warning: Removed 2 rows containing missing values (`geom_line()`).

Want a tibble instead? Easy:

res.tibble <- NMscanData("xgxr001.lst",as.fun=tibble::as_tibble,quiet=TRUE)

Or a data.table? This time, we’ll configure NMdata to return data.tables by default:

NMdataConf(as.fun="data.table")
res.dt <- NMscanData("xgxr001.lst",quiet=TRUE)

NMscanData is very general, and should work with all kinds of models, and all kinds of other software and configurations. Take a look at this vignette for more info on the NONMEM data reader. Then you will learn how to access the meta data that will allow you to trace every step that was taken combining the data and the many checks that were done along the way too.

Meta analysis made really easy

Since NMscanData is so general and will figure out where to find input and output data on its own, let’s use the NMscanMultiple wrapper to read multiple models and compare their predictions.

res <- NMscanMultiple(dir=system.file("examples/nonmem", package="NMdata"),
file.pattern="xgxr.*\\.lst",as.fun="data.table",quiet=TRUE)
#> Input and output data were searched for candidate unique row identifiers. None
#> found. To skip this check, please use merge.by.row=TRUE or merge.by.row=FALSE.
#> 
#> 
#> Overview of model scanning results:
#>                                                                            lst
#> 1: /tmp/Rtmp82nZom/temp_libpath1370126d22f6/NMdata/examples/nonmem/xgxr001.lst
#> 2: /tmp/Rtmp82nZom/temp_libpath1370126d22f6/NMdata/examples/nonmem/xgxr002.lst
#> 3: /tmp/Rtmp82nZom/temp_libpath1370126d22f6/NMdata/examples/nonmem/xgxr003.lst
#> 4: /tmp/Rtmp82nZom/temp_libpath1370126d22f6/NMdata/examples/nonmem/xgxr014.lst
#> 5: /tmp/Rtmp82nZom/temp_libpath1370126d22f6/NMdata/examples/nonmem/xgxr018.lst
#>    nrows ncols success warning
#> 1:   905    40    TRUE   FALSE
#> 2:   905    34    TRUE   FALSE
#> 3:   905    34    TRUE   FALSE
#> 4:   905    36    TRUE   FALSE
#> 5:   905    33    TRUE   FALSE
gmean <- function(x)exp(mean(log(x)))
res.mean <- res[,.(gmeanPRED=gmean(PRED)),by=.(model,NOMTIME)]
obs.all <- unique(res[,.(ID,NOMTIME,TIME,DV)])
ggplot(res.mean,aes(NOMTIME,gmeanPRED,colour=model))+geom_line()+
    geom_point(aes(TIME,DV),data=obs.all,inherit.aes=FALSE)+
    scale_y_log10()+
    labs(x="Time",y="Concentration",subtitle="Comparison of population predictions")
#> Warning: Transformation introduced infinite values in continuous y-axis
#> Transformation introduced infinite values in continuous y-axis

Get the most recent version

If your archive has not been updated since July 2021, you may not find NMdata if you try to install with install.packages (option 1). In that case you have two other options. You can explicitly select CRAN for the installation. Or if you should want a version that has not yet reached CRAN, installing from Github is easy too.

## Option 2: Install explicitly from CRAN
install.packages("NMdata",repos="https://cloud.r-project.org")
library(NMdata)

## Option 3: Install from github
library(remotes)
install_github("philipdelff/NMdata")
library(NMdata)

If you use the Github version, you may want to see the FAQ for how to install specific releases from Github (ensuring reproducibility).

Questions?

Check the FAQ, or ask on github

Issues?

The best way to report a bug or to request features is on github.

Code of Conduct

Please note that the patchwork project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

nmdata's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

nmdata's Issues

more precise duplicate checking

Hello,

My name is Eric Anderson and I work at Metrum Research Group as a Data Scientist. For starters, I just want to mention that I find this package very interesting and useful.

The function I have been exploring the most is NMdata::NMcheckData().

I have a potential feature request regarding the duplicate checking. It seems like right now the function checks across these columnsID, CMT, EVID, TIME.

Sometimes I work with data sets that have additional columns that define unique rows (e.g. DVID, DRUGID, etc). Have you considered adding an argument to the function that allows for this?

Feature Request: NMscanData() to give more info on error

I just got the error "After applying filters to input data, the resulting number of rows differ from the number of rows in output data"

It would help track down the source of the error if the number of rows in the input and output were reported. (I realize that the NMdata-preferred solution is to use a ROWID column. I'm trying to work within someone else's data management for the moment.)

My preferred error would look something like the following:

After applying filters to input data, the resulting number of rows differ (input = 123 rows) from the number of rows in output data (output = 456 rows)

Feature Request: Add `col.dv`, `col.mdv`, and `col.amt` arguments to `NMcheckData()`

This is somewhat related to #30 (in that it's about column naming in NMcheckData()).

I like how NMcheckData() does not require the default column names as stated in #30, I often use names that are closer to the SDTM and ADaM source data names for easier tracking back to the origin.

With that, it appears that NMcheckData() has column renaming for many but not all columns that are checked. The columns that do not appear to have the ability to use other names that I've found so far appear to be col.dv, col.mdv, and col.amt. col.evid also doesn't appear to exist, but that doesn't seem like a name I'd use something different for. (So, col.evid may be of interest for completeness, but it's not a real need to me.)

A simple workaround that I'm doing right now is simply renaming the columns as they go into NMcheckData() which is not a significant hardship.

`col.id` not used with `NMcheckData()`

As you can tell, I'm using NMdata on some real projects now. So, I'm having lots of good thoughts about it! :)

I often use nonstandard names for NONMEM column names because I prefer keeping them closer to the SDTM and ADaM-like names in source data. With that, I found that the col.id argument does not appear to be used with NMcheckData() based on the fact that it says the ID column is not found (when I think it should not be expected):

library(NMdata)
#> Warning: package 'NMdata' was built under R version 4.1.3
#> Welcome to NMdata. Best place to browse NMdata documentation is
#> https://philipdelff.github.io/NMdata/
library(tidyverse)
#> Warning: package 'ggplot2' was built under R version 4.1.3
#> Warning: package 'tibble' was built under R version 4.1.3
#> Warning: package 'dplyr' was built under R version 4.1.3
dat <- readRDS(system.file("examples/data/xgxr2.rds", package="NMdata"))

dat2 <-
  dat %>%
  rename(
    USUBJIDN=ID
  )
NMcheckData(dat2, col.id="USUBJIDN")
#>  column              check  N Nid
#>    EVID Subject has no obs 19   0
#>      ID   Column not found  1   0
#>     MDV   Column not found  1   0

Created on 2022-05-17 by the reprex package (v2.0.1)

new test failure using data.table github master

hi @philipdelff
When I run R CMD check on NMdata with the new version of data.table from github master, I get the following new error which is not present when using data.table release version from CRAN:

* checking tests ...
  Running 'spelling.R'
  Running 'testthat.R'
 ERROR
Running the tests in 'tests/testthat.R' failed.
Last 13 lines of output:
   2.   \-testthat::expect_known_value(..., update = update)
  -- Failure ('test_NMscanInput.R:152'): CYCLE=DROP ------------------------------
  `nm1` has changed from known value recorded in 'testReference/NMscanInput_7.rds'.
  Component "input.colnames": Attributes: < Names: 1 string mismatch >
  Component "input.colnames": Attributes: < Length mismatch: comparison on first 2 components >
  Component "input.colnames": Attributes: < Component 2: Attributes: < target is NULL, current is list > >
  Component "input.colnames": Attributes: < Component 2: Numeric: lengths (24, 0) differ >
  Backtrace:
      x
   1. \-testthat::expect_equal_to_reference(nm1, fileRef, version = 2) at test_NMscanInput.R:152:4
   2.   \-testthat::expect_known_value(..., update = update)
  
  [ FAIL 6 | WARN 0 | SKIP 0 | PASS 210 ]
  Error: Test failures
  Execution halted

Looking at your tests, it seems that you expect that some computed value from your code is equal to the value stored in a RDS file. The input.colnames table has a new attribute named "index" which is causing the error, since the stored RDS value has no such attribute.
Can you please update your tests and/or code so that this ERROR goes away, and then submit a new version to CRAN?
This will help facilitate releasing a new version of data.table to CRAN. (data.table devs must make sure all reverse dependencies do not break, before submitting a new version to CRAN)

Cannot use NMscanData with `use.input=FALSE`

I'm trying to make a reprex for this, but the quick issue is:

When I use the following to load several data files, I get the following error:

NMdata::NMscanData("NONMEM/PK_akrv2/nash18.lst", use.input=FALSE)
# Warning in NMreadTab(meta[I, file], quiet = TRUE, tab.count = tab.count,  :
#   Duplicated column names found: DV. Cleaning.
# Warning in NMreadTab(meta[I, file], quiet = TRUE, tab.count = tab.count,  :
#   Duplicated column names found: ET_QCENTP. Cleaning.
# Error in file.exists(file.mod) : invalid 'file' argument

When I traced the error a bit, it appears that file.mod is NULL. There are several places within NMscanData() where file.exists(file.mod) is called, and I don't immediately see which is causing the problem.

Feature request: NMcheckData look for periods

Most of the NONMEM data that I work with codes missing data as a period ("."). When I make an error with how I'm loading the data in R (mainly, when I'm not thinking about the loading carefully and just use read.csv() without modification), I will have those periods in the data.

I think it would be a useful feature to have NMcheckData() check and see if cells in the data have periods and suggest that maybe those should be converted to NA with code like type_convert(data, na=c(".", "NA")).

Extract subsets of data related to NMcheckData findings

After running NMcheckData, it would be convenient to have a way to extract subsets of data for plotting or "data scrolling". I am starting a discussion on what such a function could look like. For a "row-level" finding, one may want to extract all data related to the subjects affected, and plot involved columns.

Inputs appreciated!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.