adibender / pammtools Goto Github PK
View Code? Open in Web Editor NEWPiece-wise exponential Additive Mixed Modeling tools
Home Page: https://adibender.github.io/pammtools/
License: Other
Piece-wise exponential Additive Mixed Modeling tools
Home Page: https://adibender.github.io/pammtools/
License: Other
get_cumu_coef
doesn't work for cox.aalen
modelsThe way its currently imlemented include_last
argument doesn't make sense as it's only used when cut
is unspecified, which will lead to last interval cut point being set to last censored event time. The last interval then by default will be empty (without) events and could potentially disrupt estimation.
add_cumu_hazard
, otherwise keep columns.- same holds for add_surv_prob
and cumu_hazard
columns.
... should not have to be a quosure. Users should be able to specify character or bare variable name
.data
pronouns instead of utils::globalVariables
aes_sring
where possible~mean(., na.rm=T)
instead of ~funs(mean(., na.rm=T))
Would be nice to have a suit of functions that facilitate visualization of
For the simulated data, functions could directly access stored true effects, if these are stored separately
in the data frame or access the simulation formula stored in the attributes of the simulated data and apply to newdata (possibly created by make_newdata
).
Models ares usually estimated with baseline hazards evaluated at interval end points,
thus predictions and fits also return hazard at interval endpoints.
For plotting purposes, however, hazard and cumulative hazard should start at (x=0, y=0).
@fabian-s I think its time to think more generally about how we call the functions for data transformation.
Right now we use Surv(time, status)~ ... + ... | cumulative(...)
,
although the Surv(time, status)
part is only a mirrage, as we don't really do anything Surv
specific with it, except extracting the event time and status variables, while the usual functionality of Surv()
is not supported, e.g.
Surv(time, status == 2) ~ ...
(see also #31) or
Surv(time1, time2, status) ~ ...
e.g. for left truncated data,
etc.
This will also be relevant when/if we extend the functionality to competing risks/multistate models, in which case we need to support calls like
as_ped(Surv(time, event1) | Surv(time, event2) ~ lin_pred_event1 | lin_pred_event2
This could be done nicely using the Formula
functionality, but we already use |
on the RHS to differentiate between cumulative effects and "normal" effects ~ ... + ... | cumulative()
.
The latter may not be necessary, as we can simply extract cumulative via the specials
function?
gg_slice
gg_cumu_eff
length.out
argument can not be setlapply
to purrr::map
tidyeval
ci
logical argument that controls if ci should be returned@adibender @hoarzpassey :
Michel (checkmate-Autor) hat mich drauf aiufmerksam gemacht dass unser tutorial paper was wir auf dem README verlinken hinter ner paywall ist -- was haltet ihr davon den preprint auch noch nach arxiv zu laden und das stattdessen dort zu verlinken...?
Can use ' \Sexpr[results=rd, stage=render]{rlang:::lifecycle("maturing")}
in the description field of the roxygen documentation of each function to produce respective badges in man pages.
See https://www.tidyverse.org/lifecycle/ for possibilities
e.g. ci_lower
in add_hazard
vs. low
in tidy_smooth
.
as_ped(.... cumulative(...
works for simulated data, because sim_pexp
returns complete exposure history regardless of simulated event timeNA
column names + entriesto replicate devault visualization of aalen models, e.g., plot.aalen
from the timereg
package.
Currently, vignettes, examples etc. use different functions to create PED data
as_ped
to be able to create PED data with concurrent TDCs, currently implemented in split_tdc
(see https://adibender.github.io/pammtools/articles/tdcovar.html#analysis-of-the-pbc-data). Could be implemented by introducing a new special concurrent
, i.e.list(pbc, pbcseq) %>% as_ped(Surv(time, status)~.|concurrent(bili, protime, te_var = "day"), ...)
Adjust all examples
Adjust vignettes
in as_ped
, need better handling of cut
if unspecified. Currently cut selected for TDC data not the same as the one selected by split_data
-[ ] maybe have as_ped
return nested dfs by default? -> better printing, etc.
func
to cumulative
e.g.,
prodlim
only needed by pec
, which is currently not implemented in pammtools
(also pec
)survival
only needed for survSplit
(that could be done manually), to load data in examples (which could be replaced by internal data) and vignettes, that won't be submitted to CRAN anyway (only HP via pkgdown
msm
(only needed to simulate data from PEXP), however, I'd prefer to make a separate package for survival data simulationmodelr
is not developed anymore and only needed for seq_range
(seq_range
copied to the package)RColorBrewer
, scam
, coxme
, knitr
, rmarkdown
only needed for vignettes (that will not be submitted to CRAN)usually, PED data is evaluated at tend
or intmid
, thus when plotting using ggplot2 geom_line
for example, the line will start at tend
of the first interval, hower,
geom_surv
)geom_hazard
)geom_hazard
?)geom_step(..., direction="hv")
sufficient? Or should it start with a vertical line from t=0, h(0) = 0 to t=0, h(0) = h(t_1)... to avoid confusion with mgcv::te
E.g.
Looks like it doesn't print tibbles properly. Do I need to import smthing or is it a pkgdown bug?
@fabian-s Do you know whats going on?
Warning in add_hazard
when provided time variable doesn't match times used for fitting to explicit. Clutters output + omits informative text at the end of the warning.
Put values at the end of the warning + only first 10 values
Currently
get_func(data, ~ func(t-te, x) + func(t-te, z))
creates matrices Latency.x
, LL.x
, etc. , which potentially uses up a lot of memory.
If ll_fun
is equal across all func
terms, only one representation of each T
, TE
, LL
and Latency
should be created.
tbl_df
is deprecated, replace with tibble::as_tibble
ci
argument cannot be changed b/c hard codedcum*
and surv*
names by adding separating _
There are mainly 3 different types of Data Transformation required to fit PA(M)Ms:
Ideally, after data transformation user should be abel to fit the model directly.
The first two are basically solved and described in the Time-dependent covariates vignette.
For the third, we need a general function, that should be able to create data for different model types/types of cumulative effects, e.g.
...+ s(t-te, by=z*sex*LL) + ...
For the later, the sex
covariate would also need to be transformed into a matrix.
Thus, a general Trafo function would have to have formula interface or similar, similar to the mgcv formula, e.g.
data %>% as_functional("x1", x_eff(t-te, x1, by = sex, ll_fun = function(t, te) {te <= t})
which would create the necessary columns.
Currently, as_ped
is the main function for data transformation, but has only two arguments documented
data
formula
All other arguments are passed to split_data
via ellipsis and split_data
is an external function
Currently ll_fun
argument to concurrent
is ignored. It should be possible to specify lag and lead times for concurrent effects similarly to cumulative
Currently
sample_info
doesn't work on PED data with matrix columsmake_newdata
also doesn't work and must be redifined to usefully work for data with list/matrix columnsSee also #29, which would benefit from useful make_newdata
implementation for such data.
in func
, use the actual variable names for t
and te
.
Benefits:
base::t
and mgcv::te
Disadvantage:
PatientDied
)Currently calls like
data(tTRACE, package = "timereg")
as_ped(Surv(time, status == 9)~., data = tTRACE)
do not work
currently, one can not specify func(..., suffix = "")
as this is the default and will be ignored and
the te_var
argument used as suffix instead.
suffix=NULL
as defaultAlso, maybe check number of func
components in the formula, such that suffix is only appended if suffix
specified or when more than one func
component in formula.
I've recently discovered and fixed a bug w.r.t. missing checks in lists in mllg/checkmate#146. The bugfix now tests lists for elements being identical to NULL which comes closest to a "missing" in lists, but this unfortunately breaks a unit test in your package. Before the fix, the test was defunct and did not trigger any check at all.
Could you please check your assertions accordingly with the devel release of checkmate? I'll prepare a new release of checkmate and want to upload soon.
This is the relevant part of the check log:
── 1. Failure: Formula special 'func' works as expected (@test-specials.R#6) ──
Check on cumu1 isn't true.
Contains missing values (element 4)
── 2. Failure: Formula special 'concurrent' works as expected (@test-specials.R#
Check on ccr1 isn't true.
Contains missing values (element 3)
══ testthat results ═══════════════════════════════════════════════════════════
OK: 279 SKIPPED: 0 FAILED: 2
1. Failure: Formula special 'func' works as expected (@test-specials.R#6)
2. Failure: Formula special 'concurrent' works as expected (@test-specials.R#18)
Sorry for the inconvenience.
For example, make_newdata.ped
returns specified newdata, with first interval.
One would expect (and its probably more useful) a data set with all intervals specified originally
as well as newly specified data, such that predict fundtions (or add_
family of functions can be directly applied).
sample_info.ped
could be an exception, as this should return info about the data sample, not the PED data, which has different distribution when applying mean
for example.
Fro example, when
mod <- gam(... + te(karno, tend), ...)
then
mod %>% slice_1d(time, karno = c(20, 50, 90))
would create a tibble with predicted values over a grid of time
and specified karno
values.
E.g., tidy_smooth
, gg_smoth
etc.
gam
objects to also use bam
objectsbam
s@fabian-s In many functions we use dplyr functions, which however are not visible or apparent to the user. Nevertheless messages form left_join
are returned from a call to make_newdata
etc.
Do you know a way to suppress messages from other function calls within a function?
In some cases on the other hand, such messages could be helpful? How to decide and is it possible to suppress individual evaluations?
Currently, when constructing matrices for cumulative effects, integration weights are not calculated. Thus estimation will only be correct for exposure time-grids with 1 unit differences between exposure time points
Currently when using cumulative
, the constructed time matrix will have the name of the original
time
variable in Surv(time, status)
...
May be misleading, as what's actually constructed is a matrix column with entries t_j
This will make everything much smoother + avoids clogging commits with homepage updates:
r-lib/pkgdown#428
see also here: https://www.r-bloggers.com/continuous-deployment-of-package-documentation-with-pkgdown-and-travis-ci/
should also enable different pages for different branches?
ped %>% make_newdata( unique(sex), seq_range(age, 20), karno=c(20, 50, 60))
would create a new data set with all values of sex variable, age grid of 20 and karno values of 20, 50, and 60.
can be made relatively easy for simple case using purrr::cross
etc. but more complicated for data with cumulative effects
@fabian-s Ich habe mitbekommen, dass du jetzt Hexsticker experte bist (tidyfun) :-D
Ideas?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.