d-score / dscore Goto Github PK

View Code? Open in Web Editor NEW

8.0 8.0 0.0 338.07 MB

D-score for Child Development

Home Page: https://d-score.org/dscore/

License: GNU Affero General Public License v3.0

R 85.54% C++ 5.94% TeX 5.70% CSS 2.82%

child-development d-score daz developmental-trajectories growth-charts rasch-model

dscore's People

Contributors

Stargazers

Watchers

dscore's Issues

dscore estimation under different transformations

I would expect that ability estimation is insensitive to a linear transformation of the ability scale, but such turns out to be the case only approximately. The following script compares two scales (one D-score and one logit), that produces (slightly) different estimates.

transform <- c(41.10, 2.23)

# ability
data <- data.frame(
  age = rep(round(21/365.25, 4), 10),
  GSFIXEYE = c(NA, NA, 0, 0, 0, 1, 0, 1, 1, 1),
  GSRSPCH =  c(NA, NA, 0, 0, 1, 0, 1, 0, 1, 1),
  GSMLEG =   c(NA,  0, 0, 1, 0, 0, 1, 1, 0, 1))
items <- c("GSFIXEYE", "GSRSPCH", "GSMLEG")

keyd <- data.frame(item = items,
                   delta = gettau(items = items),
                   stringsAsFactors = FALSE)

zd <- ability(data, items = items, dec = 4, metric = "dscore", 
              key = keyd)$b

qpl <- ((-10:100) - transform[1]) / transform[2]
keyl <- data.frame(item = items,
                   delta = gettau(items = items),
                   stringsAsFactors = FALSE)
keyl$delta <- (keyl$delta - transform[1]) / transform[2]
zl <- ability(data, items = items, dec = 4, transform = transform, 
              qp = qpl, metric = "logit", key = keyl)$b

test_that("logit and dscore are identical", {
  expect_identical(zl, (zd - transform[1])/transform[2])
})

When tracking down differences between the two methods, I found that taking out the division (qp[2] - qp[1]) in normalize() will produce the same prior. After that, the next divergence appears in cpc <- t(exp(outer(0:m, qp) + c(0, -cumsum(delta)))) in posterior(). This suggest that the exponential transform here introduces instability. I have no time to further dive in and smooth out differences, and have put back (qp[2] - qp[1]) into normalise(), but evidently this is somewhat fishy.

Some options to pursue:

Perhaps we can bypass quadrature methods altogether, and use a normal approximation everywhere. The prior is normal anyway, and I remember seeing that a prior normal in combination with a logistic model produces a normal posterior (Albert and Chibb? Gelman's BDA book?). If so, this would considerably speed up and simplify calculations.
Study what happens in packages ltm, sirt or similar packages that can estimate EAP.
Choose one scale, derive the other by a linear transform. I would then choose for the D-score scale, and derive the logic form from that
Ot just live with the difference? It's not big, and in practice it may not matter.

Suggestion for creation of new renaming functions to facilitate GSED D-score calculations

With release of GSED (and particularly the HF), there will be increasing demand to score data, and particularly by less technically savvy users. The app currently makes the process of scoring GSED data much simpler, but still requires the user to rename variable to the GSED 9-character variable name lexicon prior to doing so. To derive D-scores from GSED-HF data, it also requires some slightly complicated data manipulation due to the nature of overlapping variables on different forms which must be named differently.

To simplify this process, I'm proposing some changes/enhancements to the app and package:

The addition of a rename_sf_lf function which is similar to the rename_gcdg_gsed function. This function would convert standardized SF and LF variable names (that come directly from the app) and convert them to the gsed lexicon. This would be a very simple function that could be added to the Shiny app to allow users to upload data scored from the tool with minimal manipulation.
The addition of a rename_hf function which again is similar to the rename_gcdg_gsed function. The difference for this function would be that it also restructures/combines variables across age forms. For example, HF001 (gh1sec001) appears on both the 0-2 month and the 3-5 month age band forms. It's not yet clear how these will be differentiated in published materials (something like age00_02_hf001 and age03_05_hf001, but prior to scoring these duplicated variables should be collapsed into a single variable and renamed accordingly.

Alternative names for D-score and sem estimates in the logit scale

Separate issues from #46

Currently both metric = "dscore" (default) and metric = "logit" create a data frame with names d, sem and daz. Depending on the metric option, the columns called d can contain either a score in the D-score scale (usually positive values) or in the logit scale (both positive and negative values). It would be better to use different column names for these, to avoid confusion. The same holds for sem. The daz in the logit scale has no meaning and should be removed.

Originally posted by @stefvanbuuren in #46 (comment)

Wrong number of rows in dscore_posterior()

The dscore_posterior() function should return the same number of rows as the input data. Here's a case where that's not true.

> library(dscore)
> data_david <- data.frame(
   age = rep(15/12, 6),
   ddifmd001 = c(NA,  1,  1,  1,  1,  1),
   ddifmm012 = c(NA, NA,  1,  1,  1,  1),
   ddicmm037 = c(NA, NA, NA,  1,  1, 1),
   ddigmm066 = c(NA, NA, NA, NA,  1,  1),
   ddigmm067 = c(NA, NA, NA, NA, NA, NA)
 )
> post <- dscore_posterior(data = data_david, qp = -10:80, prior_mean = ".dutch")
> nrow(data_david)
[1] 6
> nrow(post)
[1] 5

Append selected columns from the original data to the result of dscore()

The dscore() function returns the same number of rows in the same sequence as the input argument data.

A user requested to append selected columns from the input data to the result of the dscore() function. These additional columns are useful in preserving ID and prevent merging errors in cases where the user sends a subset of rows from a larger data, e.g. as in dscore(data[idx, ], ...).

Add key field to builtin_references

Reference values are valid only for a specific key. For example, reference pop = "phase1" requires key = "gsed2212". It would be useful to attach the required key name to builtin_references.

What needs to be done:

Extending the data-raw/R/save_builtin_references.R with a key field
Check any automatic reference choices made within the package, and simplify where needed
Devise a naming policy for naming the pop field
Extend tests, when needed
Update documentation
Update version

d-score / dscore Goto Github PK

dscore's People

Contributors

Stargazers

Watchers

dscore's Issues

dscore estimation under different transformations

Suggestion for creation of new renaming functions to facilitate GSED D-score calculations

Alternative names for D-score and sem estimates in the logit scale

Wrong number of rows in dscore_posterior()

Append selected columns from the original data to the result of dscore()

Add key field to builtin_references

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent