Git Product home page Git Product logo

dscore's People

Contributors

arjanhuizing avatar iriseekhout avatar stefvanbuuren avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

dscore's Issues

dscore estimation under different transformations

I would expect that ability estimation is insensitive to a linear transformation of the ability scale, but such turns out to be the case only approximately. The following script compares two scales (one D-score and one logit), that produces (slightly) different estimates.

transform <- c(41.10, 2.23)

# ability
data <- data.frame(
  age = rep(round(21/365.25, 4), 10),
  GSFIXEYE = c(NA, NA, 0, 0, 0, 1, 0, 1, 1, 1),
  GSRSPCH =  c(NA, NA, 0, 0, 1, 0, 1, 0, 1, 1),
  GSMLEG =   c(NA,  0, 0, 1, 0, 0, 1, 1, 0, 1))
items <- c("GSFIXEYE", "GSRSPCH", "GSMLEG")

keyd <- data.frame(item = items,
                   delta = gettau(items = items),
                   stringsAsFactors = FALSE)

zd <- ability(data, items = items, dec = 4, metric = "dscore", 
              key = keyd)$b

qpl <- ((-10:100) - transform[1]) / transform[2]
keyl <- data.frame(item = items,
                   delta = gettau(items = items),
                   stringsAsFactors = FALSE)
keyl$delta <- (keyl$delta - transform[1]) / transform[2]
zl <- ability(data, items = items, dec = 4, transform = transform, 
              qp = qpl, metric = "logit", key = keyl)$b

test_that("logit and dscore are identical", {
  expect_identical(zl, (zd - transform[1])/transform[2])
})

When tracking down differences between the two methods, I found that taking out the division (qp[2] - qp[1]) in normalize() will produce the same prior. After that, the next divergence appears in cpc <- t(exp(outer(0:m, qp) + c(0, -cumsum(delta)))) in posterior(). This suggest that the exponential transform here introduces instability. I have no time to further dive in and smooth out differences, and have put back (qp[2] - qp[1]) into normalise(), but evidently this is somewhat fishy.

Some options to pursue:

  1. Perhaps we can bypass quadrature methods altogether, and use a normal approximation everywhere. The prior is normal anyway, and I remember seeing that a prior normal in combination with a logistic model produces a normal posterior (Albert and Chibb? Gelman's BDA book?). If so, this would considerably speed up and simplify calculations.
  2. Study what happens in packages ltm, sirt or similar packages that can estimate EAP.
  3. Choose one scale, derive the other by a linear transform. I would then choose for the D-score scale, and derive the logic form from that
  4. Ot just live with the difference? It's not big, and in practice it may not matter.

Suggestion for creation of new renaming functions to facilitate GSED D-score calculations

With release of GSED (and particularly the HF), there will be increasing demand to score data, and particularly by less technically savvy users. The app currently makes the process of scoring GSED data much simpler, but still requires the user to rename variable to the GSED 9-character variable name lexicon prior to doing so. To derive D-scores from GSED-HF data, it also requires some slightly complicated data manipulation due to the nature of overlapping variables on different forms which must be named differently.

To simplify this process, I'm proposing some changes/enhancements to the app and package:

  1. The addition of a rename_sf_lf function which is similar to the rename_gcdg_gsed function. This function would convert standardized SF and LF variable names (that come directly from the app) and convert them to the gsed lexicon. This would be a very simple function that could be added to the Shiny app to allow users to upload data scored from the tool with minimal manipulation.

  2. The addition of a rename_hf function which again is similar to the rename_gcdg_gsed function. The difference for this function would be that it also restructures/combines variables across age forms. For example, HF001 (gh1sec001) appears on both the 0-2 month and the 3-5 month age band forms. It's not yet clear how these will be differentiated in published materials (something like age00_02_hf001 and age03_05_hf001, but prior to scoring these duplicated variables should be collapsed into a single variable and renamed accordingly.

Alternative names for D-score and sem estimates in the logit scale

Separate issues from #46

Currently both metric = "dscore" (default) and metric = "logit" create a data frame with names d, sem and daz. Depending on the metric option, the columns called d can contain either a score in the D-score scale (usually positive values) or in the logit scale (both positive and negative values). It would be better to use different column names for these, to avoid confusion. The same holds for sem. The daz in the logit scale has no meaning and should be removed.

Originally posted by @stefvanbuuren in #46 (comment)

Wrong number of rows in dscore_posterior()

The dscore_posterior() function should return the same number of rows as the input data. Here's a case where that's not true.

> library(dscore)
> data_david <- data.frame(
   age = rep(15/12, 6),
   ddifmd001 = c(NA,  1,  1,  1,  1,  1),
   ddifmm012 = c(NA, NA,  1,  1,  1,  1),
   ddicmm037 = c(NA, NA, NA,  1,  1, 1),
   ddigmm066 = c(NA, NA, NA, NA,  1,  1),
   ddigmm067 = c(NA, NA, NA, NA, NA, NA)
 )
> post <- dscore_posterior(data = data_david, qp = -10:80, prior_mean = ".dutch")
> nrow(data_david)
[1] 6
> nrow(post)
[1] 5

Append selected columns from the original data to the result of dscore()

The dscore() function returns the same number of rows in the same sequence as the input argument data.

A user requested to append selected columns from the input data to the result of the dscore() function. These additional columns are useful in preserving ID and prevent merging errors in cases where the user sends a subset of rows from a larger data, e.g. as in dscore(data[idx, ], ...).

Add key field to builtin_references

Reference values are valid only for a specific key. For example, reference pop = "phase1" requires key = "gsed2212". It would be useful to attach the required key name to builtin_references.

What needs to be done:

  • Extending the data-raw/R/save_builtin_references.R with a key field
  • Check any automatic reference choices made within the package, and simplify where needed
  • Devise a naming policy for naming the pop field
  • Extend tests, when needed
  • Update documentation
  • Update version

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.