alexanderrobitzsch / sirt Goto Github PK

View Code? Open in Web Editor NEW

22.0 7.0 11.0 6.66 MB

Supplementary Item Response Theory Models

Home Page: https://alexanderrobitzsch.github.io/sirt/

R 85.46% C++ 14.54%

item-response-theory

sirt's People

Contributors

Stargazers

Watchers

Forkers

strategist922 sanverhavert rosefu79 xuxin636 iago-contributedforks hyunsooseol mmj9090 isoyturk seonghobae martinabader olivroy

sirt's Issues

TIBBLES ARE NO DATA FRAMES | lsem.estimate(): error when input is tibble data frame

If one inputs a tibble data frame, one gets a very cryptic error about data types:

> admixture <- lsem.estimate(
+   data = d %>% filter(!is.na(EUR)) %>% as_tibble(),
+   moderator = "EUR",
+   moderator.grid = (0.18:0.76),
+   lavmodel = tcpg.model,
+   h = 2,
+   meanstructure = TRUE,
+   residualize = FALSE,
+   standardized = TRUE
+ )
Error in is.data.frame(x) : 
  (list) object cannot be coerced to type 'double'

The bug is in the sub-call to sirt:::lsem_residualize(), because this relies on an implicit conversion that doesn't happen with tibbles:

> sirt:::lsem_residualize
function (data, moderator, moderator.grid, lavmodel, h = 1.1, 
    residualize = TRUE, eps = 1e-10, verbose = TRUE) 
{
    lavaanstr <- lavaan::lavaanify(lavmodel)
    vars <- unique(c(lavaanstr$rhs, lavaanstr$lhs))
    vars <- intersect(colnames(data), vars)
    data.mod <- data[, moderator]
    res <- lsem_local_weights(data.mod = data.mod, moderator.grid = moderator.grid, 
        h = h)

The line data[, moderator] returns a vector if it's a standard data frame but a data frame if it's a tibble, hence the data type error when lsem_local_weights() is called and tries to convert. It's a simple fix, just change it to data[[moderator]].

comparing rasch.jml to winsteps part 2

We have tried using rasch.jml with constraints to fix some items. We are finding that compared to Winsteps the item difficulties are off by +/- .02 logits. We've also tried tam.jml, which has slightly different numbers but has a similar margin of error.

Settings:

sirt::rasch.jml(dat, method="WLE", wle.adj=0.3, constraints=anchor)
tam.jml(tamdat, bias=FALSE, xsi.fixed=anchor)

We don't need the numbers to match Winsteps exactly but they need to be closer than they are now. If you can shed any insight on this it would be greatly appreciated.

lsem.estimate(): Error on missing data in moderator

This function gives a non-informative error when moderator has missing data.

#with missing
> admixture <- lsem.estimate(
+   data = d,
+   moderator = "EUR",
+   moderator.grid = (0.18:0.76),
+   lavmodel = tcpg.model,
+   h = 2,
+   meanstructure = TRUE,
+   residualize = FALSE,
+   standardized = TRUE
+ )
Error in density.default(data.mod, from = min(moderator.grid), to = max(moderator.grid),  : 
  'x' contains missing values

#filter missing
> admixture <- lsem.estimate(
+   data = d %>% filter(!is.na(EUR)),
+   moderator = "EUR",
+   moderator.grid = (0.18:0.76),
+   lavmodel = tcpg.model,
+   h = 2,
+   meanstructure = TRUE,
+   residualize = FALSE,
+   standardized = TRUE
+ )
Error in JAC %*% OUT : requires numeric/complex matrix/vector arguments
In addition: Warning messages:
1: In lav_model_vcov(lavmodel = lavmodel, lavsamplestats = lavsamplestats,  :
  lavaan WARNING:
    Could not compute standard errors! The information matrix could
    not be inverted. This may be a symptom that the model is not
    identified.
2: In lav_model_vcov(lavmodel = object@Model, lavsamplestats = object@SampleStats,  :
  lavaan WARNING:
    Could not compute standard errors! The information matrix could
    not be inverted. This may be a symptom that the model is not
    identified.
** Fit lavaan model
|*|
|-|
** Parameter summary 

Warning message:
In lav_model_vcov(lavmodel = lavmodel, lavsamplestats = lavsamplestats,  :
  lavaan WARNING:
    The variance-covariance matrix of the estimated parameters (vcov)
    does not appear to be positive definite! The smallest eigenvalue
    (= -2.254770e-05) is smaller than zero. This may be a symptom that
    the model is not identified.

You should add a check for missing data initially and throw an informative error.

Also note that the error Error in JAC %*% OUT : requires numeric/complex matrix/vector arguments does not actually halt the call, but isn't being suppressed either. It's related to the model issues in my case here.

availability of `regpolca`

Hi, Thanks for writing a useful package. I was searching the literature for work in SEMs with fusion penalties and encountered your paper on regularized LCMs (Robitzsch 2020).

I downloaded the development branch of sirt 3.12-41 (2022-02-25 23:07:24), but don't see regpolca in the exported namespace. Has its functionality been incorporated elsewhere or is it not yet visible in 3.12?

Thanks!

issues with multidimensional rasch.mirtlc and "MLC2"

Hi,
I keep getting this error when I run rasch.mirtlc with MLC2 and more than 1 dimension.
I tried with your syntax:
mod33 <- sirt::rasch.mirtlc( dat, dimensions=dimensions, theta.k=theta.k, modeltype="MLC2")

but I keep getting the same error.
Error in r.jk[, , gg] * log(pjk.M) : non-conformable arrays
Thanks,
Andrea

modifying mcmc.3pno.testlet to make it 3pl model

Hi Prof.Robitzsch,
What all changes need to be made to mcmc.3pno.testlet to make it a logistic testlet model? One would be to change the link function from pnorm to plogis in #draw latent responses Z (mcmc.3pno.testlet_alg.R). I guess the conditional distribution from which parameters are drawn would change but I don't know how. Can you point to the literature from which you coded MCMC Gibbs Sampler? Do you have plans to publish 3pl model for testlets?

Rasch infit mnsq error?

comparing rasch.jml to winsteps

I'm developing software for a client who want IRT item difficulties to match Winsteps. My understanding is that Winsteps centers item difficulty at 0. Is there a way to use rasch.jml to get numbers similar to Winsteps?

Continuous beta item response models

Is it possible to fit continuous beta item response models? The documentation for brm-Methods demonstrates simulation and model fitting for discretized beta item response models, and it is clear how to simulate continuous bounded beta item response models, but I seem to be unable to fit continuous beta item response models.

DETECT explaratory procedure `nclusters` argument

Hi again,

Sorry to bother you with so many questions. As I indicated before I am trying to make sure this package does what original DETECT program does.

There is an argument nclusters for the expl.detect() function. It seems that the user is expected to specify the number of clusters before running the analysis.

This is a little bit confusing. Based on my understanding, the purpose of running the exploratory DETECT procedure is to explore and decide the number of clusters. So, it is unexpected to ask the user to specify that before the analysis.

In the original DETECT program, you only specify the maximum possible cluster allowed (say 10). Then, when the exploratory DETECT runs, it searches the whole space for nclusters=2, nclusters=3, nclusters = 4, ..., nclusters= 10. In the end, it returns the solution with the maximum DETECT value. So, the user can estimate how many potential clusters with the items assigned to clusters for the optimized solution. It is like deciding the number of factors.

The way sirt runs the exploratory DETECT seems different. Please let me know if I am missing anything, or I am interpreting inaccurately.

Thank you.

using btm function with eps = 0 returns error

Executing the following line of code (with data from data.txt with specs: sep = ";", dec = "." )
Ability_sirt <- btm( data = Data, fix.eta = 0, fix.delta = 0, eps = 0 )
returns the error and warning

Error in while ((iter < maxiter) & (max.change > conv)) { : 
  missing value where TRUE/FALSE needed
In addition: Warning message:
In max(theta_ch, na.rm = TRUE) :
  no non-missing arguments to max; returning -Inf

The problem is located in the file btm.r on line 62. Calculating the mean of a vector containing INF values results in an INF value. If there is no epsilon correction (i.e. eps = 0) line 60 produces an -INF value (in the example data) because there is a 0 score.

A possible solution, in my opinion, is to insert the block from line 65 to line 76 between line 60 and 61 and add the line of code theta[elim_persons_index] <- NA after the line theta_elim <- theta[ elim_persons_index ] in the moved block.

That part of code should look as follows

theta <- 0.8 * stats::qlogis(propscore)
elim_persons <- FALSE
if (sum(propscore %in% c(0, 1)) > 0) {
  elim_persons <- TRUE
  elim_persons_index <- which(propscore %in% c(0, 1))
  theta_elim <- theta[elim_persons_index]
  theta[elim_persons_index] <- NA
  indicator_elim <- dat0
  indicator_elim[, 3] <- 0
  indicator_elim[indicator_elim[, 1] %in% elim_persons_index, 
                 3] <- 1
  indicator_elim[indicator_elim[, 2] %in% elim_persons_index, 
                 3] <- 1
}
if (center.theta) {
  theta <- theta - mean(theta, na.rm = T )
}
some.fixed.theta <- FALSE

data.txt

lavaan2mirt unexpectedly estimates a covariance matrix

Could you add an option to stop lavaan2mirt from estimating a covariance matrix when one is not specified in the lavaan model?

It seems like an odd default and it caused problems for me, exemplified by this cryptic error:

> lavaan2mirt(data, lavaan_model, poly.itemtype="graded", est.mirt=TRUE)
Error in chol.default(gp$gcov) : 
  the leading minor of order 1 is not positive definite

where lavaan_model is simply

[1] "# Latent variables\nML2 =~ dss_3 + dss_10 + dss_11 + dss_12 + dss_13 + dss_15 + dss_16 + dss_18 + dss_19 + dss_20\nML4 =~ dss_4 + dss_6 + dss_7 + dss_14 + dss_17\nML3 =~ dss_1 + dss_2 + dss_8 + dss_9\nML1 =~ dss_5"

The $mirt.syntax with est.mirt=FALSE is

> lavaan2mirt(data, lavaan_model %>% gsub("_", "", .), est.mirt=FALSE)$mirt.syntax
$x
     Type  Parameters                       
[1,] "ML2" "1,2,3,4,5,6,7,8,9,10"           
[2,] "ML4" "11,12,13,14,15"                 
[3,] "ML3" "16,17,18,19"                    
[4,] "ML1" "20"                             
[5,] "COV" "ML2*ML2,ML4*ML4,ML3*ML3,ML1*ML1"

I added the %>% gsub("_", "", .) to both the model and the dataset because I get Error in TAM::lavaanify.IRT(lavmodel, data = dat) : Please do not use _ in variable names in tamaan() function! without it.

Now that I look at it, it also seems odd that ML3 for example is [3,] "ML3" "16,17,18,19", when it is ML3 =~ dss_1 + dss_2 + dss_8 + dss_9 in lavaan form. The dataset has them ordered from col 1 to 20, so I would expect it to be [3,] "ML3" "1,2,8,9". Wouldn't it be easier to just the variable names since mirt.model accepts this? Then itemnames= could be passed as an argument too.

If it's any use, I generated the above lavaan model using psych::efa_to_cfa()

Thanks for the great and otherwise very useful package!

estimating error in function pgenlogis

hi, I am using the pgenlogis and xxirt funcion to estimate generalized logistic item response model (with the fisher tranformation you suggested), and something strange happened.
Using the same data, the estimating progress was fine on my own computer, but shows error as in if ((alpha1 > 0)) { :missing value where TRUE/FALSE needed, when I using another computer. This message indicates this parameter is NA. It is strange because I defined the parameter with xxirt_createDiscItem already. And when I set verbose = T in xxirt, this error jump out in the first iteration. When I increase the sample size to test this model, this error shows almost everytime.
I am really confuesd and can't find where the problem is for two days.
Thank you!

Calculations of scores - Example 2 - conf.detect()

I am working with a polytomous dataset with a 5-level LiKert scaled response (1=strongly disagree to 5=strongly agree), and I'm trying to follow the logic of example 2 for the conf.detect() function. I am interested in running the polyDETECT analysis tests. However, I'm not following how the scores were calculated?

score <- stats::qnorm( ( rowMeans( dat )+.5 )  / ( 30 + 1 ) )

I'm not familiar with this equation. Why is 0.5 added to the row means? And why is this product then divided by 30+1. I'm guessing the 30 comes from the 30 columns of the dat dataset?

Please add some additional details for this example in the sirt::conf.detect() help pages and/or a reference on this equation for calculating scores using stats::qnorm() and how to adjust this equation for other datasets.

Thank you for your time and clarification!

====================================
Full example code:

## Not run: 
#############################################################################
# EXAMPLE 2: Big 5 data set (polytomous data)
#############################################################################

# attach Big5 Dataset
data(data.big5)

# select 6 items of each dimension
dat <- data.big5
dat <- dat[, 1:30]

# estimate person score by simply using a transformed sum score
score <- stats::qnorm( ( rowMeans( dat )+.5 )  / ( 30 + 1 ) )

# extract item cluster (Big 5 dimensions)
itemcluster <- substring( colnames(dat), 1, 1 )

# DETECT Item cluster
detect1 <- sirt::conf.detect( data=dat, score=score, itemcluster=itemcluster )
  ##        unweighted weighted
  ## DETECT      1.256    1.256
  ## ASSI        0.384    0.384
  ## RATIO       0.597    0.597

# Exploratory DETECT
detect5 <- sirt::expl.detect( data=dat, score=score,
                     nclusters=9, N.est=nrow(dat)  )
  ## DETECT (unweighted)
  ## Optimal Cluster Size is  6  (Maximum of DETECT Index)
  ##   N.Cluster N.items N.est N.val      size.cluster DETECT.est ASSI.est RATIO.est
  ## 1         2      30   500     0              6-24      1.073    0.246     0.510
  ## 2         3      30   500     0           6-10-14      1.578    0.457     0.750
  ## 3         4      30   500     0         6-10-11-3      1.532    0.444     0.729
  ## 4         5      30   500     0        6-8-11-2-3      1.591    0.462     0.757
  ## 5         6      30   500     0       6-8-6-2-5-3      1.610    0.499     0.766
  ## 6         7      30   500     0     6-3-6-2-5-5-3      1.557    0.476     0.740
  ## 7         8      30   500     0   6-3-3-2-3-5-5-3      1.540    0.462     0.732
  ## 8         9      30   500     0 6-3-3-2-3-5-3-3-2      1.522    0.444     0.724

# Plot Cluster solution
pl <- graphics::plot( detect5$clusterfit, main="Cluster solution" )
stats::rect.hclust(detect5$clusterfit, k=6, border="red")

Documentation around ccov.np

Hi!

It would be great if there was more documentation around ccov.np, at least explaining the values and how it is calculated, since the referenced papers give distinct methods to estimate the conditional covariances.

Thank you!

DETECT Exploratory analysis

Hi,

In the explaratory analysis, it seems that the function does cross-validation no matter what the user does. It has an argument N.est which the user can specify the sample size for splitting the data if cross-validation is wanted. However, we should be able to run the exploratory procedure without cross-validation using the full sample (as something possible in the original software).

I was expecting this behavior from the function when I set the N.est=NULL which is the default setting. It seems, the function just does a 50-50 split and runs cross-validation although we set N.est=NULL.

Can you check this? Would you be able allow the user to run the exploratory procedure with the full sample without cross-validation when N.est=NULL?

I put below the example from the package.


data(data.timss)
dat <- data.timss$data
dat <- dat[, substring( colnames(dat),1,1)=="M" ]
iteminfo <- data.timss$item

expl.detect(data      = dat, 
            score     = rowSums(dat), 
            nclusters = 2,
            bwscale   = 1.1,
            N.est     = NULL,
            seed      = NULL,
            use_sum_score=TRUE)

Output indicates that the function automatically made a 50-50 split (N.est = 172, N.val=173).

DETECT (unweighted)

Optimal Cluster Size is  2  (Maximum of DETECT Index)

  N.Cluster N.items N.est N.val size.cluster DETECT.est ASSI.est RATIO.est
1         2      25   172   173        15-10        0.3     0.04     0.119
  MADCOV100.est MCOV100.est DETECT.val ASSI.val RATIO.val MADCOV100.val
1         2.522      -2.493      0.092    0.027     0.036         2.542
  MCOV100.val
1      -2.521
$detect.unweighted
    DETECT.val   ASSI.val  RATIO.val MADCOV100.val MCOV100.val
Cl2 0.09231698 0.02666667 0.03631547      2.542084   -2.520976

$detect.weighted
    DETECT.val   ASSI.val  RATIO.val MADCOV100.val MCOV100.val
Cl2 0.09231698 0.02666667 0.03631547      2.542084   -2.520976

Thank you.

rasch.jml. itemdiff.correction incorrect when using constraints

Presumably when using constraints the final values of itemdiff and itemdiff.correction should be equal to the constraint value. Using tam.jml with xsi.fixed and bias=TRUE and bias=FALSE the xsi values always match the xsi.fixed. We expect the same of sirt::rasch.jml for itemdiff and itemdiff.correction to match constraints.

MCMC for 2-factor MIRT/Ising

Hi Prof Robitzsch,

First this definitely isn't an issue with the package, but more of a general question... I'm working on a class project that involves MCMC methods for estimation in the Ising/2-factor MIRT model, and I came across your package as a result.

My question is (1) what MCMC method your 2pno() function uses, (2) how might one use the output from 2pno()to create a graph in qgraph, and (3) do you know how to extend MCMC to a 2-factor MIRT model?

Thanks so much! Very new to the psychometrics field...

bug: cfa_meas_inv() returns incorrect mod_pi lavaan-object (not so for mod_mi)

File Name: cfa_meas_inv.R

File Version: 0.157

Given the programming code and variable/object assignment in the function cfa_meas_inv(), the end of line 67 should return "mod_mi=mod0, mod_pi=mod2" as part of the list object, rather than "mod_mi=mod0, mod_pi=mod1" as is currently the case.

Explanation:
During the "while (free)"-loop, the objects partable1 and mimod1 get updated until hitting the final (tested) partial MI model. However, the last lavaan-object is not stored in object mod1, but in mod2 (see line 50). Thus, returning mod_pi as mod1 at the end of the function returns the full MI model (mod1 = mod_mi; see line 26) rather than the last (tested/accepted) partial MI model (mod2 = mod_pi).

Consequently, when comparing the model parameters via pars_mi and pars_pi, any model differences are evident (and the function appears functional). However, when further exploring model fit and model parameters straight from the returned lavaan-object mod_pi - e.g., with summary(mod$mod_pi, standardized = TRUE, fit.measures = TRUE - , then the reported mod_pi is incorrectly identical to mod_mi.

Computing standard errors with invariance.alignment()

I am following along to example 8 from the manual to compute standard errors with invariance.alignment() . To my understanding, the meth parameter isn't defined before the function call, so I tried calling the function with the default value of 1 as well as 2-4 as mentioned in the function documentation. I am getting the following error:
#> Error in (function (x, lambda, nu, overparam, eps, meth_) : argument "meth_" is missing, with no default #variance matrix and standard errors
I am pasting the reprex with the error below for your reference. I'm not sure if this is a bug or if I'm missing something, and I would appreciate if you could let me know how to proceed. Thank you very much in advance!

#############################################################################
# EXAMPLE 8: Computation of standard errors
#############################################################################
G <- 3 # number of groups
I <- 5 # number of items
# define lambda and nu parameters
lambda <- matrix(1, nrow=G, ncol=I)
nu <- matrix(0, nrow=G, ncol=I)
# define size of noninvariance
dif <- 1
mu1 <- c(0,.3,.8)
sigma1 <- c(1,1.25,1.1)

lambda[1,3] <- 1+dif*.4; nu[1,5] <- dif*.5

gg <- 2
lambda[gg,5] <- 1-.5*dif; nu[gg,1] <- -.5*dif

gg <- 3
lambda[gg,4] <- 1-.7*dif; nu[gg,2] <- -.5*dif

dat <- sirt::invariance_alignment_simulate(nu=nu, 
                                          lambda=lambda,
                                          err_var=1+0*lambda,
                                          mu=mu1, 
                                          sigma=sigma1, 
                                          N=500, 
                                          output="data", 
                                          exact=TRUE)
# estimate CFA
res <- sirt::invariance_alignment_cfa_config(dat=dat[,-1], group=dat$group)
#> Compute CFA for group 1 | model 2PM
#> Loading required namespace: MASS
#> Compute CFA for group 2 | model 2PM
#> Compute CFA for group 3 | model 2PM

eps <- .001 
align.pow <- 0.5*rep(1,2)
(lambda <- res$lambda)
#>     I1   I2   I3   I4    I5
#> 1 1.00 1.00 1.40 1.00 1.000
#> 2 1.25 1.25 1.25 1.25 0.625
#> 3 1.10 1.10 1.10 0.33 1.100
(nu <- res$nu)
#>              I1           I2          I3           I4   I5
#> 1 -4.218847e-17 4.263256e-17 2.04281e-17 1.554312e-17 0.50
#> 2 -2.000000e-01 3.000000e-01 3.00000e-01 3.000000e-01 0.15
#> 3  8.000000e-01 3.000000e-01 8.00000e-01 2.400000e-01 0.80

mod1 <- sirt::invariance.alignment(lambda=lambda,
                                  nu=nu,
                                  eps=eps,
                                  optimizer="optim",
                                  align.pow=align.pow,
                                  meth=1,
                                  vcov=res$vcov)
#> Error in (function (x, lambda, nu, overparam, eps, meth_) : argument "meth_" is missing, with no default
#variance matrix and standard errors
mod1$vcov
#> Error in eval(expr, envir, enclos): object 'mod1' not found
sqrt(diag(mod1$vcov))
#> Error in eval(expr, envir, enclos): object 'mod1' not found

^{Created on 2024-03-15 with reprex v2.1.0}

DETECT outcomes are inconsistent with the original DETECT software

Hi,

I just emailed you but then realize there is a github repository for the package. So, I thought it would be more efficient to put this here as well.

I am struggling to reproduce the results. I compare the DETECT output from my previous research (using the original software) by analyzing the same dataset using the sirt package, and the results were quite different. Also, I analyzed the TIMSS data using the original DETECT program, and again the results were different. I attached the DETECT output from the original software using the same TIMSS dataset available in your package.

detect.txt

I copied the R syntax I used below (same as in your example, the only difference is that I am using the sum scores for conditioning).

data(data.timss)
dat <- data.timss$data
dat <- dat[, substring( colnames(dat),1,1)=="M" ]
iteminfo <- data.timss$item

a <- conf.detect(data=dat,
            score=rowSums(dat),
            itemcluster=iteminfo$Content.Domain)

summary(a)

-----------------------------------------------------------------
sirt 3.9-4 (2020-02-17 12:57:09)
R version 4.0.2 (2020-06-22) x86_64, mingw32 | nodename=COE-L-102R-WL01 | login=cengiz

Call:
conf.detect(data = dat, score = rowSums(dat), itemcluster = iteminfo$Content.Domain)

Confirmatory DETECT Analysis with 3 Item Clusters
Bandwidth Scale: 1.1
-----------------------------------------------------------------
Dimensionality Statistics
          unweighted weighted
DETECT         0.311    0.311
ASSI           0.273    0.273
RATIO          0.350    0.350
MADCOV100      0.888    0.888
MCOV100       -0.551   -0.551

As you can see the statistics are different. This is just an example for a confirmatory DETECT for the dataset available in your package. I did also try different datasets I have. The results are always (substantially) different for both confirmatory and exploratory DETECT procedure.

I understand that there may be nuances in algorithms used; however, these differences are not something we can ignore. Having these in R is great and provides a number of opportunities for research and make it easy to teach. On the other hand, before I use these functions for my research and recommend it to my students, I wonder if you have any insights about why the results are different, or what adjustments (if possible) to make so the results from R function is similar to the ones we get from the original DETECT program.

Thank you so much for your time and effort in developing this package.

gamma.testlet

Hi!
Thank you for your efforts!
Is there some reason to not include the gamma.testlet vector to the results of the mcmc.3pno.testlet function?