alexanderrobitzsch / sirt Goto Github PK
View Code? Open in Web Editor NEWSupplementary Item Response Theory Models
Home Page: https://alexanderrobitzsch.github.io/sirt/
Supplementary Item Response Theory Models
Home Page: https://alexanderrobitzsch.github.io/sirt/
If one inputs a tibble data frame, one gets a very cryptic error about data types:
> admixture <- lsem.estimate(
+ data = d %>% filter(!is.na(EUR)) %>% as_tibble(),
+ moderator = "EUR",
+ moderator.grid = (0.18:0.76),
+ lavmodel = tcpg.model,
+ h = 2,
+ meanstructure = TRUE,
+ residualize = FALSE,
+ standardized = TRUE
+ )
Error in is.data.frame(x) :
(list) object cannot be coerced to type 'double'
The bug is in the sub-call to sirt:::lsem_residualize()
, because this relies on an implicit conversion that doesn't happen with tibbles:
> sirt:::lsem_residualize
function (data, moderator, moderator.grid, lavmodel, h = 1.1,
residualize = TRUE, eps = 1e-10, verbose = TRUE)
{
lavaanstr <- lavaan::lavaanify(lavmodel)
vars <- unique(c(lavaanstr$rhs, lavaanstr$lhs))
vars <- intersect(colnames(data), vars)
data.mod <- data[, moderator]
res <- lsem_local_weights(data.mod = data.mod, moderator.grid = moderator.grid,
h = h)
The line data[, moderator]
returns a vector if it's a standard data frame but a data frame if it's a tibble, hence the data type error when lsem_local_weights()
is called and tries to convert. It's a simple fix, just change it to data[[moderator]]
.
We have tried using rasch.jml with constraints to fix some items. We are finding that compared to Winsteps the item difficulties are off by +/- .02 logits. We've also tried tam.jml, which has slightly different numbers but has a similar margin of error.
Settings:
sirt::rasch.jml(dat, method="WLE", wle.adj=0.3, constraints=anchor)
tam.jml(tamdat, bias=FALSE, xsi.fixed=anchor)
We don't need the numbers to match Winsteps exactly but they need to be closer than they are now. If you can shed any insight on this it would be greatly appreciated.
This function gives a non-informative error when moderator has missing data.
#with missing
> admixture <- lsem.estimate(
+ data = d,
+ moderator = "EUR",
+ moderator.grid = (0.18:0.76),
+ lavmodel = tcpg.model,
+ h = 2,
+ meanstructure = TRUE,
+ residualize = FALSE,
+ standardized = TRUE
+ )
Error in density.default(data.mod, from = min(moderator.grid), to = max(moderator.grid), :
'x' contains missing values
#filter missing
> admixture <- lsem.estimate(
+ data = d %>% filter(!is.na(EUR)),
+ moderator = "EUR",
+ moderator.grid = (0.18:0.76),
+ lavmodel = tcpg.model,
+ h = 2,
+ meanstructure = TRUE,
+ residualize = FALSE,
+ standardized = TRUE
+ )
Error in JAC %*% OUT : requires numeric/complex matrix/vector arguments
In addition: Warning messages:
1: In lav_model_vcov(lavmodel = lavmodel, lavsamplestats = lavsamplestats, :
lavaan WARNING:
Could not compute standard errors! The information matrix could
not be inverted. This may be a symptom that the model is not
identified.
2: In lav_model_vcov(lavmodel = object@Model, lavsamplestats = object@SampleStats, :
lavaan WARNING:
Could not compute standard errors! The information matrix could
not be inverted. This may be a symptom that the model is not
identified.
** Fit lavaan model
|*|
|-|
** Parameter summary
Warning message:
In lav_model_vcov(lavmodel = lavmodel, lavsamplestats = lavsamplestats, :
lavaan WARNING:
The variance-covariance matrix of the estimated parameters (vcov)
does not appear to be positive definite! The smallest eigenvalue
(= -2.254770e-05) is smaller than zero. This may be a symptom that
the model is not identified.
You should add a check for missing data initially and throw an informative error.
Also note that the error Error in JAC %*% OUT : requires numeric/complex matrix/vector arguments
does not actually halt the call, but isn't being suppressed either. It's related to the model issues in my case here.
Hi, Thanks for writing a useful package. I was searching the literature for work in SEMs with fusion penalties and encountered your paper on regularized LCMs (Robitzsch 2020).
I downloaded the development branch of sirt 3.12-41 (2022-02-25 23:07:24)
, but don't see regpolca
in the exported namespace. Has its functionality been incorporated elsewhere or is it not yet visible in 3.12?
Thanks!
Hi,
I keep getting this error when I run rasch.mirtlc with MLC2 and more than 1 dimension.
I tried with your syntax:
mod33 <- sirt::rasch.mirtlc( dat, dimensions=dimensions, theta.k=theta.k, modeltype="MLC2")
but I keep getting the same error.
Error in r.jk[, , gg] * log(pjk.M) : non-conformable arrays
Thanks,
Andrea
Hi Prof.Robitzsch,
What all changes need to be made to mcmc.3pno.testlet
to make it a logistic testlet model? One would be to change the link function from pnorm to plogis in #draw latent responses Z (mcmc.3pno.testlet_alg.R). I guess the conditional distribution from which parameters are drawn would change but I don't know how. Can you point to the literature from which you coded MCMC Gibbs Sampler? Do you have plans to publish 3pl model for testlets?
I'm developing software for a client who want IRT item difficulties to match Winsteps. My understanding is that Winsteps centers item difficulty at 0. Is there a way to use rasch.jml
to get numbers similar to Winsteps?
Is it possible to fit continuous beta item response models? The documentation for brm-Methods
demonstrates simulation and model fitting for discretized beta item response models, and it is clear how to simulate continuous bounded beta item response models, but I seem to be unable to fit continuous beta item response models.
Hi again,
Sorry to bother you with so many questions. As I indicated before I am trying to make sure this package does what original DETECT program does.
There is an argument nclusters
for the expl.detect()
function. It seems that the user is expected to specify the number of clusters before running the analysis.
This is a little bit confusing. Based on my understanding, the purpose of running the exploratory DETECT procedure is to explore and decide the number of clusters. So, it is unexpected to ask the user to specify that before the analysis.
In the original DETECT program, you only specify the maximum possible cluster allowed (say 10). Then, when the exploratory DETECT runs, it searches the whole space for nclusters=2, nclusters=3, nclusters = 4, ..., nclusters= 10. In the end, it returns the solution with the maximum DETECT value. So, the user can estimate how many potential clusters with the items assigned to clusters for the optimized solution. It is like deciding the number of factors.
The way sirt
runs the exploratory DETECT seems different. Please let me know if I am missing anything, or I am interpreting inaccurately.
Thank you.
Executing the following line of code (with data from data.txt with specs: sep = ";", dec = "." )
Ability_sirt <- btm( data = Data, fix.eta = 0, fix.delta = 0, eps = 0 )
returns the error and warning
Error in while ((iter < maxiter) & (max.change > conv)) { :
missing value where TRUE/FALSE needed
In addition: Warning message:
In max(theta_ch, na.rm = TRUE) :
no non-missing arguments to max; returning -Inf
The problem is located in the file btm.r on line 62. Calculating the mean of a vector containing INF values results in an INF value. If there is no epsilon correction (i.e. eps = 0
) line 60 produces an -INF value (in the example data) because there is a 0 score.
A possible solution, in my opinion, is to insert the block from line 65 to line 76 between line 60 and 61 and add the line of code theta[elim_persons_index] <- NA
after the line theta_elim <- theta[ elim_persons_index ]
in the moved block.
That part of code should look as follows
theta <- 0.8 * stats::qlogis(propscore)
elim_persons <- FALSE
if (sum(propscore %in% c(0, 1)) > 0) {
elim_persons <- TRUE
elim_persons_index <- which(propscore %in% c(0, 1))
theta_elim <- theta[elim_persons_index]
theta[elim_persons_index] <- NA
indicator_elim <- dat0
indicator_elim[, 3] <- 0
indicator_elim[indicator_elim[, 1] %in% elim_persons_index,
3] <- 1
indicator_elim[indicator_elim[, 2] %in% elim_persons_index,
3] <- 1
}
if (center.theta) {
theta <- theta - mean(theta, na.rm = T )
}
some.fixed.theta <- FALSE
Could you add an option to stop lavaan2mirt
from estimating a covariance matrix when one is not specified in the lavaan model?
It seems like an odd default and it caused problems for me, exemplified by this cryptic error:
> lavaan2mirt(data, lavaan_model, poly.itemtype="graded", est.mirt=TRUE)
Error in chol.default(gp$gcov) :
the leading minor of order 1 is not positive definite
where lavaan_model is simply
[1] "# Latent variables\nML2 =~ dss_3 + dss_10 + dss_11 + dss_12 + dss_13 + dss_15 + dss_16 + dss_18 + dss_19 + dss_20\nML4 =~ dss_4 + dss_6 + dss_7 + dss_14 + dss_17\nML3 =~ dss_1 + dss_2 + dss_8 + dss_9\nML1 =~ dss_5"
The $mirt.syntax
with est.mirt=FALSE
is
> lavaan2mirt(data, lavaan_model %>% gsub("_", "", .), est.mirt=FALSE)$mirt.syntax
$x
Type Parameters
[1,] "ML2" "1,2,3,4,5,6,7,8,9,10"
[2,] "ML4" "11,12,13,14,15"
[3,] "ML3" "16,17,18,19"
[4,] "ML1" "20"
[5,] "COV" "ML2*ML2,ML4*ML4,ML3*ML3,ML1*ML1"
I added the %>% gsub("_", "", .)
to both the model and the dataset because I get Error in TAM::lavaanify.IRT(lavmodel, data = dat) : Please do not use _ in variable names in tamaan() function!
without it.
Now that I look at it, it also seems odd that ML3 for example is [3,] "ML3" "16,17,18,19"
, when it is ML3 =~ dss_1 + dss_2 + dss_8 + dss_9
in lavaan form. The dataset has them ordered from col 1 to 20, so I would expect it to be [3,] "ML3" "1,2,8,9"
. Wouldn't it be easier to just the variable names since mirt.model accepts this? Then itemnames=
could be passed as an argument too.
If it's any use, I generated the above lavaan model using psych::efa_to_cfa()
Thanks for the great and otherwise very useful package!
hi, I am using the pgenlogis
and xxirt
funcion to estimate generalized logistic item response model (with the fisher tranformation you suggested), and something strange happened.
Using the same data, the estimating progress was fine on my own computer, but shows error as in if ((alpha1 > 0)) { :missing value where TRUE/FALSE needed
, when I using another computer. This message indicates this parameter is NA
. It is strange because I defined the parameter with xxirt_createDiscItem
already. And when I set verbose = T
in xxirt
, this error jump out in the first iteration. When I increase the sample size to test this model, this error shows almost everytime.
I am really confuesd and can't find where the problem is for two days.
Thank you!
I am working with a polytomous dataset with a 5-level LiKert scaled response (1=strongly disagree to 5=strongly agree), and I'm trying to follow the logic of example 2 for the conf.detect()
function. I am interested in running the polyDETECT analysis tests. However, I'm not following how the scores were calculated?
score <- stats::qnorm( ( rowMeans( dat )+.5 ) / ( 30 + 1 ) )
I'm not familiar with this equation. Why is 0.5 added to the row means? And why is this product then divided by 30+1. I'm guessing the 30 comes from the 30 columns of the dat
dataset?
Please add some additional details for this example in the sirt::conf.detect()
help pages and/or a reference on this equation for calculating scores using stats::qnorm()
and how to adjust this equation for other datasets.
Thank you for your time and clarification!
====================================
Full example code:
## Not run:
#############################################################################
# EXAMPLE 2: Big 5 data set (polytomous data)
#############################################################################
# attach Big5 Dataset
data(data.big5)
# select 6 items of each dimension
dat <- data.big5
dat <- dat[, 1:30]
# estimate person score by simply using a transformed sum score
score <- stats::qnorm( ( rowMeans( dat )+.5 ) / ( 30 + 1 ) )
# extract item cluster (Big 5 dimensions)
itemcluster <- substring( colnames(dat), 1, 1 )
# DETECT Item cluster
detect1 <- sirt::conf.detect( data=dat, score=score, itemcluster=itemcluster )
## unweighted weighted
## DETECT 1.256 1.256
## ASSI 0.384 0.384
## RATIO 0.597 0.597
# Exploratory DETECT
detect5 <- sirt::expl.detect( data=dat, score=score,
nclusters=9, N.est=nrow(dat) )
## DETECT (unweighted)
## Optimal Cluster Size is 6 (Maximum of DETECT Index)
## N.Cluster N.items N.est N.val size.cluster DETECT.est ASSI.est RATIO.est
## 1 2 30 500 0 6-24 1.073 0.246 0.510
## 2 3 30 500 0 6-10-14 1.578 0.457 0.750
## 3 4 30 500 0 6-10-11-3 1.532 0.444 0.729
## 4 5 30 500 0 6-8-11-2-3 1.591 0.462 0.757
## 5 6 30 500 0 6-8-6-2-5-3 1.610 0.499 0.766
## 6 7 30 500 0 6-3-6-2-5-5-3 1.557 0.476 0.740
## 7 8 30 500 0 6-3-3-2-3-5-5-3 1.540 0.462 0.732
## 8 9 30 500 0 6-3-3-2-3-5-3-3-2 1.522 0.444 0.724
# Plot Cluster solution
pl <- graphics::plot( detect5$clusterfit, main="Cluster solution" )
stats::rect.hclust(detect5$clusterfit, k=6, border="red")
Hi!
It would be great if there was more documentation around ccov.np
, at least explaining the values and how it is calculated, since the referenced papers give distinct methods to estimate the conditional covariances.
Thank you!
Hi,
In the explaratory analysis, it seems that the function does cross-validation no matter what the user does. It has an argument N.est which the user can specify the sample size for splitting the data if cross-validation is wanted. However, we should be able to run the exploratory procedure without cross-validation using the full sample (as something possible in the original software).
I was expecting this behavior from the function when I set the N.est=NULL
which is the default setting. It seems, the function just does a 50-50 split and runs cross-validation although we set N.est=NULL
.
Can you check this? Would you be able allow the user to run the exploratory procedure with the full sample without cross-validation when N.est=NULL
?
I put below the example from the package.
data(data.timss)
dat <- data.timss$data
dat <- dat[, substring( colnames(dat),1,1)=="M" ]
iteminfo <- data.timss$item
expl.detect(data = dat,
score = rowSums(dat),
nclusters = 2,
bwscale = 1.1,
N.est = NULL,
seed = NULL,
use_sum_score=TRUE)
Output indicates that the function automatically made a 50-50 split (N.est = 172, N.val=173).
DETECT (unweighted)
Optimal Cluster Size is 2 (Maximum of DETECT Index)
N.Cluster N.items N.est N.val size.cluster DETECT.est ASSI.est RATIO.est
1 2 25 172 173 15-10 0.3 0.04 0.119
MADCOV100.est MCOV100.est DETECT.val ASSI.val RATIO.val MADCOV100.val
1 2.522 -2.493 0.092 0.027 0.036 2.542
MCOV100.val
1 -2.521
$detect.unweighted
DETECT.val ASSI.val RATIO.val MADCOV100.val MCOV100.val
Cl2 0.09231698 0.02666667 0.03631547 2.542084 -2.520976
$detect.weighted
DETECT.val ASSI.val RATIO.val MADCOV100.val MCOV100.val
Cl2 0.09231698 0.02666667 0.03631547 2.542084 -2.520976
Thank you.
Presumably when using constraints the final values of itemdiff and itemdiff.correction should be equal to the constraint value. Using tam.jml with xsi.fixed and bias=TRUE and bias=FALSE the xsi values always match the xsi.fixed. We expect the same of sirt::rasch.jml
for itemdiff and itemdiff.correction to match constraints.
Hi Prof Robitzsch,
First this definitely isn't an issue with the package, but more of a general question... I'm working on a class project that involves MCMC methods for estimation in the Ising/2-factor MIRT model, and I came across your package as a result.
My question is (1) what MCMC method your 2pno()
function uses, (2) how might one use the output from 2pno()
to create a graph in qgraph
, and (3) do you know how to extend MCMC to a 2-factor MIRT model?
Thanks so much! Very new to the psychometrics field...
Given the programming code and variable/object assignment in the function cfa_meas_inv(), the end of line 67 should return "mod_mi=mod0, mod_pi=mod2" as part of the list object, rather than "mod_mi=mod0, mod_pi=mod1" as is currently the case.
Explanation:
During the "while (free)"-loop, the objects partable1 and mimod1 get updated until hitting the final (tested) partial MI model. However, the last lavaan-object is not stored in object mod1, but in mod2 (see line 50). Thus, returning mod_pi as mod1 at the end of the function returns the full MI model (mod1 = mod_mi; see line 26) rather than the last (tested/accepted) partial MI model (mod2 = mod_pi).
Consequently, when comparing the model parameters via pars_mi and pars_pi, any model differences are evident (and the function appears functional). However, when further exploring model fit and model parameters straight from the returned lavaan-object mod_pi - e.g., with summary(mod$mod_pi, standardized = TRUE, fit.measures = TRUE - , then the reported mod_pi is incorrectly identical to mod_mi.
mb
I am following along to example 8 from the manual to compute standard errors with invariance.alignment() . To my understanding, the meth parameter isn't defined before the function call, so I tried calling the function with the default value of 1 as well as 2-4 as mentioned in the function documentation. I am getting the following error:
#> Error in (function (x, lambda, nu, overparam, eps, meth_) : argument "meth_" is missing, with no default #variance matrix and standard errors
I am pasting the reprex with the error below for your reference. I'm not sure if this is a bug or if I'm missing something, and I would appreciate if you could let me know how to proceed. Thank you very much in advance!
#############################################################################
# EXAMPLE 8: Computation of standard errors
#############################################################################
G <- 3 # number of groups
I <- 5 # number of items
# define lambda and nu parameters
lambda <- matrix(1, nrow=G, ncol=I)
nu <- matrix(0, nrow=G, ncol=I)
# define size of noninvariance
dif <- 1
mu1 <- c(0,.3,.8)
sigma1 <- c(1,1.25,1.1)
lambda[1,3] <- 1+dif*.4; nu[1,5] <- dif*.5
gg <- 2
lambda[gg,5] <- 1-.5*dif; nu[gg,1] <- -.5*dif
gg <- 3
lambda[gg,4] <- 1-.7*dif; nu[gg,2] <- -.5*dif
dat <- sirt::invariance_alignment_simulate(nu=nu,
lambda=lambda,
err_var=1+0*lambda,
mu=mu1,
sigma=sigma1,
N=500,
output="data",
exact=TRUE)
# estimate CFA
res <- sirt::invariance_alignment_cfa_config(dat=dat[,-1], group=dat$group)
#> Compute CFA for group 1 | model 2PM
#> Loading required namespace: MASS
#> Compute CFA for group 2 | model 2PM
#> Compute CFA for group 3 | model 2PM
eps <- .001
align.pow <- 0.5*rep(1,2)
(lambda <- res$lambda)
#> I1 I2 I3 I4 I5
#> 1 1.00 1.00 1.40 1.00 1.000
#> 2 1.25 1.25 1.25 1.25 0.625
#> 3 1.10 1.10 1.10 0.33 1.100
(nu <- res$nu)
#> I1 I2 I3 I4 I5
#> 1 -4.218847e-17 4.263256e-17 2.04281e-17 1.554312e-17 0.50
#> 2 -2.000000e-01 3.000000e-01 3.00000e-01 3.000000e-01 0.15
#> 3 8.000000e-01 3.000000e-01 8.00000e-01 2.400000e-01 0.80
mod1 <- sirt::invariance.alignment(lambda=lambda,
nu=nu,
eps=eps,
optimizer="optim",
align.pow=align.pow,
meth=1,
vcov=res$vcov)
#> Error in (function (x, lambda, nu, overparam, eps, meth_) : argument "meth_" is missing, with no default
#variance matrix and standard errors
mod1$vcov
#> Error in eval(expr, envir, enclos): object 'mod1' not found
sqrt(diag(mod1$vcov))
#> Error in eval(expr, envir, enclos): object 'mod1' not found
Created on 2024-03-15 with reprex v2.1.0
Hi,
I just emailed you but then realize there is a github repository for the package. So, I thought it would be more efficient to put this here as well.
I am struggling to reproduce the results. I compare the DETECT output from my previous research (using the original software) by analyzing the same dataset using the sirt package, and the results were quite different. Also, I analyzed the TIMSS data using the original DETECT program, and again the results were different. I attached the DETECT output from the original software using the same TIMSS dataset available in your package.
I copied the R syntax I used below (same as in your example, the only difference is that I am using the sum scores for conditioning).
data(data.timss)
dat <- data.timss$data
dat <- dat[, substring( colnames(dat),1,1)=="M" ]
iteminfo <- data.timss$item
a <- conf.detect(data=dat,
score=rowSums(dat),
itemcluster=iteminfo$Content.Domain)
summary(a)
-----------------------------------------------------------------
sirt 3.9-4 (2020-02-17 12:57:09)
R version 4.0.2 (2020-06-22) x86_64, mingw32 | nodename=COE-L-102R-WL01 | login=cengiz
Call:
conf.detect(data = dat, score = rowSums(dat), itemcluster = iteminfo$Content.Domain)
Confirmatory DETECT Analysis with 3 Item Clusters
Bandwidth Scale: 1.1
-----------------------------------------------------------------
Dimensionality Statistics
unweighted weighted
DETECT 0.311 0.311
ASSI 0.273 0.273
RATIO 0.350 0.350
MADCOV100 0.888 0.888
MCOV100 -0.551 -0.551
As you can see the statistics are different. This is just an example for a confirmatory DETECT for the dataset available in your package. I did also try different datasets I have. The results are always (substantially) different for both confirmatory and exploratory DETECT procedure.
I understand that there may be nuances in algorithms used; however, these differences are not something we can ignore. Having these in R is great and provides a number of opportunities for research and make it easy to teach. On the other hand, before I use these functions for my research and recommend it to my students, I wonder if you have any insights about why the results are different, or what adjustments (if possible) to make so the results from R function is similar to the ones we get from the original DETECT program.
Thank you so much for your time and effort in developing this package.
Hi!
Thank you for your efforts!
Is there some reason to not include the gamma.testlet vector to the results of the mcmc.3pno.testlet
function?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.