Scale linking vs Score equating
Scale link is achieved when
Notation
\(\xi\) : the set of item parameters (e.g. difficulty, …)
Suppose we have a response dataset \(\mathbf{X}\) with
Through item parameter calibration on \(\mathbf{X}\), we can obtain
Suppose we have another dataset \(\mathbf{X'}\) with
Let the item parameters from this dataset be denoted by \(\xi'_a\) and \(\xi'_b\)
Without conversion, \(\xi'_a\) is on a different metric compared to \(\xi_a\)
Scale link is achieved when \(\xi'_a\) is on the same metric with \(\xi_a\)
Scale linking methods include
Linear transformation
Scale linking methods include
Fixed-parameter calibration
Score equating is achieved when
Suppose that we have
Scores \(x_a\) and \(x_b\) are on different metrics
Given a score \(x_a = 5\)
Score equating is the process of determining
Equipercentile equating is a method of score equating
The process does not involve item parameters
Equipercentile equating may be modified to get standardized scores
To accomplish this,
The end product of score equating is a crosswalk table
Scale A (raw) | Scale B (raw) | Scale B (theta) | Scale B (T-score) |
---|---|---|---|
0 | 5.0 | -0.781 | 42.189 |
1 | 14.1 | -0.416 | 45.835 |
2 | 23.2 | -0.108 | 48.918 |
3 | 32.3 | 0.159 | 51.589 |
4 | 41.4 | 0.394 | 53.944 |
5 | 50.5 | 0.605 | 56.052 |
6 | 59.6 | 0.796 | 57.958 |
7 | 68.7 | 0.970 | 59.698 |
8 | 77.8 | 1.130 | 61.299 |
9 | 86.9 | 1.278 | 62.781 |
10 | 96.0 | 1.416 | 64.161 |
Scale linking is about the metrics of item parameters
Score equating is about the metrics of observed scores
Calibrated projection [CP; Thissen et al. (2011)] is a procedure for mapping the score levels between two scales
Suppose that we have a response dataset \(\mathbf{X}\) with
A 2-factor IRT model is fitted onto the response dataset \(\mathbf{X}\)
model <- mirt.model("
F1 = 1-10 # free estimation for scale a items
F2 = 11-20 # free estimation for scale b items
COV = F1*F2
")
cp_calib <- mirt(X, model, itemtype = "graded")
First discrimination parameter
Second discrimination parameter
Other item parameters are freely estimated as usual
The correlation between factors are freely estimated
Calibrated model can be used to produce a crosswalk table
In Thissen et al. (2011), the authors presented a table
Table 4 Thissen et al. (2011)
Table 4 (reproduced)
Step 1. Read in item parameters
(code blocks are scrollable)
# demo/CP_demo_read.r
# read origin tables and create item objects
d2 <- read.csv(file.path(root, "data/table2.csv"))
d3 <- read.csv(file.path(root, "data/table3.csv"))
d <- cbind(
d2[order(d2[, 1]), ],
d3[order(d3[, 1]), -1]
)
ipar <- d[, -c(1, 14)]
colnames(ipar)[9:12] <- paste0("d", 1:4)
ipar <- ipar[, c(1:2, 9:12)]
itempool <- generate.mirt_object(ipar, itemtype = "graded")
Step 2. Initialize theta grid for multidimensional integration
# module/module_grid.r
# creates quadrature points over two-dimensional space
nd <- 2
theta <- seq(-4.5, 4.5, .2)
theta_grid <- as.matrix(expand.grid(theta, theta))
n_grid <- dim(theta_grid)[1]
Step 3. Function for getting category probability
# module/module_computeResponseProbability.r
# function for computing category response probability
# at a given theta point
computeResponseProbability <- function(
itempool, theta, item_idx, score_level
) {
n_examinees <- nrow(theta)
p <- rep(NA, n_examinees)
probs <- mirt::probtrace(itempool, Theta = theta)
itemname <- colnames(itempool@Data$data)[item_idx]
use_these <- sprintf("%s.P.%s", itemname, score_level + 1)
probs <- probs[, use_these]
return(probs)
}
Step 4. Lord-Wingersky recursion (multidimensional extension)
# module/module_LWrecursion.r
# function for performing Lord-Wingersky recursion
# this obtains likelihoods of each score level over quadrature points
LWrecursion <- function(itempool, use_items, theta_grid) {
L_init <- TRUE
for (item_idx in use_items) {
new_max_value_of_item <- itempool@Data$K[item_idx] - 1
new_possible_values <- 0:new_max_value_of_item
P <- list()
for (v in new_possible_values) {
P[[as.character(v)]] <-
computeResponseProbability(itempool, theta_grid, item_idx, v)
}
if (L_init) {
L <- P
old_possible_values <- new_possible_values
L_init <- FALSE
} else {
map_values <- expand.grid(old_possible_values, new_possible_values)
map_L <- do.call(rbind, L[as.character(map_values[, 1])])
map_P <- do.call(rbind, P[as.character(map_values[, 2])])
map_lls <- map_L * map_P
tmp <- aggregate(map_lls, by = list(apply(map_values, 1, sum)), sum)
tmp_lls <- tmp[, -1]
tmp_value <- tmp[, 1]
L <- list()
for (i in 1:nrow(tmp)) {
L[[as.character(tmp_value[i])]] <-
tmp_lls[i, ]
}
old_possible_values <- tmp[, 1]
}
}
return(L)
}
Step 4. Lord-Wingersky recursion (multidimensional extension)
# demo/CP_demo_LW.r
# likelihood values for 11 items in PedsQL instrument
# the test score ranges from 0-44
pedsql_items <- 18:28
L <- LWrecursion(itempool, pedsql_items, theta_grid)
Use PedsQL items
Step 5. Compute EAP estimates from likelihoods
# module/module_LtoEAP.r
# converts likelihoods obtained from Lord-Wingersky recursion
# into two-dimensional EAP estimates
LtoEAP <- function(L, theta_grid, sigma) {
nd <- dim(theta_grid)[2]
tmp <- list()
for (i in 1:length(L)) {
num <- matrix(0, 1, nd)
den <- 0
for (j in 1:n_grid) {
term_T <- theta_grid[j, , drop = FALSE]
term_L <- as.numeric(L[[i]][j])
term_W <- dmvn(term_T, rep(0, nd), sigma)
num <- num + (term_T * term_L * term_W)
den <- den + (term_L * term_W)
}
th <- num / den
num <- matrix(0, nd, nd)
den <- 0
for (j in 1:n_grid) {
term_T <- theta_grid[j, , drop = FALSE]
term_C <- (term_T - th)
term_V <- t(term_C) %*% term_C
term_L <- as.numeric(L[[i]][j])
term_W <- dmvn(term_T, rep(0, nd), sigma)
num <- num + (term_V * term_L * term_W)
den <- den + (term_L * term_W)
}
COV <- num / den
tmp[[names(L)[i]]]$EAP <- th
tmp[[names(L)[i]]]$COV <- COV
}
return(tmp)
}
Step 5. Compute EAP estimates from likelihoods
Step 5. Compute EAP estimates from likelihoods
Given a \(k\)-dimensional vector \(\theta\),
the \(k\)-dimensional EAP estimate given a score level \(x\) is
\[\mathrm{E}(\theta|x) = \frac{\int{\theta \mathrm{L}(x|\theta) f(\theta,\Sigma) d\theta}} {\int{\mathrm{L}(x|\theta) f(\theta,\Sigma) d\theta}}\]
approximated by
\[\mathrm{E}(\theta|x) = \frac{\sum{\theta \mathrm{L}(x|\theta) f(\theta,\Sigma)}} {\sum{\mathrm{L}(x|\theta) f(\theta,\Sigma)}}\]
The summation is taken over all \(\theta\) grid
Given a \(k\)-dimensional vector \(\theta\),
the \(k\)-dimensional EAP covariance given a score level \(x\) is
\[\mathrm{C}(\theta|x) = \frac{\int{\mathrm{C}(\theta) \mathrm{L}(x|\theta) f(\theta,\Sigma) d\theta}} {\int{\mathrm{L}(x|\theta) f(\theta,\Sigma) d\theta}}\] approximated by
\[\mathrm{C}(\theta|x) = \frac{\sum{\mathrm{C}(\theta) \mathrm{L}(x|\theta) f(\theta,\Sigma)}} {\sum{\mathrm{L}(x|\theta) f(\theta,\Sigma)}}\]
The summation is taken over all \(\theta\) grid
Step 6. Aggregate into a table
# module/module_EAPtoTABLE.r
# function for converting EAP estimates into a table
EAPtoTABLE <- function(EAP, dimension, tscore) {
eap <- lapply(EAP, function(x) x$EAP[dimension])
se <- lapply(EAP, function(x) sqrt(x$COV[dimension, dimension]))
eap <- do.call(c, eap)
se <- do.call(c, se)
if (tscore) {
eap <- (eap*10) + 50
se <- se*10
}
x <- as.numeric(names(eap))
o <- cbind(x, eap, se)
o <- as.data.frame(o)
return(o)
}
It should be emphasized that the focus of CP method is not the item parameters
The correlation between two factors represent the relationship between two scales
Advantage of CP over EQP:
If the latent correlation is perfect, CP and EQP should perform similar in terms of producing T-scores
As latent correlation decreases, CP should perform better than EQP
Question: by how much?
Equipercentile equating method vs. calibration projection method
Derived from PROMIS Depression - CES-D dataset in
PROsetta
1D IRT model was fitted on the dataset
Obtained 1D parameters were converted to 2D parameters
to be used in response data generation
CES-D items were loaded onto dimension 1
PROMIS items were loaded onto dimension 2
1000 2D \(\theta\) values were sampled from MVN with specified correlation
Response dataset \(\mathbf{X}\) was generated from item parameters and 2D theta values
Equipercentile method was performed on \(\mathbf{X}\)
Calibrated projection method was performed on \(\mathbf{X}\)
From \(\mathbf{X}\), CES-D raw score was computed for each simulee
Calibrated projection explicitly accounts for latent correlation between measured constructs
CP provides better crosswalk table
since CP uses multidimensional modeling, CP can equate more than two scales simultaneously
technical issue: multidimensional integration is time-consuming