library(mirt)
library(mirtCAT)
library(PROsetta)
library(kableExtra)
library(mvnfast)

root <- rprojroot::find_rstudio_root_file()

1 Introduction

Calibrated projection [CP; Thissen et al. (2011)] is a multidimensional procedure for mapping the score levels between two scales. The ultimate objective of the procedure is in producing a crosswalk table between the score levels of two scales. Because the objective of the procedure is solely related to the metrics of scores and not to the metrics of item parameters, CP can be considered as a score equating method and not a scale linking method.

To justify this, the distinction between scale linking and score equating is first reviewed.

1.1 Scale linking

Scale link is achieved when a set of item parameters \(\xi\) is on the same metric with another set of anchor item parameters \(\xi'\). Let \(\xi\) refer to the set of item parameters (e.g. discrimination, difficulty, and pseudo-guessing).

Suppose that we have a response dataset \(\mathbf{X}\) from items \(a_1 ... a_{10}\) from scale \(a\). Through item parameter calibration (i.e. item parameter estimation) on \(\mathbf{X}\), one can obtain item parameters \(\xi_a\) for items \(a_1 ... a_{10}\).

Then, suppose that we have another response dataset \(\mathbf{X'}\) from items \(a_1 ... a_{10}\) from scale \(a\), and items \(b_1 ... b_{10}\) from scale \(b\). Let the item parameters from this dataset be denoted by \(\xi'_a\) and \(\xi'_b\). Without conversion, the parameters \(\xi'_a\) are on a different metric compared to \(\xi_a\), because the dataset \(\mathbf{X'}\) comes from a different ability range compared to \(\mathbf{X}\).

In this example, scale link is achieved when \(\xi'_a\) is on the same metric with \(\xi_a\), by means of applying some procedure to make it so. This makes \(\xi'_a\) comparable with \(\xi_a\). Also, this makes \(\xi'_b\) from \(\mathbf{X}'\) interpretable on the metric of \(\xi_b\) as if it had been obtained from \(\mathbf{X}\).

Scale linking methods include:

Linear transformation. Here, a function \(f: \xi' \rightarrow \xi\) that converts \(\xi'_a\) to the metric of \(\xi_a\) is sought after. Once the function \(f\) is determined, \(f\) can also be used to convert \(\xi'_b\) to \(\xi_b\). Linear transformation methods include Haebara method (1980) and Stocking-Lord method (1983).
Fixed-parameter calibration. Here, the item calibration phase on the response dataset \(\mathbf{X}\) is modified, so that \(\xi' = \{\xi'_a, \xi'_b\}\) is estimated subject to the constraint \(\xi'_a = \xi_a\). This way, the solution \(\xi' = \{\xi'_a, \xi'_b\}\) is obtained so that \(\xi'_a\) is on the same metric of \(\xi_a\), achieving scale link. Further conversions should not be done with \(\xi'\) obtained this way, because such attempts will alter the metric and break the link.

1.2 Score equating

Score equating is achieved when a set of score levels on one scale is mapped to corresponding score levels on another scale.

Suppose that we have scale \(a\) with its raw scores \(x_a\) ranging in \([0, 10]\), and scale \(b\) with its raw scores \(x_b\) ranging in \([0, 100]\). The raw scores \(x_a\) and \(x_b\) are on different metrics. Then, for example given a raw score level \(x_a = 5\), one may define a corresponding score level \(\hat{x}_b\) on scale \(b\) so that \(\hat{x}_b\) can be compared with \(x_b\) in the same raw score metric. Score equating is the process of determining the map \(f: x_a \rightarrow \hat{x}_b\) for all possible \(x_a\) levels. A practical example of this is deriving SAT score equivalents of ACT scores.

Equipercentile equating is a method of score equating. Equipercentile method involves first mapping the raw scores of scale \(a\) onto the percentile \(p\) metric and then onto the raw score metric of scale \(b\), so that \(x_a \rightarrow p \rightarrow \hat{x}_b\). Equipercentile equating between raw scores does not involve item parameters, and it only involves raw scores \(x_a\) and \(x_b\).

Equipercentile equating may be modified to produce standardized scores, such as \(\theta\) values and T-scores. The scores \(x_a\) are first mapped to the percentile \(p\) metric as before. Then, instead of mapping onto the raw \(b\) metric, the \(p\) values are mapped onto the \(\theta\) metric, so that \(x_a \rightarrow p \rightarrow \theta\). To accomplish this, the scale \(b\) scores are mapped onto the \(\theta\) metric using a presupplied set of item parameters for scale \(b\). The standard method to do this is to use Lord-Wingersky recursion (1984), which obtains Expected A Posteriori (EAP) \(\theta\) estimates for each possible score level. The item parameters to be used for the process may be obtained from free calibration. Also, if scale link to external anchor item parameters is in need, one may use either the fixed-parameter calibration method or the linear transformation method to obtain the metric-matched item parameters.

Score equating methods often aim to produce a crosswalk table (Table 1.1).

Table 1.1: An example crosswalk table. Scale A raw scores (range 0 to 10) are mapped onto corresponding raw scores in scale B (range 0 to 100), and onto corresponding \(\theta\) and T-scores.
Scale A (raw)	Scale B (raw)	Scale B (theta)	Scale B (T-score)
0	5.0	-0.781	42.189
1	14.1	-0.416	45.835
2	23.2	-0.108	48.918
3	32.3	0.159	51.589
4	41.4	0.394	53.944
5	50.5	0.605	56.052
6	59.6	0.796	57.958
7	68.7	0.970	59.698
8	77.8	1.130	61.299
9	86.9	1.278	62.781
10	96.0	1.416	64.161

1.3 Calibrated projection

The CP procedure is now explained. Suppose that we have scale \(a\) with items \(a_1 ... a_{10}\), and scale \(b\) with items \(b_1 ... b_{10}\). The response dataset \(\mathbf{X}\) contains responses on both scales. Essentially, calibrated projection models the constructs measured by each scale as a multidimensional latent structure. Because we have two scales here, it would be represented by a two dimensional structure, with each dimension representing each scale.

The item parameter estimation is performed as follows. First, a 2-factor IRT model is fitted onto the response dataset \(\mathbf{X}\). Additional constraints are imposed. The first discrimination parameter is freely estimated for scale \(a\) items, while fixed as \(0\) for other items. Also, the second discrimination parameter is freely estimated for scale \(b\) items, while fixed as \(0\) for other items. Other item parameters are freely estimated as usual. Also, the correlation between the two factors are freely estimated. The estimated correlation here becomes a critical component in later steps in producing a crosswalk table.

In the one-dimensional case introduced above in equipercentile equating, each possible score level was mapped to a corresponding \(\theta\) value using Lord-Wingersky recursion. Here, using the calibrated model, each possible score level can be mapped to a corresponding two-dimensional \(\theta\) value, using a multidimensional extension of Lord-Wingersky recursion. The obtained two-dimensional values can be used to retrieve single \(\theta\) values at each scale. In Thissen et al. (2011), the authors presented a table that maps the raw scores in PedsQL scale to T-scores in PAIS scale, using a two-dimensional structure between PedsQL and PAIS constructs.

It should be emphasized that the focus of CP method is not the item parameters themselves, but the estimation of correlation between the two factors and its use in producing a score map. Here, the correlation between the two factors represent the relationship between the constructs measured by two scales. Using this information, obtaining multidimensional EAP \(\theta\) estimate value yields \(\theta\) estimate on scale \(a\) and scale \(b\) simulatneously.

Table 4 of Lord & Wingersky (1984) is now reproduced to demonstrate that the description above is correct.

First, the item parameters from Table 2 and Table 3 in Lord & Wingersky (1984) were used. All 28 items were used, and dimensions 3 to 8 were discarded.

# demo/CP_demo_read.r
# read origin tables and create item objects

d2 <- read.csv(file.path(root, "data/table2.csv"))
d3 <- read.csv(file.path(root, "data/table3.csv"))
d  <- cbind(
  d2[order(d2[, 1]), ],
  d3[order(d3[, 1]), -1]
)

ipar <- d[, -c(1, 14)]
colnames(ipar)[9:12] <- paste0("d", 1:4)
ipar <- ipar[, c(1:2, 9:12)]

itempool <- generate.mirt_object(ipar, itemtype = "graded")

A two-dimensional theta grid was created by crossing \(\theta = -4.5(0.2)4.5\), to be used in later steps.

# module/module_grid.r
# creates quadrature points over two-dimensional space

nd         <- 2
theta      <- seq(-4.5, 4.5, .2)
theta_grid <- as.matrix(expand.grid(theta, theta))
n_grid     <- dim(theta_grid)[1]

This created a total of 2116 quadrature points over two-dimensional space.

A function computeResponseProbability() was created to obtain the response probability for a specified item and a specified response category, given a two-dimensional \(\theta\) vector. This is necessary to implement the multidimensional extension of Lord-Wingersky algorithm.

# module/module_computeResponseProbability.r
# function for computing category response probability
# at a given theta point

computeResponseProbability <- function(
  itempool, theta, item_idx, score_level
) {

  n_examinees <- nrow(theta)
  p           <- rep(NA, n_examinees)

  probs       <- mirt::probtrace(itempool, Theta = theta)
  itemname    <- colnames(itempool@Data$data)[item_idx]
  use_these   <- sprintf("%s.P.%s", itemname, score_level + 1)
  probs       <- probs[, use_these]

  return(probs)

}

An example output of computeResponseProbability() is shown below. For item 10 with the following item parameters,

item_idx <- 10
coef(itempool)[[item_idx]]

##       a1 a2   d1    d2    d3    d4
## par 2.03  0 0.92 -0.16 -2.31 -3.58

the probability of obtaining a score of 3 on the polytomous item for a \(\theta = (0, 0)\) person is:

score <- 3
theta <- matrix(c(0, 0), 1, 2)
computeResponseProbability(itempool, theta, item_idx, score)

## Item.10.P.4 
##  0.06317843

Multidimensional Lord-Wingersky recursion was implemented as follows. The function LWrecursion() returns likelihood values \(\mathrm{L}(x|\theta)\) at each grid point \(\theta\), for each possible score level \(x = 0, 1, ...\). Correlation between the two dimensions is not used at this stage.

# module/module_LWrecursion.r
# function for performing Lord-Wingersky recursion
# this obtains likelihoods of each score level over quadrature points

LWrecursion <- function(itempool, use_items, theta_grid) {

  L_init <- TRUE

  for (item_idx in use_items) {

    new_max_value_of_item <- itempool@Data$K[item_idx] - 1
    new_possible_values   <- 0:new_max_value_of_item

    P <- list()
    for (v in new_possible_values) {
      P[[as.character(v)]] <-
        computeResponseProbability(itempool, theta_grid, item_idx, v)
    }

    if (L_init) {

      L <- P
      old_possible_values <- new_possible_values
      L_init <- FALSE

    } else {

      map_values <- expand.grid(old_possible_values, new_possible_values)

      map_L <- do.call(rbind, L[as.character(map_values[, 1])])
      map_P <- do.call(rbind, P[as.character(map_values[, 2])])

      map_lls <- map_L * map_P

      tmp <- aggregate(map_lls, by = list(apply(map_values, 1, sum)), sum)

      tmp_lls   <- tmp[, -1]
      tmp_value <- tmp[, 1]

      L <- list()
      for (i in 1:nrow(tmp)) {
        L[[as.character(tmp_value[i])]] <-
          tmp_lls[i, ]
      }

      old_possible_values <- tmp[, 1]

    }

  }

  return(L)

}

Then, the recursion was performed to obtain the likelihood values at each \(\theta\) grid point, for 11 items of PedsQL scale (items 18 - 28). These items had non-zero \(a\)-parameters on dimension 2 (PedsQL dimension), and zeroes on dimension 1 (PAIS dimension). The score levels ranged from 0 to 44.

# demo/CP_demo_LW.r
# likelihood values for 11 items in PedsQL instrument
# the test score ranges from 0-44

pedsql_items <- 18:28
L <- LWrecursion(itempool, pedsql_items, theta_grid)

Then, EAP estimates and covariances were computed from the likelihood values. The equations were adapted from Bryant et al. (2005). Given a \(k\)-dimensional vector \(\theta\), the \(k\)-dimensional EAP estimate given a score level \(x\) is

\[\mathrm{E}(\theta|x) = \int_k{\frac{\theta \mathrm{L}(x|\theta) f(\theta,\Sigma) d\theta}{\mathrm{L}(x|\theta) f(\theta,\Sigma) d\theta}}\]

approximated by

\[\mathrm{E}(\theta|x) = \sum{\frac{\theta \mathrm{L}(x|\theta) f(\theta,\Sigma)}{\mathrm{L}(x|\theta) f(\theta,\Sigma)}}\] where \(\mathrm{L}(x|\theta)\) is the previously computed likelihood of score level \(x\) given \(\theta\), and \(f(\theta, \Sigma)\) is a multivariate density value given the correlation matrix \(\Sigma\). Here, the factor correlation obtained previously is used to define \(\Sigma\), which the reported correlation is \(.96\). The summation is taken over all \(\theta\) grid.

The \(k\)-dimensional EAP covariance is

\[\mathrm{C}(\theta|x) = \int_k{\frac{\mathrm{C}(\theta) \mathrm{L}(x|\theta) f(\theta,\Sigma) d\theta}{\mathrm{L}(x|\theta) f(\theta,\Sigma) d\theta}}\] approximated by

\[\mathrm{C}(\theta|x) = \sum{\frac{\mathrm{C}(\theta) \mathrm{L}(x|\theta) f(\theta,\Sigma)}{\mathrm{L}(x|\theta) f(\theta,\Sigma)}}\]

where \(\mathrm{C}(\theta)\) is variance-covariance matrix \((\theta - \theta_\mathrm{EAP})(\theta - \theta_\mathrm{EAP})'\). The summation is taken over all \(\theta\) grid.

# module/module_LtoEAP.r
# converts likelihoods obtained from Lord-Wingersky recursion
# into two-dimensional EAP estimates

LtoEAP <- function(L, theta_grid, sigma) {

  nd  <- dim(theta_grid)[2]
  tmp <- list()

  for (i in 1:length(L)) {

    num <- matrix(0, 1, nd)
    den <- 0

    for (j in 1:n_grid) {
      term_T <- theta_grid[j, , drop = FALSE]
      term_L <- as.numeric(L[[i]][j])
      term_W <- dmvn(term_T, rep(0, nd), sigma)
      num <- num + (term_T * term_L * term_W)
      den <- den + (term_L * term_W)
    }

    th <- num / den

    num <- matrix(0, nd, nd)
    den <- 0

    for (j in 1:n_grid) {
      term_T <- theta_grid[j, , drop = FALSE]
      term_C <- (term_T - th)
      term_V <- t(term_C) %*% term_C
      term_L <- as.numeric(L[[i]][j])
      term_W <- dmvn(term_T, rep(0, nd), sigma)
      num <- num + (term_V * term_L * term_W)
      den <- den + (term_L * term_W)
    }

    COV <- num / den

    tmp[[names(L)[i]]]$EAP <- th
    tmp[[names(L)[i]]]$COV <- COV

  }

  return(tmp)

}

# demo/CP_demo_EAP.r
# converts likelihood values of PedsQL instrument
# into two-dimensional theta estimates

est_cor     <- .96
sigma       <- diag(nd)
sigma[2, 1] <- est_cor
sigma[1, 2] <- est_cor

EAP <- LtoEAP(L, theta_grid, sigma)

This contains two-dimensional EAP estimates for each possible score levels in the range 0 - 44. An example is given below:

EAP["10"]

## $`10`
## $`10`$EAP
##            Var1       Var2
## [1,] -0.1365144 -0.1422025
## 
## $`10`$COV
##            Var1       Var2
## Var1 0.14423697 0.06858018
## Var2 0.06858018 0.07143768

This means that a score of 10 in the PedsQL scale (11 items, score range 0 - 44) corresponds to an EAP theta estimate of -0.1365144 in the first dimension (PAIS dimension), and -0.1422025 in the second dimension (PedsQL dimension).

Finally, the estimates for the first dimension (reflecting PAIS construct) are aggregated into a table. A graphical representation of the reproduced table is displayed in Figure 1.1.

# module/module_EAPtoTABLE.r
# function for converting EAP estimates into a table

EAPtoTABLE <- function(EAP, dimension, tscore) {

  eap  <- lapply(EAP, function(x) x$EAP[dimension])
  se   <- lapply(EAP, function(x) sqrt(x$COV[dimension, dimension]))
  eap  <- do.call(c, eap)
  se   <- do.call(c, se)
  if (tscore) {
    eap <- (eap*10) + 50
    se  <- se*10
  }
  x <- as.numeric(names(eap))
  o <- cbind(x, eap, se)
  o <- as.data.frame(o)

  return(o)

}

# demo/CP_demo_TABLE.r
# converts two-dimensional theta estimates into a table

o <- EAPtoTABLE(EAP, TRUE, dimension = 1)

# draw a figure comparing origin data and reproduced data

d  <- read.csv(file.path(root, "data/table4.csv"))
x  <- d[, 1]
y  <- d[, 2]
yu <- d[, 2] + d[, 3]
yl <- d[, 2] - d[, 3]

plot(
  x, y, type = "n",
  ylim = c(20, 90),
  xlab = "PedsQL (raw)", ylab = "PAIS (T-score)"
)
lines(x, y , col = "red")
lines(x, yu, col = "red", lty = 2)
lines(x, yl, col = "red", lty = 2)

lines(o$x, o$eap , col = "blue")
lines(o$x, o$eap + o$se, col = "blue", lty = 2)
lines(o$x, o$eap - o$se, col = "blue", lty = 2)

legend("topleft",
  c("Thissen, et al. (2011)", "Reproduced"),
  col = c("red", "blue"),
  lty = 1)

Figure 1.1: Graphical representation of PedsQL - PAIS crosswalk table. Dotted lines represent standard errors.

1.4 Motivation

One of the strengths of calibrated projection method over equipercentile equating method is that it does not require unidimensionality between the two scales, and takes the correlation between the constructs measured by two scales explicitly into account. Therefore, if the latent correlation is perfect, the two methods should perform similar in terms of how well the produced crosswalk tables recover true \(\theta\)s. As latent correlation decreases, calibrated projection method should perform better than equipercentile equating method.

It would be useful to see how much advantage calibrated projection method provides compared to equipercentile equating method, as a function of latent correlation between two scales.

1.5 Study objective

Equipercentile equating method and calibration projection method were compared to evaluate their produced crosswalk tables, in using raw score levels of one scale to recover true \(\theta\) values in another scale. A Monte Carlo simulation was conducted. The correlation between the constructs underlying each scale was manipulated.

2 Method

2.1 Design

Factor correlation was varied in \(0.95(-0.05)0.50\) in 10 levels. The number of simulation trials was 20 in each level. This was sufficient to obtain a stable pattern.

2.2 Item parameters

The response dataset from PROMIS Depression - CES-D scales in PROsetta package was used as basis for obtaining item parameters. The dataset includes 731 response rows (after listwise removal of 16 rows with missing data) and 48 items. The 48 items include 20 items on scale \(a\) (CES-D scale), and 28 items on scale \(b\) (PROMIS Depression scale).

First, a one dimensional IRT model was fitted on the dataset to obtain item parameters. Graded response model was used for all items. Then, the obtained item parameters were converted to two-dimensional item parameters to be used in response data generation. CES-D items were loaded onto dimension 2, and PROMIS items were loaded onto dimension 1.

# simulation/make_item_parameters.r
# obtain item parameters to use in simulation

library(PROsetta)

root <- rprojroot::find_rstudio_root_file()

par_exists <-
  file.exists(file.path(root, "data/ipar_a.csv")) &
  file.exists(file.path(root, "data/ipar_d.csv"))

if (par_exists) {

  ipar_a <- read.csv(file.path(root, "data/ipar_a.csv"), row.names = 1)
  ipar_d <- read.csv(file.path(root, "data/ipar_d.csv"), row.names = 1)

} else {

  d <- getCompleteData(data_dep)

  set.seed(1)
  calib <- runCalibration(d, technical = list(NCYCLES = 1000))
  ipar  <- mirt::coef(calib, IRTpars = FALSE, simplify = TRUE)

  ipar   <- ipar$items
  ipar_a <- ipar[, c(1, 1)]
  ipar_d <- ipar[, 2:5]
  colnames(ipar_a) <- paste0("a", 1:2)

  cesd_items <- 29:48
  prom_items <-  1:28

  ipar_a[prom_items, 2] <- 0
  ipar_a[cesd_items, 1] <- 0

  write.csv(ipar_a, file.path(root, "data/ipar_a.csv"))
  write.csv(ipar_d, file.path(root, "data/ipar_d.csv"))

}

ipar_a <- as.matrix(ipar_a)
ipar_d <- as.matrix(ipar_d)

2.3 Simulee & response data

In each trial, 1000 two-dimensional \(\theta\) values were sampled from the multivariate normal distribution with the specified correlation. Response dataset \(\mathbf{X}\) was generated from the item parameters and the \(\theta\) values.

# sim_generate_data.r
# generate true thetas and response data to use in simulation

get_data <- function(theta_corr, ipar_a, ipar_d) {

  nd    <- 2
  sigma <- matrix(theta_corr, nd, nd)
  diag(sigma) <- 1

  true_theta <- rmvn(1000, rep(0, 2), sigma)
  true_theta <- as.matrix(true_theta)

  response <- mirt::simdata(ipar_a, ipar_d, itemtype = "graded", Theta = true_theta)

  X <- list(
    data  = response,
    theta = true_theta
  )
  return(X)

}

2.4 Score equating

Score equating methods were performed as follows.

2.4.1 Equipercentile equating

Equipercentile method was performed on the response dataset \(\mathbf{X}\). Smoothing was not applied. Because obtaining \(\theta\) values from equipercentile method requires item parameters, item parameters were estimated by performing free calibration on \(\mathbf{X}\). A one-dimensional model was used for this purpose.

# simulation/perform_EQP.r
# perform equipercentile equating

perform_EQP <- function(X) {

  dX          <- data_dep
  dX@response <- as.data.frame(X$data)
  person_id   <- seq(100001, 101000, 1)
  dX@response <- cbind(person_id, dX@response)
  colnames(dX@response) <- colnames(data_dep@response)

  # functions are from PROsetta package
  set.seed(1)
  eq_calib <- runCalibration(dX, technical = list(NCYCLES = 1000))
  eq_rsss  <- runRSSS(dX, eq_calib, min_score = 0)
  eq_conc  <- runEquateObserved(dX, smooth = "none", type_to = "theta", rsss = eq_rsss)
  eq_conc  <- eq_conc$concordance

  out <- list(
    eq_conc  = eq_conc,
    eq_calib = eq_calib
  )

  return(out)

}

2.4.2 Calibrated projection

Calibrated projection method was performed on the response dataset \(\mathbf{X}\). A two-factor model was used, where CES-D items (items 29 - 48) were loaded onto dimension 2, and PROMIS items (items 1 - 28) were loaded onto dimension 1. Correlation between the two factors was freely estimated, subject to upper bound of .999. The upper bound was imposed to avoid the latent structure being singular.

# simulation/perform_CP.r
# perform calibrated projection

perform_CP <- function(X, theta_grid) {
  model = mirt.model("
    F1  =  1-28
    F2  = 29-48
    COV = F1*F2
    UBOUND = (GROUP, COV_21, .999)
  ")
  cp_calib       <- mirt(X$data, model, itemtype = "graded")
  cp_likelihoods <- LWrecursion(cp_calib, cesd_items, theta_grid)

  est_cor <- as.data.frame(coef(cp_calib)$GroupPars)
  est_cor <- est_cor[["COV_21"]]

  sigma          <- diag(nd)
  sigma[2, 1]    <- est_cor
  sigma[1, 2]    <- est_cor

  cp_eap  <- LtoEAP(cp_likelihoods, theta_grid, sigma)
  cp_out  <- EAPtoTABLE(cp_eap, FALSE, dimension = 1)

  return(cp_out)

}

2.4.3 One-dimensional pattern scoring

One-dimensional pattern scoring was performed on the PROMIS portion of response dataset \(\mathbf{X}\). This was done to serve as a best-case reference. From the item parameters for all 48 items obtained for equipercentile equating, item parameters for PROMIS items were used to obtain EAP estimates of \(\theta\) values.

2.5 Performance criteria

For each trial, CES-D raw scores were computed from the generated \(\mathbf{X}\). Using the two crosswalk tables from equipercentile equating and calibrated projection, the raw scores were converted to their respective \(\theta\) estimates for PROMIS construct. The difference between the converted \(\theta\) and the true PROMIS \(\theta\) was used to compute RMSE values.

3 Result & Discussion

Figure 3.1: Performance of equipercentile equating and calibrated projection across correlation levels.

In Figure 3.1, the performance of calibration projection was similar to equipercentile equating in higher correlation levels. Calibration projection method performed better than equipercentile equating as correlation between the factors decreased.

The results suggest that calibrated projection provides a better crosswalk table compared to equipercentile equating, when the latent construct measured by each scale is less correlated with each other and not identical.

A potential topic for future research would be on extending the current result into three dimensional structures. Because calibrated projection uses multidimensional modeling, it is able to provide simultaneous score equating between more than two scales, without having to rely on chaining multiple steps of equipercentile equating or of other types of score equating methods that only work with two scales. A technical issue that needs to be resolved for such a study is that multidimensional integration becomes exponentially time-consuming with increased number of dimensions. Monte Carlo integration is a way to remedy such a computational burden.

References

Bryant, D. U., Smith, A. K., Alexander, S. G., Vaughn, K., & Canali, K. G. (2005). Expected A Posteriori Estimation of Multiple Latent Traits: (518612013-445). American Psychological Association. https://doi.org/10.1037/e518612013-445

Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22(3), 144–149.

Lord, F. M., & Wingersky, M. S. (1984). Comparison of IRT True-Score and Equipercentile Observed-Score "Equatings". Applied Psychological Measurement, 8(4), 453–461. https://doi.org/10.1177/014662168400800409

Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7(2), 201–210.

Thissen, D., Varni, J. W., Stucky, B. D., Liu, Y., Irwin, D. E., & DeWalt, D. A. (2011). Using the PedsQL™ 3.0 asthma module to obtain scores comparable with those of the PROMIS pediatric asthma impact scale (PAIS). Quality of Life Research, 20(9), 1497–1505. https://doi.org/10.1007/s11136-011-9874-y

Comparison of calibrated projection score equating and fixed-parameter calibration based equipercentile score equating

Sangdon Lim