Skip to contents

This document describes weighted Marginal Maximum Likelihood (MML) estimation for student test data in the Dire package. The model treats abilities as a latent variable and estimates a linear model in the latent space. In this framework, failing to account for the measurement variance will bias the regression estimates. The Dire package can directly estimate these models in the latent space. Aditionally, it can generate palausible values (PVs) which allow for unbiased estimation using the same formulas as multiple imputation (Mislevy et al.).

For simplicity, focusing first on a univariate model where there is only one latent ability measured, the student test data are assumed to have been generated by an Item Response Theory (IRT) model, where student ii has ability θi\theta_{i}. The probability of a student getting an item correct—for a dichotomous item—increases in ability but decreases in item difficulty. Put together, the likelihood of student ii’s responss to items (RiR_{i}) is a function of the student’s latent ability in the construct (θi\theta_{i}), the item parameters of each item(PP) $$\begin{align} \mathcal{L}(\bm{R}_{i}| \theta_{i} , \bm{P} )= \prod_{h=1}^{H} {\rm Pr}(R_{ih}| \theta_{i}, \bm{P}_h) \end{align}$$ where the probability function in the product is showin in the appendix and depends on the IRT model used for the item. it is possible that a student will not have been shown an item and then RijR_{ij} is missing. The missing response does not change the probability. This can be thought of as ${\rm Pr}(R_{ih}| \theta_{i}, \bm{P}_h)$ being 1 for all values of θi\theta_{i} and 𝐏h\bm{P}_h.

This latent trait is then modeled as a function of a row vector of explanatory variables 𝐗i\bm{X}_i so that. θi=𝐗i𝛃+ϵi\begin{align} \theta_i = \bm{X}_i \bm{\beta} + \epsilon_i \end{align} Where ϵiN(0,σ)\begin{align} \epsilon_i \sim N(0,\sigma) \end{align}

This creates two sets of likelihoods for person ii, one associated with the covariates (𝐗i\bm{X}_i) and another associated with the observed item responses (𝐑i\bm{R}_i). The likelihood for the student is the product of these two. $$\begin{align} \mathcal{L}(\bm{R}_{i}| \bm{\beta}, \sigma, \bm{X}_i , \bm{P} ) = \int f(\theta_i) \cdot \prod_{h=1}^{H} {\rm Pr}(R_{ih}| \theta_i, \bm{P}_h) d\theta_i \end{align}$$ where f(θi)f(\theta_i) is the density of student iis latent ability θi\theta_i. Substituting the formula for above, $$\begin{align} \mathcal{L}(\bm{R}_{i}| \bm{\beta}, \sigma, \bm{X}_i , \bm{P} ) = \int \frac{1}{\sigma \sqrt{2\pi}}\exp(\frac{-\epsilon_i^2}{2\sigma}) \prod_{h=1}^{H} {\rm Pr}(R_{ih}| \bm{X}_i \beta + \epsilon_i, \bm{P}_h) d\epsilon_i \end{align}$$ The term in the integral is the convolusion of two functions and there is no closed form solution. Dire follows the AM statistical software (Cohen & Jiang 1999) and uses fixed quadrature points set by the user.

In addition, Dire allows for the construct being scored to not be a simple univariate construct but to incorporate several correlated subscales or constructs. If these are labeled from one to JJ then each student has a vector 𝛉i\bm{\theta}_i with components θij\theta_{ij}. This leads to JJ latent regression equations and requires a JJ-demensional integral. $$\begin{align} \mathcal{L} (\bm{R}_{i}| \bm{\beta}_1, ..., \bm{\beta}_J, \bm{\Sigma}, \bm{X}_i , \bm{P} ) = \int ... \int \frac{1}{(2\pi)^{\frac{J}{2}}\sqrt{{\rm det}(\bm{\Sigma})}} \exp(-\frac{1}{2}\bm{\epsilon}_i^T \bm{\Sigma} \bm{\epsilon}_i) \prod_{j=1}^J \prod_{h_j=1}^{H_j} {\rm Pr}(R_{ijh}| \theta_{ij}, \bm{P}_{hj}) d\theta_{i1} ... d\theta_{iJ} \label{eq:introeq} \end{align}$$ where 𝐑ijh\bm{R}_{ijh}, 𝐏hj\bm{P}_{hj}, and 𝛃j\bm{\beta}_j all have a subscript jj added because there is now a set of them for each subscale jj, 𝛜i\bm{\epsilon}_i is now a vector with elements ϵij\epsilon_{ij}, and 𝚺\bm{\Sigma} is the covariance matrix for the 𝛜\bm{\epsilon} vector, which allows for covariance between constructs at the student level in that when Σjj\Sigma_{jj^\prime} is positive than a student with a higher score on construct jj, conditional on 𝐗i\bm{X}_i will also be expected to have a higher score on construct jj^\prime.

This problem suffers from the curse of dimensionality. However, as is pointed out by Cohen and Jiang (1999) this is identical to seemingly unrelated regression (SUR) and can be solved by fitting βj\beta_j and σj\sigma_j once for each subscale and then identifying each of the (J2){ J \choose 2} covariance terms in a pairwise fashion.

This comes from the fact that the multivariate normal can be decomposed into two multivariate normals, one conditional on the other. First re-writing, eq with the normal rewritten as f(ϵi)f(\epsilon_i)$$\begin{align} \mathcal{L} (\bm{R}_{i}| \bm{\beta}_1, ..., \bm{\beta}_J, \bm{\Sigma}, \bm{X}_i , \bm{P} ) = \int ... \int f(\epsilon_i) \prod_{j=1}^J \prod_{h_j=1}^{H_j} {\rm Pr}(R_{ijh}| \theta_{ij}, \bm{P}_{hj}) d\theta_{i1} ... d\theta_{iJ} \label{eq:introeq} \end{align}$$ the multivariate normal can be decomposed into the product of two distributions, an unconditional (univariate-)normal and a condional (potentially multivariate when J>2J>2) normal. Using the first integration dimension, without loss of generality, this becomes $$\begin{align} \mathcal{L} (\bm{R}_{i}| \bm{\beta}_1, ..., \bm{\beta}_J, \bm{\Sigma}, \bm{X}_i , \bm{P} ) = \int ... \int f_1(\epsilon_{i1}) f_{-1}(\epsilon_{i,-1}|\epsilon_{i1}) \prod_{j=1}^J \prod_{h_j=1}^{H_j} {\rm Pr}(R_{ijh}| \theta_{ij}, \bm{P}_{hj}) d\theta_{i1} ... d\theta_{iJ} \label{eq:introeq} \end{align}$$ where f1(ϵi1)f_1(\epsilon_{i1}) is the normal density function for a single variable and f1(ϵi,1|ϵi1)f_{-1}(\epsilon_{i,-1}|\epsilon_{i1}) is the density function of the other variables, conditional on ϵi1\epsilon_{i1}. This can then be rearanged to $$\begin{align} \mathcal{L} (\bm{R}_{i}| \bm{\beta}_1, ..., \bm{\beta}_J, \bm{\Sigma}, \bm{X}_i , \bm{P} ) = & \int f_1(\epsilon_{i1}) \left[ \prod_{h_j=1}^{H_j} {\rm Pr}(R_{ijh}| \theta_{ij}, \bm{P}_{hj}) \right] d\theta_{i1} \nonumber \\ & \int ... \int f_{-1}(\epsilon_{i,-1}|\epsilon_{i1}) \prod_{j=2}^J \prod_{h_j=1}^{H_j} {\rm Pr}(R_{ijh}| \theta_{ij}, \bm{P}_{hj}) d\theta_{i2} ... d\theta_{iJ} \label{eq:introeq} \end{align}$$ Changing to the numerical quadrature representation $$\begin{align} \mathcal{L} (\bm{R}_{i}| \bm{\beta}_1, ..., \bm{\beta}_J, \bm{\Sigma}, \bm{X}_i , \bm{P} ) = & \sum_{q_1=1}^Q \delta f_1(q_1 - \bm{X}_i \beta_1) \left[ \prod_{h_j=1}^{H_j} {\rm Pr}(R_{ijh}| q_1, \bm{P}_{hj}) \right] \nonumber \\ & \sum_{q_2=1}^Q ... \sum_{q_J=1}^Q \delta^{J-1} f_{-1}(\epsilon_{i,-1}(q_{-1})|q_{i1}) \prod_{j=2}^J \prod_{h_j=1}^{H_j} {\rm Pr}(R_{ijh}| q_{ij}, \bm{P}_{hj}) \label{eq:introeq} \end{align}$$

this is additively seperable—the max with respect to 𝛃𝟏\bm{\beta_1} can be found only with data for the first construct. The other constructs can be relabeled and each can be maximized in this way. Similarly, the variances can be independently found in this way.

The Dire package helps analyists estimate coefficients in two potentially useful ways. First the parameters in the regression can be directly estimated, where the β\beta values are used from the above model fit. Second, Dire can generate plausible values from the fitted likelihood surface.

The direct estimation method has the advantage of using the latent space to identify coefficients. However, it can be time consuming to fit a latent parameter model.

The plausible value approach has two advantages. First, it can be used to fit a saturated model (with all possible coefficients of interest) and then other models that use subsets or linear combinations of those coefficients can be fit to the plausible values. Second, doing this saves time relative to fitting a direct estimation model per regression specification.

This document first describes how parameters are estimated in the latest space. The third section describes the variance estimator. The subsequent section describes estimation of degrees of freedom. Finally, it describes plausible value generation. Each section starts by describing the univariate case and then describes the case with JJ potentially correlated constructs or subscales. Additionally, while the introduction has not mentioned weights, they are incorporated in the subsequent sections.

Parameter Estimation

Starting with the single construct MML model for test data for nn individuals, conditional on a set of parameters for a set of HH test items, the likelihood of a regression equation is $$\begin{align} \mathcal{L} (\bm{\beta}, \sigma|\bm{w}, \bm{R}, \bm{X}, \bm{P}) = \prod_{i=1}^N \left[ \int \frac{1}{\sigma \sqrt{2\pi}} \exp \frac{-(\theta_i - \bm{X}_i \beta)^2}{2\sigma^2} \, \prod_{h=1}^H {\rm Pr}(\bm{R}_{ih}|\theta_i,\bm{P}_h) d\theta_i \right]^{\bm{w}_i} \end{align}$$ where \mathcal{L} is the likelihood of the regression parameters 𝛃\bm{\beta} with full sample weights 𝐰i\bm{w}_i conditional on item score matrix 𝐑\bm{R}, student covariate matrix 𝐗\bm{X}, and item parameter data 𝐏\bm{P}; σ2\sigma^2 is the variance of the regression residual; θi\theta_i is the iith student’s latent ability measure that is being integrated out; ${\rm Pr}(\bm{R}_{ih}|\theta_i, \bm{P}_h)$ is the probability of individual ii’s score on test item hh, conditional on the student’s ability and item parameters 𝐏h\bm{P}_h—see the appendix for example forms of ${\rm Pr}(\bm{R}_{ih}|\theta_i, \bm{P}_h)$.

The integral is evaluated using the trapezoid rule at quadrature points tqt_q and quadrature weights δ\delta so that $$\begin{align} \mathcal{L} (\bm{\beta}, \sigma|\bm{w}, \bm{R}, \bm{X}, \bm{P}) &= \prod_{i=1}^N \left[ \sum_{q=1}^Q \delta \frac{1}{\sigma \sqrt{2\pi}} \exp \frac{-(t_q - \bm{X}_i \bm{\beta})^2}{2\sigma^2} \prod_{h=1}^H {\rm Pr}(\bm{R}_{ih}|t_q, \bm{P}_h)\right]^{\bm{w}_i} \end{align}$$ where δ\delta is the distance between any two uniformly spaced quadrature points so that δ=tq+1tq\delta = t_{q+1} - t_{q} for any qq that is at least one and less than QQ. The range and value of QQ parameterize the quadrature, and its accuracy and should be varied to ensure convergence. The advantage of the trapezoidal rule is that the fixed quadrature points allow the values of the probability (the portion inside the product) to be calculated once per student.

The log-likelihood is given by $$\begin{align} \ell (\bm{\beta}, \sigma|\bm{w}, \bm{R}, \bm{X}, \bm{P}) &= \sum_{i=1}^N \bm{w}_i \, {\rm log} \left[\delta \sum_{q=1}^Q \frac{1}{\sigma \sqrt{2\pi}} \exp \frac{-(t_q - \bm{X}_i \bm{\beta})^2}{2\sigma^2} \prod_{j=1}^K {\rm Pr}(\bm{R}_{ij}|t_q, \bm{P}_j) \right] \end{align}$$ Note that δ\delta can be removed for optimization, and its presence adds ${\rm log}(\delta) \sum \bm{w}_i$ to the log-likelihood.

Composite Score Estimation

When the outcome of interest is composite scores, the parameters are estimated by separately estimating the coefficients for each subscale (𝛃j\bm{\beta}_j for subscale jj) and then calculating the composite scores (𝛃c\bm{\beta}_c) using subscale weights (ωj\omega_j). 𝛃c=j=1Jωj𝛃j\begin{align} \bm{\beta}_{c} &= \sum_{j=1}^J \omega_j \bm{\beta}_j \label{eq:composite} \end{align}

The full covariance matrix for the residuals (𝛜\bm{\epsilon} vector) is then 𝚺=[σ12σ12σ1Jσ12σ22σ2Jσ1Jσ2JσJ2]\begin{align} \bm{\Sigma} = \left[ \begin{array}{cccc} \sigma_1^2 & \sigma_{12} & \cdots & \sigma_{1J} \\ \sigma_{12} & \sigma_{2}^2 & \cdots & \sigma_{2J} \\ \vdots & \vdots & \ddots & \vdots \\ \sigma_{1J} & \sigma_{2J} & \cdots & \sigma_{J}^2 \end{array} \right] \end{align}

The likelihood is then $$\begin{align} \ell \left( \sigma_{jj^\prime} \left| \bm{\beta}_j, \bm{\beta}_{j^\prime} , \sigma_j, \sigma_{j^\prime}; \bm{w}, \bm{R}, \bm{X}, \bm{P}\right. \right) &= \sum_{n=1}^N \bm{w}_n \, {\rm log} \left\{ \int \int \frac{1}{2\pi \sqrt{\left |{\bm{\Sigma}}_{(jj^\prime)} \right |}}\exp \left( \hat{\bm{\epsilon}}^T_{jj^\prime} {\bm{\Sigma}}_{(jj^\prime)}^{-1} \hat{\bm{\epsilon}}_{jj^\prime} \right) \right. \\ &\mathrel{\phantom{=}} \left. \times \left[ \prod_{h=1}^{H_j} {\rm Pr}(\bm{R}_{njh}|\theta_j, \bm{P}_{h^\prime}) \right] \left[ \prod_{h^\prime=1}^{H_{j^\prime}} {\rm Pr}(\bm{R}_{nj^\prime h^\prime}|\theta_{j^\prime}, \bm{P}_{h^\prime}) \right] \right\} d\theta_j d\theta_{j^\prime} \label{eq:compcovlnl} \end{align}$$ where |𝚺(jj)||\bm{\Sigma}_{(jj^\prime)}| is the determinant of 𝚺(jj)\bm{\Sigma}_{(jj^\prime)}, 𝚺̃(jj)[σj2σjjσjjσj2]\begin{align} \tilde{\bm{\Sigma}}_{(jj^\prime)} \equiv \left[ \begin{array}{cc} \sigma_j^2 & \sigma_{jj^\prime} \\ \sigma_{jj^\prime} & \sigma_{j^\prime}^2 \end{array} \right] \end{align} and the residual term is defined as 𝛜̂jj(θj𝐗i𝛃jθj𝐗i𝛃j)\begin{align} \hat{\bm{\epsilon}}_{jj^\prime} \equiv & \left( \begin{array}{c} \theta_j - \bm{X}_i \bm{\beta}_j \\ \theta_{j^\prime} - \bm{X}_i \bm{\beta}_{j^\prime} \end{array} \right) \end{align} Notice that the parameters 𝛃j\bm{\beta}_j, 𝛃j\bm{\beta}_{j^\prime}, σj2\sigma_j^2, and σj2\sigma^2_{j^\prime} are taken from the subscale estimation, so the only parameter not fixed is the covariance term σij\sigma_{ij}.

Variance Estimation

Starting with the univariate case, estimating variance of the parameters 𝛃\bm{\beta} can be done in one of several ways.

The inverse Hessian matrix is a consistent estimator when the estimator of 𝛃\bm{\beta} is consistent (Green, 2003, p. 520): $$\begin{align} {\rm Var}(\bm{\beta}) = -\bm{H}(\bm{\beta})^{-1} = - \left[\frac{\partial^2 \ell(\bm{\beta}, \sigma|\bm{w}, \bm{R}, \bm{X})}{\partial \bm{\beta}^2} \right]^{-1} \label{eq:vbeta} \end{align}$$ This variance is returned when the variance method is set to consistent or left as the default.

A class of variance estimators typically called “sandwich” or “robust” variance estimators allow for variation in the residual and are of the form $$\begin{align} {\rm Var}(\bm{\beta}) = H(\bm{\beta})^{-1} \bm{V} H(\bm{\beta})^{-1} \label{eq:sandwich} \end{align}$$ where VV is an estimate of the variance of the summed score function (Binder, 1983).

For a convenience sample, we provide two robust estimators. First, the so-called robust (Huber or Huber-White) variance estimator uses 𝐕=i=1N[(β,σ|𝐰i,𝐑i,𝐗i)β][(β,σ|𝐰i,𝐑i,𝐗i)β]\begin{align} \bm{V} &= \sum_{i=1}^N \left[ \frac{\partial \ell(\beta, \sigma|\bm{w}_i, \bm{R}_i, \bm{X}_i)}{\partial \beta} \right] \left[ \frac{\partial \ell(\beta, \sigma|\bm{w}_i, \bm{R}_i, \bm{X}_i)}{\partial \beta} \right]^{'} \label{eq:scorebased} \end{align}

Second, for the cluster robust case, the partial derivatives are summed within the cluster so that 𝐕=c=1n[(β,σ|𝐰c,𝐑c,𝐗c)β][(β,σ|𝐰c,𝐑c,𝐗c)β]\begin{align} \bm{V} &= \sum_{c=1}^{n^\prime} \left[ \frac{\partial \ell(\beta, \sigma|\bm{w}_c, \bm{R}_c, \bm{X}_c)}{\partial \beta} \right] \left[ \frac{\partial \ell(\beta, \sigma|\bm{w}_c, \bm{R}_c, \bm{X}_c)}{\partial \beta} \right]^{'} \end{align} where there are nn^\prime clusters, indexed by cc, and the partial derivatives are summed within the group of which there are ncn_c members: (β,σ|𝐰c,𝐑c,𝐗c)β=i=1nc(β,σ|𝐰i,𝐑i,𝐗i)β\begin{align} \frac{\partial \ell(\beta, \sigma|\bm{w}_c, \bm{R}_c, \bm{X}_c)}{\partial \beta} &= \sum_{i=1}^{n_c} \frac{\partial \ell(\beta, \sigma|\bm{w}_i, \bm{R}_i, \bm{X}_i)}{\partial \beta} \end{align}

Finally, Dire impelments the survey sampling method called the Taylor series method and uses the same formula as Eq. , but 𝐕\bm{V} is the estimate of the variance of the score vector (Binder, 1983). Our implementation assumes a two-stage design with nan_a primary sampling units (PSUs) in stratum aa and summed across the AA strata according to 𝐕=a=1A𝐕a\begin{align} \bm{V} &= \sum_{a=1}^A \bm{V}_a \end{align} where 𝐕a\bm{V}_a is a variance estimate for stratum aa and is defined by 𝐕a=nana1p=1na(𝐬p𝐬a)(𝐬p𝐬a)\begin{align} \bm{V}_a &= \frac{n_a}{n_a -1} \sum_{p=1}^{n_a} \left( \bm{s}_p - \bar{\bm{s}}_a \right)\left( \bm{s}_p - \bar{\bm{s}}_a \right)' \label{eq:Va} \end{align} where sps_p is the sum of the weighted (or pseudo-) score vector that includes all units in PSU pp in stratum aa and 𝐬a\bar{\bm{s}}_a is the (unweighted) mean of the 𝐬p\bm{s}_p terms in stratum aa so that $$\begin{align} s_p &=\sum_{i \in {\rm PSU} \ p}\frac{\partial \ell(\beta, \sigma|\bm{w}_i, \bm{R}_i, \bm{X}_i)}{\partial \beta} & \bar{\bm{s}}_a&= \frac{1}{n_a} \sum_{p \in {\rm stratum} \ a} s_p \end{align}$$

When a stratum has only one PSU, 𝐕a\bm{V}_a is undefined. The best approach is for the analyst to adjust the strata and PSU identifiers, in a manner consistent with the sampling approach, to avoid singleton strata. Two simpler, automated, but less defensible options are available in Dire. First, the strata with single PSUs can be dropped from the variance estimation, yielding an underestimate of the variance.

The second option is for the singleton stratum to use the overall mean of sps_p in place of sa\bar{s}_a. So, 𝐬=1nsp\begin{align} \bar{\bm{s}} &= \frac{1}{n^\prime} \sum s_p \end{align} where the sum is across all PSUs, and nn^\prime is the number of PSUs across all strata. Then, for each singleton stratum, Eq.  becomes 𝐕a=2(𝐬p𝐬)(𝐬p𝐬)\begin{align} \bm{V}_a &= 2 \left( \bm{s}_p - \bar{\bm{s}} \right)\left( \bm{s}_p - \bar{\bm{s}} \right)' \label{eq:Va1} \end{align} where the value 2 is used in place of nana1\frac{n_a}{n_a-1}, which is undefined when na=1n_a=1. This option can underestimate the variance but is thought to more likely overestimate it.

While not advisable, it is possible to assume information equality where the hessian (eq. ) and the score vector based covariance (eq. ) are equal. When a user sets gradientHessian=TRUE the Dire package uses this short hand in calculating any of the above results. Using this option is necessary to get agreement with the AM software on variance terms.

Univariate degrees of freedom

For the Taylor series estimator, the degrees of freedom estimator also uses the Welch-Satterthwaite (WS) degrees of freedom estimate (dofWSdof_{WS}). The WS weights require an estimate of the degrees of variance per independent group. For a clustered sample, that is available at the stratum.

Following Binder (1983) and Cohen (2002), the contribution csc_s to the degrees of freedom from stratum ss is defined as 𝐳uj\bm{z}_{uj} from the section, “Estimation of Standard Errors of Weighted Means When Plausible Values Are Not Present, Using the Taylor Series Method,” cs=wsnsns1u=1ns𝐳uj𝐳ujT\begin{align} c_s = w_s \frac{n_s}{n_s-1} \sum_{u=1}^{n_s} \bm{z}_{uj} \bm{z}_{uj}^T \end{align} where uu indexes the PSUs in the stratum, of which there are nsn_s, and wsw_s is the stratum weight, or the sum, in that stratum, of all the unit’s full sample weights. Using the csc_s values, the degrees of freedom is $$\begin{align} {\rm dof}_{\rm WS} = \frac{\left(\sum c_s \right)^2}{\sum c_s^2} \end{align}$$ which uses the formula of Satterthwaite (1946), assuming one degree of freedom per stratum.

Composite Score Variances

In the case of composite scores the variance becomes more complex and only one method is supported, Taylor series.

The log-likelihood of composite scores is additively separable, the covariances (including the variances) can be calculated in two steps using Eq. . First, the covariance matrix of 𝛏\bm{\xi} is formed, and then the composite covariance terms are estimated as the variance of a linear combination of the elements of 𝛏\bm{\xi}.

In the first step, any of the methods in the section “Variance Estimation” are applied to Eq. , treating 𝛏\bm{\xi} in the same fashion Eq.  treats 𝛃\bm{\beta}. This step results in a block diagonal inverse Hessian matrix, with a block for each subscale, and a potentially dense matrix for 𝐕\bm{V}. Each matrix is square and has S(ζ+1)S \cdot (\zeta+1) rows and columns, where ζ\zeta is the number of elements in the regression formula (each subscale), to which one is added for the σ\sigma terms.

This step results in the following matrix: $$\begin{align} {\rm Var}(\bm{\xi})=H(\bm{\xi})^{-1} \bm{V} H(\bm{\xi})^{-1} \end{align}$$

For the second step, the composite coefficient then has an iith variance term of $$\begin{align} {\rm Var}(\bm{\xi}_{ci}) &= \bm{e}_i H(\bm{\xi})^{-1} \bm{V} H(\bm{\xi})^{-1} \bm{e}_i \end{align}$$ where 𝛏ci\bm{\xi}_{ci} is the composite coefficient for the iith coefficient, and 𝐞i\bm{e}_{i} is the vector of weights arranged such that 𝛏ci=𝐞iT𝛏\begin{align} \bm{\xi}_{ci} = \bm{e}_{i}^T \bm{\xi} \end{align} The covariance between two terms, ii and jj, is a simple extension $$\begin{align} {\rm Cov}(\bm{\beta}_{ci}, \bm{\beta}_{cj}) &= \bm{e}_i H(\bm{\beta})^{-1} \bm{V} H(\bm{\beta})^{-1} \bm{e}_j \end{align}$$ which uses the definition, 𝛏cj=𝐞jT𝛏\begin{align} \bm{\xi}_{cj} &= \bm{e}_{j}^T \bm{\xi} \end{align}

A simple example may help clarify. Imagine a composite score composed of two subscales, 1 and 2, with weights ω1=0.4\omega_1 = 0.4 and ω2=0.6\omega_2=0.6. Supposed a user is interested in a regression of the form θ=a+x1b+ϵϵN(0,σ)\begin{align} \theta = a + x_1 \cdot b + \epsilon \label{eq:compex} \\ \epsilon \sim N(0,\sigma) \end{align} Then the regression in Eq.  would be fit once for subscale 1 and once for subscale 2; the first fit would yield estimated values {a1,b1,σ1}\left\{ a_1, b_1, \sigma_1 \right\}, and the second fit would yield {a2,b2,σ2}\left\{ a_2, b_2, \sigma_2 \right\}. The estimated value, for example, aca_c, would be ac=0.4α1+0.6α2a_c = 0.4\cdot\alpha_1 + 0.6\cdot\alpha_2. By stacking the estimates together, 𝛉=[a1b1σ1a2b2σ2]\begin{align} \bm{\theta} &= \begin{bmatrix} a_1 \\ b_1 \\ \sigma_1\\ a_2 \\ b_2\\ \sigma_2 \end{bmatrix} \end{align} the covariance matrix can then be estimated and will result in a matrix $\bm{\Omega} \equiv {\rm Var}(\bm{\beta})$ from Eq.  that has six rows and six columns. Using the vector 𝐞1=[0.4000.600]\begin{align} \bm{e}_1 &= \begin{bmatrix} 0.4 \\ 0 \\ 0\\ 0.6 \\ 0\\ 0 \end{bmatrix} \end{align} it can easily be confirmed that ac=𝐞1T𝛏a_c = \bm{e}_1^T \bm{\xi}, so ${\rm Var}(a_c)= \bm{e}_1^T \bm{\Omega} \bm{e}_1$.

Composite degrees of freedom

References

Binder, D. A. (1983). On the variances of asymptotically normal estimators from complex surveys. International Statistical Review, 51(3), 279–292.

Black, P. E. (2019). Big-O notation. In P. E. Black (Ed.), Dictionary of algorithms and data structures. Washington, DC: National Institute of Standards and Technology. Retrieved from https://www.nist.gov/dads/HTML/bigOnotation.html

Cohen, J. D., & Jiang, T. (1999). Comparison of partially measured latent traits across nominal subgroups. Journal of the American Statistical Association, 94(448), 1035–1044.

Green, W. H. (2003). Econometric analysis Upper Saddle River, NJ: Prentice Hall.

Huber, P. J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. Proceedings of the Fifth Berkeley Symposium of Mathematical Statistics and Probability, Vol. I: Statistics (pp. 221–233). Berkeley, CA: University of California Press.

Johnson, S. G. (2010). Notes on the convergence of trapezoidal-rule quadrature. Retrieved from https://math.mit.edu/~stevenj/trapezoidal.pdf

McCullagh, P. & Nelder, J. A. (1989). Generalized linear models. (2nd ed.). London, UK: Chapman & Hall/CRC.

NAEP. (2008). The generalized partial credit model [NAEP Technical Documentation Website]. Retrieved from https://nces.ed.gov/nationsreportcard/tdw/analysis/scaling_models_gen.aspx.

White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica, 48(4), 817–838.

Appendix. Test Probability Density Functions

For all cases scored as either correct or incorrect, we use the three parameter logit (3PL) model: Pr(𝐑ij|θi,𝐏j)=gj+1gj1+exp[Daj(θidj)]\begin{align} \Pr(\bm{R}_{ij}|\theta_i, \bm{P}_j) &= g_j + \frac{1-g_j}{1+\exp\left[ -D \, a_j \, (\theta_i - d_j)\right]} \label{eq:3PL} \end{align} where gjg_j is the guessing parameter, aja_j is the discrimination factor, djd_j is the item difficulty, and DD is a constant, usually set to 1.7, to map the θi\theta_i and djd_j terms to a probit-like space; this term is applied by tradition.

When a two parameter logit (2PL) is used, Eq.  is modified to omit gjg_j (effectively setting it to zero): Pr(𝐑ij|θi,𝐏j)=11+exp[Daj(θidj)]\begin{align} \Pr(\bm{R}_{ij}|\theta_i, \bm{P}_j) &= \frac{1}{1+\exp\left[ -D \, a_j \, (\theta_i - d_j)\right]} \label{eq:2PL} \end{align}

When a Rasch model is used, Eq.  is further modified to set all aja_j to a single aa, and DD is set to one. Pr(𝐑ij|θi,𝐏j)=11+exp[a(θidj)]\begin{align} \Pr(\bm{R}_{ij}|\theta_i, \bm{P}_j) &= \frac{1}{1+\exp\left[ - a \, (\theta_i - d_j)\right]} \label{eq:Rasch} \end{align}

The Graded Response Model (GRM) has a probability density that generalizes an ordered logit (McCullagh & Nelder, 1989): Pr(𝐑ij|θi,𝐏j)=11+exp[Daj(θidRij,j)]11+exp[Daj(θid1+Rij,j)]\begin{align} \Pr(\bm{R}_{ij}|\theta_i, \bm{P}_j) &= \frac{1}{1+\exp\left[-D\, a_j \, (\theta_i - d_{R_{ij},j})\right]} - \frac{1}{1+\exp\left[-D\, a_j \, (\theta_i - d_{1+R_{ij},j})\right]} \label{eq:grm} \end{align} Here the parameters 𝐏j\bm{P}_j are the cut points dcjd_{cj}, where d0j=d_{0j}=-\infty and dC+1,j=d_{C+1,j}=\infty. In the first term on the right side of Eq. , the subscript RijR_{ij} on dRij,jd_{R_{ij},j} indicates it is the cut point associated with the response level to item jj for person ii, whereas the last subscript (jj) indicates that it is the dd term for item jj. In the second term, the cut point above that cut point is used.

The Generalized Partial Credit Model (GPCM) has a probability density that generalizes a multinomial logit (McCullagh & Nelder, 1989) Pr(𝐑ij|θi,𝐏j)=exp[c=0RijDaj(θidcj)]r=0Cexp[c=0rDaj(θidcj)]\begin{align} \Pr(\bm{R}_{ij}|\theta_i, \bm{P}_j) &= \frac{\exp \left[ \sum_{c=0}^{R_{ij}} D a_j (\theta_i - d_{cj}) \right]}{\sum_{r=0}^{C} \exp \left[ \sum_{c=0}^{r} D a_j (\theta_i - d_{cj}) \right]} \end{align} where cc indexes cut points, of which there are CC, and jj indexes the item.

The GPCM equation has an indeterminancy because all djd_j terms could increase and make the values of the probability the same. We can solve the indeterminacy in several ways.

NAEP (2008) uses a mean difficulty (bjb_j), and the djd_j values are then given by d0j=0dcj=bjδjc;1cC\begin{align} d_{0j} &= 0 & d_{cj} &= b_j - \delta_{jc} \, ; \, 1 \leq c \leq C \end{align} where the δjc\delta_{jc} values are estimated so that 0=c=1Cδjc0=\sum_{c=1}^{C} \delta_{jc}. In this package, when the polyParamTab has an itemLocation, it serves as b. When there is no itemLocation, the package uses the δ\delta values directly d0j=0dcj=δjc;1cC\begin{align} d_{0j} &= 0 & d_{cj} &= \delta_{jc} \, ; \, 1 \leq c \leq C \end{align}

When a Partial Credit Model (PCM) is used, and the value of DD is set to one, whereas aja_j is again shared across all items. So Pr(𝐑ij|θi,𝐏j)=exp[c=0Rija(θidcj)]r=0Cexp[c=0ra(θidcj)]\begin{align} \Pr(\bm{R}_{ij}|\theta_i, \bm{P}_j) &= \frac{\exp \left[ \sum_{c=0}^{R_{ij}} a (\theta_i - d_{cj}) \right]}{\sum_{r=0}^{C} \exp \left[ \sum_{c=0}^{r} a (\theta_i - d_{cj}) \right]} \end{align}