Covariance Estimation — varEstToCov • EdSurvey

When the variance of a derived statistic (e.g., a difference) is required, the covariance between the two statistics must be calculated. This function uses results generated by various functions (e.g., a lm.sdf) to find the covariance between two statistics.

Usage

varEstToCov(
  varEstA,
  varEstB = varEstA,
  varA,
  varB = varA,
  jkSumMultiplier,
  returnComponents = FALSE
)

Arguments

varEstA: a list of two data.frames returned by a function after the returnVarEstInputs argument was turned on. The statistic named in the varA argument must be present in each data.frame.
varEstB: a list of two data.frames returned by a function after the returnVarEstInputs argument was turned on. The statistic named in the varA argument must be present in each data.frame. When the same as varEstA, the covariance is within one result.
varA: a character that names the statistic in the varEstA argument for which a covariance is required
varB: a character that names the statistic in the varEstB argument for which a covariance is required
jkSumMultiplier: when the jackknife variance estimation method—or balanced repeated replication (BRR) method—multiplies the final jackknife variance estimate by a value, set jkSumMultiplier to that value. For an edsurvey.data.frame or a light.edsurvey.data.frame, the recommended value can be recovered with EdSurvey::getAttributes(myData, "jkSumMultiplier").
returnComponents: set to TRUE to return the imputation variance seperate from the sampling variance

Value

a numeric value; the jackknife covariance estimate. If returnComponents is TRUE, returns a vector of length three, V is the variance estimate, Vsamp is the sampling component of the variance, and Vimp is the imputation component of the variance

Details

These functions are not vectorized, so varA and varB must contain exactly one variable name.

The method used to compute the covariance is in the vignette titled Statistical Methods Used in EdSurvey

The method used to compute the degrees of freedom is in the vignette titled Statistical Methods Used in EdSurvey in the section “Estimation of Degrees of Freedom.”

Author

Paul Bailey

Examples

if (FALSE) { # \dontrun{
# read in the example data (generated, not real student data)
sdf <- readNAEP(path=system.file("extdata/data", "M36NT2PM.dat", package = "NAEPprimer"))

# estimate a regression
lm1 <- lm.sdf(formula=composite ~ dsex + b017451, data=sdf, returnVarEstInputs=TRUE)
summary(lm1)
# estimate the covariance between two regression coefficients
# note that the variable names are parallel to what they are called in lm1 output
jkSumMultiplier <- EdSurvey:::getAttributes(data=sdf, attribute="jkSumMultiplier")
covFEveryDay <- varEstToCov(varEstA=lm1$varEstInputs,
                            varA="dsexFemale",
                            varB="b017451Every day",
                            jkSumMultiplier=jkSumMultiplier)
# the estimated difference between the two coefficients
# note: unname prevents output from being named after the first coefficient
unname(coef(lm1)["dsexFemale"] - coef(lm1)["b017451Every day"])
# the standard error of the difference
# uses the formula SE(A-B) = sqrt(var(A) + var(B) - 2*cov(A,B))
sqrt(lm1$coefmat["dsexFemale", "se"]^2
     + lm1$coefmat["b017451Every day", "se"]^2
     - 2 * covFEveryDay)
} # }