Fits a logit or probit that
uses weights and variance estimates
appropriate for the edsurvey.data.frame
,
the light.edsurvey.data.frame
, or the edsurvey.data.frame.list
.
Usage
glm.sdf(formula, family = binomial(link = "logit"), data,
weightVar = NULL, relevels = list(),
varMethod=c("jackknife", "Taylor"), jrrIMax = 1,
dropOmittedLevels = TRUE, defaultConditions = TRUE, recode = NULL,
returnNumberOfPSU=FALSE, returnVarEstInputs = FALSE,
omittedLevels = deprecated())
logit.sdf(
formula,
data,
weightVar = NULL,
relevels = list(),
varMethod = c("jackknife", "Taylor"),
jrrIMax = 1,
dropOmittedLevels = TRUE,
defaultConditions = TRUE,
recode = NULL,
returnNumberOfPSU = FALSE,
returnVarEstInputs = FALSE,
omittedLevels = deprecated()
)
probit.sdf(
formula,
data,
weightVar = NULL,
relevels = list(),
varMethod = c("jackknife", "Taylor"),
jrrIMax = 1,
dropOmittedLevels = TRUE,
defaultConditions = TRUE,
recode = NULL,
returnNumberOfPSU = FALSE,
returnVarEstInputs = FALSE,
omittedLevels = deprecated()
)
Arguments
- formula
a
formula
for the linear model. Seeglm
. For logit and probit, we recommend using theI()
function to define the level used for success. (See Examples.)- family
the
glm.sdf
function currently fits only the binomial outcome models, such as logit and probit, although other link functions are available for binomial models. See thelink
argument in the help forfamily
.- data
an
edsurvey.data.frame
- weightVar
character indicating the weight variable to use (see Details). The
weightVar
must be one of the weights for theedsurvey.data.frame
. IfNULL
, uses the default for theedsurvey.data.frame
.- relevels
a list; used to change the contrasts from the default treatment contrasts to the treatment contrasts with a chosen omitted group. The name of each element should be the variable name, and the value should be the group to be omitted.
- varMethod
a character set to “jackknife” or “Taylor” that indicates the variance estimation method to be used. See Details.
- jrrIMax
the
Vjrr
sampling variance term (see Statistical Methods Used in EdSurvey) can be estimated with any positive number of plausible values and is estimated on the lower of the number of available plausible values andjrrIMax
. WhenjrrIMax
is set toInf
, all plausible values will be used. Higher values ofjrrIMax
lead to longer computing times and more accurate variance estimates.- dropOmittedLevels
a logical value. When set to the default value of
TRUE
, drops those levels of all factor variables that are specified inedsurvey.data.frame
. Useprint
on anedsurvey.data.frame
to see the omitted levels.- defaultConditions
a logical value. When set to the default value of
TRUE
, uses the default conditions stored in anedsurvey.data.frame
to subset the data. Useprint
on anedsurvey.data.frame
to see the default conditions.- recode
a list of lists to recode variables. Defaults to
NULL
. Can be set asrecode=
list(
var1=
list(from=
c("a",
"b",
"c"),
to=
"d"))
.- returnNumberOfPSU
a logical value set to
TRUE
to return the number of primary sampling units (PSUs)- returnVarEstInputs
a logical value set to
TRUE
to return the inputs to the jackknife and imputation variance estimates, which allow for the computation of covariances between estimates.- omittedLevels
this argument is deprecated. Use
dropOmittedLevels
Value
An edsurveyGlm
with the following elements:
- call
the function call
- formula
the formula used to fit the model
- coef
the estimates of the coefficients
- se
the standard error estimates of the coefficients
- Vimp
the estimated variance caused by uncertainty in the scores (plausible value variables)
- Vjrr
the estimated variance from sampling
- M
the number of plausible values
- nPSU
the number of PSUs used in the calculation
- varm
the variance estimates under the various plausible values
- coefm
the values of the coefficients under the various plausible values
- coefmat
the coefficient matrix (typically produced by the summary of a model)
- weight
the name of the weight variable
- npv
the number of plausible values
- njk
the number of the jackknife replicates used
- varMethod
always
jackknife
- varEstInputs
when
returnVarEstInputs
isTRUE
, this element is returned. These are used for calculating covariances withvarEstToCov
.
Details
This function implements an estimator that correctly handles left-hand side
variables that are logical, allows for survey sampling weights, and estimates
variances using the jackknife replication or Taylor series.
The vignette titled
Statistical Methods Used in EdSurvey
describes estimation of the reported statistics and how it depends on varMethod
.
The coefficients are estimated using the sample weights according to the section “Estimation of Weighted Means When Plausible Values Are Not Present” or the section “Estimation of Weighted Means When Plausible Values Are Present,” depending on if there are assessment variables or variables with plausible values in them.
How the standard errors of the coefficients are estimated depends on the presence of plausible values (assessment variables), But once it is obtained, the t statistic is given by $$t=\frac{\hat{\beta}}{\sqrt{\mathrm{var}(\hat{\beta})}}$$ where \( \hat{\beta} \) is the estimated coefficient and \(\mathrm{var}(\hat{\beta})\) is its variance of that estimate.
logit.sdf
and probit.sdf
are included for convenience only;
they give the same results as a call to glm.sdf
with the binomial family
and the link function named in the function call (logit or probit).
By default, glm
fits a logistic regression when family
is not set,
so the two are expected to give the same results in that case.
Other types of generalized linear models are not supported.
Variance estimation of coefficients
All variance estimation methods are shown in the vignette titled
Statistical Methods Used in EdSurvey.
When the predicted
value does not have plausible values and varMethod
is set to
jackknife
, the variance of the coefficients
is estimated according to the section
“Estimation of Standard Errors of Weighted Means When
Plausible Values Are Not Present, Using the Jackknife Method.”
When plausible values are present and varMethod
is set to
jackknife
, the
variance of the coefficients is estimated according to the section
“Estimation of Standard Errors of Weighted Means When
Plausible Values Are Present, Using the Jackknife Method.”
When the predicted
value does not have plausible values and varMethod
is set to
Taylor
, the variance of the coefficients
is estimated according to the section
“Estimation of Standard Errors of Weighted Means When
Plausible Values Are Not Present, Using the Taylor Series Method.”
When plausible values are present and varMethod
is set to
Taylor
, the
variance of the coefficients is estimated according to the section
“Estimation of Standard Errors of Weighted Means When
Plausible Values Are Present, Using the Taylor Series Method.”
Testing
Of the common hypothesis tests for joint parameter testing, only the Wald
test is widely used with plausible values and sample weights. As such, it
replaces, if imperfectly, the Akaike Information Criteria (AIC), the
likelihood ratio test, chi-squared, and analysis of variance (ANOVA, including F-tests).
See waldTest
or
the vignette titled
Methods and Overview of Using EdSurvey for Running Wald Tests.
Examples
if (FALSE) { # \dontrun{
# read in the example data (generated, not real student data)
sdf <- readNAEP(path=system.file("extdata/data", "M36NT2PM.dat", package = "NAEPprimer"))
# by default uses the jackknife variance method using replicate weights
table(sdf$b013801)
# create a binary variable for 26 or more books
sdf$b013801_26more <- ifelse(sdf$b013801 %in% c("26-100", ">100"), yes = 1, no = 0)
# compare the multiple categorical and binary variable for accuracy
table(sdf$b013801, sdf$b013801_26more)
logit1 <- logit.sdf(formula=b013801_26more ~ dsex + b017451, data=sdf)
# use summary to get detailed results
summary(logit1)
# Taylor series variance estimation
logit1t <- logit.sdf(formula=b013801_26more ~ dsex + b017451, data=sdf,
varMethod="Taylor")
summary(logit1t)
# when using ifelse for PVs, use the ifelse in the formula call. PVs contains multiple variables
logit2 <- logit.sdf(formula=ifelse(composite >= 300, yes = 1, no = 0) ~ dsex + b013801, data=sdf)
summary(logit2)
# note this recoding of composite must be done in the formula
logit3 <- glm.sdf(formula=I(composite >= 300) ~ dsex + b013801, data=sdf,
family=quasibinomial(link="logit"))
# Wald test for joint hypothesis that all coefficients in b013801 are zero
waldTest(model=logit3, coefficients="b013801")
summary(logit3)
# use plausible values as predictors in a generalized linear regression model
# ifelse function converts the selected categories to 1 and all the others including
# Multiple and Omitted levels to 0
sdf$AlgebraClass <- ifelse(sdf$m815701 %in% c('Algebra I (1-yr crs)',
'1st yr 2-yr Algeb I',
'2nd yr 2-yr Algeb I',
'Algebra II'), 1, 0)
table(sdf$m815701, sdf$AlgebraClass)
logit4 <- logit.sdf(formula = AlgebraClass ~ algebra,
weightVar = 'origwt', data = sdf)
summary(logit4)
# alternatively, same analyses can be executed using the I() function with
# dropOmittedLevels = FALSE
logit5 <- logit.sdf(I(m815701 %in% c('Algebra I (1-yr crs)',
'1st yr 2-yr Algeb I', '2nd yr 2-yr Algeb I',
'Algebra II')) ~ algebra,
weightVar = 'origwt', data = sdf,
dropOmittedLevels = FALSE)
summary(logit5)
} # }