Survey Weighted Mixed-Effects Models

Implements a survey weighted mixed-effects model using the provided formula.

mix(
  formula,
  data,
  weights,
  cWeights = FALSE,
  center_group = NULL,
  center_grand = NULL,
  max_iteration = 10,
  nQuad = 13L,
  run = TRUE,
  verbose = FALSE,
  acc0 = 120,
  keepAdapting = FALSE,
  start = NULL,
  fast = FALSE,
  family = NULL
)

Arguments

formula: a formula object in the style of lme4 that creates the model.
data: a data frame containing the raw data for the model.
weights: a character vector of names of weight variables found in the data frame starts with units (level 1) and increasing (larger groups).
cWeights: logical, set to TRUE to use conditional weights. Otherwise, mix expects unconditional weights.
center_group: a list where the name of each element is the name of the aggregation level, and the element is a formula of variable names to be group mean centered; for example to group mean center gender and age within the group student: list("student"= ~gender+age), default value of NULL does not perform any group mean centering.
center_grand: a formula of variable names to be grand mean centered, for example to center the variable education by overall mean of education: ~education. Default is NULL which does no centering.
max_iteration: a optional integer, for non-linear models fit by adaptive quadrature which limits number of iterations allowed before quitting. Defaults to 10. This is used because if the likelihood surface is flat, models may run for a very long time without converging.
nQuad: an optional integer number of quadrature points to evaluate models solved by adaptive quadrature. Only non-linear models are evaluated with adaptive quadrature. See notes for additional guidelines.
run: logical; TRUE runs the model while FALSE provides partial output for debugging or testing. Only applies to non-linear models evaluated by adaptive quadrature.
verbose: logical, default FALSE; set to TRUE to print results of intermediate steps of adaptive quadrature. Only applies to non-linear models.
acc0: deprecated; ignored.
keepAdapting: logical, set to TRUE when the adaptive quadrature should adapt after every Newton step. Defaults to FALSE. FALSE should be used for faster (but less accurate) results. Only applies to non-linear models.
start: optional numeric vector representing the point at which the model should start optimization; takes the shape of c(coef, vars) from results (see help).
fast: logical; deprecated
family: the family; optionally used to specify generalized linear mixed models. Currently only binomial() and poisson() are supported.

Value

object of class WeMixResults. This is a list with elements:

lnlf: function, the likelihood function.
lnl: numeric, the log-likelihood of the model.
coef: numeric vector, the estimated coefficients of the model.
ranefs: the group-level random effects.
SE: the cluste robust (CR-0) standard errors of the fixed effects.
vars: numeric vector, the random effect variances.
theta: the theta vector.
call: the original call used.
levels: integer, the number of levels in the model.
ICC: numeric, the intraclass correlation coefficient.
CMODE: the conditional mean of the random effects.
invHessian: inverse of the second derivative of the likelihood function.
ICC: the interclass correlation.
is_adaptive: logical, indicates if adaptive quadrature was used for estimation.
sigma: the sigma value.
ngroups: the number of observations in each group.
varDF: the variance data frame in the format of the variance data frame returned by lme4.
varVC: the variance-covariance matrix of the random effects.
cov_mat: the variance-covariance matrix of the fixed effects.
var_theta: the variance covariance matrix of the theta terms.
wgtStats: statistics regarding weights, by level.
ranefMat: list of matrixes; each list element is a matrix of random effects by level with IDs in the rows and random effects in the columns.

Details

Linear models are solved using a modification of the analytic solution developed by Bates and Pinheiro (1998). Non-linear models are solved using adaptive quadrature following the methods in STATA's GLAMMM (Rabe-Hesketh & Skrondal, 2006) and Pineiro and Chao (2006). The posterior modes used in adaptive quadrature are determined following the method in lme4pureR (Walker & Bates, 2015). For additional details, see the vignettes Weighted Mixed Models: Adaptive Quadrature and Weighted Mixed Models: Analytical Solution which provide extensive examples as well as a description of the mathematical basis of the estimation procedure and comparisons to model specifications in other common software. Notes:

Standard errors of random effect variances are robust; see vignette for details.
To see the function that is maximized in the estimation of this model, see the section on "Model Fitting" in the Introduction to Mixed Effect Models With WeMix vignette.
When all weights above the individual level are 1, this is similar to a lmer and you should use lme4 because it is much faster.
If starting coefficients are not provided they are estimated using lme4.
For non-linear models, when the variance of a random effect is very low (<.1), WeMix doesn't estimate it, because very low variances create problems with numerical evaluation. In these cases, consider estimating without that random effect.
The model is estimated by maximum likelihood estimation.
Non-linear models may have up to 3 nested levels.
To choose the number of quadrature points for non-linear model evaluation, a balance is needed between accuracy and speed; estimation time increases quadratically with the number of points chosen. In addition, an odd number of points is traditionally used. We recommend starting at 13 and increasing or decreasing as needed.

Author

Paul Bailey, Blue Webb, Claire Kelley, and Trang Nguyen

Examples

if (FALSE) { # \dontrun{
library(lme4)

data(sleepstudy)
ss1 <- sleepstudy

# Create weights
ss1$W1 <- ifelse(ss1$Subject %in% c(308, 309, 310), 2, 1)
ss1$W2 <- 1

# Run random-intercept 2-level model 
two_level <- mix(Reaction ~ Days + (1|Subject), data=ss1, weights=c("W1", "W2"))

#Run random-intercept 2-level model with group-mean centering
grp_centered <- mix(Reaction ~ Days + (1|Subject), data=ss1,
                    weights = c("W1", "W2"),
                    center_group = list("Subject" = ~Days))

#Run three level model with random slope and intercept. 
#add group variables for 3 level model 
ss1$Group <- 3
ss1$Group <- ifelse(as.numeric(ss1$Subject) %% 10 < 7, 2, ss1$Group)
ss1$Group <- ifelse(as.numeric(ss1$Subject) %% 10 < 4, 1, ss1$Group)
# level-3 weights
ss1$W3 <- ifelse(ss1$Group == 2, 2, 1)

three_level <- mix(Reaction ~ Days + (1|Subject) + (1+Days|Group), data=ss1, 
                   weights=c("W1", "W2", "W3"))

# Conditional Weights
# use vignette example
library(EdSurvey)

#read in data 
downloadPISA("~/", year=2012)
cntl <- readPISA("~/PISA/2012", countries="USA")
data <- getData(cntl,c("schoolid","pv1math","st29q03","sc14q02","st04q01",
                       "escs","w_fschwt","w_fstuwt"), 
                omittedLevels=FALSE, addAttributes=FALSE)

# Remove NA and omitted Levels
om <- c("Invalid", "N/A", "Missing", "Miss", NA, "(Missing)")
for (i in 1:ncol(data)) {
  data <- data[!data[,i] %in% om,]
}

#relevel factors for model 
data$st29q03 <- relevel(data$st29q03, ref="Strongly agree")
data$sc14q02 <- relevel(data$sc14q02, ref="Not at all")

# run with unconditional weights
m1u <- mix(pv1math ~ st29q03 + sc14q02 +st04q01+escs+ (1|schoolid), data=data, 
           weights=c("w_fstuwt", "w_fschwt"))
summary(m1u)

# conditional weights
data$pwt2 <- data$w_fschwt
data$pwt1 <- data$w_fstuwt / data$w_fschwt

# run with conditional weights
m1c <- mix(pv1math ~ st29q03 + sc14q02 +st04q01+escs+ (1|schoolid), data=data, 
            weights=c("pwt1", "pwt2"), cWeights=TRUE)
summary(m1c)
# the results are, up to rounding, the same in m1u and m1c, only the calls are different

} # }