Skip to contents

Calculate the standard deviation of a numeric variable in an edsurvey.data.frame.

Usage

SD(
  data,
  variable,
  weightVar = NULL,
  jrrIMax = 1,
  varMethod = "jackknife",
  dropOmittedLevels = TRUE,
  defaultConditions = TRUE,
  recode = NULL,
  targetLevel = NULL,
  jkSumMultiplier = getAttributes(data, "jkSumMultiplier"),
  returnVarEstInputs = FALSE,
  omittedLevels = deprecated()
)

Arguments

data

an edsurvey.data.frame, an edsurvey.data.frame.list, or a light.edsurvey.data.frame

variable

character vector of variable names

weightVar

character weight variable name. Default is the default weight of data if it exists. If the given survey data do not have a default weight, the function will produce unweighted statistics instead. Can be set to NULL to return unweighted statistics.

jrrIMax

a numeric value; when using the jackknife variance estimation method, the default estimation option, jrrIMax=1, uses the sampling variance from the first plausible value as the component for sampling variance estimation. The Vjrr term (see Statistical Methods Used in EdSurvey) can be estimated with any number of plausible values, and values larger than the number of plausible values on the survey (including Inf) will result in all plausible values being used. Higher values of jrrIMax lead to longer computing times and more accurate variance estimates.

varMethod

deprecated parameter; gap always uses the jackknife variance estimation

dropOmittedLevels

a logical value. When set to TRUE, drops those levels of the specified variable. Use print on an edsurvey.data.frame to see the omitted levels. Defaults to FALSE.

defaultConditions

a logical value. When set to the default value of TRUE, uses the default conditions stored in an edsurvey.data.frame to subset the data. Use print on an edsurvey.data.frame to see the default conditions.

recode

a list of lists to recode variables. Defaults to NULL. Can be set as recode = list(var1 = list(from = c("a","b","c"), to = "d")).

targetLevel

a character string. When specified, calculates the gap in the percentage of students at targetLevel in the variable argument, which is useful for comparing the gap in the percentage of students at a survey response level.

jkSumMultiplier

when the jackknife variance estimation method—or balanced repeated replication (BRR) method—multiplies the final jackknife variance estimate by a value, set jkSumMultiplier to that value. For an edsurvey.data.frame, or a light.edsurvey.data.frame, the recommended value can be recovered with EdSurvey::getAttributes(myData, "jkSumMultiplier").

returnVarEstInputs

a logical value set to TRUE to return the inputs to the jackknife and imputation variance estimates, which allows for the computation of covariances between estimates.

omittedLevels

this argument is deprecated. Use dropOmittedLevels

Value

a list object with elements:

mean

the mean assessment score for variable, calculated according to the vignette titled Statistical Methods Used in EdSurvey

std

the standard deviation of the mean

stdSE

the standard error of the std

df

the degrees of freedom of the std

varEstInputs

the variance estimate inputs used for calculating covariances with varEstToCov. Only returned with returnVarEstInputs is TRUE

Author

Paul Bailey and Huade Huo

Examples

if (FALSE) { # \dontrun{
# read in the example data (generated, not real student data)
sdf <- readNAEP(path=system.file("extdata/data", "M36NT2PM.dat", package="NAEPprimer"))

# get standard deviation for Male's composite score
SD(data = subset(sdf, dsex == "Male"), variable = "composite")

# get several standard deviations

# build an edsurvey.data.frame.list
sdfA <- subset(sdf, scrpsu %in% c(5,45,56))
sdfB <- subset(sdf, scrpsu %in% c(75,76,78))
sdfC <- subset(sdf, scrpsu %in% 100:200)
sdfD <- subset(sdf, scrpsu %in% 201:300)

sdfl <- edsurvey.data.frame.list(datalist=list(sdfA, sdfB, sdfC, sdfD),
                                 labels=c("A locations",
                                          "B locations",
                                          "C locations",
                                          "D locations"))

# this shows how these datasets will be described:
sdfl$covs

# SD results for each survey
SD(data = sdfl, variable = "composite")
# SD results more compactly and with comparisons
gap(variable="composite", data=sdfl, stDev=TRUE, returnSimpleDoF=TRUE)
} # }