EdSurvey Class Constructors and Helpers

Two new classes in EdSurvey are described in this section: the edsurvey.data.frame and light.edsurvey.data.frame. The edsurvey.data.frame class stores metadata about survey data, and data are stored on the disk (via the LaF package), allowing gigabytes of data to be used easily on a machine otherwise inappropriate for manipulating large datasets. The light.edsurvey.data.frame is typically generated by the getData function and stores the data in a data.frame. Both classes use attributes to manage metadata and allow for correct statistics to be used in calculating results; the getAttributes acts as an accessor for these attributes, whereas setAttributes acts as a mutator for the attributes. As a convenience, edsurvey.data.frame implements the $ function to extract a variable.

Usage

edsurvey.data.frame(
  userConditions,
  defaultConditions,
  dataList = list(),
  weights,
  pvvars,
  subject,
  year,
  assessmentCode,
  dataType,
  gradeLevel,
  achievementLevels,
  omittedLevels,
  survey,
  country,
  psuVar,
  stratumVar,
  jkSumMultiplier,
  recodes = NULL,
  validateFactorLabels = FALSE,
  forceLower = TRUE,
  reqDecimalConversion = TRUE,
  fr2Path = NULL,
  dim0 = NULL,
  cacheDataLevelName = NULL
)

# S3 method for class 'edsurvey.data.frame'
x$i

# S3 method for class 'edsurvey.data.frame'
x$name <- value

# S4 method for class 'edsurvey.data.frame,ANY'
x %in% table

# S4 method for class 'edsurvey.data.frame.list,ANY'
x %in% table

getAttributes(data, attribute = NULL, errorCheck = TRUE)

setAttributes(data, attribute, value)

getPSUVar(
  data,
  weightVar = attributes(getAttributes(data, "weights"))[["default"]]
)

getStratumVar(
  data,
  weightVar = attributes(getAttributes(data, "weights"))[["default"]]
)

Arguments

userConditions: a list of user conditions that includes subsetting or recoding conditions
defaultConditions: a list of default conditions that often are set for each survey
dataList: a list of dataListItem objects to model the data structure of the survey
weights: a list that stores information regarding weight variables. See Details.
pvvars: a list that stores information regarding plausible values. See Details.
subject: a character that indicates the subject domain of the given data
year: a character or numeric that indicates the year of the given data
assessmentCode: a character that indicates the code of the assessment. Can be National or International.
dataType: a character that indicates the unit level of the main data. Examples include Student, teacher, school, Adult Data.
gradeLevel: a character that indicates the grade level of the given data
achievementLevels: a list of achievement-level categories and cutpoints
omittedLevels: a list of default omitted levels for the given data
survey: a character that indicates the name of the survey
country: a character that indicates the country of the given data
psuVar: a character that indicates the PSU sampling unit variable. Ignored when weights have psuVar defined.
stratumVar: a character that indicates the stratum variable. Ignored when weights have stratumVar defined.
jkSumMultiplier: a numeric value of the jackknife coefficient (used in calculating the jackknife replication estimation)
recodes: a list of variable recodes of the given data
validateFactorLabels: a Boolean that indicates whether the getData function needs to validate factor variables
forceLower: a Boolean; when set to TRUE, will automatically lowercase variable names
reqDecimalConversion: a Boolean; when set to TRUE, a getData call will multiply the raw file value by a decimal multiplier
fr2Path: a character file location for NAEP assessments to identify the location of the codebook file in fr2 format
dim0: numeric vector of length two. To speed construction, the dimensions of the data can be provided
cacheDataLevelName: a character value set to match the named element in the dataList to utilize the data caching scheme. See details.
x: an edsurvey.data.frame
i: a character, the column name to extract
name: a character vector of the column to edit
value: outside of the assignment context, new value of the given attribute
table: an edsurvey.data.frame or edsurvey.data.frame.list where x is searched for
data: an edsurvey.data.frame
attribute: a character, name of an attribute to get or set
errorCheck: logical; see Details
weightVar: a character indicating the full sample weights. Required in getPSUVar and getStratumVar when there is no default weight.

Value

An object of class edsurvey.data.frame with the following elements:

Elements that store data connections and data codebooks

dataList: a list object containing the surveys dataListItem objects

Elements that store sample design and default subsetting information of the given survey data

userConditions: a list containing all user conditions, set using the subset.edsurvey.data.frame method
defaultConditions: the default subsample conditions
weights: a list containing the weights. See Details.
stratumVar: a character that indicates the default strata identification variable name in the data. Often used in Taylor series estimation.
psuVar: a character that indicates the default PSU (sampling unit) identification variable name in the data. Often used in Taylor series estimation.
pvvars: a list containing the plausible values. See Details.
achievementLevels: default achievement cutoff scores and names. See Details.
omittedLevels: the levels of the factor variables that will be omitted from the edsurvey.data.frame

Elements that store descriptive information of the survey

survey: the type of survey data
subject: the subject of the data
year: the year of assessment
assessmentCode: the assessment code
dataType: the type of data (e.g., student or school)
gradeLevel: the grade of the dataset contained in the edsurvey.data.frame

Elements used in mml.sdf

dichotParamTab: IRT item parameters for dichotomous items in a data frame
polyParamTab: IRT item parameters for polytomous items in a data frame
adjustedData: IRT item parameter adjustment information in a data frame
testData: IRT transformation constants in a data frame
scoreCard: item scoring information in a data frame
scoreDict: generic scoring information in a data frame
scoreFunction: a function that turns the variables with items in them into numeric scores

Details

The weight list has an element named after each weight variable name that is a list with elements jkbase and jksuffixes. The jkbase variable is a single character indicating the jackknife replicate weight base name, whereas jksuffixes is a vector with one element for each jackknife replicate weight. When the two are pasted together, they should form the complete set of the jackknife replicate weights. The weights argument also can have an attribute that is the default weight. If the primary sampling unit and stratum variables change by weight, they also can be defined on the weight list as psuVar and stratumVar. When this option is used, it overrides the psuVar and stratumVar on the edsurvey.data.frame, which can be left blank. A weight must define only one of psuVar and stratumVar.

The pvvars list has an element for each subject or subscale score that has plausible values. Each element is a list with a varnames element that indicates the column names of the plausible values and an achievementLevel argument that is a named vector of the achievement-level cutpoints.

An edsurvey.data.frame implements a unique data caching mechanism that allows users to create and merge data columns for flexibility. This cache object is a single data.frame that is an element in the edsurvey.data.frame. To accommodate studies with complex data models the cache can only support one data level at this time. The cacheDataLevelName parameter indicates which named element in the dataList the cache is indicated. The default value cacheDataLevelName = NULL will set the first item in the dataList as the cache level for an edsurvey.data.frame.

EdSurvey Classes

edsurvey.data.frame is an object that stores connection to data on the disk along with important survey sample design information.

edsurvey.data.frame.list is a list of edsurvey.data.frame objects. It often is used in trend or cross-regional analysis in the gap function. See edsurvey.data.frame.list for more information on how to create an edsurvey.data.frame.list. Users also can refer to the vignette titled Using EdSurvey for Trend Analysis for examples.

Besides edsurvey.data.frame class, the EdSurvey package also implements the light.edsurvey.data.frame class, which can be used by both EdSurvey and non-EdSurvey functions. More particularly, light.edsurvey.data.frame is a data.frame that has basic survey and sample design information (i.e., plausible values and weights), which will be used for variance estimation in analytical functions. Because it also is a base R data.frame, users can apply base R functions for data manipulation. See the vignette titled Using the getData Function in EdSurvey for more examples.

Many functions will remove attributes from a data frame, such as a light.edsurvey.data.frame, and the rebindAttributes function can add them back.

Users can get a light.edsurvey.data.frame object by using the getData method with addAttributes=TRUE.

Basic Methods for EdSurvey Classes

Extracting a column from an edsurvey.data.frame

Users can extract a column from an edsurvey.data.frame object using $ or [] like a normal data frame.

Extracting and updating attributes of an object of class edsurvey.data.frame or light.edsurvey.data.frame

Users can use the getAttributes method to extract any attribute of an edsurvey.data.frame or a light.edsurvey.data.frame. The errorCheck parameter has a default value ofTRUE, which throws an error if an attribute is not found. Setting errorCheck = FALSE will suppress error checking, and return NULL if an attribute can't be found.

A light.edsurvey.data.frame will not have attributes related to data connection because data have already been read in memory.

If users want to update an attribute (i.e., omittedLevels), they can use the setAttributes method.

Author

Tom Fink, Trang Nguyen, and Paul Bailey

Examples

if (FALSE) { # \dontrun{
# read in the example data (generated, not real student data)
sdf <- readNAEP(path=system.file("extdata/data", "M36NT2PM.dat", package="NAEPprimer"))

# run a base R function on a column of edsurvey.data.frame
table(sdf$dsex)
# assignment
table(sdf$b013801)
sdf$books <- ifelse(sdf$b013801 %in% c("0-10", "11-25"), "0-25 books", "26+ books")
table(sdf$books, sdf$b013801)

# extract default omitted levels of NAEP primer data
getAttributes(data=sdf, attribute="omittedLevels")
#[1] "Multiple" NA         "Omitted"

# update default omitted levels of NAEP primer data
sdf <- setAttributes(data=sdf,
                   attribute="omittedLevels",
                   value=c("Multiple", "Omitted", NA, "(Missing)"))
getAttributes(data=sdf, attribute="omittedLevels")
#[1] "Multiple"  "Omitted"   NA          "(Missing)"
} # }