EdSurvey Class Constructors and Helpers
Source:R/edsurvey.data.frame.R
, R/getAttributes.R
, R/setAttributes.R
, and 1 more
edsurvey-class.Rd
Two new classes in EdSurvey
are described in this section: the edsurvey.data.frame
and light.edsurvey.data.frame
. The edsurvey.data.frame
class stores metadata about survey data, and data are stored on the
disk (via the LaF
package), allowing gigabytes of data to be used easily on a machine otherwise
inappropriate for manipulating large datasets.
The light.edsurvey.data.frame
is typically generated
by the getData
function and stores the data in a
data.frame
.
Both classes use attributes to manage metadata and allow
for correct statistics to be used in calculating results; the
getAttributes
acts as an accessor for these attributes, whereas
setAttributes
acts as a mutator for the attributes.
As a convenience, edsurvey.data.frame
implements the $
function to extract a variable.
Usage
edsurvey.data.frame(
userConditions,
defaultConditions,
dataList = list(),
weights,
pvvars,
subject,
year,
assessmentCode,
dataType,
gradeLevel,
achievementLevels,
omittedLevels,
survey,
country,
psuVar,
stratumVar,
jkSumMultiplier,
recodes = NULL,
validateFactorLabels = FALSE,
forceLower = TRUE,
reqDecimalConversion = TRUE,
fr2Path = NULL,
dim0 = NULL,
cacheDataLevelName = NULL
)
# S3 method for class 'edsurvey.data.frame'
x$i
# S3 method for class 'edsurvey.data.frame'
x$name <- value
# S4 method for class 'edsurvey.data.frame,ANY'
x %in% table
# S4 method for class 'edsurvey.data.frame.list,ANY'
x %in% table
getAttributes(data, attribute = NULL, errorCheck = TRUE)
setAttributes(data, attribute, value)
getPSUVar(
data,
weightVar = attributes(getAttributes(data, "weights"))[["default"]]
)
getStratumVar(
data,
weightVar = attributes(getAttributes(data, "weights"))[["default"]]
)
Arguments
- userConditions
a list of user conditions that includes subsetting or recoding conditions
- defaultConditions
a list of default conditions that often are set for each survey
- dataList
a list of
dataListItem
objects to model the data structure of the survey- weights
a list that stores information regarding weight variables. See Details.
- pvvars
a list that stores information regarding plausible values. See Details.
- subject
a character that indicates the subject domain of the given data
- year
a character or numeric that indicates the year of the given data
- assessmentCode
a character that indicates the code of the assessment. Can be
National
orInternational
.- dataType
a character that indicates the unit level of the main data. Examples include
Student
,teacher
,school
,Adult Data
.- gradeLevel
a character that indicates the grade level of the given data
- achievementLevels
a list of achievement-level categories and cutpoints
- omittedLevels
a list of default omitted levels for the given data
- survey
a character that indicates the name of the survey
- country
a character that indicates the country of the given data
- psuVar
a character that indicates the PSU sampling unit variable. Ignored when weights have
psuVar
defined.- stratumVar
a character that indicates the stratum variable. Ignored when weights have
stratumVar
defined.- jkSumMultiplier
a numeric value of the jackknife coefficient (used in calculating the jackknife replication estimation)
- recodes
a list of variable recodes of the given data
- validateFactorLabels
a Boolean that indicates whether the
getData
function needs to validate factor variables- forceLower
a Boolean; when set to
TRUE
, will automatically lowercase variable names- reqDecimalConversion
a Boolean; when set to
TRUE
, agetData
call will multiply the raw file value by a decimal multiplier- fr2Path
a character file location for NAEP assessments to identify the location of the codebook file in
fr2
format- dim0
numeric vector of length two. To speed construction, the dimensions of the data can be provided
- cacheDataLevelName
a character value set to match the named element in the
dataList
to utilize the data caching scheme. See details.- x
an
edsurvey.data.frame
- i
a character, the column name to extract
- name
a character vector of the column to edit
- value
outside of the assignment context, new value of the given
attribute
- table
an
edsurvey.data.frame
oredsurvey.data.frame.list
wherex
is searched for- data
an
edsurvey.data.frame
- attribute
a character, name of an attribute to get or set
- errorCheck
logical; see Details
- weightVar
a character indicating the full sample weights. Required in
getPSUVar
andgetStratumVar
when there is no default weight.
Value
An object of class edsurvey.data.frame
with the following elements:
Elements that store data connections and data codebooks
dataList
a
list
object containing the surveysdataListItem
objects
Elements that store sample design and default subsetting information of the given survey data
userConditions
a list containing all user conditions, set using the
subset.edsurvey.data.frame
methoddefaultConditions
the default subsample conditions
weights
a list containing the weights. See Details.
stratumVar
a character that indicates the default strata identification variable name in the data. Often used in Taylor series estimation.
psuVar
a character that indicates the default PSU (sampling unit) identification variable name in the data. Often used in Taylor series estimation.
pvvars
a list containing the plausible values. See Details.
achievementLevels
default achievement cutoff scores and names. See Details.
omittedLevels
the levels of the factor variables that will be omitted from the
edsurvey.data.frame
Elements that store descriptive information of the survey
survey
the type of survey data
subject
the subject of the data
year
the year of assessment
assessmentCode
the assessment code
dataType
the type of data (e.g.,
student
orschool
)gradeLevel
the grade of the dataset contained in the
edsurvey.data.frame
Elements used in mml.sdf
dichotParamTab
IRT item parameters for dichotomous items in a data frame
polyParamTab
IRT item parameters for polytomous items in a data frame
adjustedData
IRT item parameter adjustment information in a data frame
testData
IRT transformation constants in a data frame
scoreCard
item scoring information in a data frame
scoreDict
generic scoring information in a data frame
scoreFunction
a function that turns the variables with items in them into numeric scores
Details
The weight
list has an element named after each weight variable name
that is a list with elements jkbase
and jksuffixes
. The
jkbase
variable is a single character indicating the jackknife replicate
weight base name, whereas jksuffixes
is a vector with one element for each
jackknife replicate weight. When the two are pasted together, they should form
the complete set of the jackknife replicate weights. The weights
argument
also can have an attribute that is the default weight. If the primary sampling
unit and stratum variables change by weight, they also can be defined on the weight
list as psuVar
and stratumVar
. When this option is used, it overrides
the psuVar
and stratumVar
on the edsurvey.data.frame
,
which can be left blank. A weight must define only one of psuVar
and stratumVar
.
The pvvars
list has an element for each subject or subscale score
that has plausible values. Each element is a list with a varnames
element that indicates the column names of the plausible values and an
achievementLevel
argument that is a named vector of the
achievement-level cutpoints.
An edsurvey.data.frame
implements a unique data caching mechanism that allows users to create and merge data columns for flexibility.
This cache
object is a single data.frame
that is an element in the edsurvey.data.frame
. To accommodate studies with complex data models
the cache can only support one data level at this time. The cacheDataLevelName
parameter indicates which named element in the dataList
the cache is indicated. The default value cacheDataLevelName = NULL
will set the first item in the dataList
as the cache
level for an edsurvey.data.frame
.
EdSurvey Classes
edsurvey.data.frame
is an object that stores connection to data on the
disk along with important survey sample design information.
edsurvey.data.frame.list
is a list of edsurvey.data.frame
objects. It often is used in trend or cross-regional analysis in the
gap
function. See edsurvey.data.frame.list
for
more information on how to create an edsurvey.data.frame.list
. Users
also can refer to the vignette titled
Using EdSurvey for Trend Analysis
for examples.
Besides edsurvey.data.frame
class, the EdSurvey
package also
implements the light.edsurvey.data.frame
class, which can be used by both
EdSurvey
and non-EdSurvey
functions. More particularly,
light.edsurvey.data.frame
is a data.frame
that has basic
survey and sample design information (i.e., plausible values and weights), which
will be used for variance estimation in analytical functions. Because it
also is a base R data.frame
, users can apply base R functions for
data manipulation.
See the vignette titled
Using the getData
Function in EdSurvey
for more examples.
Many functions will remove attributes from a data frame, such as
a light.edsurvey.data.frame
, and the
rebindAttributes
function can add them back.
Users can get a light.edsurvey.data.frame
object by using the
getData
method with addAttributes=TRUE
.
Basic Methods for EdSurvey Classes
Extracting a column from an edsurvey.data.frame
Users can extract a column from an edsurvey.data.frame
object using $
or []
like a normal data frame.
Extracting and updating attributes of an object of class edsurvey.data.frame
or light.edsurvey.data.frame
Users can use the getAttributes
method to extract any attribute of
an edsurvey.data.frame
or a light.edsurvey.data.frame
.
The errorCheck
parameter has a default value ofTRUE
, which throws an error if an attribute is not found.
Setting errorCheck = FALSE
will suppress error checking, and return NULL
if an attribute can't be found.
A light.edsurvey.data.frame
will not have attributes related to data connection
because data have already been read in memory.
If users want to update an attribute (i.e., omittedLevels
), they can
use the setAttributes
method.
Examples
if (FALSE) { # \dontrun{
# read in the example data (generated, not real student data)
sdf <- readNAEP(path=system.file("extdata/data", "M36NT2PM.dat", package="NAEPprimer"))
# run a base R function on a column of edsurvey.data.frame
table(sdf$dsex)
# assignment
table(sdf$b013801)
sdf$books <- ifelse(sdf$b013801 %in% c("0-10", "11-25"), "0-25 books", "26+ books")
table(sdf$books, sdf$b013801)
# extract default omitted levels of NAEP primer data
getAttributes(data=sdf, attribute="omittedLevels")
#[1] "Multiple" NA "Omitted"
# update default omitted levels of NAEP primer data
sdf <- setAttributes(data=sdf,
attribute="omittedLevels",
value=c("Multiple", "Omitted", NA, "(Missing)"))
getAttributes(data=sdf, attribute="omittedLevels")
#[1] "Multiple" "Omitted" NA "(Missing)"
} # }