Compares the average levels of a variable between two groups that potentially share members.
Usage
gap(
variable,
data,
groupA = "default",
groupB = "default",
percentiles = NULL,
achievementLevel = NULL,
achievementDiscrete = FALSE,
stDev = FALSE,
targetLevel = NULL,
weightVar = NULL,
jrrIMax = 1,
varMethod = c("jackknife"),
dropOmittedLevels = TRUE,
defaultConditions = TRUE,
recode = NULL,
referenceDataIndex = 1,
returnVarEstInputs = FALSE,
returnSimpleDoF = FALSE,
returnSimpleN = FALSE,
returnNumberOfPSU = FALSE,
noCov = FALSE,
pctMethod = c("unbiased", "symmetric", "simple"),
includeLinkingError = FALSE,
omittedLevels = deprecated()
)Arguments
- variable
a character indicating the variable to be compared, potentially with a subject scale or subscale
- data
an
edsurvey.data.frame, alight.edsurvey.data.frame, or anedsurvey.data.frame.list- groupA
an expression or character expression that defines a condition for the subset. This subset will be compared to
groupB. If not specified, it will define a whole sample as indata.- groupB
an expression or character expression that defines a condition for the subset. This subset will be compared to
groupA. If not specified, it will define a whole sample as indata. If set toNULL, estimates for the second group will be dropped.- percentiles
a numeric vector. The
gapfunction calculates the mean when this argument is omitted or set toNULL. Otherwise, the gap at the percentile given is calculated.- achievementLevel
the achievement level(s) at which percentages should be calculated
- achievementDiscrete
a logical indicating if the achievement level specified in the
achievementLevelargument should be interpreted as discrete so that just the percentage in that particular achievement level will be included. Defaults toFALSEso that the percentage at or above that achievement level will be included in the percentage.- stDev
a logical, set to
TRUEto calculate the gap in standard deviations.- targetLevel
a character string. When specified, calculates the gap in the percentage of students at
targetLevelin thevariableargument. This is useful for comparing the gap in the percentage of students at a survey response level.- weightVar
a character indicating the weight variable to use. See Details.
- jrrIMax
a numeric value; when using the jackknife variance estimation method, the default estimation option,
jrrIMax=1, uses the sampling variance from the first plausible value as the component for sampling variance estimation. TheVjrrterm, or sampeling variance term, can be estimated with any number of plausible values, and values larger than the number of plausible values on the survey (includingInf) will result in all plausible values being used. Higher values ofjrrIMaxlead to longer computing times and more accurate variance estimates.- varMethod
deprecated parameter,
gapalways uses the jackknife variance estimation- dropOmittedLevels
a logical value. When set to the default value of
TRUE, drops those levels of all factor variables. Useprinton anedsurvey.data.frameto see the omitted levels.- defaultConditions
a logical value. When set to the default value of
TRUE, uses the default conditions stored inedsurvey.data.frameto subset the data. Useprinton anedsurvey.data.frameto see the default conditions.- recode
a list of lists to recode variables. Defaults to
NULL. Can be set asrecode=list(var1=list(from=c("a","b","c"),to="d")).- referenceDataIndex
a numeric used only when the
dataargument is anedsurvey.data.frame.list, indicating which dataset is the reference dataset that other datasets are compared with. Defaults to 1.- returnVarEstInputs
a logical value; set to
TRUEto return the inputs to the jackknife and imputation variance estimates which allows for the computation of covariances between estimates.- returnSimpleDoF
a logical value set to
TRUEto return the degrees of freedom for some statistics (see Value section) that do not have a t-test; useful primarily for further computation- returnSimpleN
a logical value set to
TRUEto add the count (n-size) of observations included in groups A and B in the percentage object- returnNumberOfPSU
a logical value set to
TRUEto return the number of PSUs used in the calculation- noCov
set the covariances to zero in result
- pctMethod
a character that is one of
unbiasedorsimple. See the help forpercentilefor more information.- includeLinkingError
a logical value set to
TRUEto include the linking error in variance estimation. Standard errors (e.g.,diffAAse,diffBBse, anddiffABABse) and p-values (e.g.,diffAApValue,diffBBpValue, anddiffABABpValue) would be adjusted for comparisons between digitally based assessments (DBA) and paper-based assessments (PBA) data. This option is supported only for NAEP data.- omittedLevels
this argument is deprecated. Use
dropOmittedLevels.
Value
The return type depends on if the class of the data argument is an
edsurvey.data.frame or an edsurvey.data.frame.list. Both
include the call (called call), a list called labels,
an object named percentage
that shows the percentage in groupA and groupB, and an object
that shows the gap called results.
The labels include the following elements:
- definition
the definitions of the groups
- nFullData
the n-size for the full dataset (before applying the definition)
- nUsed
the n-size for the data after the group is subsetted and other restrictions (such as omitted values) are applied
- nPSU
the number of PSUs used in calculation–only returned when
returnNumberOfPSU=TRUE
The percentages are computed according to the vignette titled Statistical Methods Used in EdSurvey in the section “Estimation of Weighted Percentages When Plausible Values Are Not Present.” The standard errors are calculated according to “Estimation of the Standard Error of Weighted Percentages When Plausible Values Are Not Present, Using the Jackknife Method.” Standard errors of differences are calculated as the square root of the typical variance formula $$Var(A-B) = Var(A) + Var(B) - 2 Cov(A,B)$$ where the covariance term is calculated as described in the vignette titled Statistical Methods Used in EdSurvey in the section “Estimation of Covariances.” These degrees of freedom are available only with the jackknife variance estimation. The degrees of freedom used for hypothesis testing are always set to the number of jackknife replicates in the data.
the data argument is an edsurvey.data.frame
When the data argument is an edsurvey.data.frame,
gap returns an S3 object of class gap.
The percentage object is a numeric vector with the following elements:
- pctA
the percentage of respondents in
groupAcompared with the whole sample indata- pctAse
the standard error on the percentage of respondents in
groupA- dofA
degrees of freedom appropriate for a t-test involving
pctA. This value is returned only ifreturnSimpleDoF=TRUE.- pctB
the percentage of respondents in
groupB.- pctBse
the standard error on the percentage of respondents in
groupB- dofB
degrees of freedom appropriate for a t-test involving
pctA. This value is returned only ifreturnSimpleDoF=TRUE.- diffAB
the value of
pctAminuspctB- covAB
the covariance of
pctAandpctB; used in calculatingdiffABse.- diffABse
the standard error of
pctAminuspctB- diffABpValue
the p-value associated with the t-test used for the hypothesis test that
diffABis zero- dofAB
degrees of freedom used in calculating
diffABpValue
The results object is a numeric data frame with the following elements:
- estimateA
the mean estimate of
groupA(or the percentage estimate ifachievementLevelortargetLevelis specified)- estimateAse
the standard error of
estimateA- dofA
degrees of freedom appropriate for a t-test involving
meanA. This value is returned only ifreturnSimpleDoF=TRUE.- estimateB
the mean estimate of
groupB(or the percentage estimate ifachievementLevelortargetLevelis specified)- estimateBse
the standard error of
estimateB- dofB
degrees of freedom appropriate for a t-test involving
meanB. This value is returned only ifreturnSimpleDoF=TRUE.- diffAB
the value of
estimateAminusestimateB- covAB
the covariance of
estimateAandestimateB. Used in calculatingdiffABse.- diffABse
the standard error of
diffAB- diffABpValue
the p-value associated with the t-test used for the hypothesis test that
diffABis zero.- dofAB
degrees of freedom used for the t-test on
diffAB
If the gap was in achievement levels or percentiles and more
than one percentile or achievement level is requested,
then an additional column
labeled percentiles or achievementLevel is included
in the results object.
When results has a single row and when returnVarEstInputs
is TRUE, the additional elements varEstInputs and
pctVarEstInputs also are returned. These can be used for calculating
covariances with varEstToCov.
the data argument is an edsurvey.data.frame.list
When the data argument is an edsurvey.data.frame.list,
gap returns an S3 object of class gapList.
The results object in the edsurveyResultList is
a data.frame. Each row regards a particular dataset from the
edsurvey.data.frame, and a reference dataset is dictated by
the referenceDataIndex argument.
The percentage object is a data.frame with the following elements:
- covs
a data frame with a column for each column in the
covs. See previous section for more details.- ...
all elements in the
percentageobject in the previous section- diffAA
the difference in
pctAbetween the reference data and this dataset. Set toNAfor the reference dataset.- covAA
the covariance of
pctAin the reference data andpctAon this row. Used in calculatingdiffAAse.- diffAAse
the standard error for
diffAA- diffAApValue
the p-value associated with the t-test used for the hypothesis test that
diffAAis zero- diffBB
the difference in
pctBbetween the reference data and this dataset. Set toNAfor the reference dataset.- covBB
the covariance of
pctBin the reference data andpctBon this row. Used in calculatingdiffAAse.- diffBBse
the standard error for
diffBB- diffBBpValue
the p-value associated with the t-test used for the hypothesis test that
diffBBis zero- diffABAB
the value of
diffABin the reference dataset minus the value ofdiffABin this dataset. Set toNAfor the reference dataset.- covABAB
the covariance of
diffABin the reference data anddiffABon this row. Used in calculatingdiffABABse.- diffABABse
the standard error for
diffABAB- diffABABpValue
the p-value associated with the t-test used for the hypothesis test that
diffABABis zero
The results object is a data.frame with the following elements:
- ...
all elements in the
resultsobject in the previous section- diffAA
the value of
groupAin the reference dataset minus the value in this dataset. Set toNAfor the reference dataset.- covAA
the covariance of
meanAin the reference data andmeanAon this row. Used in calculatingdiffAAse.- diffAAse
the standard error for
diffAA- diffAApValue
the p-value associated with the t-test used for the hypothesis test that
diffAAis zero- diffBB
the value of
groupBin the reference dataset minus the value in this dataset. Set toNAfor the reference dataset.- covBB
the covariance of
meanBin the reference data andmeanBon this row. Used in calculatingdiffBBse.- diffBBse
the standard error for
diffBB- diffBBpValue
the p-value associated with the t-test used for the hypothesis test that
diffBBis zero- diffABAB
the value of
diffABin the reference dataset minus the value ofdiffABin this dataset. Set toNAfor the reference dataset.- covABAB
the covariance of
diffABin the reference data anddiffABon this row. Used in calculatingdiffABABse.- diffABABse
the standard error for
diffABAB- diffABABpValue
the p-value associated with the t-test used for the hypothesis test that
diffABABis zero- sameSurvey
a logical value indicating if this line uses the same survey as the reference line. Set to
NAfor the reference line.
Details
This function calculates the gap between groupA and groupB (which
may be omitted to indicate the full sample). The gap is
calculated for one of four statistics:
- the gap in means
The mean score gap (in the score variable) identified in the
variableargument. This is the default. The means and their standard errors are calculated using the methods described in thelm.sdffunction documentation.- the gap in percentiles
The gap between respondents at the percentiles specified in the
percentilesargument. This is returned when thepercentilesargument is defined. The mean and standard error are computed as described in thepercentilefunction documentation.- the gap in achievement levels
The gap in the percentage of students at (when
achievementDiscreteisTRUE) or at or above (whenachievementDiscreteisFALSE) a particular achievement level. This is used when theachievementLevelargument is defined. The mean and standard error are calculated as described in theachievementLevelsfunction documentation.- the gap in a survey response
The gap in the percentage of respondents responding at
targetLeveltovariable. This is used whentargetLevelis defined. The mean and standard deviation are calculated as described in theedsurveyTablefunction documentation.
Examples
if (FALSE) { # \dontrun{
# read in the example data (generated, not real student data)
sdf <- readNAEP(path=system.file("extdata/data", "M36NT2PM.dat", package = "NAEPprimer"))
# find the mean score gap in the primer data between males and females
gap(variable="composite", data=sdf, groupA=dsex=="Male", groupB=dsex=="Female")
# find the score gap of the quartiles in the primer data between males and females
gap(variable="composite", data=sdf,
groupA=dsex=="Male", groupB=dsex=="Female", percentile=50)
gap(variable="composite", data=sdf,
groupA=dsex=="Male", groupB=dsex=="Female", percentile=c(25, 50, 75))
# find the percent proficient (or higher) gap in the primer data between males and females
gap(variable="composite", data=sdf, groupA=dsex=="Male", groupB=dsex=="Female",
achievementLevel=c("Basic", "Proficient", "Advanced"))
# find the discrete achievement level gap--this is harder to interpret
gap(variable="composite", data=sdf, groupA=dsex=="Male", groupB=dsex=="Female",
achievementLevel="Proficient", achievementDiscrete=TRUE)
# find the percent talk about studies at home (b017451) never or hardly
# ever gap in the primer data between males and females
gap(variable="b017451", data=sdf, groupA=dsex=="Male", groupB=dsex=="Female",
targetLevel="Never or hardly ever")
# example showing how to compare multiple levels
gap(variable="b017451",
data=sdf,
groupA=dsex=="Male",
groupB=dsex=="Female",
targetLevel="Infrequently",
recode=list(b017451=list(from=c("Never or hardly ever",
"Once every few weeks",
"About once a week"),
to=c("Infrequently"))))
# make subsets of sdf by scrpsu, "Scrambled PSU and school code"
sdfA <- subset(sdf, scrpsu %in% c(5,45,56))
sdfB <- subset(sdf, scrpsu %in% c(75,76,78))
sdfC <- subset(sdf, scrpsu %in% 100:200)
sdfD <- subset(sdf, scrpsu %in% 201:300)
sdfl <- edsurvey.data.frame.list(datalist=list(sdfA, sdfB, sdfC, sdfD),
labels=c("A locations", "B locations",
"C locations", "D locations"))
gap(variable="composite", data=sdfl, groupA=dsex=="Male", groupB=dsex=="Female", percentile=c(50))
} # }