Connect to PISA Data

Opens a connection to a PISA data file and returns an edsurvey.data.frame with information about the file and data.

Usage

readPISA(
  path,
  database = c("INT", "CBA", "FIN"),
  countries,
  cognitive = c("score", "response", "none"),
  forceReread = FALSE,
  verbose = TRUE
)

Arguments

path: a character vector to the full directory path(s) to the PISA-extracted fixed-width files and SPSS control files (.txt).
database: a character to indicate a selected database. Must be one of INT (general database that most people use), CBA (computer-based database in PISA 2012 only), or FIN (financial literacy database in PISA 2012, 2018, and 2022. Note that `INT` needs to be used for PISA 2015 financial literacy data as it could be merged to the general database). Defaults to INT.
countries: a character vector of the country/countries to include using the three-digit ISO country code. A list of country codes can be found in the PISA codebook or https://en.wikipedia.org/wiki/ISO_3166-1#Current_codes. If files are downloaded using downloadPISA, a country dictionary text file can be found in the filepath.
cognitive: one of none, score, or response. Default is score. The PISA database often has three student files: student questionnaire, cognitive item response, and scored cognitive item response. The first file is used as the main student file with student background information. Users can choose whether to merge score or response data into the main file or not (if none).
forceReread: a logical value to force rereading of all processed data. Defaults to FALSE. Setting forceReread to be TRUE will cause PISA data to be reread and increase processing time.
verbose: a logical value that will determine if you want verbose output while the function is running to indicate progress. Defaults to TRUE.

Value

an edsurvey.data.frame for a single specified country or an edsurvey.data.frame.list if multiple countries are specified

Details

Reads in the unzipped files downloaded from the PISA database using the OECD Repository (https://www.oecd.org/pisa.html). Users can use downloadPISA to download all required files. Student questionnaire files (with weights and plausible values) are used as main files, which are then merged with cognitive, school, and parent files (if available).

The average first-time processing time for 1 year and one database for all countries is 10–15 minutes. If forceReread is set to be FALSE, the next time this function is called will take only 5–10 seconds.

For the PISA 2000 study, please note that the study weights are subject specific. Each weight has different adjustment factors for reading, mathematics, and science based on it's original subject source file. For example, the w_fstuwt_read weight is associated with the reading subject data file. Special care must be used to select the correct weight based on your specific analysis. See the OECD documentation for further details. Use the showWeights function to see all three student level subject weights:

w_fstuwt_read = Reading (default)
w_fstuwt_scie = Science
w_fstuwt_math = Mathematics

References

Organisation for Economic Co-operation and Development. (2017). PISA 2015 technical report. Paris, France: OECD Publishing. Retrieved from https://www.oecd.org/pisa/data/2015-technical-report.html

Author

Tom Fink, Trang Nguyen, Paul Bailey, and Yuqi Liao

Examples

if (FALSE) { # \dontrun{
# the following call returns an edsurvey.data.frame to 
# PISA 2012 International Database for Singapore
sgp2012 <- readPISA(path = "~/PISA/2012", database = "INT", countries = "sgp")

# extract a data.frame with a few variables
gg <- getData(sgp2012, c("cnt","read","w_fstuwt"))  
head(gg)

# conduct an analysis on the edsurvey.data.frame
edsurveyTable(formula=read ~ st04q01 + st20q01, data = sgp2012)
} # }