Parse SPSS Syntax Script for Fixed-Width Data Files
Source:R/parseScript_SPSS.R
parseScript_SPSS.Rd
Parses an SPSS Syntax Script (.sps) file to return information relating to fixed-width data files.
Arguments
- spsFilePath
a character value of the file path to the SPSS script to parse.
- verbose
a logic value to indicate if user wishes to print parsing activity to console. Default value is
FALSE
.- outputFormat
a named argument to indicate which output format the resulting object should be. See details for information on each format. Currently,
data.frame
format is only supported.- encoding
a character value to indicate the encoding specification that is used by
readLines
base function for thespsFilePath
parameter. Only adjust this parameter if the original file encoding of the file is known, is not producing correct string values, or other errors occur. See?readLines
help for details about it's use for file encoding, and additional details.
Details
NOT CURRENTLY EXPORTED! In Future this could potentially be made to a separate R package THIS parseScript_SPSS function should be used 100 Old/Previous SPSS script parsers should be slowly transitioned to utilize this function when possible to maximize code use.
The SPSS syntax script parser is focused on gathering details for use with fixed-width data files. This function scans for the following SPSS commands:
FILE HANDLE
DATA LIST
VARIABLE LABEL
VALUE LABEL
MISSING VALUE
The outputFormat
specified will determine the result object returned. This function currently supports the following formats.
data.frame
variableName - The variable name as defined in the script
Start - The start number index of the variable defined for the fixed-width format layout
End - The end number index of the variable defined for the fixed-width format layout
Width - The length of how many columns the variable uses in the fixed-width format layout
Attributes - Any SPSS attributes that are defined in the DATA LIST command. This is typically only for field formatting.
RecordNumber - Some fixed-width data files are considered "multi-line" where one record of data can span multiple rows in the file. The RecordNumber indicates which line the variable is assigned.
Labels - The descriptive label associated with the variable name to give more detail or context.
labelValues - For categorical variables a stored value will typically be assigned a longer label/definition. This string identifies these mappings. The '^' symbol is used to delimit each individual label value. Then additionally, the '=' is used to split the value from the left side of the '=' symbol, and the remaining right-hand side of '=' is the text label for that value.
dataType - A best-guess of the data type (either 'numeric' or 'character') without actually examining the data-file.
missingValues - If a MISSING VALUE clause is included in the script this will list the values that are considered 'Missing'. If multiple values specified, they will be delimited by a ';' (semi-colon) symbol.