Calculates bivariate Pearson, Spearman, polychoric, and polyserial correlation coefficients in weighted or unweighted form, on discrete or continuous variables. Also calculates tetrachoric and biserial correlation coefficients as described below.

weightedCorr(
  x,
  y,
  method = c("Pearson", "Spearman", "Polyserial", "Polychoric"),
  weights = rep(1, length(x)),
  ML = FALSE,
  fast = TRUE
)

Arguments

x

a numeric (or numeric factor in case of polychoric) vector or an object that can be coerced to a numeric or factor vector.

y

a numeric vector (or factor in case of polychoric and polyserial) or an object that can be coerced to a numeric or factor vector.

method

a character string indicating which correlation coefficient is to be computed. These include "Pearson" (default), "Spearman", "Polychoric", or "Polyserial". For tetrachoric use "Polychoric" and for biserial use "Polyserial".

weights

a numeric vector of weights. By default, the unweighted correlation coefficient is calculated by setting the weights to a vector of all 1s.

ML

a Boolean value indicating if full Maximum Likelihood (ML) is to be used (polyserial and polychoric only, has no effect on Pearson or Spearman results). This substantially increases the compute time. See the 'wCorr Arguments' vignette for a description of the effect of this argument.

fast

a Boolean value indicating if the Rcpp methods should be used. Setting this value to FALSE uses the pure R implementation and is included primarily for comparing the implementations to each other. See the 'wCorr Arguments' vignette for a description of the effect of this argument.

Value

A scalar that is the estimated correlation.

Details

In case of polyserial, x must be the observed ordinal variable, and y the observed continuous variable. For polychoric, both must be categorical. The correlation methods are calculated as described in the 'wCorr Formulas' vignette.

For Spearman the data is first ranked and then a Pearson type correlation coefficient is calculated on the result. The ranking method gives averages for ties.

The details of computation are given in the 'wCorr Formulas' vignette.

References

Polyserial computation based on the likelihood function in Cox, N. R. (1974), "Estimation of the Correlation between a Continuous and a Discrete Variable." Biometrics, 30 (1), pp 171-178.

Polychoric computation based on the likelihood function in Olsson, U. (1979) "Maximum Likelihood Estimation of the Polychoric Correlation Coefficient." Psyhometrika, 44 (4), pp 443-460.

The weighted Pearson formula appears in many places, including the "correlate" function in Stata Corp, Stata Statistical Software: Release 8. College Station, TX: Stata Corp LP, 2003.

See also

Examples

# run a polyserial correlation
attach(mtcars)
weightedCorr(gear, x=cyl, method="polyserial")
#> [1] -0.5420413
# weight by MPG
weightedCorr(y=gear, x=cyl, method="polyserial", weights=mpg)
#> [1] -0.5549905
# unweight
weightedCorr(y=gear, x=cyl, method="polyserial")
#> [1] -0.5420413

# run a polychoric correlation
weightedCorr(gear, x=cyl, method="polychoric")
#> [1] -0.6188992
# weight by MPG
weightedCorr(y=gear, x=cyl, method="polychoric", weights=mpg)
#> [1] -0.6103001
# unwiehgted
weightedCorr(y=gear, x=cyl, method="polychoric")
#> [1] -0.6188992
detach(mtcars)