Skip to contents

Performs a cross-validated ANOVA (CV-ANOVA) test for OPLS models. The function compares residuals from a null model and a model using cross-validated predictive scores to assess the significance of the OPLS model.

Usage

cvanova(smod)

Arguments

smod

An object of class OPLS_metabom8, generated by functions in the metabom8 package.

Value

A data.frame containing:

  • SS - Sum of Squares

  • DF - Degrees of Freedom

  • MS - Mean Squares

  • F_value - F statistic

  • p_value - P-value from the F-test

Details

CV-ANOVA formally compares the fit of two linear models using the size of their residuals. The null model regresses the response on an intercept only: $$Y_i = \beta_0 + \epsilon_i$$ The full model includes the cross-validated predictive scores: $$Y_i = \beta_0 + \beta_1 t_{\text{pred},i} + \epsilon_i$$ Here, \(Y_i\) is the response for observation \(i\), \(t_{\text{pred},i}\) the cross-validated predictive score, and \(\epsilon_i\) are residuals. The residual sums of squares (SS) and degrees of freedom (DF) from both models are used to calculate an F-statistic and associated p-value to assess if the OPLS model significantly improves the fit.

Please note: larger sample sizes increase the power to detect true effects. With few samples, even strong models may not reach statistical significance.

References

Eriksson, L., et al. (2008). CV-ANOVA for significance testing of PLS and OPLS models. Journal of Chemometrics, 22(11-12), 594–600.

Examples

data("covid")
X <- covid$X
an <- covid$an
mod <- opls(X, an$type)
#> Performing discriminant analysis.
#> Reducing k to 5 due to small group size (min n = 5).
#> An O-PLS-DA model with 1 predictive and 1 orthogonal components was fitted.

cvanova(mod)
#>                           SS DF           MS  F_value      p_value
#> Total corrected 9.000000e+00  9 1.000000e+00       NA         <NA>
#> Regression      3.609325e+00  4 9.023312e-01       NA         <NA>
#> Residual        5.390675e+00  5 1.078135e+00       NA         <NA>
#> RESULT                  <NA> NA         <NA> 0.836937 5.560243e-01

# Internally, two linear models are fitted:
# Null model: lm(Y ~ 1)
# Full model: lm(Y ~ 1 + t_pred_cv)
#
# where Y is the response variable and t_pred_cv the cross-validated predictive scores.