Skip to contents

Performs a cross-validated ANOVA (CV-ANOVA) test for OPLS models. The function compares residuals from a null model and a model using cross-validated predictive scores to assess the significance of the OPLS model.

Usage

cv_anova(smod)

Arguments

smod

An object of class m8_model, generated by function opls in the metabom8 package.

Value

A data.frame containing:

  • SS - Sum of Squares

  • DF - Degrees of Freedom

  • MS - Mean Squares

  • F_value - F statistic

  • p_value - P-value from the F-test

Details

Interpretation of the CV-ANOVA table

The CV-ANOVA compares two nested linear models:

  • Null model: \(Y = \beta_0 + \epsilon\)

  • Full model: \(Y = \beta_0 + \beta_1 t_{\text{pred,cv}} + \epsilon\)

where \(t_{\text{pred,cv}}\) represents the cross-validated predictive component score(s) obtained from the OPLS model.

The ANOVA table contains:

  • SS – Sum of Squares

  • DF – Degrees of Freedom

  • MS – Mean Squares (SS / DF)

  • F_value – F statistic comparing model vs. null

  • p_value – P-value from the F-test

The Regression row quantifies the reduction in residual variance achieved by including the cross-validated predictive score(s). The Residual row represents the unexplained variance of the full model.

The F-statistic is computed as: $$ F = \frac{(RSS_0 - RSS_1) / df_{reg}}{RSS_1 / df_{res}} $$ where \(RSS_0\) and \(RSS_1\) are the residual sums of squares of the null and full models, respectively.

Predictive interpretation

A small p-value (typically < 0.05) indicates that the cross-validated predictive score(s) significantly reduce residual variance compared to an intercept-only model. This suggests that the OPLS model captures statistically meaningful predictive structure in \(Y\).

A large p-value indicates that the predictive component does not explain \(Y\) significantly better than chance, implying weak or unstable predictive performance.

Importantly, CV-ANOVA evaluates the linear explanatory power of the cross-validated predictive scores, not the descriptive separation of the latent space. A model may show visual class separation or moderate R\(^2\) yet fail CV-ANOVA if predictive performance is weak.

Sample size strongly affects statistical power. With small \(n\), even models with moderate predictive strength may not reach statistical significance.

CV-ANOVA is intended for continuous response (regression) models.

References

Eriksson, L., et al. (2008). CV-ANOVA for significance testing of PLS and OPLS models. Journal of Chemometrics, 22(11-12), 594–600.

See also

Other model_validation: dmodx(), opls_perm()

Examples

data("covid")

X <- covid$X
Y <- as.numeric(factor(covid$an$type)) - 1

scaling <- uv_scaling(center=TRUE)
cv <- balanced_mc(10, split=2/3, type='R', probs = c(0, 0.5, 1))
mod <- opls(X, Y, scaling, cv)
#> An O-PLS-R model with 1 predictive and 1 orthogonal components was fitted.

cv_anova(mod)
#>                       SS DF        MS F_value    p_value
#> Total corrected 2.500000  9 0.2777778      NA         NA
#> Regression      1.148372  1 1.1483719      NA         NA
#> Residual        1.351628  8 0.1689535      NA         NA
#> RESULT                NA NA        NA 6.79697 0.03127089