Cross-validated ANOVA for O-PLS models

Performs a cross-validated ANOVA (CV-ANOVA) test for OPLS models. The function compares residuals from a null model and a model using cross-validated predictive scores to assess the significance of the OPLS model.

Usage

cv_anova(smod)

Arguments

smod: An object of class m8_model, generated by function opls in the metabom8 package.

Value

A data.frame containing:

SS - Sum of Squares
DF - Degrees of Freedom
MS - Mean Squares
F_value - F statistic
p_value - P-value from the F-test

Details

Interpretation of the CV-ANOVA table

The CV-ANOVA compares two nested linear models:

Null model: $Y = \beta_0 + \epsilon$
Full model: $Y = \beta_0 + \beta_1 t_{\text{pred,cv}} + \epsilon$

where $t_{\text{pred,cv}}$ represents the cross-validated predictive component score(s) obtained from the OPLS model.

The ANOVA table contains:

SS – Sum of Squares
DF – Degrees of Freedom
MS – Mean Squares (SS / DF)
F_value – F statistic comparing model vs. null
p_value – P-value from the F-test

The Regression row quantifies the reduction in residual variance achieved by including the cross-validated predictive score(s). The Residual row represents the unexplained variance of the full model.

The F-statistic is computed as: $$ F = \frac{(RSS_0 - RSS_1) / df_{reg}}{RSS_1 / df_{res}} $$ where $RSS_0$ and $RSS_1$ are the residual sums of squares of the null and full models, respectively.

Predictive interpretation

A small p-value (typically < 0.05) indicates that the cross-validated predictive score(s) significantly reduce residual variance compared to an intercept-only model. This suggests that the OPLS model captures statistically meaningful predictive structure in $Y$.

A large p-value indicates that the predictive component does not explain $Y$ significantly better than chance, implying weak or unstable predictive performance.

Importantly, CV-ANOVA evaluates the linear explanatory power of the cross-validated predictive scores, not the descriptive separation of the latent space. A model may show visual class separation or moderate R$^2$ yet fail CV-ANOVA if predictive performance is weak.

Sample size strongly affects statistical power. With small $n$, even models with moderate predictive strength may not reach statistical significance.

CV-ANOVA is intended for continuous response (regression) models.

References

Eriksson, L., et al. (2008). CV-ANOVA for significance testing of PLS and OPLS models. Journal of Chemometrics, 22(11-12), 594–600.

Examples

data("covid")

X <- covid$X
Y <- as.numeric(factor(covid$an$type)) - 1

scaling <- uv_scaling(center=TRUE)
cv <- balanced_mc(10, split=2/3, type='R', probs = c(0, 0.5, 1))
mod <- opls(X, Y, scaling, cv)
#> An O-PLS-R model with 1 predictive and 1 orthogonal components was fitted.

cv_anova(mod)
#>                       SS DF        MS F_value    p_value
#> Total corrected 2.500000  9 0.2777778      NA         NA
#> Regression      1.148372  1 1.1483719      NA         NA
#> Residual        1.351628  8 0.1689535      NA         NA
#> RESULT                NA NA        NA 6.79697 0.03127089