Performs a cross-validated ANOVA (CV-ANOVA) test for OPLS models. The function compares residuals from a null model and a model using cross-validated predictive scores to assess the significance of the OPLS model.
Value
A data.frame containing:
SS- Sum of SquaresDF- Degrees of FreedomMS- Mean SquaresF_value- F statisticp_value- P-value from the F-test
Details
Interpretation of the CV-ANOVA table
The CV-ANOVA compares two nested linear models:
Null model: \(Y = \beta_0 + \epsilon\)
Full model: \(Y = \beta_0 + \beta_1 t_{\text{pred,cv}} + \epsilon\)
where \(t_{\text{pred,cv}}\) represents the cross-validated predictive component score(s) obtained from the OPLS model.
The ANOVA table contains:
SS – Sum of Squares
DF – Degrees of Freedom
MS – Mean Squares (SS / DF)
F_value – F statistic comparing model vs. null
p_value – P-value from the F-test
The Regression row quantifies the reduction in residual variance achieved by including the cross-validated predictive score(s). The Residual row represents the unexplained variance of the full model.
The F-statistic is computed as: $$ F = \frac{(RSS_0 - RSS_1) / df_{reg}}{RSS_1 / df_{res}} $$ where \(RSS_0\) and \(RSS_1\) are the residual sums of squares of the null and full models, respectively.
Predictive interpretation
A small p-value (typically < 0.05) indicates that the cross-validated predictive score(s) significantly reduce residual variance compared to an intercept-only model. This suggests that the OPLS model captures statistically meaningful predictive structure in \(Y\).
A large p-value indicates that the predictive component does not explain \(Y\) significantly better than chance, implying weak or unstable predictive performance.
Importantly, CV-ANOVA evaluates the linear explanatory power of the cross-validated predictive scores, not the descriptive separation of the latent space. A model may show visual class separation or moderate R\(^2\) yet fail CV-ANOVA if predictive performance is weak.
Sample size strongly affects statistical power. With small \(n\), even models with moderate predictive strength may not reach statistical significance.
CV-ANOVA is intended for continuous response (regression) models.
References
Eriksson, L., et al. (2008). CV-ANOVA for significance testing of PLS and OPLS models. Journal of Chemometrics, 22(11-12), 594–600.
See also
Other model_validation:
dmodx(),
opls_perm()
Examples
data("covid")
X <- covid$X
Y <- as.numeric(factor(covid$an$type)) - 1
scaling <- uv_scaling(center=TRUE)
cv <- balanced_mc(10, split=2/3, type='R', probs = c(0, 0.5, 1))
mod <- opls(X, Y, scaling, cv)
#> An O-PLS-R model with 1 predictive and 1 orthogonal components was fitted.
cv_anova(mod)
#> SS DF MS F_value p_value
#> Total corrected 2.500000 9 0.2777778 NA NA
#> Regression 1.148372 1 1.1483719 NA NA
#> Residual 1.351628 8 0.1689535 NA NA
#> RESULT NA NA NA 6.79697 0.03127089