Balanced Monte-Carlo resampling strategy
Usage
balanced_mc(k, split, type = c("DA", "R"), probs = NULL)Value
A named list with elements:
- train
List of integer vectors containing training set indices for each resampling iteration.
- strategy
Character string indicating the resampling strategy.
- n
Integer. Number of samples in the dataset.
- seed
Integer. Random seed used to generate the resampling splits, ensuring reproducibility.
Details
Generates k Monte-Carlo resampling splits by randomly partitioning
the data into training and test sets without replacement.
Balancing ensures equal representation of strata in the training data:
type = "DA"Class labels define the strata, and sampling is balanced across classes.
type = "R"The response is discretised into bins using quantiles defined by
probs, and each bin contributes equally to the training set.
This strategy can improve robustness of model evaluation in settings with limited samples size and imbalanced or unevenly distributed outcome variables.
See also
Other resampling strategies:
balanced_boot(),
kfold(),
mc(),
stratified_kfold()
Examples
n <- 100
# bivariate outcome
thr <- 1.5
Y <- c(rnorm(80, thr-3, 0.3), rnorm(20, thr+3, 0.3)) # unbalanced low/high outcome
mean(Y>thr)
#> [1] 0.2
cv_k <- kfold(k = 10)
cv_mc <- balanced_mc(k = 10, split = 2/3, type = "R", probs = c(0, 0.8, 1))
k_inst <- metabom8:::.arg_check_cv(cv_pars=cv_k, model_type='R', n=n, Y_prepped=cbind(Y))
mc_inst <- metabom8:::.arg_check_cv(cv_pars=cv_mc, model_type='R', n=n, Y_prepped=cbind(Y))
# balanced splits: proportion above global median stays ~0.5
q80 <- quantile(Y, 0.8)
round(sapply(k_inst$train, function(i) mean(Y[i] > q80)), 2) # resembles original Y distr.
#> [1] 0.19 0.21 0.21 0.20 0.20 0.19 0.18 0.22 0.19 0.21
round(sapply(mc_inst$train, function(i) mean(Y[i] > q80)), 2) # balanced strata (low/high)
#> [1] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5