Applies probabilistic quotient normalisation (PQN) to spectra. PQN estimates a sample-specific dilution factor from the median of quotients relative to a reference spectrum and scales each spectrum accordingly.
Arguments
- X
Numeric matrix or data.frame. Each row is a sample spectrum and each column is a variable (e.g. chemical shift point or bin).
- ref_index
Integer vector of row indices used to compute the reference spectrum. If
NULL, all rows are used.- total_area
Logical. If
TRUE, total area normalisation is applied to the working copy before estimating the PQN dilution factors. See Notes.- bin
Optional named list controlling binning for reference estimation, e.g.
list(ppm = ppm, width = 0.05)orlist(ppm = ppm, npoints = 400). IfNULL, no binning is applied.- iref
Deprecated. Use
ref_index.- TArea
Deprecated. Use
total_area.
Details
Mechanics. Let \(x_i\) be spectrum \(i\) and \(r\) a reference spectrum. PQN computes quotients \(q_{ij} = x_{ij} / r_j\) and defines the dilution factor as \(d_i = 1 / \mathrm{median}_j(q_{ij})\). The PQN-normalised spectrum is \(x_i^{(PQN)} = d_i \, x_i\).
The reference spectrum \(r\) is typically the median spectrum across all samples
or across QC samples (ref_index).
If bin is provided, \(r\) and dilution factors are computed on binned spectra,
but applied to the original spectra.
Dilution factors are stored in attr(X, "m8_pqn")$dilution_factor.
Notes on total area normalisation
Total area normalisation prior to PQN is usually not recommended. Total area scaling removes global intensity differences by enforcing equal total signal per sample. PQN is itself a global scaling method intended to estimate dilution. Applying both can substantially change results because PQN no longer estimates dilution alone, but also compensates compositional distortions introduced by total area scaling.
Situations where total_area = TRUE can be defensible include:
when spectra have large, non-dilution-related amplitude differences caused by acquisition artefacts (receiver gain / baseline offset) and you explicitly want to stabilise the reference estimation step;
when the measured total signal is expected to be constant by design (e.g. strictly controlled sample mass/volume and stable overall metabolite pool), and the main goal is to reduce technical scaling variation before PQN.
In most metabolomics settings, prefer PQN without total area scaling.
#' @section On spectral alignment and binning: PQN assumes that corresponding variables represent the same chemical signal across spectra. If spectra are not well aligned, small peak shifts can inflate the variability of pointwise quotients \(x_{ij} / r_j\), leading to unstable dilution factor estimates.
In such cases, slight binning (e.g. narrow fixed-width bins) prior to reference estimation is recommended. Binning reduces sensitivity to minor misalignments by aggregating neighbouring variables. However, excessive binning may obscure narrow signals and should be avoided.
Alternatively, prior spectral alignment is preferable when available.
References
Dieterle F, Ross A, Schlotterbeck G, Senn H (2006). Probabilistic Quotient Normalization as Robust Method to Account for Dilution of Complex Biological Mixtures. Analytical Chemistry, 78(13), 4281–4290.
See also
Other preprocessing:
align_segment(),
align_spectra(),
binning(),
calibrate(),
correct_baseline(),
correct_lw(),
print_preprocessing()
Examples
set.seed(1)
ppm <- seq(0, 10, length.out = 1000)
ref <- dnorm(ppm, 3, 0.15) + dnorm(ppm, 6, 0.20) + dnorm(ppm, 7.5, 0.18)
dil <- c(1, 0.8, 0.6, 0.4, 0.2) # true dilution factors
X <- t(sapply(dil, function(d) d * ref + rnorm(length(ref), 0, 0.005)))
plot_spec(X, ppm)
Xn <- pqn(X, ref_index=1)
dil_est <- attr(Xn, "m8_pqn")$dilution_factor
cbind(true = dil, estimated = dil_est)
#> true estimated
#> [1,] 1.0 1.0000000
#> [2,] 0.8 0.7321857
#> [3,] 0.6 0.5939519
#> [4,] 0.4 0.3949580
#> [5,] 0.2 0.1986954