Skip to contents

Applies probabilistic quotient normalisation (PQN) to spectra. PQN estimates a sample-specific dilution factor from the median of quotients relative to a reference spectrum and scales each spectrum accordingly.

Usage

pqn(
  X,
  ref_index = NULL,
  total_area = FALSE,
  bin = NULL,
  iref = NULL,
  TArea = NULL
)

Arguments

X

Numeric matrix or data.frame. Each row is a sample spectrum and each column is a variable (e.g. chemical shift point or bin).

ref_index

Integer vector of row indices used to compute the reference spectrum. If NULL, all rows are used.

total_area

Logical. If TRUE, total area normalisation is applied to the working copy before estimating the PQN dilution factors. See Notes.

bin

Optional named list controlling binning for reference estimation, e.g. list(ppm = ppm, width = 0.05) or list(ppm = ppm, npoints = 400). If NULL, no binning is applied.

iref

Deprecated. Use ref_index.

TArea

Deprecated. Use total_area.

Value

Numeric matrix of PQN-normalised spectra.

Details

Mechanics. Let \(x_i\) be spectrum \(i\) and \(r\) a reference spectrum. PQN computes quotients \(q_{ij} = x_{ij} / r_j\) and defines the dilution factor as \(d_i = 1 / \mathrm{median}_j(q_{ij})\). The PQN-normalised spectrum is \(x_i^{(PQN)} = d_i \, x_i\).

The reference spectrum \(r\) is typically the median spectrum across all samples or across QC samples (ref_index).

If bin is provided, \(r\) and dilution factors are computed on binned spectra, but applied to the original spectra.

Dilution factors are stored in attr(X, "m8_pqn")$dilution_factor.

Notes on total area normalisation

Total area normalisation prior to PQN is usually not recommended. Total area scaling removes global intensity differences by enforcing equal total signal per sample. PQN is itself a global scaling method intended to estimate dilution. Applying both can substantially change results because PQN no longer estimates dilution alone, but also compensates compositional distortions introduced by total area scaling.

Situations where total_area = TRUE can be defensible include:

  • when spectra have large, non-dilution-related amplitude differences caused by acquisition artefacts (receiver gain / baseline offset) and you explicitly want to stabilise the reference estimation step;

  • when the measured total signal is expected to be constant by design (e.g. strictly controlled sample mass/volume and stable overall metabolite pool), and the main goal is to reduce technical scaling variation before PQN.

In most metabolomics settings, prefer PQN without total area scaling.

#' @section On spectral alignment and binning: PQN assumes that corresponding variables represent the same chemical signal across spectra. If spectra are not well aligned, small peak shifts can inflate the variability of pointwise quotients \(x_{ij} / r_j\), leading to unstable dilution factor estimates.

In such cases, slight binning (e.g. narrow fixed-width bins) prior to reference estimation is recommended. Binning reduces sensitivity to minor misalignments by aggregating neighbouring variables. However, excessive binning may obscure narrow signals and should be avoided.

Alternatively, prior spectral alignment is preferable when available.

References

Dieterle F, Ross A, Schlotterbeck G, Senn H (2006). Probabilistic Quotient Normalization as Robust Method to Account for Dilution of Complex Biological Mixtures. Analytical Chemistry, 78(13), 4281–4290.

Examples

set.seed(1)
ppm <- seq(0, 10, length.out = 1000)
ref <- dnorm(ppm, 3, 0.15) + dnorm(ppm, 6, 0.20) + dnorm(ppm, 7.5, 0.18)
dil <- c(1, 0.8, 0.6, 0.4, 0.2)            # true dilution factors
X <- t(sapply(dil, function(d) d * ref + rnorm(length(ref), 0, 0.005)))
plot_spec(X, ppm)
Xn <- pqn(X, ref_index=1) dil_est <- attr(Xn, "m8_pqn")$dilution_factor cbind(true = dil, estimated = dil_est) #> true estimated #> [1,] 1.0 1.0000000 #> [2,] 0.8 0.7321857 #> [3,] 0.6 0.5939519 #> [4,] 0.4 0.3949580 #> [5,] 0.2 0.1986954