ppm, peakwidth, snthresh, prefilter, mzCenterFun, integrate, mzdiff, fitgauss, scanrange, noise.

 

All centWave parameters require fine-tuning for each assay type and mass analyser. The following provides a short description of each parameter value and its relevance in the centWave algorithm.

 

Parameter Description Pre-adjusted value
ppm Allowed signal deviation in m/z dimension 25
peakwidth Range of peak elution times (in seconds) 20-50 s
snthresh Threshold signal to noise ratio 10
prefilter Peak definition: Number of data points (n) exceeding a certain intensity threshold (I) k=3, I=100
mzdiff Accepted closeness of two signals in m/z dimension -0.001
noise Intensity cut-off, values below are considered as instrument noise 0
mzCenterFun Function to calculate the m/z center of a chromatographic peak weighted Mean (wMean)
integrate Integration method for peak quantification: 1 - Mexican hat filtered data, 2 - real data 1
fitgauss Peak parameterisation using Gaussian distribution FALSE
scanrange Perform peak picking in a scan range interval numeric(0)

 

 

 

ppm: Allowed signal deviation in m/z dimension

This centWave parameter ppm specifies the tolerance in m/z values for defining a signal in m/z dimension. This parameter is closely related to the mass accuracy of the mass spectrometer, which is traditionally expressed in parts per million (ppm).

Higher ppm values allow more m/z value variablity and lead to detection of a higher number of features (potential peaks), increasing false positives. Lower values can lead to missing out of signals, since the measured mass values may deviate from the true mass more than expected (this is common).

The illustration below shows a peak picking example using the same LC-MS data, where ppm parameter value was varied while all other centWave parameters were held constant.

 

 

 

mzdiff: Accepted closeness of two signals in m/z dimension

The mzdiff parameter specifies the allowed minimum distance of two co-eluting peaks in m/z dimension. An mzdiff value of 1 indicates that the m/z value of two signals with overlapping scan time (=retention time) be at least 1 m/z, in order for both signals to be included in the result peak list.

The centWave mzdiff parameter can also take negative values, indicating that the same data point can be allocated to two different peaks. Assigning negative mzdiff values has implications for further downstream processing steps, e.g., establishing correspondence of overlapping peaks across different samples. Higher mzdiff parameter values lead to detecting more features (potential signals), lower values reduce the number of features detected.

Below is an example using the same data processed with mzdiff values of 1 and 0.01.

 

 

 

noise: Intensity cut-off, values below are not considered

The ion detection in most mass spectrometers is accomplished with electron multipliers. These instruments are highly sensitive and produce electronic noise, which is visible as (usually) random data points below a certain intensity cut-off.

The noise structure of LC-MS spectra generated with mass specs of different types and from different vendors (incl. software updates) can be inherently different, with mz-value dependent noise intensities. Therefore, this parameter requires careful adjustment for each mass spectrometer setup. Lower values increase centWave computation time, higher values lead to missing out true ion signals.

 

 

 

snthresh: Threshold of signal to noise ratio

Closely related to the noise structure is the snthresh parameter, allowing to set a minimal signal to noise ratio (S/N) for peaks to be detected. snthresh builds on an S/N estimate that defines noise intensities locally in the scan time dimension:

This S/N definition can be problematic when the noise structure is not evenly distributed across the m/z dimension. The example below shows that higher intensity signals are discarded with higher snthresh values, most likely due to noisy and high intensity data points in proximity to the signal’s m/z trace. The final peak list using a high snthresh parameter value contains signals of low intensities, which is somewhat unexpected. UPDATE: THIS BEHAVIOUR IS ONLY OBSERVED WHEN CENTWAVE FUNCTION IS PARAMETERISED WITH THE SCANRANGE ARGUMENT (such as in MSbrowser).

 

 

 

peakwidth: Range of peak elution times

The peakwidth parameter specifies the minimum and maximum peak elution time in seconds. The optimal peakwidth parameter values are related to the number of performed scans on MS 1 level (this is an instrument setting set by the mass spectrometrist). Literature suggests a minimum number of six data points per peak in order to obtain reliable peak quantifications. Example: If a low intensity compound elutes over 1s, then the number of instrument scans should be at least 6 per second.

Setting the minimum (maximum) peakwidth argument to higher (lower) values reduces the overall number of detected peak as low (high) intensity ions with short (long) elution time are not detected. To find optimal peakwidth parameter values, it is useful to visually inspect elution times for high and low intensity ions in different spectral regions.

The example below shows the results of centWave peak picking performed with different peakwidth parameter values and where all other parameters were held constant.

An unexpected algorithm behaviour was observed when setting the minimal elution time to a value of 1, which resulted in peak splitting of a coherent signal which was not split with higher peakwidth values.

 

 

 

prefilter: Number of data points (k) exceeding a certain intensity threshold (I)

The prefilter parameter is similar to the noise parameter, but instead of specifying an intensity cut-off, it specifies a minimal number of data points (k) that exceed a certain intensity value (I). A signal is discarded if it is represented by less than k consecutive data points of intensity I.

Lower prefilter parameter values perform less filtering of signals with low intensity / scan time, and therefore, reduce computation time. The values of this parameter strongly depend on the characteristics of well-behaved LC-MS signals. This parameter should not be set to stringent as it can lead to discarding true positive signals.

 

 

 

mzCenterFun: m/z summary statistic of a peak

In the final LC-MS feature table, each feature is characterised by a scan time and m/z value. The latter represents a summary statistic of all data points defining a feature, where each one has a slightly different m/z value (see also ppm parameter). The mzCenterFun parameter specifies the m/z summary statistic of a feature. In practice, the different mzCenterFun options have very little impact.

 

 

 

integrate: Integration method for peak quantification

Signal quantification is performed by integrating the area under a peak. The integrate function specifies if the non-transformed data should be used for peak integration or if a wavelet-transform filtered data should be used (upper and lower panel, respectively, in the plot below). The former is sensitive to outliers and noise, since centWave defines left and right peak boundaries through change in slope (see Figure below). The latter is less sensitive to noise, however, the wavelet-filtered data is less exact and can lead to signal mis-representations.

 

 

 

fitgauss: Peak parameterisation using Gaussian distribution

A Gaussian function resembles a characteristic symmetric “bell curve” shape, which is often described as a normal distribution. If this centWave parameter option is set to TRUE, a Gaussian is fitted to each feature. According to the xcms documentation this parametric approach of peak quantification “affects mostly the retention time position of the peak.”

 

 

 

scanrange: Perform peak picking in a specific scan range

The scanrange parameter allows to define a reduced scan/retention time interval for which peak picking will be performed over the entire m/z range. Spectral data outside this scan/retention time interval are not considered. The scanrange parameter takes scan ID values (it is not specified in seconds). MSbrowser uses the scanrange argument by default to speed up computation time.