
Normalize intensities across samples using a Probabilistic Quotient Normalization (PQN)
Source:R/normalize.R
normalize_pqn.Rd
This method was originally developed for H-NMR spectra of complex biofluids but has been adapted for other 'omics data. It aims to eliminate dilution effects by calculating the most probable dilution factor for each sample, relative to one or more reference samples. See references for more details.
Usage
normalize_pqn(
data,
fn = "median",
normalize_sum = TRUE,
reference_samples = NULL,
ref_as_group = FALSE,
group_column = NULL
)
Arguments
- data
A tidy tibble created by
read_featuretable
.- fn
Which function should be used to calculate the reference spectrum from the reference samples? Can be either "mean" or "median".
- normalize_sum
A logical indicating whether a sum normalization (aka total area normalization) should be performed prior to PQN. It is recommended to do so and other packages (e.g., KODAMA) also perform a sum normalization prior to PQN.
- reference_samples
Either
NULL
or a character or character vector containing the sample(s) to calculate the reference spectrum from. In the original publication, it is advised to calculate the median of control samples. IfNULL
, all samples will be used to calculate the reference spectrum.- ref_as_group
A logical indicating if
reference_samples
are the names of samples or group(s).- group_column
Only relevant if
ref_as_group = TRUE
. Which column should be used for grouping reference and non-reference samples? Usuallygroup_column = Group
. Usesargs_data_masking
.
References
F. Dieterle, A. Ross, G. Schlotterbeck, H. Senn, Anal. Chem. 2006, 78, 4281–4290, DOI 10.1021/ac051632c.
Examples
# specify the reference samples with their sample names
toy_metaboscape %>%
impute_lod() %>%
normalize_pqn(reference_samples = c("QC1", "QC2", "QC3"))
#> # A tibble: 110 × 8
#> UID Feature Sample Intensity RT `m/z` Name Formula
#> <int> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 1 161.10519 Da 26.98 s Sample1 0.115 0.45 162. NA C7H15N…
#> 2 2 276.13647 Da 27.28 s Sample1 0.0862 0.45 277. Octyl hyd… C16H22…
#> 3 3 304.24023 Da 32.86 s Sample1 0.00575 0.55 305. Arachidon… C20H32…
#> 4 4 417.23236 Da 60.08 s Sample1 0.144 1 418. NA NA
#> 5 5 104.10753 Da 170.31 s Sample1 0.144 2.84 105. NA C5H14NO
#> 6 6 105.04259 Da 199.80 s Sample1 0.144 3.33 106. NA C3H8NO3
#> 7 7 237.09204 Da 313.24 s Sample1 0.0402 5.22 238. Ketamine C13H16…
#> 8 8 745.09111 Da 382.23 s Sample1 0.0862 6.37 746. NADPH C21H30…
#> 9 9 427.02942 Da 424.84 s Sample1 0.115 7.08 428. ADP C10H15…
#> 10 10 1284.34904 Da 498.94 s Sample1 0.0115 8.32 1285. NA NA
#> # ℹ 100 more rows
# specify the reference samples with their group names
toy_metaboscape %>%
join_metadata(toy_metaboscape_metadata) %>%
impute_lod() %>%
normalize_pqn(reference_samples = c("QC"), ref_as_group = TRUE, group_column = Group)
#> # A tibble: 110 × 12
#> UID Feature Sample Intensity RT `m/z` Name Formula Group Replicate
#> <int> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <int>
#> 1 1 161.10519 D… Sampl… 0.115 0.45 162. NA C7H15N… cont… 1
#> 2 2 276.13647 D… Sampl… 0.0862 0.45 277. Octy… C16H22… cont… 1
#> 3 3 304.24023 D… Sampl… 0.00575 0.55 305. Arac… C20H32… cont… 1
#> 4 4 417.23236 D… Sampl… 0.144 1 418. NA NA cont… 1
#> 5 5 104.10753 D… Sampl… 0.144 2.84 105. NA C5H14NO cont… 1
#> 6 6 105.04259 D… Sampl… 0.144 3.33 106. NA C3H8NO3 cont… 1
#> 7 7 237.09204 D… Sampl… 0.0402 5.22 238. Keta… C13H16… cont… 1
#> 8 8 745.09111 D… Sampl… 0.0862 6.37 746. NADPH C21H30… cont… 1
#> 9 9 427.02942 D… Sampl… 0.115 7.08 428. ADP C10H15… cont… 1
#> 10 10 1284.34904 … Sampl… 0.0115 8.32 1285. NA NA cont… 1
#> # ℹ 100 more rows
#> # ℹ 2 more variables: Batch <int>, Factor <dbl>