Filters Features based on their coefficient of variation (CV). The CV is defined as \(CV = \frac{s_i}{\overline{x_i}}\) with \(s_i\) = Standard deviation of sample \(i\) and \(\overline{x_i}\) = Mean of sample \(i\).
Usage
filter_cv(
data,
reference_samples,
max_cv = 0.2,
ref_as_group = FALSE,
group_column = NULL,
na_as_zero = TRUE
)Arguments
- data
A tidy tibble created by
read_featuretable.- reference_samples
The names of the samples or group which will be used to calculate the CV of a feature. Usually Quality Control samples.
- max_cv
The maximum allowed CV. 0.2 is a reasonable start.
- ref_as_group
A logical indicating if
reference_samplesare the names of samples or group(s).- group_column
Only relevant if
ref_as_group = TRUE. Which column should be used for grouping reference and non-reference samples? Usuallygroup_column = Group. Usesargs_data_masking.- na_as_zero
Should
NAbe replaced with 0 prior to calculation? Under the hoodfilter_cvcalculates the CV bystats::sd(..., na.rm = TRUE) / mean(..., na.rm = TRUE). If there are 3 samples to calculate the CV from and 2 of them areNAfor a specific feature, then the CV for that Feature will beNAifna_as_zero = FALSE. This might lead to problems.na_as_zero = TRUEis the safer pick. Zeros will be replaced withNAafter calculation no matter if it isTRUEorFALSE.
Examples
# Example 1: Define reference samples by sample names
toy_metaboscape %>%
filter_cv(max_cv = 0.2, reference_samples = c("QC1", "QC2", "QC3"))
#> # A tibble: 33 × 8
#> UID Feature Sample Intensity RT `m/z` Name Formula
#> <int> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 3 304.24023 Da 32.86 s Sample1 NA 0.55 305. Arachidoni… C20H32…
#> 2 8 745.09111 Da 382.23 s Sample1 3 6.37 746. NADPH C21H30…
#> 3 9 427.02942 Da 424.84 s Sample1 4 7.08 428. ADP C10H15…
#> 4 3 304.24023 Da 32.86 s Sample2 2 0.55 305. Arachidoni… C20H32…
#> 5 8 745.09111 Da 382.23 s Sample2 4 6.37 746. NADPH C21H30…
#> 6 9 427.02942 Da 424.84 s Sample2 3 7.08 428. ADP C10H15…
#> 7 3 304.24023 Da 32.86 s Sample3 1 0.55 305. Arachidoni… C20H32…
#> 8 8 745.09111 Da 382.23 s Sample3 2 6.37 746. NADPH C21H30…
#> 9 9 427.02942 Da 424.84 s Sample3 4 7.08 428. ADP C10H15…
#> 10 3 304.24023 Da 32.86 s Sample4 8 0.55 305. Arachidoni… C20H32…
#> # ℹ 23 more rows
# Example 2: Define reference samples by group name
toy_metaboscape %>%
join_metadata(toy_metaboscape_metadata) %>%
filter_cv(max_cv = 0.2, reference_samples = "QC", ref_as_group = TRUE, group_column = Group)
#> # A tibble: 33 × 12
#> UID Feature Sample Intensity RT `m/z` Name Formula Group Replicate
#> <int> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <int>
#> 1 3 304.24023 D… Sampl… NA 0.55 305. Arac… C20H32… cont… 1
#> 2 8 745.09111 D… Sampl… 3 6.37 746. NADPH C21H30… cont… 1
#> 3 9 427.02942 D… Sampl… 4 7.08 428. ADP C10H15… cont… 1
#> 4 3 304.24023 D… Sampl… 2 0.55 305. Arac… C20H32… cont… 2
#> 5 8 745.09111 D… Sampl… 4 6.37 746. NADPH C21H30… cont… 2
#> 6 9 427.02942 D… Sampl… 3 7.08 428. ADP C10H15… cont… 2
#> 7 3 304.24023 D… Sampl… 1 0.55 305. Arac… C20H32… cont… 3
#> 8 8 745.09111 D… Sampl… 2 6.37 746. NADPH C21H30… cont… 3
#> 9 9 427.02942 D… Sampl… 4 7.08 428. ADP C10H15… cont… 3
#> 10 3 304.24023 D… Sampl… 8 0.55 305. Arac… C20H32… trea… 1
#> # ℹ 23 more rows
#> # ℹ 2 more variables: Batch <int>, Factor <dbl>
