One of several PCA-based imputation methods. Basically a wrapper around pcaMethods::pca(method = "ppca").
For a detailed discussion, see the vignette("pcaMethods") and vignette("missingValues", "pcaMethods") as well as the References section.
In the underlying function (pcaMethods::pca(method = "ppca")), the order of columns has an influence on the outcome. Therefore, calling pcaMethods::pca(method = "ppca")
on a matrix and calling metamorphr::impute() on a tidy tibble might give different results, even though they contain the same data. That is because under the hood,
the tibble is transformed to a matrix prior to calling pcaMethods::pca(method = "ppca") and you have limited influence on the column order of the
resulting matrix.
Important Note
impute_ppca() depends on the pcaMethods package from Bioconductor. If metamorphr was installed via install.packages(), dependencies from Bioconductor were not
automatically installed. When impute_ppca() is called without the pcaMethods package installed, you should be asked if you want to install pak and pcaMethods.
If you want to use impute_ppca() you have to install those. In case you run into trouble with the automatic installation, please install pcaMethods manually. See
pcaMethods – a Bioconductor package providing PCA methods for incomplete data for instructions on manual installation.
Usage
impute_ppca(
data,
n_pcs = 2,
center = TRUE,
scale = "none",
direction = 2,
random_seed = 1L
)Arguments
- data
A tidy tibble created by
read_featuretable.- n_pcs
The number of PCs to calculate.
- center
Should
databe mean centered? Seeprepfor details.- scale
Should
databe scaled? Seeprepfor details.- direction
Either
1or2.1runs a PCA on a matrix with samples in columns and features in rows and2runs a PCA on a matrix with features in columns and samples in rows. Both are valid according to this discussion on GitHub but give different results.- random_seed
An integer used as seed for the random number generator.
References
H. R. Wolfram Stacklies, 2017, DOI 10.18129/B9.BIOC.PCAMETHODS.
W. Stacklies, H. Redestig, M. Scholz, D. Walther, J. Selbig, Bioinformatics 2007, 23, 1164–1167, DOI 10.1093/bioinformatics/btm069.
Examples
toy_metaboscape %>%
impute_ppca()
#> # A tibble: 110 × 8
#> UID Feature Sample Intensity RT `m/z` Name Formula
#> <int> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 1 161.10519 Da 26.98 s Sample1 4 0.45 162. NA C7H15N…
#> 2 2 276.13647 Da 27.28 s Sample1 3 0.45 277. Octyl hyd… C16H22…
#> 3 3 304.24023 Da 32.86 s Sample1 4.95 0.55 305. Arachidon… C20H32…
#> 4 4 417.23236 Da 60.08 s Sample1 5 1 418. NA NA
#> 5 5 104.10753 Da 170.31 s Sample1 5 2.84 105. NA C5H14NO
#> 6 6 105.04259 Da 199.80 s Sample1 5 3.33 106. NA C3H8NO3
#> 7 7 237.09204 Da 313.24 s Sample1 8.22 5.22 238. Ketamine C13H16…
#> 8 8 745.09111 Da 382.23 s Sample1 3 6.37 746. NADPH C21H30…
#> 9 9 427.02942 Da 424.84 s Sample1 4 7.08 428. ADP C10H15…
#> 10 10 1284.34904 Da 498.94 s Sample1 3.39 8.32 1285. NA NA
#> # ℹ 100 more rows
