One of several PCA-based imputation methods. Basically a wrapper around pcaMethods::
pca
(method = "ppca")
.
For a detailed discussion, see the vignette("pcaMethods")
and vignette("missingValues", "pcaMethods")
as well as the References section.
In the underlying function (pcaMethods::
pca
(method = "ppca")
), the order of columns has an influence on the outcome. Therefore, calling pcaMethods::
pca
(method = "ppca")
on a matrix and calling metamorphr::impute()
on a tidy tibble might give different results, even though they contain the same data. That is because under the hood,
the tibble is transformed to a matrix prior to calling pcaMethods::
pca
(method = "ppca")
and you have limited influence on the column order of the
resulting matrix.
Important Note
impute_ppca()
depends on the pcaMethods
package from Bioconductor. If metamorphr
was installed via install.packages()
, dependencies from Bioconductor were not
automatically installed. When impute_ppca()
is called without the pcaMethods
package installed, you should be asked if you want to install pak
and pcaMethods
.
If you want to use impute_ppca()
you have to install those. In case you run into trouble with the automatic installation, please install pcaMethods
manually. See
pcaMethods – a Bioconductor package providing PCA methods for incomplete data for instructions on manual installation.
Usage
impute_ppca(
data,
n_pcs = 2,
center = TRUE,
scale = "none",
direction = 2,
random_seed = 1L
)
Arguments
- data
A tidy tibble created by
read_featuretable
.- n_pcs
The number of PCs to calculate.
- center
Should
data
be mean centered? Seeprep
for details.- scale
Should
data
be scaled? Seeprep
for details.- direction
Either
1
or2
.1
runs a PCA on a matrix with samples in columns and features in rows and2
runs a PCA on a matrix with features in columns and samples in rows. Both are valid according to this discussion on GitHub but give different results.- random_seed
An integer used as seed for the random number generator.
References
H. R. Wolfram Stacklies, 2017, DOI 10.18129/B9.BIOC.PCAMETHODS.
W. Stacklies, H. Redestig, M. Scholz, D. Walther, J. Selbig, Bioinformatics 2007, 23, 1164–1167, DOI 10.1093/bioinformatics/btm069.
Examples
toy_metaboscape %>%
impute_ppca()
#> # A tibble: 110 × 8
#> UID Feature Sample Intensity RT `m/z` Name Formula
#> <int> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 1 161.10519 Da 26.98 s Sample1 4 0.45 162. NA C7H15N…
#> 2 2 276.13647 Da 27.28 s Sample1 3 0.45 277. Octyl hyd… C16H22…
#> 3 3 304.24023 Da 32.86 s Sample1 4.95 0.55 305. Arachidon… C20H32…
#> 4 4 417.23236 Da 60.08 s Sample1 5 1 418. NA NA
#> 5 5 104.10753 Da 170.31 s Sample1 5 2.84 105. NA C5H14NO
#> 6 6 105.04259 Da 199.80 s Sample1 5 3.33 106. NA C3H8NO3
#> 7 7 237.09204 Da 313.24 s Sample1 8.22 5.22 238. Ketamine C13H16…
#> 8 8 745.09111 Da 382.23 s Sample1 3 6.37 746. NADPH C21H30…
#> 9 9 427.02942 Da 424.84 s Sample1 4 7.08 428. ADP C10H15…
#> 10 10 1284.34904 Da 498.94 s Sample1 3.39 8.32 1285. NA NA
#> # ℹ 100 more rows