Impute missing values using Probabilistic PCA — impute

One of several PCA-based imputation methods. Basically a wrapper around pcaMethods::pca(method = "ppca"). For a detailed discussion, see the vignette("pcaMethods") and vignette("missingValues", "pcaMethods") as well as the References section.
In the underlying function (pcaMethods::pca(method = "ppca")), the order of columns has an influence on the outcome. Therefore, calling pcaMethods::pca(method = "ppca") on a matrix and calling metamorphr::impute() on a tidy tibble might give different results, even though they contain the same data. That is because under the hood, the tibble is transformed to a matrix prior to calling pcaMethods::pca(method = "ppca") and you have limited influence on the column order of the resulting matrix.

Important Note

impute_ppca() depends on the pcaMethods package from Bioconductor. If metamorphr was installed via install.packages(), dependencies from Bioconductor were not automatically installed. When impute_ppca() is called without the pcaMethods package installed, you should be asked if you want to install pak and pcaMethods. If you want to use impute_ppca() you have to install those. In case you run into trouble with the automatic installation, please install pcaMethods manually. See pcaMethods – a Bioconductor package providing PCA methods for incomplete data for instructions on manual installation.

Usage

impute_ppca(
  data,
  n_pcs = 2,
  center = TRUE,
  scale = "none",
  direction = 2,
  random_seed = 1L
)

Arguments

data: A tidy tibble created by read_featuretable.
n_pcs: The number of PCs to calculate.
center: Should data be mean centered? See prep for details.
scale: Should data be scaled? See prep for details.
direction: Either 1 or 2. 1 runs a PCA on a matrix with samples in columns and features in rows and 2 runs a PCA on a matrix with features in columns and samples in rows. Both are valid according to this discussion on GitHub but give different results.
random_seed: An integer used as seed for the random number generator.

Value

A tibble with imputed missing values.

References

H. R. Wolfram Stacklies, 2017, DOI 10.18129/B9.BIOC.PCAMETHODS.
W. Stacklies, H. Redestig, M. Scholz, D. Walther, J. Selbig, Bioinformatics 2007, 23, 1164–1167, DOI 10.1093/bioinformatics/btm069.

Examples

toy_metaboscape %>%
  impute_ppca()
#> # A tibble: 110 × 8
#>      UID Feature                Sample  Intensity    RT `m/z` Name       Formula
#>    <int> <chr>                  <chr>       <dbl> <dbl> <dbl> <chr>      <chr>  
#>  1     1 161.10519 Da 26.98 s   Sample1      4     0.45  162. NA         C7H15N…
#>  2     2 276.13647 Da 27.28 s   Sample1      3     0.45  277. Octyl hyd… C16H22…
#>  3     3 304.24023 Da 32.86 s   Sample1      4.95  0.55  305. Arachidon… C20H32…
#>  4     4 417.23236 Da 60.08 s   Sample1      5     1     418. NA         NA     
#>  5     5 104.10753 Da 170.31 s  Sample1      5     2.84  105. NA         C5H14NO
#>  6     6 105.04259 Da 199.80 s  Sample1      5     3.33  106. NA         C3H8NO3
#>  7     7 237.09204 Da 313.24 s  Sample1      8.22  5.22  238. Ketamine   C13H16…
#>  8     8 745.09111 Da 382.23 s  Sample1      3     6.37  746. NADPH      C21H30…
#>  9     9 427.02942 Da 424.84 s  Sample1      4     7.08  428. ADP        C10H15…
#> 10    10 1284.34904 Da 498.94 s Sample1      3.39  8.32 1285. NA         NA     
#> # ℹ 100 more rows