mvsIn biomedical science, a set of objects or persons can often be
described by multiple distinct sets of features (also called
independent variables or predictors) obtained from
different data sources or modalities (Li et al.
2018). These feature sets are said to provide different
views of the objects or persons under consideration. Data sets
consisting of multiple views are called multi-view data (Zhao et al. 2017; Sun et al. 2019; Smilde et al.
2022)1. Classical machine learning methods ignore
the multi-view structure of such data, limiting model interpretability
and performance. The R package mvs provides methods that
were designed specifically for dealing with multi-view data, based on
the multi-view stacking (MVS) framework (Li et al. 2011; Garcia-Ceja et al. 2018; Van Loon et
al. 2020). MVS is a form of supervised2 (machine) learning
used to train multi-view classification or prediction models. MVS works
by training a learning algorithm (the base-learner) on each
view separately, estimating the predictive power of each view-specific
model through cross-validation, and then using another learning
algorithm (the meta-learner) to assign weights to the
view-specific models based on their estimated predictions. MVS is a form
of ensemble learning, dividing the large multi-view learning
problem into smaller sub-problems. Most of these sub-problems can be
solved in parallel, making it computationally attractive. Additionally,
the number of features of the sub-problems is greatly reduced compared
with the full multi-view learning problem. This makes MVS especially
useful when the total number of features is larger than the number of
observations (i.e., high-dimensional data). MVS can still be
applied even if the sub-problems are themselves high-dimensional by
adding suitable penalty terms to the learning algorithms (Van Loon et al. 2020). Furthermore, MVS can be
used to automatically select the views which are most important for
prediction (Van Loon et al. 2020). The R
package mvs makes fitting MVS models, including such
penalty terms, easily and openly accessible. mvs allows for
the fitting of stacked models with any number of levels, with different
penalty terms, different outcome distributions, and provides several
options for missing data handling.
Consider the following hypothetical example: We want to build a model that can predict whether or not a person has Alzheimer’s disease based on different sources of data. Additionally, we want to find out which sources are most predictive of the outcome. That way, if certain sources turn out not to be predictive, they can be omitted from future data collection. We have collected data from 200 research subjects, half of which have been diagnosed with Alzheimer, and half are healthy controls. For each subject, we have the following sources of data available:
Now we could just collect all the features (independent variables) from these three views together into a single data frame or matrix and fit a ‘traditional’ feature-selecting model on the complete data. Combining the views together like this is called feature concatenation. However, this approach typically ignores the multi-view structure of the data. To take the multi-view structure into account, we can instead fit a multi-view stacking (MVS) model. Multi-view stacking has several advantages over feature concatenation:
A multi-view stacking model fitted on this data could look something like Figure @ref(fig:mvs-diag).
A simple graphic representation of a multi-view stacking model including 3 views: structural MRI, functional MRI, and genetic information. A sub-model is fitted on each view separately, and the predictions of these sub-models are combined by the meta-learner into a single prediction. Note that the n persons are the same persons for each view.
Multi-view stacking is a very broad framework: Any suitable learning algorithm can be chosen as the base and meta-learner. In this case, the outcome variable is binary (yes/no), so we will want to use some sort of classifier. Additionally, suitable penalty terms could be added to these classifiers to automatically select the views that are most important for prediction. Stacked penalized logistic regression (StaPLR) is a form of multi-view stacking where we use penalized logistic regression as both the base and meta-learner. If we use a penalty term that induces selection (such as the lasso (Tibshirani 1996)) in the meta-learner, it will automatically select the most important views. We also typically put additional nonnegativity constraints on the meta-learner, for both technical reasons and to improve interpretability (Van Loon et al. 2020). Since the views most likely contain many features (data obtained from fMRI scans can contain millions of features (Van Loon et al. 2022)), we may apply a penalty term in the base-learner that induces shrinkage (such as ridge regression). If we apply StaPLR to the data, we may find that one of the views is not predictive of the outcome, like in Figure @ref(fig:staplr-diag).
A simple graphic representation of how StaPLR can perform automatic view selection. In this (hypothetical) example, functional MRI was discarded from the model because it was not sufficiently predictive of the outcome in the presence of the other two views.
Of course, Figures @ref(fig:mvs-diag) and @ref(fig:staplr-diag) are simplified and represent only the trained model. The training process itself is slightly more complicated, due to the inclusion of a cross-validation step. A more technically accurate description of MVS is given in Algorithm 1.
We denote the views by \(X^{(v)}, v = 1
\dots V\). The outcome variable is denoted by \(y\). Learning algorithms are denoted by the
letter \(A\), and classifiers by the
letter \(f\). For each of the views, a
trained classifier \(\hat{f}_v\) is
obtained by applying the base-learning algorithm \(A_{\text{b}}\) to the \(X^{(v)}\) and the outcome \(y\). We then apply \(k\)-fold cross-validation for each of these
base-classifiers to obtain a vector of estimated out-of-sample
predictions which we denote by \(z^{(v)}\). We assume (and recommend) that
these predictions take the form of predicted probabilities instead of
hard class labels. The vectors \(z^{(v)}, v =
1 \dots V\), are concatenated column-wise (i.e.,
cbind) into the matrix \(Z\). The matrix \(Z\) is then used together with outcome
\(y\) to train the meta-learning
algorithm \(A_{\text{m}}\) and obtain
the meta-classifier \(\hat{f}_{\text{meta}}\). The final
classifier is then given by using the output of the base-level
classifiers as the input for the meta-classifier. This whole process can
also be represented as a flowchart, which is shown in Figure
@ref(fig:staplr-flow).
The MVS algorithm represented as a flow diagram. StaPLR denotes the special case where all learners are penalized logistic regression learners. Figure adapted from (Van Loon, Fokkema, De Vos, et al. 2024)
Further technical details about MVS/StaPLR can be found in (Van Loon et al. 2020). For other possible choices of the meta-learner see (Van Loon, Fokkema, Szabo, et al. 2024). Although the example discussed above is hypothetical, a real application of multi-view stacking for Alzheimer’s disease classification on the basis of MRI data is described in (Van Loon et al. 2022); this paper also generalized multi-view stacking to more than two levels. For multi-view stacking with missing data see (Van Loon, Fokkema, De Vos, et al. 2024).
mvs?The multi-view stacking methodology was introduced by Li et al. (2011), and recently popularized by Garcia-Ceja et al. (2018), but no software
packages implementing the methodology were made publicly available. We
further investigated and extended the methodology, and found that it has
many favorable properties (Van Loon et al. 2020,
2024, 2022; Van Loon, Fokkema, De Vos, et al. 2024). A small
number of packages for multi-view learning were available on the
Comprehensive R Archive Network (CRAN) at the time of the first release
of mvs: Spectrum (John
and Watson 2020), LUCIDus (Zhao 2022) and multiview (Ding et al. 2023), but none of these packages
included multi-view stacking. Note that earlier development versions of
mvs were also called multiview, but the name
has been changed since it was claimed by a different package (Ding et al. 2023) which has no relation to
multi-view stacking. Development versions of mvs have been
used, among other applications, to analyze multi-view gene expression
(Van Loon et al. 2020, 2024) and
multi-view magnetic resonance imaging (MRI) data (Van Loon et al. 2022; Van Loon, Fokkema, De Vos, et
al. 2024).
mvs is available on CRAN and GitLab. Below, we briefly
discuss the core functionality of the package. For examples of usage see
Using mvs
step-by-step. A detailed overview of all the function arguments can
be found in the package
reference manual.
The primary application of mvs is to fit multi-view
stacking (MVS) models. The implementation of MVS is based on an
extension of the Stacked Penalized Logistic Regression (StaPLR)
algorithm (Van Loon et al. 2020).
mvs features two main functions for fitting MVS models:
StaPLR is used to fit penalized and stacked
penalized regression models with up to two levels. The minimum required
input is the total feature matrix (x), the outcome variable
(y), and a vector denoting to which view each feature
corresponds (view_index). The StaPLR function
has a few special options unique to models with only two
levels.
MVS is used to fit multi-view stacking models with
two or more levels. The minimum required input is the total feature
matrix (x), the outcome variable (y), and
either a vector (if there are only two levels) or a matrix of dimensions
[\(\textit{number of features} \times
(\textit{number of levels} - 1)\)] denoting to which view each
feature corresponds at each level (views). MVS models with
more than are appropriate when the data have a hierarchical multi-view
structure, that is, the features are nested in views, which are
themselves nested in larger views, and so on (Van
Loon et al. 2022).
For more technical arguments see the documentation included with
mvs. The individual sub-problems are optimized using
coordinate descent via the R package glmnet (Friedman et al. 2010). Users of
glmnet will feel right at home since mvs uses
a very similar syntax.
One of the main advantages of MVS is that, at each level of the
hierarchy, all sub-problems are independent. This means that
these sub-problems can be calculated in parallel. mvs
supports parallel computation through foreach (Microsoft and Weston 2022b), assuming a
parallel back-end is registered (for more
information about registering a parallel back-end in R see Weston and
Calaway 2015). Parallel computation can be enabled using the
function argument parallel.
mvs can be used to model outcome variables with different
distributions. Binomial, Gaussian and Poisson distributions are
currently supported through the function argument
family.mvs supports the use of model
relaxation (as used in, e.g., the relaxed lasso (Hastie et al. 2017)). Model relaxation can be
enabled for the entire hierarchy, or only for specific levels, through
the function argument relax. Use of model relaxation is
generally only sensible if alpha > 0.mvs supports the use of adaptive
weights (as used in, e.g., the adaptive lasso (Zou 2006)). Adaptive weights can be enabled for
the entire hierarchy, or only for specific levels, through the function
argument adaptive. Adaptive weights are initialized using
ridge regression as described in Van Loon,
Fokkema, Szabo, et al. (2024). Use of adaptive weights is
generally only sensible if alpha > 0.mvs supports the use of random
forests (Breiman 2001; Liaw and Wiener
2002b) as base or meta-learner(s).In a two-level StaPLR model, the meta-level regression coefficient of
each view can be used as a measure of that view’s importance, since
these regression coefficients are effectively on the same scale (Van Loon et al. 2020). In hierarchical
StaPLR/MVS models with more than two levels this does not necessarily
apply, since these coefficients may correspond to different sub-models
at different levels of the hierarchy. The minority report
measure (MRM) was developed to quantify the importance of a view at
any level of the hierarchy (Van Loon et al.
2022). The MRM quantifies how much the prediction of the complete
stacked model changes as the view-specific prediction of view i
changes from a (default value 0) to b (default value
1), while the other predictions are kept constant (the recommended value
being the mean of the outcome variable) (Van Loon
et al. 2022). As of version 2.0.0, the MRM can be calculated
using MRM.
In practice, it is likely that not all views are measured for all observations. When a view is missing for some observations, typical approaches are:
The first approach is not recommended since it is very wasteful;
often more values are removed through list-wise deletion than were
initially missing. The second approach is preferable, but often
unrealistic. For example, if the missing view is an MRI scan, this means
having to impute millions of values, often from high-dimensional
observed data, which is computationally infeasible for all but the
simplest imputation algorithms (Van Loon,
Fokkema, De Vos, et al. 2024). As of version 2.0.0,
mvs therefore supports a third option: meta-level
imputation. Instead of imputing the raw data, meta-level imputation
uses the complete observations for each view to generate cross-validated
predictions, and performs imputation in the reduced space (see Figure
@ref(fig:missing-data)).
A simple graphic representation of meta-level imputation. Assume, for example, that the three views consist of, respectively, 100, 1000 and 10,000 features. Now, say that there are 10 observations which have missing values on view \(X^{(2)}\). Then in traditional imputation we would have to impute 10 × 1000 = 10,000 values whereas in list-wise deletion 10 × (100 + 10,000) = 101,000 values would be deleted even though they were observed. However, in meta-level imputation only 10 values have to be imputed, and no observed values are deleted. Figure adapted from (Van Loon, Fokkema, De Vos, et al. 2024).
This is much faster than traditional imputation and leads to comparable performance (Van Loon, Fokkema, De Vos, et al. 2024). It allows the use of state-of-the-art imputation algorithms which would otherwise be too computationally intensive (Van Loon, Fokkema, De Vos, et al. 2024). The following imputation methods are currently supported:
mean performs meta-level (unconditional) mean
imputation.mice performs meta-level predictive mean matching. It
requires the R package mice (Van
Buuren and Groothuis-Oudshoorn 2011).missForest performs meta-level missForest imputation.
It requires the R package missForest (Stekhoven and Bühlmann 2012).Additionally, mvs includes the option to ‘pass’ the
missing values through to the meta-level without imputing them, allowing
the user to use a different imputation scheme of their choice. Options
for missing data handling can be specified directly through model
fitting using the arguments na.action and
na.arguments. For more details about meta-level imputation
see Van Loon, Fokkema, De Vos, et al.
(2024).
mvs step-by-stepIn this section, we cover how to use the basic functions of
mvs in practice, and show some example usage on small
simulated data sets.
The current stable release can be installed directly from CRAN:
The current development version can be installed from GitLab using
package devtools:
The package can then be loaded using:
We first generate a very simple simulated data set:
set.seed(123)
n <- 100
X <- matrix(rnorm(8500), nrow=n, ncol=85)
b <- c(rep(10, 65), rep(0, 20)) * ((rbinom(85, 1, 0.5)*2)-1)
eta <- X %*% b
p <- 1 /(1 + exp(-eta))
y <- rbinom(n, 1, p)
views <- c(rep(1,45), rep(2,20), rep(3,20))This data set consists of 100 observations of 3 views. View 1 contains 45 features whereas View 2 and View 3 contain 20 features each. The first two views are truly related to the outcome, whereas View 3 is just noise. Now we will apply a simple 2-level MVS model to the data, like so:
Argument x is the full data matrix, and y
is the outcome variable. Argument views is a vector which
denotes to which view each feature belongs. Argument alphas
defines the penalty parameter for each level. Without going into much
technical detail, a value of 0 means shrinkage is applied (akin to ridge
regression (Hoerl and Kennard 1970; Le Cessie and
Van Houwelingen 1992)), while a value of 1 means selection is
applied (akin to the lasso (Tibshirani
1996)), while a value in between corresponds to the so-called
‘elastic net’ (Zou and Hastie 2005). So,
alphas=c(0,1) means we will select or discard complete
views. Finally, we use family="binomial" since the outcome
variable is binary. For other types of outcome variables we might use
the gaussian or poisson family. Note that for
the last two arguments we are using the default values, so they could
have been omitted from the call like so:
Instead of MVS() we could have also used
mvs() since they refer to the same function. Since this
model has only two levels, we could have also used the
StaPLR() function to fit the model. However, we typically
recommend using the MVS() function, since it is more
general. Fitting the model will show a progress bar in the R console.
Once the model has been fitted, we can extract model coefficients using
coef(). In this case we are primarily interested in the
coefficients at the second (i.e., meta) level:
coef(fit)$'Level 2'
#> [[1]]
#> 4 x 1 sparse Matrix of class "dgCMatrix"
#> lambda.min
#> (Intercept) -4.019245
#> V1 3.739578
#> V2 4.038358
#> V3 .These coefficients show that View 1 and View 2 were selected, while
View 3 was discarded. Note that since all inputs to the meta-learner are
on the same scale, these coefficients can be interpreted as-is without
any further need for standardization. Their interpretation is the same
as in any other logistic regression model (i.e., as predicted changes in
the log-odds). The base-level coefficients can be viewed using
coef(fit)$'Level 1', but since there are 85 of them, we
will not print them here. The fitted model can also be used to predict
the outcome for new observations. For example, if we have the following
two new observations:
We can obtain their predicted outcomes using
Note that since we are using a probabilistic classifier, the predicted outcomes are probabilities rather than class labels. If we want class labels instead, we can use
By default, mvs uses the extended StaPLR algorithm to
fit the learners, which means all the sub-models are generalized linear
models (GLMs). Depending on the outcome variable, we can set the
family argument to either gaussian,
binomial or poisson. In addition to GLMs,
mvs also supports the use of random forests (Breiman 2001; Liaw and Wiener 2002b). To use
random forests instead of GLMs, simply set the type
argument to RF:
You can also mix and match random forests with GLMs. For example, to use random forests as the base-learners and nonnegative logistic lasso as the meta-learner, we can use
Note that if we extract the model coefficients using
coef() we get only the coefficients of the
meta-learner:
coef(fit)
#> $`Level 1`
#> $`Level 1`[[1]]
#> [1] NA
#>
#> $`Level 1`[[2]]
#> [1] NA
#>
#> $`Level 1`[[3]]
#> [1] NA
#>
#>
#> $`Level 2`
#> $`Level 2`[[1]]
#> 4 x 1 sparse Matrix of class "dgCMatrix"
#> lambda.min
#> (Intercept) -4.554313
#> V1 5.122468
#> V2 3.673454
#> V3 .
#>
#>
#> attr(,"type")
#> [1] "RF" "StaPLR"
#> attr(,"class")
#> [1] "MVScoef"This is because random forests do not have regression coefficients in
the traditional sense. However, we can calculate feature importance
measures using importance():
importance(fit)
#> $`Level 1`
#> $`Level 1`[[1]]
#> MeanDecreaseGini
#> 1 0.7764228
#> 2 0.9236304
#> 3 0.6986672
#> 4 0.7228284
#> 5 0.9392745
#> [ reached 'max' / getOption("max.print") -- omitted 40 rows ]
#>
#> $`Level 1`[[2]]
#> MeanDecreaseGini
#> 1 4.638057
#> 2 2.026891
#> 3 2.411922
#> 4 1.643252
#> 5 1.684561
#> [ reached 'max' / getOption("max.print") -- omitted 15 rows ]
#>
#> $`Level 1`[[3]]
#> MeanDecreaseGini
#> 1 2.071363
#> 2 2.067622
#> 3 2.085501
#> 4 2.094666
#> 5 1.980755
#> [ reached 'max' / getOption("max.print") -- omitted 15 rows ]
#>
#>
#> $`Level 2`
#> $`Level 2`[[1]]
#> [1] NA
#>
#>
#> attr(,"type")
#> [1] "RF" "StaPLR"
#> attr(,"class")
#> [1] "MVSimportance"For an overview of the different feature importance measures
available, see the randomForest
package manual (Liaw and Wiener
2002a). For more information about random forests in general see,
for example, Breiman (2001).
Multi-view stacking is computationally attractive because at any
level of the model all sub-problems are independent, which means they
can be computed in parallel. mvs supports parallel
computing using foreach (Microsoft
and Weston 2022b) and doParallel (Microsoft and Weston 2022a). Enabling parallel
computing consists of two steps:
mvs option parallel = TRUE.Registering a parallel back-end on a local machine is typically as simple as:
However, the specifics may vary from system to system. We therefore recommend checking the doParallel vignette. Once the parallel back-end has been registered, fitting a model using parallel computation is as simple as:
In practice, it is possible that the multi-view structure consists of
more than two levels. For example, one might have features (base level)
which are grouped by brain area (middle level) and further grouped by
the type of MRI scan they were obtained from (top level). Such a
hierarchical analysis is described in detail in (Van Loon et al. 2022). Fitting such a model
using mvs is very simple. Consider a modified version of
the example used above:
set.seed(123)
n <- 100
X <- matrix(rnorm(8500), nrow=n, ncol=85)
b <- c(rep(0, 15), rep(10, 40), rep(0, 30)) * ((rbinom(85, 1, 0.5)*2)-1)
eta <- X %*% b
p <- 1 /(1 + exp(-eta))
y <- rbinom(n, 1, p)
sub_views <- c(rep(1:3, each=15), rep(4:5, each=10), rep(6:9, each=5))
top_views <- c(rep(1,45), rep(2,20), rep(3,20))Here, we again have 3 views, but they are now further divided into
sub-views. View 1 is divided into 3 sub-views of 15 features each, of
which only the second and third sub-view are truly related to the
outcome. View 2 is divided into 2 sub-views of 10 features each, of
which only the first sub-view is related to the outcome. View 3 is
subdivided into 4 sub-views of 4 features each, none of which are
related to the outcome. The main difference when applying
MVS to this data compared with the 2-level model is that
views should now be a matrix where each column is a vector
denoting to which view each feature corresponds at that level.
The structure is “bottom-up” from left to right, meaning the first
column corresponds to the lowest level in the hierarchy, the second
column to the level above that, and so on. For the example data above,
it looks like this:
Note that although views has two columns, there are
three levels in total: (1) the features, (2) the sub-views, and (3) the
top level views. The number of levels can be determined using the
levels argument. For each level, we need to specify the
desired penalty parameter, which is indicated using the same
alphas argument described in the previous section, except
it is now a vector of length 3 instead of length 2. We will assume here
that the goal is to select top level views and sub-views, but not
individual features within sub-views. Finally, we also need to indicate
for each level if we want to include nonnegativity constraints using
argument nnc, which is a vector which takes value 1 if
nonnegativity constraints should be applied, and 0 otherwise. We
generally recommend to apply nonnegativity constraints at all levels
above the feature level. The call to fit a three-level MVS model is
then:
The top level view coefficients are:
coef(fit)$'Level 3'
#> [[1]]
#> 4 x 1 sparse Matrix of class "dgCMatrix"
#> lambda.min
#> (Intercept) -3.083045
#> V1 3.624991
#> V2 2.465696
#> V3 .Again, Views 1 and 2 are selected, while View 3 is discarded. The sub-view coefficients for the first two can be observed by:
coef(fit)$'Level 2'
#> [[1]]
#> 4 x 1 sparse Matrix of class "dgCMatrix"
#> lambda.min
#> (Intercept) -4.137114
#> V1 .
#> V2 3.715694
#> V3 4.203360
#>
#> [[2]]
#> 3 x 1 sparse Matrix of class "dgCMatrix"
#> lambda.min
#> (Intercept) -1.493922
#> V1 3.105825
#> V2 .
#>
#> [[3]]
#> 5 x 1 sparse Matrix of class "dgCMatrix"
#> lambda.min
#> (Intercept) 0.1201443
#> V1 0.0000000
#> V2 .
#> V3 .
#> V4 .We can observe that for View 1, the second and third sub-view were
selected, while for View 2, the first sub-view was selected. Note that
the coefficients corresponding to the sub-views of View 3 can be
ignored, since View 2 was discarded in its entirety (although in this
case, the sub-view coefficients are also all zero). Note that
coefficients of sub-views that are not part of the same top level view
cannot be directly compared, because they are part of different
sub-models. To compare the effects of sub-views that are part of
different top level views, we can employ the minority report
measure (MRM) (Van Loon et al. 2022).
The MRM calculates how much the final prediction of the complete stacked
model changes as the prediction obtained from a sub-view changes from
a (default value 0) to b (default
value 1), while the predictions of the other views are kept
constant at constant (the recommended value for which is
mean(y)). The MRM for the 9 sub-views can be calculated
by:
MRM(fit, constant = mean(y), level=2)
#> [1] 0.0000000 0.5792082 0.6096366 0.3805919 0.0000000 0.0000000 0.0000000
#> [8] 0.0000000 0.0000000The value of the MRM ranges from zero to one, with larger values indicating an increased effect size. Note that if a view was excluded from the model, the value of the MRM is zero, since it has no effect on the outcome. More details about the MRM can be found in (Van Loon et al. 2022).
Consider the same simulated data set we used in Fitting a basic model:
set.seed(123)
n <- 100
X <- matrix(rnorm(8500), nrow=n, ncol=85)
b <- c(rep(10, 65), rep(0, 20)) * ((rbinom(85, 1, 0.5)*2)-1)
eta <- X %*% b
p <- 1 /(1 + exp(-eta))
y <- rbinom(n, 1, p)
views <- c(rep(1,45), rep(2,20), rep(3,20))But now, assume that half of the observations have missing values on the first view:
If we try to fit the same model as before, we get an error:
fit <- MVS(x=X, y=y, views=views)
#> Error in `StaPLR()`:
#> ! Missing values detected in x. Either remove or impute missing values, or choose a different na.actionThis is because the default value of the function parameter
na.action is fail, which causes
MVS to stop and warn the user about the presence of missing
values. The error message tells us there are three possible ways to
continue, namely (1) to remove all observations with missing data, (2)
to impute the missing values before running mvs or (3) to
choose a different value for na.action. As discussed in Handling missing data, option (1) is
very wasteful, in this case deleting half of our observations. Option
(2) is preferable, but quickly becomes computationally infeasible as the
number of missing values and/or features increases (Van Loon, Fokkema, De Vos, et al. 2024).
However, mvs allows for three different types of meta-level
imputation using the na.action argument, which is much
faster but obtains similar results (see Handling missing data for more
details). Here we will use meta-level predictive mean matching using
mice (Van Buuren and
Groothuis-Oudshoorn 2011). To perform the meta-level imputation
using mice, simply use:
Running this will print progress on both the MVS model fitting and the imputation to the console. The meta-level coefficients can again then be obtained using
coef(fit)$'Level 2'
#> [[1]]
#> 4 x 1 sparse Matrix of class "dgCMatrix"
#> lambda.min
#> (Intercept) -4.945277
#> V1 5.916366
#> V2 4.139852
#> V3 .Information about the performed imputation are stored in the
mvs object together with the matrix of cross-validated
predictions:
attributes(fit$'Level 1'$CVs)
#> $dim
#> [1] 100 3
#>
#> $imputation_method
#> result.1 result.2 result.3 y
#> "pmm" "" "" ""
#>
#> $number_of_imputations
#> [1] 5
#>
#> $additional_arguments_passed_to_mice
#> list()This shows us that the first view was imputed using predictive mean
matching (“pmm”), whereas the other views were not imputed (since they
had no missing values). Note that the outcome variable y
was also used in the imputation process, as is generally recommended
(Van Buuren 2018). The attributes also
show us that the given matrix of cross-validated predictions is an
average of 5 different imputations. The number of imputations, or any
other mice arguments can be changed by providing a list of
arguments and their values using the na.arguments option.
For example, to change the number of imputations to 10 use:
attributes(fit$'Level 1'$CVs)
#> $dim
#> [1] 100 3
#>
#> $imputation_method
#> result.1 result.2 result.3 y
#> "pmm" "" "" ""
#>
#> $number_of_imputations
#> [1] 10
#>
#> $additional_arguments_passed_to_mice
#> $additional_arguments_passed_to_mice$m
#> [1] 10In addition to imputation using mice, meta-level
imputation using the mean and meta-level imputation with
missForest (Stekhoven and Bühlmann
2012) are also supported. Note that there is another possible
value for na.action, namely pass. Using this
value does not perform any imputation, but instead “passes” the
missingness onto the meta-level:
fit$`Level 1`$CVs
#> [,1] [,2] [,3]
#> [1,] NA 0.84404597 0.5555556
#> [2,] NA 0.55530209 0.5555556
#> [3,] NA 0.56042449 0.5101729
#> [4,] NA 0.72310130 0.5787974
#> [5,] NA 0.76159335 0.5017087
#> [6,] NA 0.47206796 0.5117947
#> [7,] NA 0.73552713 0.5000000
#> [8,] NA 0.39004445 0.5504475
#> [9,] NA 0.82898606 0.5475928
#> [10,] NA 0.26547150 0.5502753
#> [ reached 'max' / getOption("max.print") -- omitted 90 rows ]This option is primarily useful for implementing custom imputation
schemes other than those supported by mice or
missForest.
I acknowledge the contribution of Marjolein Fokkema, who helped
prepare mvs for submission to CRAN and worked on the
implementation of model relaxation and random forests. I acknowledge the
support of Mark de Rooij and Botond Szabo during the earlier stages of
this project.
Depending on the research area, multi-view data is sometimes called multi-block, multi-set, multi-group, or multi-table data (Smilde et al. 2022).↩︎
Like classical machine learning, multi-view learning can
be divided into supervised, unsupervised, and semi-supervised learning.
Supervised learning means there is an outcome variable that is used to
guide the learning process (Friedman et al.
2009). Unsupervised learning means that only the features are
observed and there is no known outcome variable (Friedman et al. 2009). Semi-supervised learning
refers to a setting where there is an outcome variable of interest, but
it has only been measured for a subset of the observations (Goodfellow et al. 2016). The methods provided
by the R package mvs are supervised learning methods.
However, they could also be used for semi-supervised learning by
treating the unobserved values of the outcome variable as missing data
and using the provided imputation methods (see Handling missing data). For an
overview of multi-view learning methods specific to semi-supervised or
unsupervised settings see, for example, Sun et al. (2019) and Smilde et al. (2022).↩︎