predict(
.object = NULL,
.benchmark = c("lm", "unit", "PLSPM", "GSCA", "PCA", "MAXVAR"),
.cv_folds = 10,
.handle_inadmissibles = c("stop", "ignore", "set_NA"),
.r = 10,
.test_data = NULL
)
.object  An R object of class cSEMResults resulting from a call to 

.benchmark  Character string. The procedure to obtain benchmark predictions. One of "lm", "unit", "PLSPM", "GSCA", "PCA", or "MAXVAR". Default to "lm". 
.cv_folds  Integer. The number of crossvalidation folds to use. Setting

.handle_inadmissibles  Character string. How should inadmissible results
be treated? One of "stop", "ignore", or "set_NA". If "stop", 
.r  Integer. The number of repetitions to use. Defaults to 
.test_data  A matrix of test data with the same column names as the training data. 
An object of class cSEMPredict
with print and plot methods.
Technically, cSEMPredict
is a
named list containing the following list elements:
$Actual
A matrix of the actual values/indicator scores of the endogenous constructs.
$Prediction_target
A matrix of the predicted indicator scores of the endogenous constructs
based on the target model. Target refers to procedure used to estimate
the parameters in .object
.
$Residuals_target
A matrix of the residual indicator scores of the endogenous constructs based on the target model.
$Residuals_benchmark
A matrix of the residual indicator scores
of the endogenous constructs based on a model estimated by the procedure
given to .benchmark
.
$Prediction_metrics
A data frame containing the predictions metrics MAE, RMSE, and Q2_predict.
$Information
A list with elements
Target
, Benchmark
,
Number_of_observations_training
, Number_of_observations_test
, Number_of_folds
,
Number_of_repetitions
, and Handle_inadmissibles
.
Predict the indicator scores of endogenous constructs.
Predict uses the procedure introduced by Shmueli et al. (2016)
in the context of
PLS (commonly called: "PLSPredict" (Shmueli et al. 2019)
).
Predict uses kfold crossvalidation to randomly
split the data into training and test data and subsequently predicts the
relevant values in the test data based on the model parameter estimates obtained
using the training data. The number of crossvalidation folds is 10 by default but
may be changed using the .cv_folds
argument.
By default, the procedure is repeated .r = 10
times to avoid irregularities
due to a particular split. See Shmueli et al. (2019)
for
details.
Alternatively, users may supply a matrix or a data frame of .test_data
with
the same column names as those in the data used to obtain .object
(the training data).
In this case, arguments .cv_folds
and .r
are
ignored and predict uses the estimated coefficients from .object
to
predict the values in the columns of .test_data
.
In Shmueli et al. (2016)
PLSbased predictions for indicator i
are compared to the predictions based on a multiple regression of indicator i
on all available exogenous indicators (.benchmark = "lm"
) and
a simple meanbased prediction summarized in the Q2_predict metric.
predict()
is more general in that is allows users to compare the predictions
based on a socalled target model/specification to predictions based on an
alternative benchmark. Available benchmarks include predictions
based on a linear model, PLSPM weights, unit weights (i.e. sum scores),
GSCA weights, PCA weights, and MAXVAR weights.
Each estimation run is checked for admissibility using verify()
. If the
estimation yields inadmissible results, predict()
stops with an error ("stop"
).
Users may choose to "ignore"
inadmissible results or to simply set predictions
to NA
("set_NA"
) for the particular run that failed.
Shmueli G, Ray S, Estrada JMV, Chatla SB (2016).
“The Elephant in the Room: Predictive Performance of PLS Models.”
Journal of Business Research, 69(10), 45524564.
doi: 10.1016/j.jbusres.2016.03.049
, https://doi.org/10.1016/j.jbusres.2016.03.049.
Shmueli G, Sarstedt M, Hair JF, Cheah J, Ting H, Vaithilingam S, Ringle CM (2019).
“Predictive Model Assessment in PLSSEM: Guidelines for Using PLSpredict.”
European Journal of Marketing, 53(11), 23222347.
doi: 10.1108/ejm0220190189
, https://doi.org/10.1108/ejm0220190189.
### Anime example taken from https://github.com/ISSAnalytics/plspredict
# Load data
data(Anime) # data is similar to the Anime.csv found on
# https://github.com/ISSAnalytics/plspredict but with irrelevant
# columns removed
# Split into training and data the same way as it is done on
# https://github.com/ISSAnalytics/plspredict
set.seed(123)
index < sample.int(dim(Anime)[1], 83, replace = FALSE)
dat_train < Anime[index, ]
dat_test < Anime[index, ]
# Specify model
model < "
# Structural model
ApproachAvoidance ~ PerceivedVisualComplexity + Arousal
# Measurement/composite model
ApproachAvoidance =~ AA0 + AA1 + AA2 + AA3
PerceivedVisualComplexity <~ VX0 + VX1 + VX2 + VX3 + VX4
Arousal <~ Aro1 + Aro2 + Aro3 + Aro4
"
# Estimate (replicating the results of the `simplePLS()` function)
res < csem(dat_train,
model,
.disattenuate = FALSE, # original PLS
.iter_max = 300,
.tolerance = 1e07,
.PLS_weight_scheme_inner = "factorial"
)
# Predict using a usersupplied training data set
pp < predict(res, .test_data = dat_test)
pp$Predictions_target[1:6, ]
#> AA0 AA1 AA2 AA3
#> 1 2.6721142 3.043641 3.188336 3.404801
#> 2 5.8792289 6.048707 6.174991 6.170046
#> 3 0.9136649 1.395974 1.550764 1.888627
#> 4 4.1853797 4.461571 4.597579 4.709572
#> 5 6.1292377 6.282965 6.407814 6.385609
#> 6 3.8898464 4.184656 4.322361 4.454756
pp
#> ________________________________________________________________________________
#>  Overview 
#>
#> Number of obs. training = 100
#> Number of obs. test = 83
#> Number of cv folds = NA
#> Number of repetitions = 1
#> Handle inadmissibles = stop
#> Target = 'PLSPM'
#> Benchmark = 'lm'
#>
#>  Prediction metrics 
#>
#>
#> Name MAE target MAE benchmark RMSE target RMSE benchmark Q2_predict
#> AA0 1.2027 1.1671 1.5472 1.5010 0.4625
#> AA1 1.5028 1.5532 1.8751 1.9659 0.2794
#> AA2 0.9892 0.9821 1.3950 1.4021 0.4396
#> AA3 1.0416 1.0489 1.4282 1.4743 0.3656
#> ________________________________________________________________________________
### Compute prediction metrics 
res2 < csem(Anime, # whole data set
model,
.disattenuate = FALSE, # original PLS
.iter_max = 300,
.tolerance = 1e07,
.PLS_weight_scheme_inner = "factorial"
)
# Predict using 10fold crossvalidation with 5 repetitions
if (FALSE) {
pp2 < predict(res, .benchmark = "lm")
pp2
## There is a plot method available
plot(pp2)}