Structural equation modeling (SEM) has been used and developed for decades across various research fields such as (among others) psychology, sociology, and business research.
As an almost inevitable consequence, a different terminology system and, to some extent, mathematical notation has evolved within each field over the years. This “terminology mess” is one of the major obstacles in interdisciplinary research and (scientific) debates, since it hampers a broader understanding of methodological issues, and, even worse, promotes systematic misuse (e.g., the use of Cronbach’s alpha as an estimator for congeneric reliability). This is particularly true for users new to SEM or practitioners who are overwhelmed by terminology of their own field only to find that the term they just thought to have finally completely grasped is defined differently in another field, making the confusion complete. A prime example is the term “formative” (measurement) which has been used to describe a causal-formative as well as a composite model. See e.g., Henseler (2017) for a clarification.
Ultimately, this is a matter of (mis)communication which, in our opinion, can only be satisfactorily solved by establishing a clear, unambiguous definition for each term and symbol that is used within the package. We emphasize that we do not seek to impose “our” conventions, nor do we claim they are the “correct” conventions, but merely seek to make communication between us (the authors of the package) and you (the package user) as unambiguous and error-free as possible.
Hence, we provide a Terminology and a Notation file that contains key terms and mathematical notation/symbols that we consider crucial, alongside our definition. Users are encouraged to read these files carefully to avoid potential misunderstandings.
The package is designed based on a set of principles and a terminology that is partly in contrast to commonly used open-source or commercial software packages offering similar content (e.g., smartPLS). Together with the Terminology and Notation file this introduction seeks to explain these principles. Lastly, to use cSEM effectively, it is helpful to understand its design. Hence, the package architecture and design as well as what we consider the “cSEM workflow” are discussed.
Structural equation modeling (SEM) is about analyzing, i.e., modeling, estimating, assessing, and testing, the (causal) relationships between concepts - an entity defined by a conceptual definition - with other concepts and/or observable quantities generally referred to as indicators, manifest variables or items. Broadly speaking, two modeling approaches for the concepts and their relationship exist. We refer to the first as the latent variable or common factor model and to the second as the composite model. Each approach entails a set of methods, test, and evaluation criteria as well as a specific terminology that may or may not be adequate within the realm of the other approach.
Assuming a researcher identifies \(J\) concepts and \(K\) indicators, the fundamental feature of the latent variable model is the assumption of the existence of a set of \(J\) latent variables (common factors) that each serve as a representation of one of the \(J\) concepts to be studied in a sense that each latent variable is causally responsible for the manifestations of a set of \(K_j\) indicators which are supposed to measure the concept in question. The entirety of these measurement relations is captured by the measurement model which relates indicators to latent variables according to the researchers theory of how observables are related to the concepts in question. The entirety of the relationships between concepts (i.e., its representation in a statistical model, the construct) is captured by the structural model whose parameters are usually at the center of the researchers interest. Caution is warranted though as the common factor and its respective concept are not the same thing. Within the “classical” covariance-based or factor-based literature concept, construct, latent variable and its representation the common factor have often been used interchangeably (Rigdon 2012, 2016; Rigdon, Becker, and Sarstedt 2019). This will not be the case in cSEM and readers are explicitly made aware of the fact that concepts are the abstract entity which may be modeled by a common factor, however, no assertion as to the correctness of this approach in terms of “closeness of the common factor and its related concept” are made.
Parameters in latent variable models are usually retrieved by maximum likelihood (ML). The basic idea of ML is to find parameters such that the difference between the model-implied and the empirical indicator covariance matrix is minimized. Such estimation methods are therefore often referred to as covariance-based methods.
The second approach is known as the composite model. As opposed to the latent variable or common factor model, composites do not presuppose the existence of a latent variable. Hence, designed entities (artifacts) such as the “OECD Better Life Index” that arguably have no latent counterpart may be adequately described by a composite, i.e., the linear combination of observables defining the composite. Composites may also be formed to represent latent variables/common factors (or more precisely concepts modeled as common factors) in which case the composite serves as a proxy or stand-in for the latent variable. However, in cSEM, the term “composite model” is only used to refer to a model in a the former sense, i.e., a model in which the composite is a direct representation of concept/construct!
Parameters in composite models are retrivied by a composite-based approach such as partial least squares path modeling (PLS-PM), generalized structured component analysis (GSCA) or dimension reduction techniques such as principal component analysis (PCA). The basic idea of any composite-based approach is to build scores/composites for each concept and subsequently retrieve structural model parameters by a series of (linear) regressions. Such estimation methods are therefore often referred to as variance-based methods as regression maximizes the explained variance of the dependent variable.
Composite-based SEM is the entirety of methods, approaches, procedures, algorithms that in some way or another involve linear compounds (composites/proxies/scores), i.e., linear combinations of observables when retrieving (estimating) quantities of interest such as the coefficients of the structural model. It is crucial to clearly distinguish between the composite model and composite-based SEM. They are not the same. While the former is “only” a statistical model relating concepts to observables, the latter simply states that composites - linear compounds, i.e., weighted linear combinations of observables - are used to retrieve quantities of interest! Hence, composite-based SEM as a way of obtaining/estimating parameters of interest may thus be used for the latent variable or common factor model as well as the composite model. However, interpretation of the parameter estimates is fundamentally different since the underlying models differ!
As sketched above, common factor and composite models fundamentally differ in how the relation between observables and concepts is modeled. Naturally, results and, most notably, their (correct/meaningful) interpretation critically hinge on what type of model the user specifies. Across the package we therefore strictly distinguish between
These phrases will repeatedly appear in help files and other complementary files. It is therefore crucial to remember what they supposed to convey.
The idea of cSEM is twofold:
The first point is an always ongoing task since approaches are constantly evolving with new developments appearing at a pace that we, the package authors, will not be able to keep up with. The second point, however, was particularly important to us as we have been frustrated ourselves by how technical, unfriendly packages in R can be. Hence, from the very start we envisioned a workflow that essentially only comprises three steps:
Get the essential: no estimator or approach works without data and a description of what parameters are to be estimated and how data is related to these parameters, i.e. a model. Hence, we always need a data set and model. Since, model specification in lavaan model syntax is probably unbeatable in its ease and well known to R users that have an interest in SEM, to us, lavaan model syntax is the obvious tool for users to specify their model. Experience tells, that for R beginners the biggest obstacle has been to get the data into R. However, largely thanks to the tidyverse and RStudio, data import and data transformation are nowadays relatively easy to handle. See the Preparing the data and the Specifying a model sections below.
Estimate: no matter the model and type of data, estimation is always done using one central function with the data as its first and the model as its second argument:
csem(.data = my_data, .model = my_model)
Naturally, the csem()
function has a number of additional arguments to fine-tune the estimation, however, since csem()
automatically recognizes, for instance, whether a concept was modeled as a common factor or composite and automatically applies appropriate correction for attenuation, default arguments are often sufficient. See the Estimate using csem() section below.
Postestimate: Inspired by the grammar of data manipulation underlying the dplyr package, cSEM provides 5 postestimation verbs that concisely cover all common postestimation tasks as well as 4 additional test commands and 2 general do commands:
assess()
infer()
predict()
summarize()
verify()
testOMF()
testMICOM()
testHausman()
testMGD()
doIPMA()
doNonlinearRedundancyAnalysis()
doRedundancyAnalysis()
All verbs accept the result of a call to csem()
as input which makes working with these function extremely simple. You only need to remember the word, not any specific syntax or arguments. Of course, all functions have a number of additional arguments to fine-tune the postestimation. See the Apply postestimation functions sections below. For details on the arguments consult the individual help files.
The price we pay for an increase in flexibility is primarily a, mostly minor, loss in computational speed, in particular, when intense resampling is involved (i.e., 5000 bootstrap run for a complex model with, say, 1000 observations). Users looking for the most efficient implementation of common resampling routines may find faster implementations. That said, we believe, the time saved when using a standardized estimate-postestimate workflow, no matter the model or data used, well outweighs the potential loss in computational efficiency.
The following sections describe the workflow in more detail.
As described in the previous section, working with cSEM consists of 3-4 steps:
csem()
functionTechnically, preparing the data does not require the cSEM and is therefore better considered a preparation task, i.e., a “pre-cSEM” task. The reason why this step is nevertheless considered an explicit part of the cSEM-workflow is motivated by the experience that applied/causal users tend to shy away from software like R because “just getting the data in” and understanding how to show, manipulate and work with data can be frustrating if one is not aware of R’s rich and easy to learn data import and data processing capabilities. While these topics may have been overwhelming for newcomers several years ago, data import and data transformation have become extremely simple and user-friendly if the right tools packages are used. The best place to start is the Rstudio Cheat sheet webpage, especially the Data Import and the Data Transformation cheat sheets.
cSEM is relatively flexible as to the type of data accepted. Currently the following data types/structures are accepted:
A data.frame
or tibble
with column names matching the indicator names used in the lavaan model description of the measurement or composite model. Possible column types or classes of the data provided are: "logical"
(TRUE
/FALSE
), "numeric"
("double"
or "integer"
), "factor"
("ordered"
and/or "unordered"
) or a mix of several types. Additionally, the data may also include one character column whose column name must be given to .id
. Values of this column are interpreted as group identifiers and csem()
will split the data by levels of that column and run the estimation for each level separately.
Example:
Assuming the following simple model is to be estimated:
model <- "
# Structural model
EXPE ~ IMAG
# Reflective measurement model
EXPE =~ expe1 + expe2
IMAG =~ imag1 + imag2
"
To estimate the model a data frame with \(N\) rows (the observations) and \(K = 4\) columns with column names expe1
, imag1
, expe2
, imag2
is required. The order of the columns in the dataset is irrelevant. In cSEM the order is defined by the order in which the names appear in the measurement or composite model equations in the model description. In this case any resulting matrix or vector whose (row/column) names contain the indicator names would have the order expe1
, expe2
, imag1
, imag2
. More one model specificaiton below.
A matrix
with column names matching the indicator names used in the lavaan model description of the measurement model or composite model description.
A list of data frames or matrices. In this case estimation is repeated for each data frame or matrix separately.
The current version 0.1.0 available on CRAN does not provide any tools to handle missing values. Future versions are likely to include at least the basic approaches for handling missing values. Regularly check https://github.com/M-E-Rademaker/cSEM to get the latest updates.
Models are defined using lavaan model syntax. Currently, only the “standard” lavaan model syntax is supported. This comprises:
=~
” operator.<~
” operator.~
” operator.~~
” operator.cSEM handles linear, nonlinear and hierarchical models. Syntax for each model is illustrated below using variables of the build-in satisfaction
dataset. For more information see the lavaan syntax tutorial.
A typical linear model would look like this:
model <- "
# Structural model
EXPE ~ IMAG
QUAL ~ EXPE
VAL ~ EXPE + QUAL
SAT ~ IMAG + EXPE + QUAL + VAL
LOY ~ IMAG + SAT
# Composite model
IMAG <~ imag1 + imag2 + imag3 # composite
EXPE <~ expe1 + expe2 + expe3 # composite
QUAL <~ qual1 + qual2 + qual3 + qual4 + qual5 # composite
VAL <~ val1 + val2 + val3 # composite
# Reflective measurement model
SAT =~ sat1 + sat2 + sat3 + sat4 # common factor
LOY =~ loy1 + loy2 + loy3 + loy4 # common factor
# Measurement error correlation
sat1 ~~ sat2
"
Note that the operator <~
tells cSEM that the concept to its left is modeled as a composite; the operator =~
tells cSEM that the concept to its left is modeled as a common factor. ~~
tells cSEM that the measurement errors of sat1
and sat2
are assumed to correlate.
Nonlinear terms are specified as interactions using the dot operator "."
. Nonlinear terms include interactions and exponential terms. The latter is described in model syntax as an “interaction with itself”, e.g., x_1^3 = x1.x1.x1
. Currently the following terms are allowed
eta1
eta1.eta1
eta1.eta1.eta1
eta1.eta2
eta1.eta2.eta3
eta1.eta1.eta3
A simple example would look like this:
model <- "
# Structural model
EXPE ~ IMAG + IMAG.IMAG
# Composite model
EXPE <~ expe1 + expe2
IMAG <~ imag1 + imag2
"
Currently only second-order models are supported. Specification of the second-order construct takes place in the measurement/composite model.
model <- "
# Structural model
SAT ~ QUAL
VAL ~ SAT + QUAL
# Reflective measurement model
SAT =~ sat1 + sat2
VAL =~ val1 + val2
# Composite model
IMAG <~ imag1 + imag2
EXPE <~ expe1 + expe2
# Second-order term
QUAL =~ IMAG + EXPE
"
In this case QUAL
is modeled as a second-order common factor measuring IMAG
and EXPE
, where IMAG
is modeled as a composite and and EXPE
is itself a common factor.
csem()
csem()
is the central function of the package. Although it is possible to estimate a model using individual functions called by csem()
(such as parseModel()
, processData()
, calculateWeightsPLS()
, estimatePath()
etc.) using R’s :::
mechanism for non-exported functions, it is virtually always easier, safer and quicker to use csem()
instead (this is why these functions are not exported).
csem()
accepts all models and data types described above. The result of a call to csem()
is always an object of class cSEMResults
. Technically, the resulting object has an additional class attribute, namely cSEMResults_default
, cSEMResults_multi
or cSEMResults_2ndorder
that depends on the type of model and/or data provided, however, users usually do not need to worry since postestimation functions automatically work on all classes.
The simplest possible call to csem()
involves a data set and a model:
require(cSEM)
model <- "
# Path model / Regressions
eta2 ~ eta1
eta3 ~ eta1 + eta2
# Reflective measurement model
eta1 =~ y11 + y12 + y13
eta2 =~ y21 + y22 + y23
eta3 =~ y31 + y32 + y33
"
a <- csem(.data = threecommonfactors, .model = model)
a
## ________________________________________________________________________________
## ----------------------------------- Overview -----------------------------------
##
## Estimation was successful.
##
## The result is a list of class cSEMResults with list elements:
##
## - Estimates
## - Information
##
## To get an overview or help type:
##
## - ?cSEMResults
## - str(<object-name>)
## - listviewer::jsondedit(<object-name>, mode = 'view')
##
## If you wish to access the list elements directly type e.g.
##
## - <object-name>$Estimates
##
## Available postestimation commands:
##
## - assess(<object-name>)
## - infer(<object-name)
## - predict(<object-name>)
## - summarize(<object-name>)
## - verify(<object-name>)
## ________________________________________________________________________________
This is equivalent to:
csem(
.data = threecommonfactors,
.model = model,
.approach_cor_robust = "none",
.approach_nl = "sequential",
.approach_paths = "OLS",
.approach_weights = "PLS-PM",
.conv_criterion = "diff_absolute",
.disattenuate = TRUE,
.dominant_indicators = NULL,
.estimate_structural = TRUE,
.id = NULL,
.iter_max = 100,
.normality = FALSE,
.PLS_approach_cf = "dist_squared_euclid",
.PLS_ignore_structural_model = FALSE,
.PLS_modes = NULL,
.PLS_weight_scheme_inner = "path",
.reliabilities = NULL,
.starting_values = NULL,
.tolerance = 1e-05,
.resample_method = "none",
.resample_method2 = "none",
.R = 499,
.R2 = 199,
.handle_inadmissibles = "drop",
.user_funs = NULL,
.eval_plan = "sequential",
.seed = NULL,
.sign_change_option = "no"
)
See the csem()
documentation for details on the arguments.
By default, no inferential quantities are calculated since composite-based approaches, generally, do not have closed-form solutions for standard errors. cSEM relies on the bootstrap
or jackknife
to estimate standard errors, test statistics, critical quantiles, and confidence intervals.
cSEM offers two ways to compute resamples:
.resample_method
to "jackkinfe"
or "bootstrap"
to perform resampling and subsequently use infer()
(or more conveniently summarize()
which internally calls infer()
) to compute the actual inferential quantities of interest.cSEMResults
object to resamplecSEMResults()
and subsequently using summarize()
or infer()
.b1 <- csem(.data = threecommonfactors, .model = model, .resample_method = "bootstrap")
b2 <- resamplecSEMResults(a)
Several confidence intervals are implemented, see ?infer()
:
summarize(b1)
## ________________________________________________________________________________
## ----------------------------------- Overview -----------------------------------
##
## General information:
## ------------------------
## Estimation status = Ok
## Number of observations = 500
## Weight estimator = PLS-PM
## Inner weighting scheme = "path"
## Type of indicator correlation = Pearson
## Path model estimator = OLS
## Second-order approach = NA
## Type of path model = Linear
## Disattenuated = Yes (PLSc)
##
## Resample information:
## ---------------------
## Resample method = "bootstrap"
## Number of resamples = 499
## Number of admissible results = 499
## Approach to handle inadmissibles = "drop"
## Sign change option = "none"
## Random seed = 708067309
##
## Construct details:
## ------------------
## Name Modeled as Order Mode
##
## eta1 Common factor First order "modeA"
## eta2 Common factor First order "modeA"
## eta3 Common factor First order "modeA"
##
## ----------------------------------- Estimates ----------------------------------
##
## Estimated path coefficients:
## ============================
## CI_percentile
## Path Estimate Std. error t-stat. p-value 95%
## eta2 ~ eta1 0.6713 0.0449 14.9420 0.0000 [ 0.5869; 0.7569 ]
## eta3 ~ eta1 0.4585 0.0814 5.6344 0.0000 [ 0.3023; 0.6162 ]
## eta3 ~ eta2 0.3052 0.0878 3.4753 0.0005 [ 0.1327; 0.4786 ]
##
## Estimated loadings:
## ===================
## CI_percentile
## Loading Estimate Std. error t-stat. p-value 95%
## eta1 =~ y11 0.6631 0.0402 16.4879 0.0000 [ 0.5799; 0.7360 ]
## eta1 =~ y12 0.6493 0.0412 15.7782 0.0000 [ 0.5716; 0.7266 ]
## eta1 =~ y13 0.7613 0.0316 24.1289 0.0000 [ 0.7028; 0.8249 ]
## eta2 =~ y21 0.5165 0.0515 10.0318 0.0000 [ 0.4035; 0.6093 ]
## eta2 =~ y22 0.7554 0.0352 21.4419 0.0000 [ 0.6809; 0.8199 ]
## eta2 =~ y23 0.7997 0.0377 21.1919 0.0000 [ 0.7202; 0.8659 ]
## eta3 =~ y31 0.8223 0.0317 25.8998 0.0000 [ 0.7575; 0.8777 ]
## eta3 =~ y32 0.6581 0.0413 15.9331 0.0000 [ 0.5707; 0.7307 ]
## eta3 =~ y33 0.7474 0.0405 18.4601 0.0000 [ 0.6677; 0.8228 ]
##
## Estimated weights:
## ==================
## CI_percentile
## Weight Estimate Std. error t-stat. p-value 95%
## eta1 <~ y11 0.3956 0.0209 18.9554 0.0000 [ 0.3535; 0.4336 ]
## eta1 <~ y12 0.3873 0.0198 19.5881 0.0000 [ 0.3469; 0.4233 ]
## eta1 <~ y13 0.4542 0.0204 22.2748 0.0000 [ 0.4154; 0.4952 ]
## eta2 <~ y21 0.3058 0.0283 10.8136 0.0000 [ 0.2446; 0.3568 ]
## eta2 <~ y22 0.4473 0.0201 22.2824 0.0000 [ 0.4112; 0.4872 ]
## eta2 <~ y23 0.4735 0.0206 22.9887 0.0000 [ 0.4364; 0.5148 ]
## eta3 <~ y31 0.4400 0.0190 23.1844 0.0000 [ 0.4059; 0.4801 ]
## eta3 <~ y32 0.3521 0.0193 18.2191 0.0000 [ 0.3106; 0.3867 ]
## eta3 <~ y33 0.3999 0.0195 20.4913 0.0000 [ 0.3639; 0.4390 ]
##
## ------------------------------------ Effects -----------------------------------
##
## Estimated total effects:
## ========================
## CI_percentile
## Total effect Estimate Std. error t-stat. p-value 95%
## eta2 ~ eta1 0.6713 0.0449 14.9420 0.0000 [ 0.5869; 0.7569 ]
## eta3 ~ eta1 0.6634 0.0371 17.8699 0.0000 [ 0.5910; 0.7381 ]
## eta3 ~ eta2 0.3052 0.0878 3.4753 0.0005 [ 0.1327; 0.4786 ]
##
## Estimated indirect effects:
## ===========================
## CI_percentile
## Indirect effect Estimate Std. error t-stat. p-value 95%
## eta3 ~ eta1 0.2049 0.0601 3.4070 0.0007 [ 0.0892; 0.3288 ]
## ________________________________________________________________________________
Or directly via infer()
ii <- infer(b1, .quantity = c("CI_standard_z", "CI_percentile"), .alpha = c(0.01, 0.05))
ii$Path_estimates
## $CI_standard_z
## eta2 ~ eta1 eta3 ~ eta1 eta3 ~ eta2
## 99%L 0.5516346 0.2451642 0.08339944
## 99%U 0.7830943 0.6643908 0.53574578
## 95%L 0.5793049 0.2952815 0.13747609
## 95%U 0.7554240 0.6142735 0.48166913
##
## $CI_percentile
## eta2 ~ eta1 eta3 ~ eta1 eta3 ~ eta2
## 99%L 0.5536602 0.2517157 0.0769402
## 99%U 0.7787106 0.6601661 0.5180477
## 95%L 0.5869110 0.3022924 0.1326575
## 95%U 0.7568794 0.6162060 0.4786209
Both bootstrap and jackknife resampling support platform-independent multiprocessing as well as random seeds via the future framework. For multiprocessing simply set .eval_plan = "multiprocess"
in which case the maximum number of available cores is used if not on Windows. On Windows as many separate R instances are opened in the background as there are cores available instead. Note that this naturally has some overhead. Consequently, for a small number of resamples multiprocessing will generally not be faster compared to sequential (single core) processing (the default). Seeds are set via the .seed
argument. A typical call would look like this:
b <- csem(
.data = satisfaction,
.model = model,
.resample_method = "bootstrap",
.R = 999,
.seed = 98234,
.eval_plan = "multiprocess")
# Output omitted
There are 5 major postestimation function and 4 test-family functions:
assess()
Assess the quality of the estimated model without conducting a statistical test. Quality in this case is taken to be a catch-all term for all common aspects of model assessment. This mainly comprises fit indices, reliability estimates, common validity assessment criteria and other related quality measures/indices that do not rely on a formal test procedure. In cSEM a generic (fit) index or quality/assessment measure is referred to as a quality criterion.
infer()
Calculate common inferential quantities. For users interested in the estimated standard errors and/or confidences intervals summarize()
will usually be more helpful as it has a much more user-friendly print method.
predict()
Predict indicator scores of endogenous constructs based on the procedure introduced by Shmueli et al. (2016).
summarize()
Summarize a model. The function is mainly called for its side effect, the printing of a structured summary of the estimates. It also provides most estimates in user-friendly data frames. The data frame format is usually much more convenient if users intend to present the results in e.g., a paper or a presentation.
verify()
Verify admissibility of the estimated quantities for a given model. Results based on an estimated model exhibiting one of the following defects are deemed inadmissible: non-convergence, loadings and/or (congeneric) reliabilities larger than 1, a construct VCV and/or a model-implied VCV matrix that is not positive (semi-)definite.
test_*
family of postestimation functionstestHausman()
The regression-based Hausman test for SEM.
testOMF()
Test for overall model fit based on Beran and Srivastava (1985). See also Dijkstra and Henseler (2015).
testMGD()
Test for group differences using several different approaches such as e.g., the one described in Klesel et al. (2019).
testMICOM()
Test of measurement invariance of composites proposed by Henseler, Ringle, and Sarstedt (2016)
do_*
family of postestimation functionsdoIPMA()
Performs an importance-performance matrix analysis (IPMA).
doNonlinearEffectsAnalysis()
Performs nonlinear effects analysis such as floodlight and surface analysis as described in e.g., Spiller et al. (2013).
doRedundancyAnalysis()
Performs a redundancy analysis (RA) as proposed by Hair et al. (2016) with reference to Chin (1998).
Technically, postestimation functions are generic function with methods for objects of class cSEMResults_default
, cSEMResults_multi
, cSEMResults_2ndorder
. In cSEM every cSEMResults_*
object must also have class cSEMResults
for internal reasons. When using one of the major postestimation functions, method dispatch is therefore technically done on one of the cSEMResults_*
class attributes, ignoring the cSEMResults
class attribute. As long as a postestimation function is used directly method dispatch is not of any practical concern to the end-user. The difference, however, becomes important if a user seeks to directly invoke an internal function which is called by one of the postestimation functions (e.g., calculateAVE()
or calculateHTMT()
as called by assess()
). In this case, only objects of class cSEMResults_default
are accepted as this ensures a specific structure. Therefore, it is important to remember that internal functions are generally not generic.
cSEM is based on a number of principles, that have shaped its design, terminology and scope. These principles are discussed below
The way different concepts and their relationship are modeled is strictly distinct from how they are estimated. Hence we strictly distinguish between concepts modeled as common factors (or composites) and the actual estimation for a given model. In our opinion, these differences are fundamental to understanding the scope and limits of a certain approach. The most notable consequence is that approaches such as partial least squares and everything related to it (e.g., the modes) or generalized structured component analysis are “only” considered as estimators/estimation approaches for a given model.
By virtue of the package, cSEM uses composite-based estimators/approaches only. Depending on the postulated model, linear compounds may therefore either serve as a composite as part of the composite model or as a proxy/stand-in for a common factor. If a concept is modeled as a common factor, proxy correlations, proxy-indicator correlations and path coefficients are inconsistent estimates for their supposed construct level counterparts (construct correlations, loadings and path coefficients) unless the proxy is a perfect representation of its construct level counterpart. This is commonly referred to as attenuation bias. Several approaches have been suggested to correct for these biases. In cSEM estimates are correctly dissattenuated by default if any of the concepts involved is modeled as a common factor! Disattentuation is controlled by the .disattenuate
argument of csem()
.
Example
model <- "
## Structural model
eta2 ~ eta1
## Measurement model
eta1 <~ item1 + item2 + item3
eta2 =~ item4 + item5 + item6
"
# Identical
csem(threecommonfactors, model)
csem(threecommonfactors, model, .disattenuate = TRUE)
# To supress automatic disattenuation
csem(threecommonfactors, model, .disattenuate = FALSE)
Note that since .approach_weights = "PLS-PM"
and .disattentuate = TRUE
by default (see for The role of the weighting scheme and partial least squares (PLS) below) and one of the concepts in the model above is modeled as a common factor, composite (proxy) correlations, loadings and path coefficients are adequately disattenuated using the correction approach commonly known as consistent partial least squares (PLSc). If .disattenuate = FALSE
or all concepts are modeled as composites “proper” PLS values are returned.
In principal, any weighted combination of appropriately chosen observables can be used to estimate structural relationships between these compounds. Hence, any conceptual or methodological issue discussed based on a composite build by a given (weighting) approach may equally well be discussed for any other potential weighting scheme. The appropriateness or potential superiority of a specific weighting approach such as “partial least squares path modeling” (PLS-PM) over another such as “unit weights” (sum scores) or generalized structured component analysis (GSCA) is therefore to some extent a question of relative appropriateness and relative superiority.
As a notable consequence, we believe that well known approaches such partial least squares path modeling (PLS-PM) and generalized structured component analysis (GSCA) are - contrary to common belief - best exclusively understood as prescriptions for forming linear compounds based on observables, i.e., as weighting approaches. Not more, not less.^{1} In cSEM this is reflected by the fact that "PLS"
and "GSCA"
are choices of the .approach_weights
argument.
model <- "
## Structural model
eta2 ~ eta1
## Composite model
eta1 <~ item1 + item2 + item3
eta2 <~ item4 + item5 + item6
"
### Currently the following weight approaches are implemented
# Partial least squares path modeling (PLS)
csem(threecommonfactors, model, .approach_weights = "PLS-PM") # default
# Generalized canonical correlation analysis (Kettenring approaches)
csem(threecommonfactors, model, .approach_weights = "SUMCORR")
csem(threecommonfactors, model, .approach_weights = "MAXVAR")
csem(threecommonfactors, model, .approach_weights = "SSQCORR")
csem(threecommonfactors, model, .approach_weights = "MINVAR")
csem(threecommonfactors, model, .approach_weights = "GENVAR")
# Generalized structured component analysis (GSCA)
csem(threecommonfactors, model, .approach_weights = "GSCA")
# Principal component analysis (PCA)
csem(threecommonfactors, model, .approach_weights = "PCA")
# Factor score regression (FSR) using "unit", "bartlett" or "regression" weights
csem(threecommonfactors, model, .approach_weights = "unit")
csem(threecommonfactors, model, .approach_weights = "bartlett")
csem(threecommonfactors, model, .approach_weights = "regression")
Beran, Rudolf, and Muni S. Srivastava. 1985. “Bootstrap Tests and Confidence Regions for Functions of a Covariance Matrix.” The Annals of Statistics 13 (1): 95–115. https://doi.org/10.1214/aos/1176346579.
Bollen, Kenneth A. 1989. Structural Equations with Latent Variables. Wiley-Interscience.
Chin, W. W. 1998. “Modern Methods for Business Research.” In, edited by G. A. Marcoulides, 295–358. Mahwah, NJ: Lawrence Erlbaum.
Dijkstra, Theo K., and Jörg Henseler. 2015. “Consistent and Asymptotically Normal PLS Estimators for Linear Structural Equations.” Computational Statistics & Data Analysis 81: 10–23.
Hair, Joseph F, G Tomas M Hult, Christian Ringle, and Marko Sarstedt. 2016. A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM). Sage publications.
Henseler, Jörg. 2017. “Bridging Design and Behavioral Research with Variance-Based Structural Equation Modeling.” Journal of Advertising 46 (1): 178–92. https://doi.org/10.1080/00913367.2017.1281780.
Henseler, Jörg, Christian M. Ringle, and Marko Sarstedt. 2016. “Testing Measurement Invariance of Composites Using Partial Least Squares.” International Marketing Review 33 (3): 405–31. https://doi.org/10.1108/imr-09-2014-0304.
Klesel, Michael, Florian Schuberth, Jörg Henseler, and Bjoern Niehaves. 2019. “A Test for Multigroup Comparison Using Partial Least Squares Path Modeling.” Internet Research 29 (3): 464–77. https://doi.org/10.1108/intr-11-2017-0418.
Rigdon, Edward E. 2012. “Rethinking Partial Least Squares Path Modeling: In Praise of Simple Methods.” Long Range Planning 45 (5-6): 341–58. https://doi.org/10.1016/j.lrp.2012.09.010.
———. 2016. “Choosing PLS Path Modeling as Analytical Method in European Management Research: A Realist Perspective.” European Management Journal 34 (6). https://doi.org/10.1016/j.emj.2016.05.006.
Rigdon, Edward E., Jan-Michael Becker, and Marko Sarstedt. 2019. “Factor Indeterminacy as Metrological Uncertainty: Implications for Advancing Psychological Measurement.” Multivariate Behavioral Research, 1–15. https://doi.org/10.1080/00273171.2018.1535420.
Shmueli, Galit, Soumya Ray, Juan Manuel Velasquez Estrada, and Suneel Babu Chatla. 2016. “The Elephant in the Room: Predictive Performance of PLS Models.” Journal of Business Research 69 (10): 4552–64. https://doi.org/10.1016/j.jbusres.2016.03.049.
Spiller, Stephen A., Gavan J. Fitzsimons, John G. Lynch, and Gary H. Mcclelland. 2013. “Spotlights, Floodlights, and the Magic Number Zero: Simple Effects Tests in Moderated Regression.” Journal of Marketing Research 50 (2): 277–88. https://doi.org/10.1509/jmr.12.0420.
In fact, labels such as PLS-PM and even more so PLS-SEM are misleading as they create the impression that PLS(-PM) is somehow capable of more than other composite-based approaches. While among composite-based approaches, methodological research surrounding composites formed using weights obtained by the PLS(-PM) algorithm is most advanced, the PLS algorithm remains a weighting scheme in its core.↩︎