Generate data based on the parameters of a structural equation model in lavaan model syntax.

generateData(
 .model                    = NULL,
 .empirical                = FALSE,
 .handle_negative_definite = c("stop", "drop", "set_NA"),
 .return_type              = c("data.frame", "matrix", "cor"),
 .N                        = 200,
 .skewness                 = NULL,
 .kurtosis                 = NULL,
 ...
 )

Arguments

.model

A model in lavaan model syntax.

.empirical

Logical. If TRUE, mu and Sigma of the normal distribution specify the empirical not the population mean and covariance matrix. Ignored if return.type = "cor". Defaults to FALSE.

.handle_negative_definite

Character string. How should negative definite indicator correlation matrices be handled? One of "stop", "drop" or "set_NA" in which case an NA is produced. Defaults to "stop".

.return_type

Character string. One of "data.frame", "matrix" or "cor" in which case the indicator correlation matrix is returned. Defaults to "data.frame".

.N

Integer. The number of observations to generate. Ignored if return.type = "cor". Defaults to 200.

.skewness

List. List of predefined values for the skewness of the indicators.

.kurtosis

List. List of predefined values for the kurtosis of the indicators.

...

"name" = vector_of_values pairs. "name" is a character string giving the label used for the parameter of interest. vector_of_values is a numeric vector of values to use for the paramter given by "name".

Value

The generated data. Either as a data.frame (return_type = "data.frame"), a numeric matrix (return.type = "matrix"), or a correlation matrix (return.type = "cor"). If variable parameters have been set a nested tibble is returned.

Details

Generate data for structural equation models including up to 8 constructs if a structural model is given or an unlimited number if only the correlation between constructs is needed. To be precise, if users specify a structural model we support a maximum of 5 exogenous constructs. Depending on the number of exogenous constructs the following number of endogenous constructs is allowed:

  1. If there is 1 exogenous construct : a maximum of 7 endogenous constructs is allowed

  2. If there are 2 exogenous constructs: a maximum of 6 endogenous constructs is allowed

  3. If there are 3 exogenous constructs: a maximum of 5 endogenous constructs is allowed

  4. If there are 4 exogenous constructs: a maximum of 4 endogenous constructs is allowed

  5. If there are 5 exogenous constructs: a maximum of 4 endogenous constructs is allowed

The reason for the limitation is that data is generated such that the model-implied variances of the constructs are always unity. Since the model-implied construct covariance matrix is a complex function of the structural residual variances which are in turn a complex function of the path coefficients the equation for each construct variance grows massively with each additional construct added. Since for a given number of constructs the number of possible model specifications grows rapidly, we solved the variance equations symbolically as a function of the path coefficients in Mathematica. With more than 8 constructs the size of these symbolic representation becomes computationally infeasible.

Generation is based on parameter values given in lavaan model syntax. Currently, linear models and models containing second order constructs are supported. Supplying a model containing nonlinear terms causes an error.

For the structural model equations (~) values are interpreted as path coefficients. For measurement model equations values are taken to be loadings if the concept is modeled as a common factor (=~). If the concept is modeled as a composite (<~) values are interpreted as (unscaled) weights! In the latter case, indicators are allowed to be arbitrarily correlated. Hence, the correlation between indicators needs to be set as well. Indicator correlations measurement error correlations, and correlations between exogenous constructs are set using the (~~) operator. Note that when writing, for instance, x1 ~~ 0.2*x2 (where x1 and x2 are indicators of some construct eta1), the interpretation depends on whether eta1 is modeled as a composite or a common factor. In the former case x1 ~~ 0.2*x2 is a correlation between indicators, in the latter case it is interpreted as a measurement error correlation.

In addition to supplying numeric values, variable values for parameters are allowed. To achieve this, the package makes use of lavaan's labeling capabilities. Users may replace a given parameter in, i.e. the structural model by a symbolic name and assign a vector of values to that name by passing a "name" = vector_of_values argument to generateData(). These values will be used to generate data for all possible combinations of these values with the remaining fixed parameters.

If .return_type is "data.frame" or "matrix" normally distributed data with zero mean and variance-covariance matrix equal to the indicator correlation matrix which would be returned if .return_type = "cor" (i.e., the population indicator correlation matrix) is generated.

Examples

# ============================================================================== # Without variable parameters # ============================================================================== ## DGP with constructs modeled as common factors dgp <- " # Structural model eta2 ~ 0.4*eta1 eta3 ~ 0.4*eta1 + 0.35*eta2 # Measurement model eta1 =~ 0.8*y11 + 0.9*y12 + 0.8*y13 eta2 =~ 0.7*y21 + 0.7*y22 + 0.9*y23 eta3 =~ 0.9*y31 + 0.8*y32 + 0.7*y33 " dat <- generateData(dgp, .return_type = "cor") dat
#> y11 y12 y13 y21 y22 y23 y31 y32 y33 #> y11 1.0000 0.7200 0.6400 0.2240 0.2240 0.2880 0.3888 0.3456 0.3024 #> y12 0.7200 1.0000 0.7200 0.2520 0.2520 0.3240 0.4374 0.3888 0.3402 #> y13 0.6400 0.7200 1.0000 0.2240 0.2240 0.2880 0.3888 0.3456 0.3024 #> y21 0.2240 0.2520 0.2240 1.0000 0.4900 0.6300 0.3213 0.2856 0.2499 #> y22 0.2240 0.2520 0.2240 0.4900 1.0000 0.6300 0.3213 0.2856 0.2499 #> y23 0.2880 0.3240 0.2880 0.6300 0.6300 1.0000 0.4131 0.3672 0.3213 #> y31 0.3888 0.4374 0.3888 0.3213 0.3213 0.4131 1.0000 0.7200 0.6300 #> y32 0.3456 0.3888 0.3456 0.2856 0.2856 0.3672 0.7200 1.0000 0.5600 #> y33 0.3024 0.3402 0.3024 0.2499 0.2499 0.3213 0.6300 0.5600 1.0000
## DGP with a construct modeled as a composite # If the model contains composites, within-block indicator correlation # needs to be set as well. dgp <- " # Structural model eta2 ~ 0.2*eta1 eta3 ~ 0.4*eta1 + 0.35*eta2 # Measurement model eta1 <~ 0.7*y11 + 0.9*y12 + 0.8*y13 eta2 =~ 0.7*y21 + 0.7*y22 + 0.9*y23 eta3 =~ 0.9*y31 + 0.8*y32 + 0.7*y33 # Within block indicator correlation of eta1 y11 ~~ 0.2*y12 y11 ~~ 0.3*y13 y12 ~~ 0.5*y13 " dat <- generateData(dgp, .return_type = "matrix") dat[1:4, ]
#> y11 y12 y13 y21 y22 y23 #> [1,] 0.2833699 -0.1695697 0.19485360 -0.2459982 1.30419075 -0.01983093 #> [2,] 0.2636589 -0.5632186 -0.30880616 1.1322802 0.71194015 0.53392818 #> [3,] 1.6549501 1.7640614 1.39770736 1.9406051 1.41488167 1.23776506 #> [4,] 1.1624829 -1.7732494 -0.06297272 -0.7619076 0.09713996 -0.26068743 #> y31 y32 y33 #> [1,] 1.6381927 2.4202904 1.409257809 #> [2,] -0.6779495 -2.4443972 0.658444822 #> [3,] 2.1322319 1.6886823 0.628915071 #> [4,] 1.0220640 0.2920795 0.002221929
# ============================================================================== # With variable parameters # ============================================================================== ### Linear DGP ----------------------------------------------------------------- # Add a label and assign values to for each name dgp <- " # Structural model eta2 ~ 0.2*eta1 eta3 ~ gamma*eta1 + 0.35*eta2 # Measurement model eta1 <~ 0.7*y11 + 0.9*y12 + 0.8*y13 eta2 =~ 0.7*y21 + 0.7*y22 + 0.9*y23 eta3 =~ 0.9*y31 + 0.8*y32 + 0.7*y33 # Within block indicator correlation y11 ~~ 0.2*y12 y11 ~~ 0.3*y13 y12 ~~ epsilon*y13 " dat <- generateData(dgp, "gamma" = c(-0.4, -0.2, 0, 0.2, 0.4), "epsilon" = c(0.1, 0.2, 0.3), .return_type = "data.frame") dat
#> # A tibble: 15 x 4 #> Id gamma epsilon dgp #> <int> <dbl> <dbl> <list> #> 1 1 -0.4 0.1 <df[,9] [200 × 9]> #> 2 2 -0.2 0.1 <df[,9] [200 × 9]> #> 3 3 0 0.1 <df[,9] [200 × 9]> #> 4 4 0.2 0.1 <df[,9] [200 × 9]> #> 5 5 0.4 0.1 <df[,9] [200 × 9]> #> 6 6 -0.4 0.2 <df[,9] [200 × 9]> #> 7 7 -0.2 0.2 <df[,9] [200 × 9]> #> 8 8 0 0.2 <df[,9] [200 × 9]> #> 9 9 0.2 0.2 <df[,9] [200 × 9]> #> 10 10 0.4 0.2 <df[,9] [200 × 9]> #> 11 11 -0.4 0.3 <df[,9] [200 × 9]> #> 12 12 -0.2 0.3 <df[,9] [200 × 9]> #> 13 13 0 0.3 <df[,9] [200 × 9]> #> 14 14 0.2 0.3 <df[,9] [200 × 9]> #> 15 15 0.4 0.3 <df[,9] [200 × 9]>
### DGP containing a second order construct ------------------------------------ # Second order constructs are supported as well. dgp_2ndorder <- " ## Path model / Regressions eta2 ~ 0.5*eta1 eta3 ~ 0.35*eta1 + 0.4*eta2 ## Composite model eta1 <~ 0.8*y41 + 0.6*y42 + 0.6*y43 eta2 <~ 2*y51 + 3*y52 + 5*y53 c1 <~ 0.8*y11 + 0.4*y12 c2 <~ 0.5*y21 + 0.3*y22 + 0.2*y23 + 0.4*y24 ## Higher order composite eta3 <~ 0.4*c1 + 0.4*c2 ## Composite indicator correlations # eta1 y41 ~~ 0.5*y42 y41 ~~ 0.5*y43 y42 ~~ 0.5*y43 # eta2 y51 ~~ 0.2*y52 y51 ~~ 0.3*y53 y52 ~~ 0.4*y53 # eta3 (the 2nd order construct) c1 ~~ 0.49*c2 # c1-c2 y11 ~~ 0.3125*y12 y21 ~~ 0.4*y22 y21 ~~ 0.3*y23 y21 ~~ 0.31*y24 y22 ~~ 0.28*y23 y22 ~~ 0.31*y24 y23 ~~ 0.3*y24 " dat <- generateData(dgp_2ndorder, .return_type = "data.frame", .empirical = TRUE) dat[1:5, ]
#> y41 y42 y43 y11 y12 y21 #> 1 1.6501001 1.0591335 1.04291831 2.0596873 0.1926962 0.7920012 #> 2 -0.2852002 0.4285055 1.48444017 0.1242391 0.7418071 0.6104972 #> 3 0.4569223 -0.2404468 0.74690311 1.1072126 0.4188807 0.8521478 #> 4 0.1503388 0.5890999 0.04658271 1.0498477 -0.7476978 0.6602936 #> 5 1.2367091 0.7017904 -0.59327101 -0.3491474 -0.4072340 -0.6550728 #> y22 y23 y24 y51 y52 y53 #> 1 0.07741937 0.9016692 0.05564948 -0.3327969 -0.6952362 -0.3423606 #> 2 -0.29478640 1.4400117 1.31828329 1.2679078 1.7665792 0.3374870 #> 3 0.53409022 1.3029529 1.17307243 -0.9371740 -0.4217491 0.5689144 #> 4 0.39212851 1.5824987 0.83750277 1.5860984 1.0826290 0.6578463 #> 5 -0.70035840 1.1851110 1.89050705 1.2829048 -1.6464734 0.4250665
## Estimate using cSEM require(cSEM)
#> Loading required package: cSEM
#> #> Attaching package: ‘cSEM’
#> The following object is masked from ‘package:stats’: #> #> predict
aa <- cSEM::csem(dat, dgp_2ndorder) cSEM::summarize(aa) ## parameters estimates are identical to the DGP
#> ________________________________________________________________________________ #> ----------------------------------- Overview ----------------------------------- #> #> General information: #> ------------------------ #> Estimation status = Ok #> Number of observations = 200 #> Weight estimator = PLS-PM #> Inner weighting scheme = path #> Type of indicator correlation = Pearson #> Path model estimator = OLS #> Second order approach = 2stage #> Type of path model = Linear #> Disattenuated = No #> #> Construct details: #> ------------------ #> Name Modeled as Order Mode #> #> eta1 Composite First order modeB #> c1 Composite First order modeB #> c2 Composite First order modeB #> eta2 Composite First order modeB #> eta3 Composite Second order modeB #> #> ----------------------------------- Estimates ---------------------------------- #> #> Estimated path coefficients: #> ============================ #> Path Estimate Std. error t-stat. p-value #> eta2 ~ eta1 0.5000 NA NA NA #> eta3 ~ eta1 0.3500 NA NA NA #> eta3 ~ eta2 0.4000 NA NA NA #> #> Estimated loadings: #> =================== #> Loading Estimate Std. error t-stat. p-value #> eta1 =~ y41 0.8552 NA NA NA #> eta1 =~ y42 0.7941 NA NA NA #> eta1 =~ y43 0.7941 NA NA NA #> c1 =~ y11 0.9250 NA NA NA #> c1 =~ y12 0.6500 NA NA NA #> c2 =~ y21 0.8040 NA NA NA #> c2 =~ y22 0.6800 NA NA NA #> c2 =~ y23 0.5540 NA NA NA #> c2 =~ y24 0.7080 NA NA NA #> eta2 =~ y51 0.5365 NA NA NA #> eta2 =~ y52 0.7066 NA NA NA #> eta2 =~ y53 0.8898 NA NA NA #> eta3 =~ c1 0.8631 NA NA NA #> eta3 =~ c2 0.8631 NA NA NA #> #> Estimated weights: #> ================== #> Weights Estimate Std. error t-stat. p-value #> eta1 <~ y41 0.4887 NA NA NA #> eta1 <~ y42 0.3665 NA NA NA #> eta1 <~ y43 0.3665 NA NA NA #> c1 <~ y11 0.8000 NA NA NA #> c1 <~ y12 0.4000 NA NA NA #> c2 <~ y21 0.5000 NA NA NA #> c2 <~ y22 0.3000 NA NA NA #> c2 <~ y23 0.2000 NA NA NA #> c2 <~ y24 0.4000 NA NA NA #> eta2 <~ y51 0.2617 NA NA NA #> eta2 <~ y52 0.3926 NA NA NA #> eta2 <~ y53 0.6543 NA NA NA #> eta3 <~ c1 0.5793 NA NA NA #> eta3 <~ c2 0.5793 NA NA NA #> #> Estimated measurement error correlations: #> ========================================= #> Correlation Estimate Std. error t-stat. p-value #> y41 ~~ y42 -0.1791 NA NA NA #> y41 ~~ y43 -0.1791 NA NA NA #> y42 ~~ y43 -0.1306 NA NA NA #> y11 ~~ y12 -0.2887 NA NA NA #> y21 ~~ y22 -0.1467 NA NA NA #> y21 ~~ y23 -0.1454 NA NA NA #> y21 ~~ y24 -0.2592 NA NA NA #> y22 ~~ y23 -0.0967 NA NA NA #> y22 ~~ y24 -0.1714 NA NA NA #> y23 ~~ y24 -0.0922 NA NA NA #> y51 ~~ y52 -0.1791 NA NA NA #> y51 ~~ y53 -0.1774 NA NA NA #> y52 ~~ y53 -0.2288 NA NA NA #> #> Estimated indicator correlations: #> ================================= #> Correlation Estimate Std. error t-stat. p-value #> y41 ~~ y42 0.5000 NA NA NA #> y41 ~~ y43 0.5000 NA NA NA #> y42 ~~ y43 0.5000 NA NA NA #> y11 ~~ y12 0.3125 NA NA NA #> y21 ~~ y22 0.4000 NA NA NA #> y21 ~~ y23 0.3000 NA NA NA #> y21 ~~ y24 0.3100 NA NA NA #> y22 ~~ y23 0.2800 NA NA NA #> y22 ~~ y24 0.3100 NA NA NA #> y23 ~~ y24 0.3000 NA NA NA #> y51 ~~ y52 0.2000 NA NA NA #> y51 ~~ y53 0.3000 NA NA NA #> y52 ~~ y53 0.4000 NA NA NA #> #> ------------------------------------ Effects ----------------------------------- #> #> Estimated total effects: #> ======================== #> Total effect Estimate Std. error t-stat. p-value #> eta2 ~ eta1 0.5000 NA NA NA #> eta3 ~ eta1 0.5500 NA NA NA #> eta3 ~ eta2 0.4000 NA NA NA #> #> Estimated indirect effects: #> =========================== #> Indirect effect Estimate Std. error t-stat. p-value #> eta3 ~ eta1 0.2000 NA NA NA #> ________________________________________________________________________________