What is significance?

This study provides practical recommendations for researchers using BSEM-N. By identifying the most effective priors for sparse factor loading structures, the research enhances the accuracy and reliability of parameter estimates. It also cautions against the use of zero-mean priors in cases where cross-loadings are substantial, helping to avoid biased results.

What are future directions?

Future research could expand on these findings by exploring how these priors perform across a broader range of data sets and structural models. Additionally, developing automated tools to assist in prior selection could make BSEM more accessible to practitioners without advanced statistical training.

Liang’s (2020) study offers valuable contributions to understanding the impact of prior selection in Bayesian Structural Equation Modeling. By addressing both theoretical and practical considerations, this research supports the continued refinement of statistical methods in psychology and education.

Liang, X. (2020). Prior Sensitivity in Bayesian Structural Equation Modeling for Sparse Factor Loading Structures. Educational and Psychological Measurement, 80(6), 1025-1058. https://doi.org/10.1177/0013164420906449

Bayesian SEM Prior Sensitivity

Published: December 5, 2020 · Last reviewed: May 6, 2026

📖2,003 words⏱8 min read📚4 references cited

The standard confirmatory factor analysis (CFA) machinery requires the analyst to commit, in advance, to which cross-loadings are exactly zero. This independent-clusters assumption is convenient for identification but rarely correct: most psychological measures have small but nonzero cross-loadings, and forcing them to zero produces systematic misfit that propagates into biased structural parameters. Bayesian structural equation modeling with small-variance normal priors (BSEM-N), introduced by Muthén and Asparouhov (2012), replaces the hard zero constraint with a soft one—a prior centered at zero with a small variance—that lets the data adjust cross-loadings toward whatever values they actually take. The price is a new modeling decision: how small the prior variance should be. Liang’s (2020) systematic prior-sensitivity study in Educational and Psychological Measurement answers that question for sparse loading structures.

Why the prior variance matters

In BSEM-N, every potential cross-loading receives a prior of the form N(0, ψ²). When ψ² is very small (say, 0.001), the model is nearly identical to a strict CFA: cross-loadings are pulled hard toward zero regardless of what the data say. When ψ² is large (say, 0.5), the model is essentially exploratory factor analysis: cross-loadings are unconstrained and the rotational indeterminacy of EFA reasserts itself. The interesting region is the middle, where the prior is informative enough to identify the model but flexible enough to recover meaningful cross-loadings.

The interaction between the prior variance and the true (population) cross-loading magnitude is what drives the sensitivity. A prior whose 95% credible interval is, say, (-0.07, 0.07) treats anything beyond ±0.07 as implausible a priori. If the population cross-loadings are 0.05, this prior is well-matched and recovers them accurately. If the population cross-loadings are 0.20, the same prior systematically underestimates them, and—because the variance has to come from somewhere—biases the primary loadings and factor correlations to compensate.

Liang’s simulation design

Liang (2020) crossed seven shrinkage prior variances (ψ² ∈ {.001, .005, .01, .02, .03, .05, .08}) with two population cross-loading magnitudes (.1 and .3), seven sample sizes (N = 50, 100, 200, 400, 600, 800, 1,000), and a sparse three-factor structure with a small number of nontrivial cross-loadings. The dependent measures were model fit, recovery of the population structure, true positive rate (correctly identifying nonzero cross-loadings), false positive rate (incorrectly flagging zero cross-loadings as nonzero), and parameter estimation accuracy.

The headline finding is intuitive once you see it: the optimal prior is the one whose 95% credible interval barely covers the population cross-loading values. For population cross-loadings of .1, this is roughly ψ² = .005 (95% CI ≈ ±0.14); for cross-loadings of .3, ψ² ≈ .03 to .05 (95% CI ≈ ±0.34 to ±0.44). Priors much tighter than this miss large cross-loadings entirely; priors much wider than this admit too many false positives. The empirically grounded recommendation for analysts is to match the prior credible interval to the smallest substantively meaningful cross-loading they would want to detect—not to default to whatever ψ² appears in textbook examples (often .001 or .01, both of which under-shrink for moderate cross-loadings).

What goes wrong when the prior is misspecified

The most consequential misspecification is using BSEM-N with zero-mean priors when the true cross-loadings are large (≥.3). Because the shrinkage prior pulls cross-loadings toward zero, their unmodeled variance is absorbed into nearby parameters. Liang (2020) documents two systematic biases in this regime: primary loadings are inflated (because the model attributes the indicator’s true cross-loading variance to its primary factor) and factor correlations are inflated (because shared cross-loading variance across factors looks like factor overlap when it is misattributed). These biases are not subtle—they reach 0.05 to 0.10 in standardized metric for cross-loadings of .3 and shrinkage priors of .001—and they invalidate substantive interpretation of the structural part of the model.

The empirical example in Liang’s Study 2 used the Arkansas Rehabilitation Services Comprehensive Needs Assessment (N = 623, 15 items, three intended factors: readiness/placement, barriers to service, service efficacy). Across the seven prior variances, primary loading estimates differed by up to 0.15 standardized units, factor correlations ranged from near-zero to substantively meaningful, and the model fit indices (PPP-value, BRMSEA) shifted enough to change qualitative conclusions about model adequacy. Same data, same model class, different priors, different conclusions.

Position in the broader BSEM literature

Liang’s (2020) results sit between two related contributions. Muthén and Asparouhov (2012) proposed BSEM-N as a solution to the over-rejection problem in standard CFA: real-world data rarely satisfy the independent-clusters assumption, but freeing all cross-loadings makes the model unidentifiable, while the BSEM-N compromise of small-variance priors keeps identification while accommodating realistic cross-loading patterns. Their original paper used ψ² = .01 as a default, a choice that subsequent simulation work has shown to be appropriate only when population cross-loadings are small.

Van Erp, Mulder, and Oberski (2018) preceded Liang with a broader sensitivity analysis across multiple BSEM contexts. They showed that prior choice affects model-fit indices (in particular, posterior predictive p-values and BRMSEA) at least as strongly as it affects parameter estimates, and they argued for routine sensitivity analysis—running the same model across a grid of prior variances and reporting the range—rather than a single point estimate. Their recommendation has not been universally adopted, in part because the modal applied paper still reports BSEM results as if the prior were a fixed feature of the model rather than a tunable hyperparameter.

More recently, Edeh, Liang, and Cao (2025) extended the prior sensitivity question to model size, finding that fit-index sensitivity to prior variance grows with the number of indicators per factor and with the number of factors. The practical consequence is that BSEM-N analyses of large measurement instruments (40+ items) are more vulnerable to prior misspecification than analyses of small instruments, and benefit more from sensitivity reporting.

Beyond normal shrinkage: alternative prior families

Liang (2020) restricted attention to BSEM-N—small-variance normal priors—because that is the formulation introduced by Muthén and Asparouhov (2012) and the one most commonly implemented in applied software (Mplus, blavaan). The broader Bayesian variable-selection literature offers two alternatives whose properties are relevant when the BSEM-N tradeoff is unsatisfactory.

Spike-and-slab priors place a discrete mixture over each cross-loading: with probability π, the cross-loading is exactly zero (the spike); with probability 1-π, it is drawn from a diffuse distribution (the slab). This formulation explicitly models the binary “is it zero?” question that BSEM-N converts into a continuous shrinkage. Spike-and-slab is theoretically attractive for sparse loading structures because the posterior probability of nonzero loading is directly interpretable, but it is computationally demanding and identifiability of π is fragile in small samples.

Horseshoe and regularized horseshoe priors place a continuous prior with a sharp peak at zero and heavy tails. Cross-loadings near zero are pulled hard toward zero (mimicking BSEM-N with small ψ²); cross-loadings that are clearly large are barely shrunk (avoiding the BSEM-N bias against substantial cross-loadings). The horseshoe family adapts its shrinkage strength to the magnitude of each individual cross-loading rather than imposing a uniform shrinkage across all of them. Van Erp, Oberski, and Mulder’s broader work on shrinkage priors for Bayesian penalized regression suggests these formulations dominate normal shrinkage when cross-loading magnitudes are heterogeneous—exactly the realistic case Liang’s results identify as problematic for BSEM-N.

The applied literature has been slow to adopt spike-and-slab and horseshoe formulations, in part because mainstream SEM software does not implement them with the same convenience as BSEM-N, and in part because the additional flexibility comes at the cost of additional hyperparameter choices. For analysts working in Stan or general probabilistic-programming environments, however, these alternatives are now accessible and avoid the specific failure mode (large cross-loadings, zero-mean shrinkage) that Liang documents.

Practical recommendations from the 2020 study

Three concrete guidelines emerge from Liang (2020):

Calibrate the prior credible interval to substantive cross-loading expectations. If theory or pilot data suggest cross-loadings could be as large as 0.3, use ψ² ≈ .03; if cross-loadings are expected to be small (.1 or below), ψ² ≈ .005 is appropriate. Defaulting to .01 without thought will be approximately right for one regime and approximately wrong for the other.
Run a sensitivity grid. Report the model across at least three to five prior variances spanning the plausible range. If primary loadings, factor correlations, and fit indices are stable across the grid, the inference is robust; if they shift substantially, that instability is itself a finding to report.
Avoid BSEM-N with zero-mean priors when cross-loadings are likely to be large. When measurement theory or empirical evidence suggests substantial nonzero cross-loadings, a model that estimates them freely (with weak rather than informative priors) is less biased than a shrinkage formulation that fights against the data.

Implications for measurement modeling practice

The deeper methodological point underlying Liang (2020) is that BSEM is not a tuning-free alternative to CFA—it is CFA with one additional decision shifted from a binary constraint (cross-loading is or is not exactly zero) to a continuous one (how strongly to shrink). The continuous version is more honest because measurement instruments rarely produce exactly zero cross-loadings, but it is also more demanding: the analyst now owes the reader an explicit justification for the shrinkage strength, with sensitivity analysis to demonstrate that the substantive conclusions are not artifacts of that choice.

The 2020 paper, together with van Erp et al. (2018) and the more recent Edeh et al. (2025) extension, builds the empirical foundation for that justification. The remaining gap—addressed only partially in current literature—is automated tools that select prior variances data-adaptively rather than requiring the analyst to specify them. Until such tools are mature, sensitivity analysis remains the only credible defense against prior-driven inference in BSEM-N.

Frequently asked questions

What is BSEM-N?

Bayesian structural equation modeling with small-variance normal priors (BSEM-N), introduced by Muthén and Asparouhov (2012), replaces the strict CFA constraint that cross-loadings are exactly zero with a soft prior centered at zero with a small variance. The data are then allowed to adjust cross-loadings toward whatever values they actually take, while the prior provides enough regularization to keep the model identified.

Why does the prior variance matter so much?

The prior variance ψ² controls how strongly cross-loadings are pulled toward zero. When ψ² is very small, the model is nearly identical to a strict CFA. When ψ² is large, it approaches exploratory factor analysis with rotational indeterminacy. The optimal middle depends on the population cross-loading magnitudes: a prior that is too tight misses real cross-loadings, and a prior that is too wide admits too many false positives.

Match the prior 95% credible interval to the smallest substantively meaningful cross-loading you would want to detect. For population cross-loadings around 0.1, ψ² ≈ .005 is appropriate; for cross-loadings around 0.3, ψ² ≈ .03 to .05. Defaulting to the textbook ψ² = .01 without thought will be approximately right for small cross-loadings and substantially wrong when cross-loadings are large.

What goes wrong when the prior is misspecified?

When BSEM-N uses a small zero-mean prior but the true cross-loadings are large, the unmodeled cross-loading variance is absorbed into nearby parameters: primary loadings are inflated and factor correlations are inflated. In Liang’s simulations, these biases reach 0.05 to 0.10 in standardized metric and invalidate substantive interpretation of the structural part of the model.

Should I run a sensitivity analysis?

Yes. Van Erp, Mulder, and Oberski (2018) showed that prior choice affects model-fit indices at least as strongly as it affects parameter estimates, and Liang’s (2020) empirical example shows the same data can produce qualitatively different conclusions across different priors. Reporting the model across at least three to five prior variances spanning the plausible range is the credible defense against prior-driven inference.

Are there alternatives to BSEM-N?

Yes. Spike-and-slab priors place a discrete mixture over each cross-loading, modeling the binary “is it zero?” question directly. Horseshoe and regularized horseshoe priors adapt their shrinkage strength to the magnitude of each individual cross-loading—pulling near-zero loadings hard toward zero while barely shrinking clearly large ones. These formulations dominate normal shrinkage when cross-loading magnitudes are heterogeneous, but they require probabilistic-programming environments rather than mainstream SEM software.

References

Edeh, E., Liang, X., & Cao, C. (2025). Probing beyond: The impact of model size and prior informativeness on Bayesian SEM fit indices. Behavior Research Methods, 57(4). https://doi.org/10.3758/s13428-025-02609-2
Liang, X. (2020). Prior sensitivity in Bayesian structural equation modeling for sparse factor loading structures. Educational and Psychological Measurement, 80(6), 1025-1058. https://doi.org/10.1177/0013164420906449
Muthén, B., & Asparouhov, T. (2012). Bayesian structural equation modeling: A more flexible representation of substantive theory. Psychological Methods, 17(3), 313-335. https://doi.org/10.1037/a0026802
van Erp, S., Mulder, J., & Oberski, D. L. (2018). Prior sensitivity analysis in default Bayesian structural equation modeling. Psychological Methods, 23(2), 363-388. https://doi.org/10.1037/met0000162

Xavier Jouve, Ph.D.PsychometricianPhD

Xavier Jouve, Ph.D., is a psychometrician and quantitative psychologist specializing in cognitive ability measurement, item response theory, and test development. He is Head of Research at Cogn-IQ, where he has designed and validated seven cognitive assessment instruments — including the JCTI (inductive reasoning), JCCES (crystallized intelligence), IAW (vocabulary), JCFS (figurative sequences), JCWS (verbal reasoning), GIE (general knowledge), and WN (logical inference) — collectively normed on over 13,000 examinees. His work applies 2PL IRT modeling, computerized adaptive testing, and advanced composite scoring methods (including the modified Tellegen & Briggs Formula 4 with cubic correction) to produce research-grade cognitive measures available online. ORCID: 0009-0006-1283-045X

ORCID

Related Research

Cognitive Development and Neurodevelopment

Premature Birth and Long-Term Cognition

Approximately 10% of babies worldwide are born prematurely — before 37 weeks of gestation. For parents of preterm infants, particularly those born very early, a…

Oct 31, 2025

Cognitive Abilities and Intelligence

Can You Actually Increase Your IQ?

Few questions in psychology generate as much debate as whether intelligence is fixed or malleable. The idea that IQ is set in stone — hardwired…

Feb 1, 2025

Cognitive Abilities and Intelligence

Decoding High Intelligence: Interdisciplinary Insights

Research into high intelligence provides valuable insights into human cognitive abilities and their impact on individual and societal progress. By exploring the historical development of…

Oct 27, 2023

Psychological Measurement and Testing

History of the WAIS Intelligence Scale

The Wechsler Adult Intelligence Scale (WAIS), developed in 1955 by David Wechsler, introduced a broader and more dynamic approach to assessing cognitive abilities. Over the…