With a fixed effect model, the effect size is identical across groups and we assume that any effect size variation between groups is due to sampling error. If we are testing attention span across 3 daycare facilities in the area and are only interested in these particular 3 facilities, the effect of the categorical variable - facility - is fixed.
In random effect models however the effect size varies. If we are conducting the same tests across 3 randomly chosen facilities out of a larger pool of facilities, then the categorical variable - facility - will be different every time (across a large number of samples).
In experimentation there are times when we have both fixed and random effects, hence the term “mixed” model. In this case we have a variable that contains all possible responses and another that only represents a sample of a larger pool of values.
This explanation is not meant to be exhaustive but rather summarize the basics. The scope of this manual is to show how to implement Bayesian methods in R.
Bayesian methods are an alternative approach to frequentist mixed effects methods. Bayesian methods are especially helpful with smaller sample sizes, but work for large samples as well. We will be using the brms package, which uses syntax similar to the lme4 package.
In Bayesian statistics the modeling approach originates from Bayes’ theorem, where we use information in the observed data to update knowledge about the model parameters. The background information is known as the prior distribution and combined with likelihood functions to determine the posterior distribution.
\[P(\theta, y)=P(\theta)P(y|\theta)\] Where the posterior distribution is represented by: \[P(\theta, y)\]
The prior distribution: \[P(\theta)\] and the sampling distribution represented by: \[P(y|\theta)\] Given this information, we can use our data (sampling distribution) along with a prior to calculate the posterior.
We will be using a dataset that comes with the lme4 pacakge called “sleepstudy”. This dataset comes from a study conducted by Belenkyn et al (2003).
In this dataset, we have 18 subjects with varying degrees of sleep deprivation.
str(lme4::sleepstudy)
## 'data.frame': 180 obs. of 3 variables:
## $ Reaction: num 250 259 251 321 357 ...
## $ Days : num 0 1 2 3 4 5 6 7 8 9 ...
## $ Subject : Factor w/ 18 levels "308","309","310",..: 1 1 1 1 1 1 1 1 1 1 ...
The response variable, Reaction
, represents the average
reaction time in milliseconds observed for subjects in a sleep
deprivation study. Days
represents the number of days the
given Subject
was deprived of sleep.
head(lme4::sleepstudy)
## Reaction Days Subject
## 1 249.5600 0 308
## 2 258.7047 1 308
## 3 250.8006 2 308
## 4 321.4398 3 308
## 5 356.8519 4 308
## 6 414.6901 5 308
Below we have a simple plot of the dataset.
lme4::sleepstudy %>%
ggplot(aes(x = Days, y = Reaction)) + geom_point() + theme_classic()
Using the parameter color = Subject
within the
aes()
call, we can color each point by Subject. For
example, the top of the legend shows that Subject 309 is depicted by
deep orange dots:
lme4::sleepstudy %>%
ggplot(aes(x = Days, y = Reaction, color = Subject)) + geom_point() + theme_classic()
Our first model assumes that the slope (reaction time over number of days) is the same for each subject. In this case, the goal is to estimate the intercept, which represents the baseline reaction time without sleep deprivation, for each subject.
\[y_{ij} = \beta_0 + \beta_{1}t_j + C_{i} + e_{ij}\] where \(y_{ij}\) is the reaction time of the \(i^{th}\) subject at j days, \(\beta_0\) is the population intercept, \(\beta_1\) is the population slope, \(t_j\) is the number of days at time \(t\), \(C_i\) is the random intercept for subject \(i\), and \(e_{ij}\) is the error term. Since the time markers are in days that are equally spread apart, \({t_j}\) is simply \({j}\), number of days.
Assumptions for this model:
1. \(e_{ij}\)’s are independent with
mean equal to 0, finite variance, and follow \(Normal(0,\sigma^2_e )\)
2. \(C_1, C_2, ..., C_n\) are all
independent and follow \(Normal(0,\sigma^2_c)\)
3. \(e_{ij}\) and \(C_i\) are independent
4. \(e_{ij}\), \(C_i\), and \(t_j\) are all independent
where,
\[i \in \{1,...,18\} , and\]
\[j \in \{0,...,9\}\]
We need to choose prior distributions for parameters that might be
helpful in forming a posterior distribution. I will use fitted
parameters from a mixed linear model on the same data built using the
lme4
package. See the lme4
section of this
manual for more info.
First, load in the rstan and brms packages, then build our base model.
require(rstan)
require(brms)
# Fit a base model using lmer
lmm <- lmer(Reaction ~ Days + (1|Subject), sleepstudy)
# Pull base model estimates from lmer model
b0 <- summary(lmm)$coefficients[1,1]
b1 <- summary(lmm)$coefficients[1,2]
se <- summary(lmm)$sigma
te <- attr(summary(lmm)$varcor$Subject, "stddev")
We are using the base model slope and intercept estimates as prior distributions of \(\beta_1\) and \(\beta_0\), the standard deviation of residuals for \(\sigma_e\), and the standard deviation of subject specific random intercept \(\tau_o\).
The priors we are using: \[\beta_0 \sim N(251, 10)\] \[\beta_1 \sim N(10, 10)\] \[\sigma_e \sim N(0, 31)\] \[\tau_0 \sim N(0, 37)\] ***
As mentioned before, this package does not fit models itself but uses
Stan
on the back end. Stan is a probabilistic programming
language with a C++ backend. By default this backend uses the static
Hamiltonian Monte-Carlo (HMC) Sampler and its extension the No-U-Turn
Sampler (NUTS) by Hoffman and Gelman(2014).
Alternatives to these algorithms, not available in the
brms
package include random-walk Metropolis and variations,
Gibbs-sampling, and slice-sampling. The documentation for this package
concludes that its choice of samplers converge much more quickly
especially for high-dimensional models regardless of whether the priors
are conjugate or not. Read the brms
documentation overview
for more information.
This function in the brms
package fits generalized
linear/non-linear multivariate multilevel models (MMM). To set priors,
we use the prior()
function, which accepts a distribution
with parameters as well as a class
argument which
designates which parameter the prior is associated with. The
brm
function does not handle parameters set as variables
inside of prior()
well, so you will need to manually enter
the prior values into the prior()
function.
Chains: The number of Markov chains to use. The advantage of using more chains is that we can be more confident of our understanding of the posterior distribution as each chain has a different starting point. If running in parallel using the “cluster” option, this package limits the number of chains to the number of cores the system has. Defaults to 4.
Iter: The total number of iterations per chain. Some datasets may require more iterations to converge - meaning it takes longer to reach a point where further iterations do not improve the model. A higher number will cause the model to take longer to fit, see a comparison below. Defaults to 2,000.
Warmup: Also known as burn-in. This refers to the
number of iterations at the beginning of an MCMC run to be thrown away.
The warm-up period is a hyper-parameter that is meant to give the Markov
Chain time to reach its equilibrium distribution. The idea here is that
a “bad” randomized starting point may over-sample regions that aren’t
representative of the equilibrium distribution. Defaults to
floor(iter/2)
.
Thin: Thinning consists of picking values at each
k-th step and removing them from the Markov chain. This helps speed up
model convergence if number of iterations, iter
, is large.
Defaults to 1, meaning all values are kept. Choose a value > 1 to
save memory and computation time if iter
is large.
Cores: Number of CPU cores to utilize for building
chans. As chains can be created independent of one another,
parallelization can be utilized to speed up the process. You can see how
many cores are available on your machine by using the
detectCores()
function from the package
parallel
.
Family: Defaults to Gaussian. Since we are building a linear mixed model, we will use leave use the default. More info on specifying distribution family can be found in the generalized mixed model section of this manual.
priors <- c(prior(normal(251, 10), class = Intercept), # intercept prior
prior(normal(10, 10), class = b), # slope prior
prior(normal(0, 31), class = sigma), # population sd
prior(normal(0, 37), class = sd) # tau0, group sd
)
model1 <- brm(Reaction ~ Days + (1 | Subject),
data = sleepstudy,
prior = priors,
chains = 4,
iter = 10000,
# warmup = ,
thin = 1,
cores = 4,
family = gaussian()
)
summary(model1)
Notes: * Within prior()
calls, sigma
refers to the population-level and sd refers to the group-level * The
brms
package adjustes prior distributions to be greater
than 0 in sd
and sigma
priors. * To compare
these results with the output from a model built with the
lme4
package, please see the frequentist section of this
manual above. * More iterations take longer to fit. If a model fails to
converge, you will see a warning output and may need to adjust
iterations or other fitting options in order to be successful.
Experiment with iterations until you reach an acceptable output. * If
you want to compare fitting times for different iterations, refer to the
“iteration_comparison” section of accompanying code this markdown
file.
With our model built, we can now get estimates of the parameters. R’s
built-in summary()
function does a good job explaining each
portion of the output one item at a time.
summary(model1)
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: Reaction ~ Days + (1 | Subject)
## Data: sleepstudy (Number of observations: 180)
## Draws: 4 chains, each with iter = 10000; warmup = 5000; thin = 1;
## total post-warmup draws = 20000
##
## Group-Level Effects:
## ~Subject (Number of levels: 18)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 45.58 8.94 31.17 66.13 1.00 3473 6345
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 225.75 9.30 206.90 243.33 1.00 3895 7696
## Days 10.46 0.80 8.90 12.02 1.00 19151 15566
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 31.15 1.75 27.95 34.79 1.00 17794 14570
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
The first 5 lines of the summary output describe our model and model
inputs defined in the brm
call.
If you want to access a specific estimate from the model object, they
can be accessed by using the $
operator on the summary
object of the model. For example:
summary(model1)$random$Subject
returns Subject-specific (or
group level) random effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 45.58409 8.944504 31.16719 66.12533 1.000897 3472.927 6345.14
If you want to access individual parts of this output, the
$
notation can be used in addition to the last command to
access each part.
Our estimate for the subject-specific random effect is 45.58, which differs from the base model estimate \(\tau_0\) = 37 although the base model estimate is still within the Bayesian credible interval (31.1671894, 66.1253317).
Rhat (\(\hat R\)) for a given parameter estimate is a measure of whether the samples obtained are likely representative of the true distribution. Put simply, it is the square root of the estimated total variance over within-chain variance. It’s formula:
\[\hat R=\sqrt{\frac{\hat {var}^+ (\theta|y)} {W}}\]
where,
The STAN backend code uses split \(\hat R\) that takes \(M\) chains, splits them in half, then performs the \(\hat R\) calculation on the resulting \(2M\) chains. So if there is only one chain, \(\hat R\) can still be calculated (\(M=2\)).
If this value is larger than 1.1, it is an indication that the model
has not converged adequately. This can be accessed for each parameter
estimate within the model summary. For example, \(\hat R\) for subject specific random
intercept, \(\hat R_{\hat\sigma_\tau}\)
is accessed with summary(model1)$random$Subject$Rhat
=
1.0008974.
Bulk_ESS or bulk effective sample size is a diagnostic for the sampling efficiency in the bulk of the posterior distribution. It measures the information content of a sample chain. It can be thought of as the minimum sample size from the posterior which has the same efficiency in estimating the posterior as a chain of MCMC samples. A higher number here is better - ESS > 400 is recommended (Vehtari, et al).
\[ESS = NM /\hat\pi\] where \(N\) is the length of the chain, \(M\) is the total number of chains, and \(\hat\pi\) represents estimated correlation between lagged samples in the chain.
Tail_ESS represents the minimum sample size needed to create 5% and 95% quantiles of the posterior.
\(\hat R\) of 1 and \(ESS\) greater than 400 are required to ensure an MCMC sample is useful in practice. More information on \(\hat R\), \(ESS\), and \(Tail ESS\) calculations - Vehtari, et al (2021).
This section shows our intercept estimate, \(\hat\beta_0\) = 225.75, our slope estimate, \(\hat\beta\) = 10.46 along with credible intervals, \(\hat R\), and bulk/tail ESS for each.
This shows our estimate for \(\hat\sigma_e\) = 31.15, compared to our
estimate from the lme4
model 30.99.
While the summary()
function has a tidy appearance, the
posterior_summary()
function produces much more information
about the estimates.
posterior_summary(model1)
## Estimate Est.Error Q2.5 Q97.5
## b_Intercept 225.750435 9.3027460 206.895825 243.32649
## b_Days 10.460900 0.8023936 8.895002 12.02149
## sd_Subject__Intercept 45.584095 8.9445037 31.167189 66.12533
## sigma 31.151958 1.7502682 27.954902 34.78601
## r_Subject[308,Intercept] 65.997152 13.0123027 40.806541 91.63120
## r_Subject[309,Intercept] -54.722034 12.3077081 -78.446039 -30.02530
## r_Subject[310,Intercept] -39.663749 12.3483840 -63.341846 -15.02465
## r_Subject[330,Intercept] 29.005029 12.8804220 4.196334 54.85045
## r_Subject[331,Intercept] 34.895612 12.7218829 10.439384 60.28100
## r_Subject[332,Intercept] 32.790871 12.7337185 7.934024 58.43186
## r_Subject[333,Intercept] 41.343382 12.8779965 16.657921 66.90051
## r_Subject[334,Intercept] 21.421917 12.6389381 -2.718623 46.62350
## r_Subject[335,Intercept] -21.454776 12.5544246 -45.750937 3.38582
## r_Subject[337,Intercept] 97.942819 13.2118430 72.473681 124.29456
## r_Subject[349,Intercept] 2.987682 12.6953216 -21.315485 28.43434
## r_Subject[350,Intercept] 38.854315 12.8335160 14.656843 64.85983
## r_Subject[351,Intercept] 16.550803 12.8477957 -8.065190 42.07075
## r_Subject[352,Intercept] 61.601552 13.0832915 36.354825 87.37896
## r_Subject[369,Intercept] 31.657925 12.8573637 7.300884 57.55121
## r_Subject[370,Intercept] 17.978869 12.6684198 -6.403454 43.27305
## r_Subject[371,Intercept] 21.093182 12.8087785 -3.666542 46.83127
## r_Subject[372,Intercept] 42.963274 12.8757294 18.264684 68.93640
## lprior -17.988315 1.7642717 -22.236738 -15.56261
## lp__ -910.611060 4.6416543 -920.700731 -902.61389
The first 4 lines are repeated from the summary, along with estimate quantiles. The next section of our model output displays \(C_{i}\), our intercept estimates for each subject. The means are interpreted as where the intercept is on the Y-axis in relation to the population mean. For example - subject 308 has an intercept mean of 88.46, so this could be interpreted as the subject having a reaction time of (88.46 + 204.98) = 293.44ms.
lprior
and lp__
are used for automatic
prior/likelihood sensitivity analysis using an additional package,
priorsens
and are not included in the brms
documentation.
If you want to show trace and density plots for all model parameters,
use plot(model)
:
plot(model1)
The left side of the plot shows the overall density of our parameter
estimates. The right side of the plot output shows a trace plot with
parameter estimates across all iterations. You can also plot individual
parameters by adding the pars=
argument to the plot
function call. plot(model1, pars=b_Intercept)
for example
only outputs the top two plots from the output above.
As an alternative the using the summary(model)$
notation, it’s also possible to pull model coefficients and error
estimates using the coef()
function. We can access
intercept estimates for each Subject, \(\hat
C_i\), to plot them along with the dataset.
If you would like to see example code on how to do this, view hidden
code in this section. For more model plotting options in the brms
package, see the mcmc_plot.brmsfit
section of the package
documentation.
You can also access the individual estimates for each iteration/chain
by using the as_draws()
function on the brms model object.
By default this function will return all variables from all chains,
excluding estimates from warmup/burn-in draws. You can pull a single
variable or list of variables by adding the variables=c()
,
pull values for a specific chain using the $
operator, and
include inc_warmup = TRUE
to the function call to include
warmup draws.
Ex:
as_draws(model1, variables=c('b_Intercept', 'sigma'), inc_warmup=TRUE)$3
will return b_Intercept
and sigma
estimates
for all 10,000 draws on chain #3.
Our second model has a random intercept term and a random slope, with correlated random effect. \[y_{ij} = \beta_0 + \beta_1 t_j + C_{i,0} + C_{i,1}t_j + e_{ij}\] Reaction time \(y_{ij}\) is equal to population intercept \(\beta_0\) plus population slope \(\beta_1 t\) times the number of days \(t_j\), plus subject specific random intercept \(C_{i,0}\) plus subject specific random slope \(C_{i,1}\) times the number of days \(t_j\), plus an error term \(e_{ij}\). \(C_{i,0}\) and \(C_{i,1}\) are both independent of \(e_{ij}\).
Additional Assumptions:
\[e_{ij}\sim N(0,\sigma_e)\]
\[\left(\begin{array}{c}
C_{i,0}\\
C_{i,1}\\
\end{array}\right) \sim N_2(0,\Sigma)\]
where \[\Sigma=\left[\begin{array}{c} \sigma^2_{C_{i,0}}& \sigma_{(C_{i,0},C_{i,1})}\\ \sigma_{(C_{i,0},C_{i,1})} & \sigma^2_{C_{i,1}}\\ \end{array}\right],\] \[K=Cov(C_{i,0}, C_{i,1})\]
There is an additional function add_criterion()
that
allows a model fit criteria to be added to a model object. There is no
criteria added to the model build function by default. Instead, a
separate call is needed after the model is built.
Criteria choices include:
loo: Leave one out cross-validation splits the data into training and testing sets, with the testing set consisting of only one observation. During each iteration, the model is trained and a prediction is made on the left out observation. (Bürkner, 2017)
kfold: K-fold cross validation refits the model K times, leaving one/Kth of the original data out. If K is set to the total number of observations in the data, then K-fold is identical to loo. (Bürkner, 2017)
waic: Widely Applicable Information Criterion is a generalization of the Akaike Information Criterion (AIC). It differs in that AIC is calculated using a point estimate, usually the maximum likelihood estimate, whereas WAIC is averaged over the posterior distribution. (Vehtari et al, 2013)
loo_subsample: An approximation of loo
using subsampling for efficiency - especially useful for large datasets.
(Magnusson, et al. 2019)
bayes_r2: Bayesian \(r^2\) solves the problem with typical \(r^2\) potentially having a numerator that is larger than the denominator. Classical \(R^2 = \frac{V_{n=1}^N \hat y_n } {V_{n=1}^Ny_n}\) - or fitted variance over total variance. With strong prior information and weak observed data it is possible for the fitted variance to be larger than total variance. Bayesian \(R_s^2=\frac {V_{n=1}^N y_n^{pred s}}{V_{n=1}^N y_n^{pred s} + var_{res}^s}\), or explained variance over explained variance plus residual variance. For a linear model, \(y_n^{pred s}\) is simply the linear predictor (\(X_n\beta\)). For a generalized linear model, \(y_n^{pred s}\) is the transformed linear predictor, with transformation dependent on which family is used. (Vehtari, et al. 2018)
marglik: This criterion uses log marginal likelihood
for model comparison. This is also referred to as the model evidence and
uses the Bayes factor to determine which model is best. This is
computationally expensive due to the additional posterior draws needed
to compute the likelihood. To use this criterion, you must set
save_pars = save_pars(all=TRUE)
in the original brm
function call to save all parameters. The brms package implements bridge
sampling to calculate the marginal likelihood. (Bürkner, 2017)
Here, we will create a function that accepts a model object as an argument and will return the model with several criteria added.
# Function to add comparison criteria to a model
update_model <- function(model){
model_crit <- add_criterion(model, c("waic", "loo", "bayes_R2"))
model_crit$comparison <- data.frame(
model_name = deparse(substitute(model)),
waic = model_crit$criteria$waic$waic,
loo = model_crit$criteria$loo$loo,
bayes_R2 = mean(model_crit$criteria$bayes_R2)
)
return(model_crit)
}
# Add critera to each model
model1 <- update_model(model1)
model2c <- update_model(model2c)
model2u <- update_model(model2u)
# Build data frame with model criteria
df <- rbind(model1$comparison,
model2c$comparison,
model2u$comparison)
library(knitr)
knitr::kable(df)
model_name | waic | loo | bayes_R2 |
---|---|---|---|
model1 | 1769.434 | 1770.096 | 0.7010747 |
model2c | 1720.075 | 1723.630 | 0.7924409 |
model2u | 1719.045 | 1722.365 | 0.7947852 |
For both waic and loo we are looking for the model with the lowest number. In each case, waic is lower than loo and model 2 - uncorrelated performs the best.
For bayes r2 we are looking for the highest number which would represent the most variance explained by the model. This criterion also chooses model 2 - uncorrelated as the best performing model.
Ultimately, you should check and compare several to make an informed decision on which model to proceed with in any given problem.
We will utilize the brms
package to approach a nonlinear
problem with a Bayesian mixed model.
The lme4
package offers another dataset, “Contagious
Bovine Pleuropneumonia” (CBPP) that is accessible using
data(cbpp)
. This dataset describes the serological
incidence of CBPP in zebu cattle during a survey implemented in 15
commercial herds in Ethiopia. The data contains 4 colums: 1. herd - a
factor identifying the herd 2. incidence - the number of new serological
cases for a given herd and time period 3. size - the number describing
the size of the herd at the beginning of a given time period 4. period -
a factors with levels 1 to 4.
head(cbpp)
## herd incidence size period
## 1 1 2 14 1
## 2 1 3 12 2
## 3 1 4 9 3
## 4 1 0 5 4
## 5 2 3 22 1
## 6 2 1 18 2
First we’ll plot the data points, colored by herd:
It is a little difficult to see which points belong to which herd. We can add a line to make it a little more clear:
cbpp %>%
ggplot(aes(x = period, y = (size-incidence)/size, group=herd)) +
geom_point(aes(col=herd)) +
geom_line(aes(col=herd)) +
ylab("Cattle Without Disease (%)") +
xlab("Period")
We will fit a generalized linear mixed model with family “binomial”.
This model will have a random herd specific intercept and random period
specific intercept. As before, we will build a model using the
lmer
package first to obtain priors.
\[ln(\frac{Pr(healthy_i=1)}{Pr(healthy_i=0)})=\beta_0
+ \beta_h + \beta_p\] where
* Term \(\beta_0\) is the population
intercept
* Term \(\beta_h\) is is the random
herd-level intercept for herd \(h\)
* Term \(\beta_p\) is the random
period-level intercept for period \(p\)
The herd level random effects are assumed to come from an iid normal distribution with mean 0 and some shared, herd-level variance, \(\beta_h \sim N(0,\sigma^2_h)\) , \(h \in {1,...,15}\). Period-level random intercepts are assumed iid normal, mean 0 and shared period-level variance, \(\beta_p \sim N(0,\sigma^2_p)\), \(p\in {1,2,3,4}\).
There is a useful function in the brms
package that
gives information on all parameters that may have priors assigned,
including the default priors that the package will use if
unspecified.
Calling the get_prior()
function with the formula,
family, and data specified shows this information:
# Get priors for our model
get_prior(cbind(incidence, size-incidence) ~ (1 | period) + (1 | herd),
family = binomial(link="logit"),
data = cbpp)
## prior class coef group resp dpar nlpar lb ub
## student_t(3, 0, 2.5) Intercept
## student_t(3, 0, 2.5) sd 0
## student_t(3, 0, 2.5) sd herd 0
## student_t(3, 0, 2.5) sd Intercept herd 0
## student_t(3, 0, 2.5) sd period 0
## student_t(3, 0, 2.5) sd Intercept period 0
## source
## default
## default
## (vectorized)
## (vectorized)
## (vectorized)
## (vectorized)
As before, we will build a model first using the lmer
package to obtain priors for a brm
model.
nlin_base_model <- glmer(cbind(incidence, size-incidence) ~ (1 | period) + (1 | herd),
family = binomial(link="logit"),
data = cbpp)
s <- summary(nlin_base_model)
coef(nlin_base_model)
There is a slight change in notation in the brm
call
here since we are using a generalized binomial model. The left side of
the model reads size-incidence | trials(size)
, which notes
the number of successes on the left (herd size - incidence) and the size
of each herd inside of trials()
on the right.
priors <- c(prior(normal(-2.3, 10), class = Intercept), # pop slope
prior(normal(0, 1), class = sd), # residual error
prior(normal(0, 0.71), class = sd, group = herd, coef = Intercept), # herd-level intercept sd
prior(normal(0, 0.54), class = sd, group = period, coef = Intercept)) # period-level intercept sd
nonlin_bayes <- brm(size-incidence | trials(size) ~ (1 | period) + (1 | herd),
family = binomial(link="logit"),
data = cbpp,
prior = priors,
chains = 4,
iter = 10000,
thin = 1,
cores = 4
)
## Compiling Stan program...
## Start sampling
Model summary:
## Family: binomial
## Links: mu = logit
## Formula: size - incidence | trials(size) ~ (1 | period) + (1 | herd)
## Data: cbpp (Number of observations: 56)
## Draws: 4 chains, each with iter = 10000; warmup = 5000; thin = 1;
## total post-warmup draws = 20000
##
## Group-Level Effects:
## ~herd (Number of levels: 15)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.71 0.19 0.39 1.14 1.00 7697 11858
##
## ~period (Number of levels: 4)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.64 0.23 0.28 1.18 1.00 10440 12095
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 2.27 0.41 1.49 3.12 1.00 6427 8390
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Posterior summary:
## Estimate Est.Error Q2.5 Q97.5
## b_Intercept 2.274503e+00 0.4143059 1.48759506 3.11625704
## sd_herd__Intercept 7.115428e-01 0.1928406 0.38870621 1.14432076
## sd_period__Intercept 6.350311e-01 0.2330193 0.28251741 1.18012151
## r_herd[1,Intercept] -5.812935e-01 0.4070634 -1.40153974 0.20197158
## r_herd[2,Intercept] 3.256277e-01 0.4111733 -0.44852618 1.17891784
## r_herd[3,Intercept] -3.785111e-01 0.3630653 -1.09225452 0.33506609
## r_herd[4,Intercept] 1.504352e-03 0.4549975 -0.86122768 0.92925592
## r_herd[5,Intercept] 2.251802e-01 0.3987275 -0.53222248 1.03662025
## r_herd[6,Intercept] 4.671788e-01 0.4316371 -0.32882567 1.38321888
## r_herd[7,Intercept] -8.918146e-01 0.4024964 -1.70155695 -0.12710418
## r_herd[8,Intercept] -6.555817e-01 0.3971951 -1.46153787 0.09210059
## r_herd[9,Intercept] 3.049577e-01 0.5062790 -0.62123157 1.36411760
## r_herd[10,Intercept] 6.055707e-01 0.4254937 -0.16062991 1.49808996
## r_herd[11,Intercept] 1.209941e-01 0.3614664 -0.57863749 0.85064713
## r_herd[12,Intercept] 1.018909e-01 0.4816100 -0.81083066 1.09201060
## r_herd[13,Intercept] 7.746042e-01 0.4544491 -0.03209427 1.73143867
## r_herd[14,Intercept] -1.016214e+00 0.4391320 -1.89588075 -0.17576095
## r_herd[15,Intercept] 6.018014e-01 0.4543928 -0.21424951 1.56930878
## r_period[1,Intercept] -7.926089e-01 0.3899514 -1.61442195 -0.06409133
## r_period[2,Intercept] 1.086725e-01 0.3926570 -0.67920041 0.90561120
## r_period[3,Intercept] 2.173730e-01 0.4014973 -0.57152776 1.03369687
## r_period[4,Intercept] 5.145987e-01 0.4287157 -0.26529105 1.42327462
## lprior -4.143545e+00 0.6685041 -5.83443897 -3.30195107
## lp__ -1.175746e+02 4.2232312 -126.71224081 -110.39124690
Bürkner, P.-C. (2017). brms: An R Package for Bayesian Multilevel Models Using Stan. Journal of Statistical Software, 80(1), 1–28. https://doi.org/10.18637/jss.v080.i01
Vehtari A., Gelman A., Goodrich B., Gabry J. (2019) Bayesian R2 and LOO-R2 (https://avehtari.github.io/bayes_R2/bayes_R2.html)
Vehtari A., Gelman A., Goodrich B., Gabry J. (2018) R-squared for Bayesian regression models (http://www.stat.columbia.edu/~gelman/research/unpublished/bayes_R2.pdf)
Vehtari et al (2013). Understanding predictive information criteria for Bayesian models. http://www.stat.columbia.edu/~gelman/research/published/waic_understand3.pdf
Vehtari A., Gelman A., Simpson D., Carpenter B., and Bürkner P. (2021). Rank-normalization, folding, and localization: An improved R-hat for assessing convergence of MCMC (with discussion). Bayesian Data Analysis. (https://projecteuclid.org/journals/bayesian-analysis/advance-publication/Rank-Normalization-Folding-and-Localization--An-Improved-R%CB%86-for/10.1214/20-BA1221.full)
Magnusson M., Vehtari A., Jonasson J., Anderson M. - Bayesian leave-one-out cross-validation for large data (2019) (http://proceedings.mlr.press/v97/magnusson19a/magnusson19a.pdf)
Gregory Belenky, Nancy J. Wesensten, David R. Thorne, Maria L. Thomas, Helen C. Sing, Daniel P. Redmond, Michael B. Russo and Thomas J. Balkin (2003). [[Sleepstudy data]]