Fixed vs Random Effects

With a fixed effect model, the effect size is identical across groups and we assume that any effect size variation between groups is due to sampling error. If we are testing attention span across 3 daycare facilities in the area and are only interested in these particular 3 facilities, the effect of the categorical variable - facility - is fixed.

In random effect models however the effect size varies. If we are conducting the same tests across 3 randomly chosen facilities out of a larger pool of facilities, then the categorical variable - facility - will be different every time (across a large number of samples).

Mixed Models

In experimentation there are times when we have both fixed and random effects, hence the term “mixed” model. In this case we have a variable that contains all possible responses and another that only represents a sample of a larger pool of values.

This explanation is not meant to be exhaustive but rather summarize the basics. The scope of this manual is to show how to implement Bayesian methods in R.

Bayesian Methods

Bayesian methods are an alternative approach to frequentist mixed effects methods. Bayesian methods are especially helpful with smaller sample sizes, but work for large samples as well. We will be using the brms package, which uses syntax similar to the lme4 package.

Bayes Theorem

In Bayesian statistics the modeling approach originates from Bayes’ theorem, where we use information in the observed data to update knowledge about the model parameters. The background information is known as the prior distribution and combined with likelihood functions to determine the posterior distribution.

\[P(\theta, y)=P(\theta)P(y|\theta)\] Where the posterior distribution is represented by: \[P(\theta, y)\]

The prior distribution: \[P(\theta)\] and the sampling distribution represented by: \[P(y|\theta)\] Given this information, we can use our data (sampling distribution) along with a prior to calculate the posterior.

Linear Mixed Models

Introduction to the dataset

We will be using a dataset that comes with the lme4 pacakge called “sleepstudy”. This dataset comes from a study conducted by Belenkyn et al (2003).

In this dataset, we have 18 subjects with varying degrees of sleep deprivation.

str(lme4::sleepstudy)

## 'data.frame':    180 obs. of  3 variables:
##  $ Reaction: num  250 259 251 321 357 ...
##  $ Days    : num  0 1 2 3 4 5 6 7 8 9 ...
##  $ Subject : Factor w/ 18 levels "308","309","310",..: 1 1 1 1 1 1 1 1 1 1 ...

The response variable, Reaction, represents the average reaction time in milliseconds observed for subjects in a sleep deprivation study. Days represents the number of days the given Subject was deprived of sleep.

head(lme4::sleepstudy)

##   Reaction Days Subject
## 1 249.5600    0     308
## 2 258.7047    1     308
## 3 250.8006    2     308
## 4 321.4398    3     308
## 5 356.8519    4     308
## 6 414.6901    5     308

Below we have a simple plot of the dataset.

lme4::sleepstudy %>% 
    ggplot(aes(x = Days, y = Reaction)) + geom_point() + theme_classic()

Using the parameter color = Subject within the aes() call, we can color each point by Subject. For example, the top of the legend shows that Subject 309 is depicted by deep orange dots:

lme4::sleepstudy %>% 
    ggplot(aes(x = Days, y = Reaction, color = Subject)) + geom_point() + theme_classic()

Model 1

Our first model assumes that the slope (reaction time over number of days) is the same for each subject. In this case, the goal is to estimate the intercept, which represents the baseline reaction time without sleep deprivation, for each subject.

\[y_{ij} = \beta_0 + \beta_{1}t_j + C_{i} + e_{ij}\] where $y_{ij}$ is the reaction time of the $i^{th}$ subject at j days, $\beta_0$ is the population intercept, $\beta_1$ is the population slope, $t_j$ is the number of days at time $t$, $C_i$ is the random intercept for subject $i$, and $e_{ij}$ is the error term. Since the time markers are in days that are equally spread apart, ${t_j}$ is simply ${j}$, number of days.

Assumptions for this model:
1. $e_{ij}$’s are independent with mean equal to 0, finite variance, and follow $Normal(0,\sigma^2_e )$
2. $C_1, C_2, ..., C_n$ are all independent and follow $Normal(0,\sigma^2_c)$
3. $e_{ij}$ and $C_i$ are independent
4. $e_{ij}$, $C_i$, and $t_j$ are all independent

where,
\[i \in \{1,...,18\} , and\]
\[j \in \{0,...,9\}\]

Priors

We need to choose prior distributions for parameters that might be helpful in forming a posterior distribution. I will use fitted parameters from a mixed linear model on the same data built using the lme4 package. See the lme4 section of this manual for more info.

First, load in the rstan and brms packages, then build our base model.

require(rstan)
require(brms)

# Fit a base model using lmer
lmm <- lmer(Reaction ~ Days + (1|Subject), sleepstudy)

# Pull base model estimates from lmer model
b0 <- summary(lmm)$coefficients[1,1]
b1 <- summary(lmm)$coefficients[1,2]
se <- summary(lmm)$sigma
te <- attr(summary(lmm)$varcor$Subject, "stddev")

We are using the base model slope and intercept estimates as prior distributions of $\beta_1$ and $\beta_0$, the standard deviation of residuals for $\sigma_e$, and the standard deviation of subject specific random intercept $\tau_o$.

The priors we are using: \[\beta_0 \sim N(251, 10)\] \[\beta_1 \sim N(10, 10)\] \[\sigma_e \sim N(0, 31)\] \[\tau_0 \sim N(0, 37)\] ***

BRMS package

Sampler

As mentioned before, this package does not fit models itself but uses Stan on the back end. Stan is a probabilistic programming language with a C++ backend. By default this backend uses the static Hamiltonian Monte-Carlo (HMC) Sampler and its extension the No-U-Turn Sampler (NUTS) by Hoffman and Gelman(2014).

Alternatives to these algorithms, not available in the brms package include random-walk Metropolis and variations, Gibbs-sampling, and slice-sampling. The documentation for this package concludes that its choice of samplers converge much more quickly especially for high-dimensional models regardless of whether the priors are conjugate or not. Read the brms documentation overview for more information.

The brm() function

This function in the brms package fits generalized linear/non-linear multivariate multilevel models (MMM). To set priors, we use the prior() function, which accepts a distribution with parameters as well as a class argument which designates which parameter the prior is associated with. The brm function does not handle parameters set as variables inside of prior() well, so you will need to manually enter the prior values into the prior() function.

Default arguments for brm()

Chains: The number of Markov chains to use. The advantage of using more chains is that we can be more confident of our understanding of the posterior distribution as each chain has a different starting point. If running in parallel using the “cluster” option, this package limits the number of chains to the number of cores the system has. Defaults to 4.

Iter: The total number of iterations per chain. Some datasets may require more iterations to converge - meaning it takes longer to reach a point where further iterations do not improve the model. A higher number will cause the model to take longer to fit, see a comparison below. Defaults to 2,000.

Warmup: Also known as burn-in. This refers to the number of iterations at the beginning of an MCMC run to be thrown away. The warm-up period is a hyper-parameter that is meant to give the Markov Chain time to reach its equilibrium distribution. The idea here is that a “bad” randomized starting point may over-sample regions that aren’t representative of the equilibrium distribution. Defaults to floor(iter/2).

Thin: Thinning consists of picking values at each k-th step and removing them from the Markov chain. This helps speed up model convergence if number of iterations, iter, is large. Defaults to 1, meaning all values are kept. Choose a value > 1 to save memory and computation time if iter is large.

Cores: Number of CPU cores to utilize for building chans. As chains can be created independent of one another, parallelization can be utilized to speed up the process. You can see how many cores are available on your machine by using the detectCores() function from the package parallel.

Family: Defaults to Gaussian. Since we are building a linear mixed model, we will use leave use the default. More info on specifying distribution family can be found in the generalized mixed model section of this manual.

priors <- c(prior(normal(251, 10), class = Intercept), # intercept prior
            prior(normal(10, 10), class = b), # slope prior
            prior(normal(0, 31), class = sigma), # population sd
            prior(normal(0, 37), class = sd) # tau0, group sd
            )

model1 <- brm(Reaction ~ Days + (1 | Subject),
             data = sleepstudy,
             prior = priors,
             chains = 4,
             iter = 10000,
             # warmup = ,
             thin = 1, 
             cores = 4,
             family = gaussian()
             )

summary(model1)

Notes: * Within prior() calls, sigma refers to the population-level and sd refers to the group-level * The brms package adjustes prior distributions to be greater than 0 in sd and sigma priors. * To compare these results with the output from a model built with the lme4 package, please see the frequentist section of this manual above. * More iterations take longer to fit. If a model fails to converge, you will see a warning output and may need to adjust iterations or other fitting options in order to be successful. Experiment with iterations until you reach an acceptable output. * If you want to compare fitting times for different iterations, refer to the “iteration_comparison” section of accompanying code this markdown file.

Model Summary

With our model built, we can now get estimates of the parameters. R’s built-in summary() function does a good job explaining each portion of the output one item at a time.

summary(model1)

##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: Reaction ~ Days + (1 | Subject) 
##    Data: sleepstudy (Number of observations: 180) 
##   Draws: 4 chains, each with iter = 10000; warmup = 5000; thin = 1;
##          total post-warmup draws = 20000
## 
## Group-Level Effects: 
## ~Subject (Number of levels: 18) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)    45.58      8.94    31.17    66.13 1.00     3473     6345
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept   225.75      9.30   206.90   243.33 1.00     3895     7696
## Days         10.46      0.80     8.90    12.02 1.00    19151    15566
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma    31.15      1.75    27.95    34.79 1.00    17794    14570
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

The first 5 lines of the summary output describe our model and model inputs defined in the brm call.

Group-Level Effects

If you want to access a specific estimate from the model object, they can be accessed by using the $ operator on the summary object of the model. For example: summary(model1)$random$Subject returns Subject-specific (or group level) random effects:

##               Estimate Est.Error l-95% CI u-95% CI     Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 45.58409  8.944504 31.16719 66.12533 1.000897 3472.927  6345.14

If you want to access individual parts of this output, the $ notation can be used in addition to the last command to access each part.

Our estimate for the subject-specific random effect is 45.58, which differs from the base model estimate $\tau_0$ = 37 although the base model estimate is still within the Bayesian credible interval (31.1671894, 66.1253317).

Rhat ($\hat R$) for a given parameter estimate is a measure of whether the samples obtained are likely representative of the true distribution. Put simply, it is the square root of the estimated total variance over within-chain variance. It’s formula:

\[\hat R=\sqrt{\frac{\hat {var}^+ (\theta|y)} {W}}\]

where,

Total posterior variance of a given paramater, ${\hat {var}}^+(\theta |y)$ = $\frac{N-1}{N}W + \frac{1}{N}B$
The within-chain variance estimate $W=\frac{1}{M} \sum_{m=1}^M s_m^2$, where $s^2_m=\frac{1}{N-1}\sum_{n=1}^N (\theta^{(n,m)}-\bar \theta^{(\cdot, m)})^2$
Between-chain variance estimate $B=\frac{N}{M-1}\sum_{m=1}^M(\bar\theta^{(\cdot,m)}-\bar\theta^{(\cdot,\cdot)})^2$, where $\bar\theta^{(\cdot,m)}=\frac{1}{N}\sum_{n=1}^N\theta^{(n,m)}$, $\bar\theta^{(..)}=\frac{1}{M}\sum_{m=1}^M\bar\theta^{(.m)}$
$N$ is the total number of iterations (iters) per chain, $M$ is total number of chains, $S=MN$ is the total number of draws from all chains, $\theta^{(nm)}$ is the $n$th draw of the $m$th chain, $\theta^{(.m)}$ is the average of draws from the $m$th chain, and $\theta^{(..)}$ is the parameter of interest.

The STAN backend code uses split $\hat R$ that takes $M$ chains, splits them in half, then performs the $\hat R$ calculation on the resulting $2M$ chains. So if there is only one chain, $\hat R$ can still be calculated ($M=2$).

If this value is larger than 1.1, it is an indication that the model has not converged adequately. This can be accessed for each parameter estimate within the model summary. For example, $\hat R$ for subject specific random intercept, $\hat R_{\hat\sigma_\tau}$ is accessed with summary(model1)$random$Subject$Rhat = 1.0008974.

Bulk_ESS or bulk effective sample size is a diagnostic for the sampling efficiency in the bulk of the posterior distribution. It measures the information content of a sample chain. It can be thought of as the minimum sample size from the posterior which has the same efficiency in estimating the posterior as a chain of MCMC samples. A higher number here is better - ESS > 400 is recommended (Vehtari, et al).

\[ESS = NM /\hat\pi\] where $N$ is the length of the chain, $M$ is the total number of chains, and $\hat\pi$ represents estimated correlation between lagged samples in the chain.

Tail_ESS represents the minimum sample size needed to create 5% and 95% quantiles of the posterior.

$\hat R$ of 1 and $ESS$ greater than 400 are required to ensure an MCMC sample is useful in practice. More information on $\hat R$, $ESS$, and $Tail ESS$ calculations - Vehtari, et al (2021).

Population-Level Effects

This section shows our intercept estimate, $\hat\beta_0$ = 225.75, our slope estimate, $\hat\beta$ = 10.46 along with credible intervals, $\hat R$, and bulk/tail ESS for each.

Family Specific Parameters:

This shows our estimate for $\hat\sigma_e$ = 31.15, compared to our estimate from the lme4 model 30.99.

Posterior Summary

While the summary() function has a tidy appearance, the posterior_summary() function produces much more information about the estimates.

posterior_summary(model1)

##                             Estimate  Est.Error        Q2.5      Q97.5
## b_Intercept               225.750435  9.3027460  206.895825  243.32649
## b_Days                     10.460900  0.8023936    8.895002   12.02149
## sd_Subject__Intercept      45.584095  8.9445037   31.167189   66.12533
## sigma                      31.151958  1.7502682   27.954902   34.78601
## r_Subject[308,Intercept]   65.997152 13.0123027   40.806541   91.63120
## r_Subject[309,Intercept]  -54.722034 12.3077081  -78.446039  -30.02530
## r_Subject[310,Intercept]  -39.663749 12.3483840  -63.341846  -15.02465
## r_Subject[330,Intercept]   29.005029 12.8804220    4.196334   54.85045
## r_Subject[331,Intercept]   34.895612 12.7218829   10.439384   60.28100
## r_Subject[332,Intercept]   32.790871 12.7337185    7.934024   58.43186
## r_Subject[333,Intercept]   41.343382 12.8779965   16.657921   66.90051
## r_Subject[334,Intercept]   21.421917 12.6389381   -2.718623   46.62350
## r_Subject[335,Intercept]  -21.454776 12.5544246  -45.750937    3.38582
## r_Subject[337,Intercept]   97.942819 13.2118430   72.473681  124.29456
## r_Subject[349,Intercept]    2.987682 12.6953216  -21.315485   28.43434
## r_Subject[350,Intercept]   38.854315 12.8335160   14.656843   64.85983
## r_Subject[351,Intercept]   16.550803 12.8477957   -8.065190   42.07075
## r_Subject[352,Intercept]   61.601552 13.0832915   36.354825   87.37896
## r_Subject[369,Intercept]   31.657925 12.8573637    7.300884   57.55121
## r_Subject[370,Intercept]   17.978869 12.6684198   -6.403454   43.27305
## r_Subject[371,Intercept]   21.093182 12.8087785   -3.666542   46.83127
## r_Subject[372,Intercept]   42.963274 12.8757294   18.264684   68.93640
## lprior                    -17.988315  1.7642717  -22.236738  -15.56261
## lp__                     -910.611060  4.6416543 -920.700731 -902.61389

The first 4 lines are repeated from the summary, along with estimate quantiles. The next section of our model output displays $C_{i}$, our intercept estimates for each subject. The means are interpreted as where the intercept is on the Y-axis in relation to the population mean. For example - subject 308 has an intercept mean of 88.46, so this could be interpreted as the subject having a reaction time of (88.46 + 204.98) = 293.44ms.

lprior and lp__ are used for automatic prior/likelihood sensitivity analysis using an additional package, priorsens and are not included in the brms documentation.

Plots

If you want to show trace and density plots for all model parameters, use plot(model):

plot(model1)

The left side of the plot shows the overall density of our parameter estimates. The right side of the plot output shows a trace plot with parameter estimates across all iterations. You can also plot individual parameters by adding the pars= argument to the plot function call. plot(model1, pars=b_Intercept) for example only outputs the top two plots from the output above.

Plotting Model Estimates

As an alternative the using the summary(model)$ notation, it’s also possible to pull model coefficients and error estimates using the coef() function. We can access intercept estimates for each Subject, $\hat C_i$, to plot them along with the dataset.

If you would like to see example code on how to do this, view hidden code in this section. For more model plotting options in the brms package, see the mcmc_plot.brmsfit section of the package documentation.

You can also access the individual estimates for each iteration/chain by using the as_draws() function on the brms model object. By default this function will return all variables from all chains, excluding estimates from warmup/burn-in draws. You can pull a single variable or list of variables by adding the variables=c(), pull values for a specific chain using the $ operator, and include inc_warmup = TRUE to the function call to include warmup draws.

Ex: as_draws(model1, variables=c('b_Intercept', 'sigma'), inc_warmup=TRUE)$3 will return b_Intercept and sigma estimates for all 10,000 draws on chain #3.

Model 2

Our second model has a random intercept term and a random slope, with correlated random effect. \[y_{ij} = \beta_0 + \beta_1 t_j + C_{i,0} + C_{i,1}t_j + e_{ij}\] Reaction time $y_{ij}$ is equal to population intercept $\beta_0$ plus population slope $\beta_1 t$ times the number of days $t_j$, plus subject specific random intercept $C_{i,0}$ plus subject specific random slope $C_{i,1}$ times the number of days $t_j$, plus an error term $e_{ij}$. $C_{i,0}$ and $C_{i,1}$ are both independent of $e_{ij}$.

Additional Assumptions:
\[e_{ij}\sim N(0,\sigma_e)\] \[\left(\begin{array}{c} C_{i,0}\\ C_{i,1}\\ \end{array}\right) \sim N_2(0,\Sigma)\]

where \[\Sigma=\left[\begin{array}{c} \sigma^2_{C_{i,0}}& \sigma_{(C_{i,0},C_{i,1})}\\ \sigma_{(C_{i,0},C_{i,1})} & \sigma^2_{C_{i,1}}\\ \end{array}\right],\] \[K=Cov(C_{i,0}, C_{i,1})\]

Model 2 - Correlated

We will first build this model allowing correlation between subject specific random effect and number of days. We will fit a base model using lmer with correlation denoted by (Days | Subject). We will use the estimates from the base model as priors for a brm model.

You can access model estimates, estimated error, 2.5% annd 97.5% credible intervals by using the posterior_summary() function on the model object.

posterior_summary(model2c)

##                                   Estimate  Est.Error         Q2.5        Q97.5
## b_Intercept                   239.11281691  7.6176033  223.3015820  253.4205455
## b_Days                          7.47672642  1.7015021    3.9178537   10.6486694
## sd_Subject__Intercept          27.95829665  6.8871234   16.4132145   43.3910576
## sd_Subject__Days                6.85096256  1.5419022    4.3553367   10.3884821
## cor_Subject__Intercept__Days    0.23565757  0.2817023   -0.3253521    0.7545215
## sigma                          25.89281605  1.5480762   23.0697866   29.1072707
## r_Subject[308,Intercept]       14.90035470 14.6260858  -14.0130362   43.6943427
## r_Subject[309,Intercept]      -28.93318332 13.0956958  -55.0931592   -3.6378730
## r_Subject[310,Intercept]      -27.11220711 13.1054875  -53.1910681   -1.6514660
## r_Subject[330,Intercept]       33.75470545 14.6961006    6.5074684   63.8823257
## r_Subject[331,Intercept]       32.66063878 14.6200051    5.8032838   63.1260264
## r_Subject[332,Intercept]       20.20959305 13.5998450   -5.4910845   47.9922043
## r_Subject[333,Intercept]       27.72334795 13.9055340    1.5815760   56.1419250
## r_Subject[334,Intercept]        4.64017121 13.1933789  -21.4109831   30.6578299
## r_Subject[335,Intercept]        9.56489008 13.8839944  -16.3226259   38.3120676
## r_Subject[337,Intercept]       46.60115419 15.3352789   17.2851656   77.5570956
## r_Subject[349,Intercept]      -12.78250086 13.3820396  -39.5900171   12.8785783
## r_Subject[350,Intercept]       -0.11296768 14.0982968  -28.5151036   26.8421871
## r_Subject[351,Intercept]       15.34109307 13.5844308  -10.2295996   43.0199286
## r_Subject[352,Intercept]       32.37207972 14.3546134    4.8151077   61.4116590
## r_Subject[369,Intercept]       14.51141493 13.3003108  -10.7294086   41.3092781
## r_Subject[370,Intercept]      -12.71361974 14.2079634  -40.8523926   14.3709082
## r_Subject[371,Intercept]       12.12914157 13.4852652  -13.2667857   39.4712190
## r_Subject[372,Intercept]       23.66837993 13.8077445   -2.5903626   51.7349879
## r_Subject[308,Days]            12.05147466  2.9916331    6.3584944   18.1650253
## r_Subject[309,Days]            -5.75575412  2.6154859  -10.8210659   -0.5300509
## r_Subject[310,Days]            -2.64169534  2.6581066   -7.7218663    2.6796609
## r_Subject[330,Days]            -1.56713783  2.8601123   -7.3002713    3.9475746
## r_Subject[331,Days]             0.12293815  2.8372121   -5.5746878    5.6148105
## r_Subject[332,Days]             2.79602705  2.7396627   -2.5812868    8.2773687
## r_Subject[333,Days]             2.88326172  2.7770153   -2.5355969    8.3755288
## r_Subject[334,Days]             3.96490928  2.7055988   -1.1900657    9.4940235
## r_Subject[335,Days]            -7.55503368  2.7571197  -13.0992044   -2.2516458
## r_Subject[337,Days]            11.70182484  3.0296152    5.8274419   17.7135153
## r_Subject[349,Days]             3.95491020  2.7344139   -1.2046296    9.5169154
## r_Subject[350,Days]             9.35475386  2.8641573    3.9317936   15.2068611
## r_Subject[351,Days]             0.07586618  2.6909199   -5.2613742    5.3680401
## r_Subject[352,Days]             6.59639912  2.8622235    1.0297151   12.3160119
## r_Subject[369,Days]             3.89883766  2.6851889   -1.3632147    9.3034935
## r_Subject[370,Days]             7.56666216  2.8859773    2.1235336   13.4076425
## r_Subject[371,Days]             2.02486274  2.6906858   -3.2066005    7.3851143
## r_Subject[372,Days]             4.31315446  2.7774968   -1.0398744    9.8398022
## lprior                        -20.68604104  1.7540785  -24.8090560  -18.2052419
## lp__                         -904.59449951  6.6989875 -918.7736901 -892.7398096

Now we can see all estimates for slopes and intercepts in each group and population-wide. The first two lines, b_Intercept and b_Days represent values for the estimates $\hat\beta_0$ and $\hat\beta_1$, respectively. The next two lines represent the standard deviation of the subject-level intercept, $\hat\sigma_{C,0}$ and slope, $\hat\sigma_{C,1}$

The next line, cor_Subject__Intercept__Days is the estimated correlation between subject specific random effect and days - which is 0.24 or a very weak to weak positive association. This suggests that subject specific random intercept is weakly positively correlated with the independent variable days. Sigma represents $\hat\sigma_e$.

The next 36 lines show our model estimates for intercept and days for each individual subject, along with posterior standard deviation (Est.Error), 2.5%, and 97.5% credible intervals.

You can access the random effects of the model using the $ notation: summary(model2c)$random

Data and model estimates can be accessed and plotted by first pulling the subject specific random slopes and intercepts using coef() and adding them to a plot with abline().

# population level slop estimate
pop_slope <- coef(model2c)[["Subject"]][,,'Days'][,1]
m2c_coefs <- coef(model2c)[["Subject"]][,,'Intercept'][,1]

estimate_df <- data.frame(subject = factor(names(m2c_coefs)),
           slope = pop_slope,
           intercept = m2c_coefs)


# Plotting first 4 subjects
# Function to emulate first 4 default colors from ggplot package
gg_color_hue <- function(n) {
  hues = seq(15, 375, length = n + 1)
  hcl(h = hues, l = 65, c = 100)[1:n]
}

colors = gg_color_hue(4)

sleepstudy %>% 
    filter(Subject %in% c(308,309,310,330)) %>% # first 4 subjects
    ggplot(aes(x = Days, y = Reaction, color = Subject)) +
    geom_point() +
    # subject 308
    geom_abline(slope = estimate_df[1,]$slope, 
                intercept = estimate_df[1,]$intercept,
                color = colors[1]) +
    # subject 309
    geom_abline(slope = estimate_df[2,]$slope,
                intercept = estimate_df[2,]$intercept,
                color = colors[2]) +
    # subject 310
    geom_abline(slope = estimate_df[3,]$slope,
                intercept = estimate_df[3,]$intercept,
                color = colors[3]) +
    # subject 330
    geom_abline(slope = estimate_df[4,]$slope,
                intercept = estimate_df[4,]$intercept,
                color = colors[4]) +
  ggtitle("Sleep Study Model 2 - Correlated")

Model 2 - Uncorrelated

If you suspect the random effects are uncorrelated, it is possible to account for htis in the model call. We will build the model with random effects Days and Subject uncorrelated. This assumes that the subject specific random effect is uncorrelated with days. This is denoted by (Days || Subject) in the lmer and brm calls. As before, we will build a base model using lmer for prior estimation.

base_model3 <- lmer(Reaction ~ Days + (Days || Subject),data = sleepstudy)

b0 <- summary(base_model3)$coefficients[1,1]
b1 <- summary(base_model3)$coefficients[1,2]
se <- summary(base_model3)$sigma
te <- attr(summary(base_model3)$varcor$Subject, "stddev")[1] # subject specific intercept 
be <- attr(summary(base_model3)$varcor$Subject, "stddev")[2]


model2u <- brm(Reaction ~ Days + (Days || Subject),
             data = sleepstudy,
             prior = priors,
             chains = 4,
             iter = 10000,
             thin = 1, 
             cores = 4,
             family = gaussian()
             )

# posterior_summary(model2u)

summary(model2u)

##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: Reaction ~ Days + (Days || Subject) 
##    Data: sleepstudy (Number of observations: 180) 
##   Draws: 4 chains, each with iter = 10000; warmup = 5000; thin = 1;
##          total post-warmup draws = 20000
## 
## Group-Level Effects: 
## ~Subject (Number of levels: 18) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)    28.27      6.76    17.20    43.67 1.00    10215    13200
## sd(Days)          6.86      1.50     4.43    10.31 1.00     8304    11827
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept   239.91      7.84   223.24   254.17 1.00     9387    13147
## Days          7.71      1.76     4.02    10.91 1.00     7023    10883
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma    25.73      1.53    22.95    28.92 1.00    24662    15760
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

The output of the summary function is similar to that of the correlated model, with the exception of showing the correlation between the random intercept and days as we are not allowing for correlation with this model. You can use the posterior_summary() function to see estimates and credible intervals for each parameter.

Model Comparison

There is an additional function add_criterion() that allows a model fit criteria to be added to a model object. There is no criteria added to the model build function by default. Instead, a separate call is needed after the model is built.

Criteria choices include:

loo: Leave one out cross-validation splits the data into training and testing sets, with the testing set consisting of only one observation. During each iteration, the model is trained and a prediction is made on the left out observation. (Bürkner, 2017)

kfold: K-fold cross validation refits the model K times, leaving one/Kth of the original data out. If K is set to the total number of observations in the data, then K-fold is identical to loo. (Bürkner, 2017)

waic: Widely Applicable Information Criterion is a generalization of the Akaike Information Criterion (AIC). It differs in that AIC is calculated using a point estimate, usually the maximum likelihood estimate, whereas WAIC is averaged over the posterior distribution. (Vehtari et al, 2013)

loo_subsample: An approximation of loo using subsampling for efficiency - especially useful for large datasets. (Magnusson, et al. 2019)

bayes_r2: Bayesian $r^2$ solves the problem with typical $r^2$ potentially having a numerator that is larger than the denominator. Classical $R^2 = \frac{V_{n=1}^N \hat y_n } {V_{n=1}^Ny_n}$ - or fitted variance over total variance. With strong prior information and weak observed data it is possible for the fitted variance to be larger than total variance. Bayesian $R_s^2=\frac {V_{n=1}^N y_n^{pred s}}{V_{n=1}^N y_n^{pred s} + var_{res}^s}$, or explained variance over explained variance plus residual variance. For a linear model, $y_n^{pred s}$ is simply the linear predictor ($X_n\beta$). For a generalized linear model, $y_n^{pred s}$ is the transformed linear predictor, with transformation dependent on which family is used. (Vehtari, et al. 2018)

marglik: This criterion uses log marginal likelihood for model comparison. This is also referred to as the model evidence and uses the Bayes factor to determine which model is best. This is computationally expensive due to the additional posterior draws needed to compute the likelihood. To use this criterion, you must set save_pars = save_pars(all=TRUE) in the original brm function call to save all parameters. The brms package implements bridge sampling to calculate the marginal likelihood. (Bürkner, 2017)

Here, we will create a function that accepts a model object as an argument and will return the model with several criteria added.

# Function to add comparison criteria to a model
update_model <- function(model){
  model_crit <- add_criterion(model, c("waic", "loo", "bayes_R2"))
  model_crit$comparison <- data.frame(
      model_name = deparse(substitute(model)),
      waic = model_crit$criteria$waic$waic,
      loo = model_crit$criteria$loo$loo,
      bayes_R2 = mean(model_crit$criteria$bayes_R2)
    )
  return(model_crit)
}

# Add critera to each model
model1 <- update_model(model1)
model2c <- update_model(model2c)
model2u <- update_model(model2u)

# Build data frame with model criteria 
df <- rbind(model1$comparison, 
        model2c$comparison,
        model2u$comparison)

library(knitr)
knitr::kable(df)

model_name	waic	loo	bayes_R2
model1	1769.434	1770.096	0.7010747
model2c	1720.075	1723.630	0.7924409
model2u	1719.045	1722.365	0.7947852

For both waic and loo we are looking for the model with the lowest number. In each case, waic is lower than loo and model 2 - uncorrelated performs the best.

For bayes r2 we are looking for the highest number which would represent the most variance explained by the model. This criterion also chooses model 2 - uncorrelated as the best performing model.

Ultimately, you should check and compare several to make an informed decision on which model to proceed with in any given problem.

Nonlinear Mixed Models

We will utilize the brms package to approach a nonlinear problem with a Bayesian mixed model.

Introduction to the dataset

The lme4 package offers another dataset, “Contagious Bovine Pleuropneumonia” (CBPP) that is accessible using data(cbpp). This dataset describes the serological incidence of CBPP in zebu cattle during a survey implemented in 15 commercial herds in Ethiopia. The data contains 4 colums: 1. herd - a factor identifying the herd 2. incidence - the number of new serological cases for a given herd and time period 3. size - the number describing the size of the herd at the beginning of a given time period 4. period - a factors with levels 1 to 4.

head(cbpp)

##   herd incidence size period
## 1    1         2   14      1
## 2    1         3   12      2
## 3    1         4    9      3
## 4    1         0    5      4
## 5    2         3   22      1
## 6    2         1   18      2

First we’ll plot the data points, colored by herd:

It is a little difficult to see which points belong to which herd. We can add a line to make it a little more clear:

cbpp %>% 
  ggplot(aes(x = period, y = (size-incidence)/size, group=herd)) + 
  geom_point(aes(col=herd)) +
  geom_line(aes(col=herd)) +
  ylab("Cattle Without Disease (%)") + 
  xlab("Period")

Model

We will fit a generalized linear mixed model with family “binomial”. This model will have a random herd specific intercept and random period specific intercept. As before, we will build a model using the lmer package first to obtain priors.

\[ln(\frac{Pr(healthy_i=1)}{Pr(healthy_i=0)})=\beta_0 + \beta_h + \beta_p\] where
* Term $\beta_0$ is the population intercept
* Term $\beta_h$ is is the random herd-level intercept for herd $h$
* Term $\beta_p$ is the random period-level intercept for period $p$

The herd level random effects are assumed to come from an iid normal distribution with mean 0 and some shared, herd-level variance, $\beta_h \sim N(0,\sigma^2_h)$ , $h \in {1,...,15}$. Period-level random intercepts are assumed iid normal, mean 0 and shared period-level variance, $\beta_p \sim N(0,\sigma^2_p)$, $p\in {1,2,3,4}$.

Priors

There is a useful function in the brms package that gives information on all parameters that may have priors assigned, including the default priors that the package will use if unspecified.

Calling the get_prior() function with the formula, family, and data specified shows this information:

# Get priors for our model
get_prior(cbind(incidence, size-incidence) ~ (1 | period) + (1 | herd), 
      family = binomial(link="logit"),
      data = cbpp)

##                 prior     class      coef  group resp dpar nlpar lb ub
##  student_t(3, 0, 2.5) Intercept                                       
##  student_t(3, 0, 2.5)        sd                                   0   
##  student_t(3, 0, 2.5)        sd             herd                  0   
##  student_t(3, 0, 2.5)        sd Intercept   herd                  0   
##  student_t(3, 0, 2.5)        sd           period                  0   
##  student_t(3, 0, 2.5)        sd Intercept period                  0   
##        source
##       default
##       default
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)

Fitting a Model

As before, we will build a model first using the lmer package to obtain priors for a brm model.

nlin_base_model <- glmer(cbind(incidence, size-incidence) ~ (1 | period) + (1 | herd), 
      family = binomial(link="logit"),
      data = cbpp)

s <- summary(nlin_base_model)

coef(nlin_base_model)

There is a slight change in notation in the brm call here since we are using a generalized binomial model. The left side of the model reads size-incidence | trials(size), which notes the number of successes on the left (herd size - incidence) and the size of each herd inside of trials() on the right.

priors <- c(prior(normal(-2.3, 10), class = Intercept), # pop slope
            prior(normal(0, 1), class = sd), # residual error 
            prior(normal(0, 0.71), class = sd, group = herd, coef = Intercept), # herd-level intercept sd
            prior(normal(0, 0.54), class = sd, group = period, coef = Intercept)) # period-level intercept sd

nonlin_bayes <- brm(size-incidence | trials(size) ~ (1 | period) + (1 | herd), 
      family = binomial(link="logit"),
      data = cbpp,
      prior = priors,
      chains = 4,
      iter = 10000,
      thin = 1,
      cores = 4
    )

## Compiling Stan program...

## Start sampling

Model summary:

##  Family: binomial 
##   Links: mu = logit 
## Formula: size - incidence | trials(size) ~ (1 | period) + (1 | herd) 
##    Data: cbpp (Number of observations: 56) 
##   Draws: 4 chains, each with iter = 10000; warmup = 5000; thin = 1;
##          total post-warmup draws = 20000
## 
## Group-Level Effects: 
## ~herd (Number of levels: 15) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.71      0.19     0.39     1.14 1.00     7697    11858
## 
## ~period (Number of levels: 4) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.64      0.23     0.28     1.18 1.00    10440    12095
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     2.27      0.41     1.49     3.12 1.00     6427     8390
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Posterior summary:

##                            Estimate Est.Error          Q2.5         Q97.5
## b_Intercept            2.274503e+00 0.4143059    1.48759506    3.11625704
## sd_herd__Intercept     7.115428e-01 0.1928406    0.38870621    1.14432076
## sd_period__Intercept   6.350311e-01 0.2330193    0.28251741    1.18012151
## r_herd[1,Intercept]   -5.812935e-01 0.4070634   -1.40153974    0.20197158
## r_herd[2,Intercept]    3.256277e-01 0.4111733   -0.44852618    1.17891784
## r_herd[3,Intercept]   -3.785111e-01 0.3630653   -1.09225452    0.33506609
## r_herd[4,Intercept]    1.504352e-03 0.4549975   -0.86122768    0.92925592
## r_herd[5,Intercept]    2.251802e-01 0.3987275   -0.53222248    1.03662025
## r_herd[6,Intercept]    4.671788e-01 0.4316371   -0.32882567    1.38321888
## r_herd[7,Intercept]   -8.918146e-01 0.4024964   -1.70155695   -0.12710418
## r_herd[8,Intercept]   -6.555817e-01 0.3971951   -1.46153787    0.09210059
## r_herd[9,Intercept]    3.049577e-01 0.5062790   -0.62123157    1.36411760
## r_herd[10,Intercept]   6.055707e-01 0.4254937   -0.16062991    1.49808996
## r_herd[11,Intercept]   1.209941e-01 0.3614664   -0.57863749    0.85064713
## r_herd[12,Intercept]   1.018909e-01 0.4816100   -0.81083066    1.09201060
## r_herd[13,Intercept]   7.746042e-01 0.4544491   -0.03209427    1.73143867
## r_herd[14,Intercept]  -1.016214e+00 0.4391320   -1.89588075   -0.17576095
## r_herd[15,Intercept]   6.018014e-01 0.4543928   -0.21424951    1.56930878
## r_period[1,Intercept] -7.926089e-01 0.3899514   -1.61442195   -0.06409133
## r_period[2,Intercept]  1.086725e-01 0.3926570   -0.67920041    0.90561120
## r_period[3,Intercept]  2.173730e-01 0.4014973   -0.57152776    1.03369687
## r_period[4,Intercept]  5.145987e-01 0.4287157   -0.26529105    1.42327462
## lprior                -4.143545e+00 0.6685041   -5.83443897   -3.30195107
## lp__                  -1.175746e+02 4.2232312 -126.71224081 -110.39124690

References

Bürkner, P.-C. (2017). brms: An R Package for Bayesian Multilevel Models Using Stan. Journal of Statistical Software, 80(1), 1–28. https://doi.org/10.18637/jss.v080.i01

Vehtari A., Gelman A., Goodrich B., Gabry J. (2019) Bayesian R2 and LOO-R2 (https://avehtari.github.io/bayes_R2/bayes_R2.html)

Vehtari A., Gelman A., Goodrich B., Gabry J. (2018) R-squared for Bayesian regression models (http://www.stat.columbia.edu/~gelman/research/unpublished/bayes_R2.pdf)

Vehtari et al (2013). Understanding predictive information criteria for Bayesian models. http://www.stat.columbia.edu/~gelman/research/published/waic_understand3.pdf

Vehtari A., Gelman A., Simpson D., Carpenter B., and Bürkner P. (2021). Rank-normalization, folding, and localization: An improved R-hat for assessing convergence of MCMC (with discussion). Bayesian Data Analysis. (https://projecteuclid.org/journals/bayesian-analysis/advance-publication/Rank-Normalization-Folding-and-Localization--An-Improved-R%CB%86-for/10.1214/20-BA1221.full)

Magnusson M., Vehtari A., Jonasson J., Anderson M. - Bayesian leave-one-out cross-validation for large data (2019) (http://proceedings.mlr.press/v97/magnusson19a/magnusson19a.pdf)

Gregory Belenky, Nancy J. Wesensten, David R. Thorne, Maria L. Thomas, Helen C. Sing, Daniel P. Redmond, Michael B. Russo and Thomas J. Balkin (2003). [[Sleepstudy data]]

Bayesian Mixed Models

Casey Moroney

6/5/2022

Fixed vs Random Effects

Mixed Models

Bayesian Methods

Bayes Theorem

Linear Mixed Models

Introduction to the dataset

Model 1

Priors

BRMS package

Sampler

The brm() function

Default arguments for brm()

Model Summary

Group-Level Effects

Population-Level Effects

Family Specific Parameters:

Posterior Summary

Plots

Plotting Model Estimates

Model 2

Model 2 - Correlated

Model 2 - Uncorrelated

Model Comparison

Nonlinear Mixed Models

Introduction to the dataset

Model

Priors

Fitting a Model

References