13 Multilevel modelling

13.1 Multilevel regression analysis

Many research topics have multilevel structured data which consist of multiple macro and micro units within each macro unit (e.g. individuals within countries, individuals within occupations, children within classes within schools, etc). Therefore, at each level there are both mean characteristics (fixed effects) and differences (random effects).

Single level regression modelling is not the appropriate method to use here. For instance, the effect of an X within clusters can be different from the effect between clusters. Indeed, single level regression does not consider the nested data structure, which may violate the uncorrelated errors assumption. It is likely that in a “rich” country all people have higher wages than in “poor” countries? If errors are correlated, this is likely to cause the following problems:

efficiency of estimators is low
standard errors are low
coefficients are often significant

It will also be unclear:

how much variation in Y is at each level?
how much the context impacts on individual level after controlling for other relevant factors?

The purpose of multilevel modelling (MLM) is to correct for biased estimates resulting from clustering and to provide correct standard errors, confidence intervals, as well as significance tests. It will thus decompose the total variance of Y into portions associated with each level (e.g. individual vs country).

We might want to know: How much variation is there in individual wages between and within countries? Which countries have particularly low and high wages ?

13.1.1 Fixed versus random effects

In order to understand MLM we need to understand random effects, and to understand random effects we need to understand variance. We often need to think more about where the variance in our system is showing up in our model. It allows us to decompose the variance of the dependent variable into:

within-context variance
between-context variance

So far, we are familiar with the residual variance from OLS, but might that residual variance be better attributed to within a group? Or between a group?

Recall that a factor is a categorical predictor that has two or more levels. Up to this point (e.g. in ANOVAs) we have only talked about fixed factors which assumes that the levels are separate, independent, and not similar. Fixed effects estimate separate levels with no relationship assumed between the levels. Fixed effects also assume a common variance known as homoscedasticity. Post-hoc adjustments are needed to do pairwise comparisons of the different factor levels. Random effects means that each level can be thought of as a random variable from an underlying process or distribution. Estimation of random effects provides inference about the specific levels (similar to a fixed effect), but also population level information (think about it as if each level of the effect is a draw from a random variable).

13.1.2 Example

Let’s image that we have 10 people (10 levels in our model) in a study about time spent to read the news on a daily basis over a 5-day period. Each day, we ask people to report how much time they spent reading the news (n=10*5=50) so that each person has 5 observations.

A fixed effect model enables us to estimate the means of the 10 individuals and assumes that each of the individuals has a common variance around their news reading time (variance is the same or similar for everyone in the study).
A random effect model enables us to estimate the mean and variance of the participants and to make a reasonable prediction about others that were not enrolled in the study (amount of time spent to read the news within a given individual is much likely to be similar than compared to someone else).

13.1.3 When not to use random effects

You might not want to use random effects when the number of factor levels is very low (there is not definitive recommendation here). Furthermore, it is commonly reported that you may want five or more factor levels for a random effect in order to really benefit from what the random effect can do. Note that a group with a large sample size and/or strong information (e.g. a strong relationship) will have very little influence of the grand mean and largely reflect the information contained entirely within the group.

Another case in which you may not want to random effect is when you do not assume that your factor levels come from a common distribution.

We already know about the OLS model (labelled “fixed effects” in the figure below). The next figure displays different types of random effects:

Source: https://bookdown.org/steve_midway/DAR/random-effects.html

13.2 Fixed effect regression approach

One might account for the nested structured of the data by including a “grouping variable” for individuals (dummy for each country or year) or inclusion of contextual explanatory variables (e.g. gender equality index per country).

\[y_{ij}=\beta_0 + \beta_1*X_{ij} + \sum^{}_{C-1}(\beta_C*CD_j + \epsilon_{ij})\] where, \(\beta_0\) is the overall intercept (e.g. reference country or year) and \(\beta_C\) is the intercept for one country. The problem is that grouping parameters are treated as fixed effects ignoring random variability associated with macro-level characteristics. This “dummy variable approach” thus suggests that group differences are fixed effects. What if number of groups is very large? What about including group-level predictors as all degrees of freedom at group-level have been consumed by country dummy?

13.3 Random intercept model (with fixed slope)

The next model we will examine is the MLM with a random intercept and fixed slope.

13.3.1 Null model

In its simplest form, a MLM is in the form of a “null model” which contains no explanatory variables. It thus includes one regression constant (intercept) and assumes that this varies across contexts. The intercept is a random variable.

So far, we are able to answer questions like: Are there countries differences with respect to happiness? How much of this variation is due to these country differences?

13.3.2 Interclass Correlation Coefficient (ICC)

A MLM with a random intercept is a point where we can pause and run a diagnostic called the Interclass Correlation Coefficient (ICC).

\[ICC = \frac{\sigma^2_\alpha}{\sigma^2+\sigma^2_\alpha}\] The ICC tells us how much similarity is within contexts (i.e. countries). Basically, it accounts for the closeness of observations in same context relative to closeness of observations in different contexts.

For instance, it can tell us how much % of variance in Y (e.g. happiness) can be explained by context (belonging to a different country).

The ICC ranges from 0 (no clustering/single level data structure) and 1 (maximum clustering). In practice, 0 or 1 rarely occur. A general recommendation is that if the ICC is small, then use of a single level model.

13.3.3 Random intercept model (with fixed slope) including one predictor

At this stage, it is unclear which factors account for the variation. Are there more reasons why individuals and countries might differ with respect to happiness?

Random intercept model is a combination of variance component and a linear model (for a person \(i\) in country \(j\)). We thus have a fixed part (\(\beta_0 + \beta_x*X_{ij}\)), where the estimated parameters are the beta coefficients and are the same for each observation in the sample. We also have a random part (\(u_{0j} + \epsilon_{ij}\)), where estimated parameters are variances which are allowed to vary (e.g. across countries).

\[y_{ij}=\beta_0 + \beta_x*X_{ij} + u_{0j} + \epsilon_{ij}\] In this model, there are two random terms and therefore two types of residuals: \(u_j\) a the level-2 and \(e_{ij}\) at the level-1. There are also an overall (average) line \(β_0\) and group (average) lines \(β_0+u_j\).

So far, we are able to answer questions as: Do country differences in happiness remain after controlling for gender? How much of variation in happiness is due to country differences after controlling for gender? What is the relationship between an individual’s happiness and their gender?

13.3.4 Model assumptions

The model assumptions are that individual and group-level residuals are normally distributed. Furthermore, residuals at the same/different level are uncorrelated.

13.4 Random slope model (with random intercept)

The assumption of random intercept model suggests that group lines have all same slope as overall regression line in every group (effect of X on Y is the same for every country). Is this always valid? It can be argued that X is not a fixed but a random effect and that its slope can vary across groups.

We can account for the random slope by adding random term to coefficient of \(x_{1ij}\) (e.g. working hours), so it can be different for each group (e.g. country):

\[ y_{ij} = \beta_0 + (\beta_1 + u_{1j})*x_{1ij} + u_{0j} + \epsilon_{0ij} \] Note that one extra parameter \(u_{1ij}\) leads to two extra parameters \(\sigma^2_{u1}\) (variance in slopes between groups) and \(\sigma^2_{u01}\) (covariance between intercepts and slopes).

With a random slope model, we could now answer questions as: Does the effect of being employed has the same impact on happiness across countries?

Useful reference:

Sommet, N. & Morselli, D. (2021). Keep Calm and Learn Multilevel Linear Modeling: A Three-Step Procedure Using SPSS, Stata, R, and Mplus. (International Review of Social Psychology), 34(1). Available here.

13.4.1 Note on fixed intercept with random slope model

It might be the case that a model requires a fixed intercept but a random slope specification. This could be the case when we hypothesize that the effect of our predictor differs among groups, but when we want to fix the intercept because we know that all groups start in a similar place.

13.4.2 Testing the random part

The likelihood ratio test (LRtest) can be used to compare the random slope model to a random intercept model. The null hypothesis states that \(\sigma^2_{u1}\) and \(\sigma^2_{u0}\) should equal 0. If this is true, the random intercept model is more appropriate than a random slope model. If random intercept/slope coefficients are not significantly different from zero, this suggests that there is not much random variability in the slope/intercept. Therefore, there is no need to specify random parameter.

13.5 Cross-level interactions

Adding cross-level interactions allows us to assess whether contextual factors (Z) influence the effect of level-1 X variables. For instance, we can answer questions such as: Is the effect of gender on happiness stronger/weaker in countries with a high gender equality index?

\[ y_{ij} = \beta_0 + \beta_1*x_{ij} + \beta_2*Z_j + \beta_3x_{ij}*Z_j + u_{0j} + u_{1j}x_{ij} + \epsilon_{ij} \] The cross-level interaction should be included in the fixed part of model. Therefore, the direct main and interaction effects have to be interpreted together (similar to OLS): the \(\beta_3\) coefficient indicates the impact of each unit change in Z on the slope \(\beta_1\).

Useful links to external data sources to link at the macro-level are:

Eurostat: http://epp.eurostat.ec.europa.eu/portal/page/portal/eurostat/home/
OECD: http://www.oecd.org/home
Worldbank: http://data.worldbank.org/
ILO: http://www.ilo.org/global/statistics-and-databases/lang--en/index.htm
Databanks: http://www.gapminder.org/ and http://www.statsilk.com/

13.6 Variance decomposition approach

To assess how much variance is explained by a model, simple OLS regression entail the \(R^2\) statistic (proportion of explained variance). In MLM, there are several several (co)variances.

Hox (2010, pp.70) proposes to examine residual error variance in a sequence of models. This approach suggests examining the residual error variances in a sequence of models:

intercept-only model (since there are no explanatory variables in the model, it is reasonable to interpret variances as the error variances)
model including the level-1 predictors
model including the level-2 predictors
model with random coefficient
model with cross-level interaction

Other useful references:

Clarke, P., Crawford, C., Steele, F. & Vignoles, A. (2010). The choice between fixed and random effects models: some considerations for educational research. Institute of Education DoQSS, Working Paper No. 10. Available here.
Moehring, K. (2012). The fixed effect as an alternative to multilevel analysis for cross-national analyses. GK Soclife working paper. Available here

13.7 Centering and standardizing

The regression intercept equals the expected value of Y if all X are 0. However, what if 0 has no useful meaning or is not possible? (e.g. age of 0). In this case, it is useful to transform the X variables, for instance by centering them. There are two options:

Grand mean centering: computing variables as deviations from the overall mean (the constant in model reflects the mean of all cases)
Group mean centering: computing variables as deviation from the group mean (decomposes within vs. between effects)

13.8 Concluding remarks

Ultimately the application of MLM with fixed or random effects is done to capture more realism in the phenomenon we are seeking to describe with our model. While all models are to some degree incorrect, inaccurate estimation or pooling dissimilar information may extend the degree to which the model is inaccurate or misleading.

13.9 How it works in R?

See the lecture slides on MLM:

You can also download the PDF of the slides here:

13.10 Time to practice on your own

In the following example, we would like to assess cross-national differences in the gender gap in working time.

First thing you want is to download the data from the European Social Survey and select the variables: country information, working time, gender and additional explanatory variables (e.g. level of education, having children, etc). Then, you want to know how the variables are coded and to apply the necessary changes. You can also filter outliers (e.g. very high working hours).

Show the code

library(foreign)
db <- read.spss(file=paste0(getwd(),"/data/ESS10.sav"), 
                use.value.labels = T, 
                to.data.frame = T)
sel <- db |>
  dplyr::select(cntry,wkhct,gndr,eduyrs,chldhhe) |>
  stats::na.omit() 
# drop levels
sel$cntry <- droplevels(sel$cntry)
sel <- sel[complete.cases(sel),]
# recodings
sel$gndr <- ifelse(sel$gndr=="Female",1,0)
sel$wkhct <- as.numeric(sel$wkhct)
sel$eduyrs <- as.numeric(sel$eduyrs)
sel$cntry <- as.factor(sel$cntry)
# filtering
sel <- sel[sel$wkhct<60,]
hist(sel$wkhct)

Second, you start by conducting a fixed effects model.

Show the code

dummy_model <- lm(wkhct ~
                    gndr +
                    cntry, 
                  data=sel)
summary(dummy_model)
## 
## Call:
## lm(formula = wkhct ~ gndr + cntry, data = sel)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -40.958  -0.958   1.287   3.350  24.145 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          41.89245    0.22431 186.762  < 2e-16 ***
## gndr                 -2.31631    0.13146 -17.620  < 2e-16 ***
## cntrySwitzerland     -3.28260    0.35072  -9.360  < 2e-16 ***
## cntryCzechia         -0.56063    0.32625  -1.718 0.085746 .  
## cntryEstonia         -1.55739    0.34830  -4.471 7.83e-06 ***
## cntryFinland         -4.52794    0.32602 -13.889  < 2e-16 ***
## cntryFrance          -4.72122    0.33860 -13.943  < 2e-16 ***
## cntryGreece           1.15618    0.33480   3.453 0.000555 ***
## cntryCroatia          0.13675    0.37672   0.363 0.716614    
## cntryHungary          0.49649    0.33865   1.466 0.142649    
## cntryIceland         -5.13347    0.45526 -11.276  < 2e-16 ***
## cntryItaly           -2.60405    0.33088  -7.870 3.78e-15 ***
## cntryLithuania       -0.99921    0.34479  -2.898 0.003761 ** 
## cntryMontenegro       0.06515    0.90505   0.072 0.942617    
## cntryNorth Macedonia  0.25775    0.41787   0.617 0.537364    
## cntryNetherlands     -8.17255    0.36731 -22.250  < 2e-16 ***
## cntryNorway          -6.24267    0.37639 -16.586  < 2e-16 ***
## cntryPortugal        -1.20709    0.35048  -3.444 0.000575 ***
## cntrySlovenia        -1.14729    0.39832  -2.880 0.003978 ** 
## cntrySlovakia        -0.14967    0.37599  -0.398 0.690593    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.059 on 15183 degrees of freedom
## Multiple R-squared:  0.1014, Adjusted R-squared:  0.1002 
## F-statistic: 90.14 on 19 and 15183 DF,  p-value: < 2.2e-16

Interpretation

From the output, we see that gender has a negative effect on working hours (women work on average less hours than men). We also note some country differences. The R-squared is 0.09, which means that the model explains ~9% of the overall variance.

Third, you conduct a “null” model with only working hours. This allows you to assess how much variation in working hours can be attributed to individual differences and country differences.

Show the code

empty_model <- lme4::lmer(wkhct ~ 1 + (1 | cntry), 
                  data=sel)
stargazer::stargazer(empty_model, type="text", single.row = T)
## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                                wkhct           
## -----------------------------------------------
## Constant                 38.691*** (0.599)     
## -----------------------------------------------
## Observations                  15,203           
## Log Likelihood              -53,490.590        
## Akaike Inf. Crit.           106,987.200        
## Bayesian Inf. Crit.         107,010.100        
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01
# ICC
performance::icc(empty_model)
## # Intraclass Correlation Coefficient
## 
##     Adjusted ICC: 0.092
##   Unadjusted ICC: 0.092

Interpretation

The ICC can be interpreted as the proportion of the variance explained by the grouping structure in the population. Here, the ICC tells us that ~8% of the variance can be attributed to country differences. The average expected working hours for a person is ~39 hrs across all countries.

Fourth, you can conduct a random intercept model with gender, and compare it with a random intercept with individual level variables (having children, years of education, etc.).

Show the code

# gender only 
random <- lme4::lmer(wkhct ~ gndr + (1 | cntry), 
                  data=sel)
stargazer::stargazer(random, type="text", single.row = T)
## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                                wkhct           
## -----------------------------------------------
## gndr                     -2.315*** (0.131)     
## Constant                 39.874*** (0.606)     
## -----------------------------------------------
## Observations                  15,203           
## Log Likelihood              -53,338.130        
## Akaike Inf. Crit.           106,684.300        
## Bayesian Inf. Crit.         106,714.800        
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01
# add more variables
random2 <- lme4::lmer(wkhct ~ gndr + eduyrs + chldhhe + (1 | cntry), 
                  data=sel)
stargazer::stargazer(random2, type="text", single.row = T)
## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                                wkhct           
## -----------------------------------------------
## gndr                     -2.382*** (0.132)     
## eduyrs                   -0.061*** (0.018)     
## chldhheNo                -1.013*** (0.137)     
## Constant                 41.221*** (0.643)     
## -----------------------------------------------
## Observations                  15,203           
## Log Likelihood              -53,302.970        
## Akaike Inf. Crit.           106,617.900        
## Bayesian Inf. Crit.         106,663.700        
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

You can also assess the variance between countries:

Show the code

as.data.frame(lme4::VarCorr(random))
##        grp        var1 var2      vcov    sdcor
## 1    cntry (Intercept) <NA>  6.778696 2.603593
## 2 Residual        <NA> <NA> 64.944980 8.058845

Interpretation

The average working hours is ~40 hrs across all countries. It varies by a score of ~7 between countries.

You can also test whether it makes sense to rely on a random intercept instead of the fixed effects model.

Show the code

anova(random, dummy_model)
## refitting model(s) with ML (instead of REML)
## Data: sel
## Models:
## random: wkhct ~ gndr + (1 | cntry)
## dummy_model: wkhct ~ gndr + cntry
##             npar    AIC    BIC logLik deviance  Chisq Df Pr(>Chisq)    
## random         4 106683 106713 -53337   106675                         
## dummy_model   21 106617 106777 -53287   106575 100.25 17      8e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpretation

The comparison of both models using the anova() function tells us that the random intercept model is better than the fixed effects model to explain working hours.

Fifth, we can conduct a random slope model with gender:

Show the code

randomslope <- lme4::lmer(wkhct ~ 1 + gndr + 
                            (1 + gndr| cntry), 
                          data=sel)
stargazer::stargazer(randomslope, type="text", single.row = T)
## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                                wkhct           
## -----------------------------------------------
## gndr                     -2.317*** (0.467)     
## Constant                 39.849*** (0.443)     
## -----------------------------------------------
## Observations                  15,203           
## Log Likelihood              -53,256.100        
## Akaike Inf. Crit.           106,524.200        
## Bayesian Inf. Crit.         106,570.000        
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

You can also assess the variance between countries:

Show the code

as.data.frame(lme4::VarCorr(randomslope))
##        grp        var1 var2      vcov     sdcor
## 1    cntry (Intercept) <NA>  3.532110 1.8793908
## 2    cntry        gndr <NA>  3.748614 1.9361337
## 3    cntry (Intercept) gndr  2.320980 0.6378506
## 4 Residual        <NA> <NA> 64.133868 8.0083624

Interpretation

Average working hours for a person is ~40 hrs across all countries (varies by a score of ~3.5 between countries) Gender decreases the working hours on average by ~2.3 (varies by a score of ~3.7 between countries). Furthermore, the correlation between the intercept and slope is positive (~2.3).

You can also test whether it makes sense to rely on a random slope instead of a fixed slope model.

Show the code

# anova(randomslope, random)

Finally, you can add a cross-level interaction using the information provided by the Gender equality Index.

Show the code

sel$cntry = droplevels(sel$cntry)
sel$cntry = as.numeric(sel$cntry)
sel$GEI = NA
sel$GEI[sel$cntry==1] = 58.8
sel$GEI[sel$cntry==2] = NA
sel$GEI[sel$cntry==3] = 55.7
sel$GEI[sel$cntry==4] = 59.8
sel$GEI[sel$cntry==5] = 73.4
sel$GEI[sel$cntry==6] = 74.6
sel$GEI[sel$cntry==7] = 51.2
sel$GEI[sel$cntry==8] = NA
sel$GEI[sel$cntry==9] = 55.6
sel$GEI[sel$cntry==10] = NA
sel$GEI[sel$cntry==11] = 63
sel$GEI[sel$cntry==12] = 55.5
sel$GEI[sel$cntry==13] = NA
sel$GEI[sel$cntry==14] = NA
sel$GEI[sel$cntry==15] = 72.1
sel$GEI[sel$cntry==16] = NA
sel$GEI[sel$cntry==17] = 59.9
sel$GEI[sel$cntry==18] = 68.3
sel$GEI[sel$cntry==19] = 54.1
# cross level interaction term
# random slope with cross-level interaction
randomslope2 <- lme4::lmer(wkhct ~ 1 + gndr + GEI +
                             gndr*GEI +
                            (1 + gndr| cntry), 
                          data=sel)
stargazer::stargazer(randomslope2, type="text", single.row = T)
## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                                wkhct           
## -----------------------------------------------
## gndr                      6.325** (3.095)      
## GEI                      -0.208*** (0.035)     
## gndr:GEI                 -0.135*** (0.050)     
## Constant                 52.664*** (2.164)     
## -----------------------------------------------
## Observations                  12,020           
## Log Likelihood              -41,711.130        
## Akaike Inf. Crit.           83,438.260         
## Bayesian Inf. Crit.         83,497.420         
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

Interpretation

The GEI has a negative impact on working hours. Furthermore, the cross-level interaction of between GEI and gender has also a negative impact.