Learn what longitudinal data analysis is and how to run the analysis in R
A panel is usually denoted by having multiple entries (rows) for the same entity (e.g. respondent, company, etc) in a dataset.
The multiple entries are due to different time periods at which the entity was observed.
OLS can be used to pool observations of the same entity recorded at different time points. However, observations of the same entity are then treated as if they originate from other entities.
Important influences like serial correlation of observations within the same entity cannot be considered, leading to biased estimates.
Fixed effect models (including time dummy variables) are sometimes applied to remove omitted variable bias. By estimating changes within a specific group (over time) all time-invariant differences between entities are controlled for.
The model removes characteristics that do not change over time, leading to unbiased estimates of the remaining independent variables on the dependent variable.
Hence, if unobserved characteristics do not change over time, each change in the dependent variable must be due to influences not related to the fixed effects, which are controlled for.
Note that the influence of time-invariant independent variables on the dependent variable cannot be examined with a fixed effect model.
Random effect models (with random intercept and/or slope) assume that any variation between entities is random and not correlated with the independent variables used in the estimation model.
This also means that time-invariant variables (like a person’s gender) can be taken into account as independent variables.
The entity’s error term (unobserved heterogeneity) is hence not correlated with the independent variables.
The plm package offers empirical data around Grunfeld’s Investment which is a panel of 10 observational units (firms) from 1935 to 1954. In the next examples, we are interested in regressing investment (inv) on capital.
We can start by exploring the panel data.
We can check the heterogeneity across firms (with a 95% confidence interval around the mean):
The regular linear regression does not consider heterogeneity across groups or time:
In the fixed effect model (including dummy variables for all firms), the coefficient of capital indicates how much investment changes over time, on average per firm, when capital increases by one unit:
fe_model <- plm(inv ~ capital, data = Grunfeld,
index = c("firm", "year"),
effect = "individual",
model = "within")
stargazer::stargazer(fe_model, type="text",
single.row = T)
========================================
Dependent variable:
---------------------------
inv
----------------------------------------
capital 0.371*** (0.019)
----------------------------------------
Observations 200
R2 0.660
Adjusted R2 0.642
F Statistic 366.446*** (df = 1; 189)
========================================
Note: *p<0.1; **p<0.05; ***p<0.01
Each firm has its own intercept:
library(ggplot2)
fe_model_lm <- lm(inv ~ capital + factor(firm),
data = Grunfeld)
ggplot(data = broom::augment(fe_model_lm),
aes(x = capital, y = .fitted)) +
geom_point(aes(color = `factor(firm)`)) +
geom_line(aes(color = `factor(firm)`)) +
labs(x = "Stock",
y = "Fitted Values (inv ~ capital)",
color = "Firm")
The function pFtest() tests for fixed effects with the null hypothesis that pooled linear model is better than fixed effects. The results indicate that the null hypothesis is rejected in favor of the alternative that there are significant fixed effects:
In random effects model, the coefficient of capital indicates the average effect of capital over investment when capital changes across time and between firm by one unit:
re_model <- plm(inv ~ capital, data = Grunfeld,
index = c("firm", "year"),
effect = "individual", model = "random")
stargazer::stargazer(re_model, type="text", single.row = T)
========================================
Dependent variable:
---------------------------
inv
----------------------------------------
capital 0.372*** (0.019)
Constant 43.247 (51.411)
----------------------------------------
Observations 200
R2 0.652
Adjusted R2 0.650
F Statistic 371.149***
========================================
Note: *p<0.1; **p<0.05; ***p<0.01
A decision between a fixed and random effects model can be made with the Hausman test. The null hypothesis states that there is no such correlation (thus, one should prefer the random effect model). The alternative hypothesis is that a correlation exists (thus, one should go for the fixed effect model). The null hypothesis cannot be rejected here, hence we should use a RE model:
The Breusch-Pagan Lagrange multiplier Test further helps to decide between a random effects model and a simple linear regression. The null hypothesis states that the variance across entities is zero (thus, this means that there is no panel effect). The test shows that there are significant differences across firms:
We should further test for the presence of heteroskedasticity. Results show that there is strong evidence for the presence of heteroskedasticity, thereby advising for the use of robust standard errors:
For long time-series, a test for serial correlation of the residuals should be performed because serial correlation can lead to an underestimation of standard errors (too small) and an overestimation of \(R^2\) (too large). The null hypothesis states that there is no serial correlation. Results show evidence that the residuals are serially correlated:
To solve the issue of serial correlation, clustered standard errors have to be used. Clustered standard errors estimate the variance of the coefficient when independent variables are correlated within the entity.
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 43.246697 37.815768 1.1436 0.2542
capital 0.372120 0.065803 5.6551 5.389e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Multivariate statistics