6 Logistic regression

6.1 Logistic regression analyis

Logistic regression is a model with a binary dependent variable (e.g., 0 = not elected versus 1 = elected). In this case, we are interested in knowing the probability of a phenomenon (e.g., get elected, lose a job, becoming sick, etc) to occur:

\[Y(P=1)=a+bX+e\]

also referred as Linear Probability Model (LPM). However, the interpretation is complicated be that fact that applying LPM may return values outside the [0,1] interval (probabilities are necessary between 0 and 1!).

Example: An increase in political experience (e.g., number of years a candidate has joined his party) by 1 year increases/decreases the probability of being elected as a national representative by \(b*100\) percentage points.

Simple regression: professional consultant example

Let’s predict the probability that a candidate employs professional consultants explained by the available campaigning budget in a linear regression framework.

We can first look at the distribution of political candidates employing a consultant:

Show the code

library(foreign)
db <- read.spss(file=paste0(getwd(),
                "/data/1186_Selects2019_CandidateSurvey_Data_v1.1.0.sav"), 
                use.value.labels = F, 
                to.data.frame = T)
sel <- db |>
  dplyr::select(B11,B12,B6) |>
  stats::na.omit() |>
  dplyr::rename("consultant"="B11",
                "budget"="B12",
                "personalization"="B6") |>
  plyr::mutate(budget=as.numeric(as.character(as.character(budget)))) 
sel$consultant <- ifelse(sel$consultant==1,1,0)
# keep candidates with a budget<100'000
sel <- sel[sel$budget<100000,]
sel$budget1000 <- sel$budget/1000
# distribution
nb <- table(sel$consultant)
prop <- round(table(sel$consultant)/nrow(sel), 2)
prop <- cbind(nb,prop)
rownames(prop) <- c("without_consultant","with_consultant")

# show table
as.data.frame(prop)
##                      nb prop
## without_consultant 1712 0.92
## with_consultant     143 0.08

We see that ~8% of candidates employ a professional consultant, the probability to employ a consultant being calculated as follows: 143/(143+1712)=8%.

We can plot the relationship between hiring a consultant and the budget:

plot(sel$budget1000,sel$consultant)
abline(lm(consultant ~ budget1000, data=sel))

We see that the linear line is not ideal and does not fit the data. A dichotomous dependent variable does not display a normal distribution. It cannot have a linear relationship with an explanatory variable. Therefore, linear regression could make predictions smaller than 0 and bigger than 1, which is not possible!

Example: We may observe that by 2109, the word “sustainable” will be used only in English-language publications if the model’s tendency is accurate. Therefore, even though a trend is evident in this case, the linear model does a less than ideal job of capturing it. Linear regression, which forecasts results on a scale from negative infinity to infinity, is no longer logical given that these categories are nominal. Rather, we require a tool that allows us to make categorical predictions.

6.2 The problem with LPM

LPM poses several issues:

non-linearity
non-normal distribution of the errors (only two possible outcomes)
heteroscedasticity
difficulty in interpreting the results

Therefore, we need a non-linear model predicting probability of an event which remains in [0,1] bounds. The question is how to constrain all possible outcomes between 0 and 1.

Non-linear probability models are essential in social sciences which often deal with categorical dependent variables (e.g., binary, ordinal, nominal). There exists several models depending of the measurement level of the dependent variable. Logistic regression allows to predict models for categorical dependent variables in the simplest binary form and is part of the Generalized Linear Model (GLM) framework:

Binomial: binary variable
Multinomial: categorical variable
Gaussian: interval variable
Poisson: count data (discrete)
Gamma: >0 (non-discrete)

6.3 Differences between linear and logistic regressions

6.4 Odds ratio (OR)

OR is the relative chance of an event happening under two different conditions. In other words, it is the ratio of the odds of an event occurring in one group to the odds of it occurring in another group (relative odds).

Let’s take the following games:

Game 1: profit of 20, loss of 100, odds of 20/100=0.2
Game 2: profit of 10, loss of 100, odds of 10/100=0.1

Here, the OR is 0.2/0.1=2. Since the OR is >1, it suggests that the odds in game 1 is higher than in game 2: the relative chance of winning instead of loosing is 2-times higher in game 1.

OR: further examples

An OR of 1.3 means there is a 30% increase in the odds of an outcome.
An OR of 2 means there is a 100% increase in the odds of an outcome (same as saying that there is a doubling of the odds of the outcome).
An OR of 0.3 means there is an 70% decrease in the odds of an outcome.

6.5 Constraining all possible outcomes

6.5.1 Step 1: p to odds

First, we need to transform \(Y\) into \(p/(1-p)\) so that the binary dependent variable can be expressed as a function of continuous positive values ranging from 0 to \(+\infty\).

The formal interpretation suggests that “a unit increase in X changes the odds that Y=1 instead of Y=0 by a factor of \(e^z\) (antilog of the odds), all else equal.

OR less than 1= negative effect (log-odds)
OR greater than 1= positive effect (log-odds)

But, OR give no indication about the magnitude of the implied change in the probabilities. Let’s look at the following example of two societies:

In the above scenario, different differences in probabilities can be associated to the same odds ratio. The social mechanisms responsible for gender effect on having work are the same in the two societies, but the intensity of the effect resulting from those mechanisms is much stronger in A than B.

Note that OR can be expressed as % change in the odds \(100*(OR-1)\).

6.5.2 Step 2: odds to log odds

Up to now, we have \(odds=p/(1-p)\), which we now transform into log odds: \(ln(p/(1-p))\).

If p=0.5, odds=1 and log odds: 0, which suggests no effect
If p=0.8 of success (0.2 of failure), odds=4 and log odds: 1.38, which suggests a positive effect
If p=0.8 of failure (0.2 of success), odds=1/4 and log odds: -1.38, which suggests a negative effect

So, for log odds, only the sign (direction of the coefficient) can be interpreted.

6.5.3 Step 3: back to probabilities

The problem with log odds (also called logit) is that they are not easy to interpret: “logarithmic odds of an increase in budget by one franc on being elected”. Therefore, we need to transform log odds into probabilities:

\[P_i(y=1)=\frac{e^{a+b_1x_1+b_2x_2}}{1+e^{a+b_1x_1+b_2x_2}}\] where \(e^1=2.71828\) and \(e^{ln(x)}=x\).

6.6 Relationship between probability, logit and odds ratio

If p is a probability, then p/(1 − p) is the corresponding odds. Furthermore, the logit of the probability is the logarithm of the odds:

6.7 Marginal predictions

The main benefit of marginal predictions is that it leaves everything at the mean, except for the variable that you are interested in. Therefore, one variable is changing while the others are not.

For a continuous covariate, the margins compute how P(Y=1) changes as X changes from 0 to 1, controlling for other variables in the model. For a dichotomous independent variable, the marginal effect equates the difference in the adjusted predictions for two groups (e.g., for women and men). For discrete covariate, the margins compute the effect of a discrete change of the covariate (discrete change effects).

6.8 Model fit

\(R^2\) in OLS is a measure for model fit. It is based on differences between real observations and the regression line. It tries to remove as much error as possible.

In logistic regression, there is no comparable measure since all values on Y are either 1 or 0 . We are explaining probabilities (not explained variance). The model fit is based on Maximum Likelihood Estimation (MLE) which tries to guess parameters that have highest likelihood of producing observed sample patterns. Likelihood is the probability that our statistical model is actually found in a sample, thus the better the model fits the data, the more likelier it is.

The smaller the Log likelihood the better the model fit: by how much Log likelihood has decreased by adding (a) variable(s), and by calculating the significance of this difference. It is possible to compare models statistically by a Chi-squared test.

Note 1: Pseudo \(R^2=(-2LL0 - -2LL1)/-2LL0\) has no clear interpretation, but provides a good way to compare models (rather than assessing fit).

Note 2: Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) are alternative measures (the smaller AIC or BIC, the better the model fit).

6.9 How it works in R?

See the lecture slides on logistic regression:

You can also download the PDF of the slides here:

6.10 Quiz

True	False	Statement
		Logistic regression is used to make predictions about a dichotomous dependent variable.
		Odds can be defined as the number of times something occurs relative to the number of times it does not occur.
		If the odds ratio of a dummy variable is greater than 1, then the group captured in the dummy variable is predicted to be more likely than the reference group to have something occur.
		When there is exactly a 0.5 probability of something occurring, the log odds are 1.

My results will appear here

6.11 Example from the literature

The following article relies on logistic regression as a method of analysis:

Vogler, D., & Schäfer, M. S. (2020). Growing influence of university PR on science news coverage? A longitudinal automated content analysis of university media releases and newspaper coverage in Switzerland, 2003‒2017. International Journal of Communication, 14, 22. Available here.

Please reflect on the following questions:

What is the research question of the study?
What are the research hypotheses?
Is logistic regression an appropriate method of analysis to answer the research question?
What are the main findings of the logistic regression analysis?

6.12 Time to practice on your own

You can download the PDF of the exercises here:

6.12.1 Exercise 1: probability of hiring a consultant according to campaign personalization

For instance, we are interested in measuring the likelihood of hiring a consultant (Y) explained by personalized style of campaigning (X). To do so, we will rely on the data covering the Swiss part of the Comparative Candidate Survey. We will be using the Selects 2019 Candidate Survey.

We can look at the likelihood of hiring of consultant (B11) by the level of campaign personalization (where B6 is recoded as 0=attention to the party and 10=attention to the candidate):

Show the code

library(foreign)
db <- read.spss(file=paste0(getwd(),
                "/data/1186_Selects2019_CandidateSurvey_Data_v1.1.0.sav"), 
                use.value.labels = F, 
                to.data.frame = T)
sel <- db |>
  dplyr::select(B11,B12,B6) |>
  stats::na.omit() |>
  dplyr::rename("consultant"="B11",
                "budget"="B12",
                "personalization"="B6") |>
  plyr::mutate(budget=as.numeric(as.character(as.character(budget)))) 
sel$consultant <- ifelse(sel$consultant==1,1,0)
# keep candidates with a budget<100'000
sel <- sel[sel$budget<100000,]
# reverse the scale: higher values = higher personaliz.
sel$personalization <- as.numeric(as.character(sel$personalization))
sel$personalization <- (sel$personalization-11)*(-1)
# mean by level of personalization
p = aggregate(sel$consultant, by=list(sel$personalization), FUN=mean)
colnames(p) = c("personalization","mean")
p
##    personalization        mean
## 1                1 0.035398230
## 2                2 0.006896552
## 3                3 0.031690141
## 4                4 0.107279693
## 5                5 0.086956522
## 6                6 0.117391304
## 7                7 0.120000000
## 8                8 0.119047619
## 9                9 0.186440678
## 10              10 0.173913043
## 11              11 0.200000000

Now, we can calculate the odds of hiring a consultant for a very personalized campaign (personalization = 10):

Interpretation

\[\frac{0.17}{(1-0.17)} = 0.2 \] This suggests that for each candidate without a consultant, there are 0.2 candidates hiring a consultant. Alternatively:

\[\frac{(1-0.17)}{(1-(1-0.17))} = 4.9 \] This suggests that for each candidate hiring a consultant, there are 4.9 candidates without a consultant.

Now, calculate the odds of hiring a consultant for a very low personalized campaign (personalization = 0):

Interpretation

\[\frac{0.03}{(1-0.03)} = 0.03 \] Therefore, the odds ratio is: 0.2/0.03 = 6.7, suggesting that the odds of hiring a consultant are 6.7 higher for candidates with a very high personalized campaign than candidates with a very low personalized campaign.

The logit of the dependent variable (Y) is estimated by the following equation:

\[ logit(Y) = \beta_0 + \beta_1X_1 + \beta_2X_2 + ... + \epsilon \]

The logit does not indicate the probability that an event occurs. Apply the necessary transformation to know this probability (prob(Y=1)):

Answer

\[ proba = \frac{exp^{logit}}{1+exp^{logit}} \]

Let’s go back to our example and run the logistic regression:

Show the code

model2 <- glm(consultant ~ personalization, 
              data=sel, 
              family="binomial")
summary(model2)
## 
## Call:
## glm(formula = consultant ~ personalization, family = "binomial", 
##     data = sel)
## 
## Coefficients:
##                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)     -3.54005    0.19321  -18.32  < 2e-16 ***
## personalization  0.22052    0.03132    7.04 1.92e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1007.64  on 1854  degrees of freedom
## Residual deviance:  957.86  on 1853  degrees of freedom
## AIC: 961.86
## 
## Number of Fisher Scoring iterations: 5

Coefficients in the above output are log odds: 0.22 means that by augmenting the personalization of one point, log odds change by 0.22.

Now, assess the odds of hiring a consultant for a very personalized campaign (personalization=10):

Interpretation

\[ Logit = -3.54 + 0.22*10 = -1.34 \]

The odds ratio for the personalization variable is exp(0.22)=1.24. This suggests that, for each unit increase on the personalization scale, the odds increase by a factor of 1.24, which is equivalent to an increase of 24%.

Beware that the odds ratio does not provide information about the probability of hiring a consultant. We can calculate the probability as follows:

\[ Probability = \frac{exp(logit)}{1+exp(logit)} = \frac{e^{-1.34}}{(1+e^{-1.34})} = 0.79 \]

6.12.2 Exercise 2: predict the reliance of social media as campaigning tool

Using the same dataset, let’s investigate the following question: how does the level of campaign personalization and the fact of being affiliated to a governmental party, and being an incumbent affect the reliance of social media as campaigning tool?

In this scenario, the binary outcome is whether politicians rely on social media (combination of B4m and B4p) and the predictors are personalization (B6), being affiliated to a governmental party (based on T9), and being an incumbent (T11c).

Let’s prepare the data, including the selection and recoding of the relevant variables:

Show the code

library(foreign)
db <- read.spss(file=paste0(getwd(),
                  "/data/1186_Selects2019_CandidateSurvey_Data_v1.1.0.sav"), 
                use.value.labels = F, 
                to.data.frame = T)
sel <- db |>
  dplyr::select(B4m,B4p,T9,B6,T11c) |>
  stats::na.omit() |>
  dplyr::rename("facebook"="B4m",
                "twitter"="B4p",
                "party"="T9",
                "personalization"="B6",
                "incumbentNC"="T11c")
# reliance on social media
sel$twitter=ifelse(sel$twitter>0,1,0)
sel$facebook=ifelse(sel$facebook>0,1,0)
sel$SMuse=ifelse(sel$facebook==1 | sel$twitter==1, 1, 0)
sel$SMuse=as.factor(sel$SMuse)
# party in government
sel$in_gov=ifelse(sel$party %in% c(1,2,3,4,7), 1, 0)
sel$in_gov=as.factor(sel$in_gov)
# personalization (invert scale)
sel$personalization <- as.numeric(as.character(sel$personalization))
sel$personalization <- (sel$personalization-10)*(-1)
# incumbent
sel$incumbentNC <- as.factor(sel$incumbentNC)
# head
head(sel[,c(3:ncol(sel))])
##   party personalization incumbentNC SMuse in_gov
## 1    11               0           0     0      0
## 2    11               5           0     1      0
## 3    11               3           1     1      0
## 4    11               5           0     1      0
## 5    11               0           0     0      0
## 6    11               0           0     0      0

Now, we can conduct logistic regression and interpret the findings. Recall that, for log odds, we interpret only the sign of the coefficients (positive/negative). Coefficients smaller than 1 suggests a negative effect (negative log odds) and coefficients larger than 1 suggest positive effect (positive log odds). You can also transform to percentages using the formula 100*(OR-1):

Show the code

mod <- glm(SMuse ~ 
             personalization +
             in_gov + 
             incumbentNC, 
           data=sel, 
           family = "binomial")
summary(mod)
## 
## Call:
## glm(formula = SMuse ~ personalization + in_gov + incumbentNC, 
##     family = "binomial", data = sel)
## 
## Coefficients:
##                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      0.26890    0.08100   3.320 0.000901 ***
## personalization  0.16152    0.02046   7.896 2.89e-15 ***
## in_gov1          0.08618    0.09972   0.864 0.387475    
## incumbentNC1     0.96787    0.30413   3.182 0.001461 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2585.2  on 2094  degrees of freedom
## Residual deviance: 2487.0  on 2091  degrees of freedom
## AIC: 2495
## 
## Number of Fisher Scoring iterations: 4
# transformation
exp(coef(mod))
##     (Intercept) personalization         in_gov1    incumbentNC1 
##        1.308526        1.175299        1.090003        2.632324

Interpretation

In our example: when personalization goes up by one, the odds of relying on social media increase by a factor of 1.16, controlling for the other variables in the model. In other terms, when personalization goes up by one, the odds of using social media increase by 16% (100(1.16-1)).

The marginal effects indicate a change in predicted probability as X increases by 1. For categorical predictors, you have to take the predicted probability of the group A minus the predicted probability of the group B.

There are different ways of calculating predicted probabilities. In the social sciences, the most commonly used are Adjusted Predictions at the Means (APMs). For instance, we can assess the predicted probabilities of using social media for political incumbents, when the personalization level is at the mean and for incumbent not affiliated to a party in government.

Show the code

newdata = data.frame(personalization=5, 
                     in_gov="0", 
                     incumbentNC="1")
predict(mod, newdata, type="response")
##         1 
## 0.8853785

Nota bene: Marginal Effects at the Means (MEMs) are calculated by taking the difference of two APMs. Let’s also calculate the predicted probabilities of using social media for political non-incumbents, when the personalization level is at the mean and for politicians not affiliated to a party in government. Then, calculate the difference between both predicted probabilities:

Show the code

newdata2 = data.frame(personalization=5, 
                      in_gov="0", 
                      incumbentNC="0")
print(paste0("for incumbents: ", 
             round(predict(mod, newdata, type="response"),2),
             "; for non-incumbents: ", 
             round(predict(mod, newdata2, type="response"),2)))
## [1] "for incumbents: 0.89; for non-incumbents: 0.75"

In logistic regressions, there is no such R-squared value for general linear models. Instead, we can calculate a metric known as McFadden’s R-Squared, which ranges from 0 to just under 1, with higher values indicating a better model fit. We use the following formula to calculate McFadden’s R-Squared:

\[ 1 - (\frac{LL_{model}}{LL_{null}}) \]

Show the code

with(summary(mod), 1 - deviance/null.deviance)
## [1] 0.03798762