The purpose of this notebook is to work through the set of exercises provided by Kenneth Train for his Discrete Choice Models class, and worked through by Yves Croissant in (Kenneth Train's Exercises using mlogit package). This will teach me how to apply several discrete choice models, including multinomial logit and nested logit.

library(mlogit); library(data.table); library(tidyverse); library(scales)
options(repr.plot.width=6, repr.plot.height=3)
data(Heating)
head(Heating)

Data¶

There are 5 types of heating systems possible:

gas central (gc)
gas room (gr)
electric central (ec)
electric room (er)
heat pump (hp)

Other variables:

ic refers to the installation cost (specific to each household) for each type of heating system.
oc refers to the annual operating cost (specific to each household) for each type of heating system.
income refers to the annual household income
agehed refers to the age of the head of household
rooms is the number of rooms in the house
region is the part of California they belong to

Q1 Run a model with the installation costs and operating costs, without intercepts.¶

Do the estimated coefficients have the expected signs?
Are both coefficients significantly different from zero?
How closely do the average probabilities match the shares of customers choosing each alternative?

We first arrange the data in a format suitable for running the logistic regression.

ic_vars <- paste0("ic.", c("gc", "gr", "ec", "er", "hp"))
oc_vars <- paste0("oc.", c("gc", "gr", "ec", "er", "hp"))
H <- mlogit.data(Heating, shape="wide", choice="depvar", varying=c(ic_vars, oc_vars))
head(H)

m <- mlogit(depvar ~ ic + oc | 0, H) # Constrain intercept to be 0
summary(m)

Call:
mlogit(formula = depvar ~ ic + oc | 0, data = H, method = "nr", 
    print.level = 0)

Frequencies of alternatives:
      ec       er       gc       gr       hp 
0.071111 0.093333 0.636667 0.143333 0.055556 

nr method
4 iterations, 0h:0m:0s 
g'(-H)^-1g = 1.56E-07 
gradient close to zero 

Coefficients :
      Estimate  Std. Error t-value  Pr(>|t|)    
ic -0.00623187  0.00035277 -17.665 < 2.2e-16 ***
oc -0.00458008  0.00032216 -14.217 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Log-Likelihood: -1095.2

We see that both ic and oc have negative and significant coefficients, which means that as the cost of a heating system rises, the probability of it being chosen falls. This makes sense.
We can now calculate the predicted probabilities by the model for each alternative heating system.

plot_actual_vs_predicted_shares <- function(model) {
    predicted_shares = apply(model$probabilities, MARGIN=2, FUN=mean)
    actual_shares = as.vector(model$freq) / sum(model$freq)
    df = data.frame(predicted = predicted_shares, actual = actual_shares, system = names(predicted_shares))
    df = melt(df, id.var="system")
    ggplot(df, aes(x=system, y=value, col=variable, group=variable)) + 
        geom_point(alpha=0.6) + geom_line(alpha=0.6) + theme_minimal() + 
        scale_y_continuous(labels=percent_format(), limits=c(0,1)) + ylab("Share of Market")    
}
plot_actual_vs_predicted_shares(m)

We can see that the predicted probabilities are slightly off.

Q2: Willingness to pay¶

The ratio of coefficients usually provides economically meaningful information. The willingness to pay (wtp) through higher installation cost for a one-dollar reduction in operating costs is the ratio of the operating cost coefficient to the installation cost coefficient. We can look at the ratio of coefficients between oc and ic to see if the model is reasonable.

beta_oc = as.numeric(coef(m)["oc"]); print(beta_oc)
beta_ic = as.numeric(coef(m)["ic"]); print(beta_ic)
ratio = beta_oc / beta_ic; message("beta_oc / beta_ic is: ", ratio)

[1] -0.004580083
[1] -0.006231869

beta_oc / beta_ic is: 0.734945281306935

The coefficient is 0.73, which means that people have the same utility, or probability of purchase if they were to decrease annual operating cost (oc) by \$1 and *increase* annual installation cost (ic) by \$0.73. This cannot be right, as oc is a recurring cost, so they should be willing to pay more than \$1 in ic to get a \$1 reduction in oc. The ratio should be more than 1.0.

To account for the present value of future operating costs, we can estimate the PV of annual operating costs as: $$ oc_{pv} = \sum \frac{oc}{(1+r)^t}$$ Where t denotes the number of years and r is the discount rate of a dollar. As t increases, PV will approach: $$oc_{pv} \approx \frac{oc}{r}$$ This means that for every \$1 reduction in oc, the buyer should be willing to pay $\$ \frac{1}{r}$ in ic upfront. We can calculate the discount value r implied by the findings of our regression and see if the implied discount rate is reasonable.

$$ U = a LC $$ where LC is the lifetime cost and a is a constant. Subbing in the components of LC: $$ U = a(ic + \frac{oc}{r}) = a(ic) + \frac{a}{r}(oc) $$ The regression coefficients imply $$a = -0.00623$$ and $$\frac{a}{r} = -0.00458$$ So $$ discount = r = \frac{-0.00623}{-0.00458} = 1.36 = 136\%$$ This discount rate is far too high and obviously something is wrong with our regression.

Q3 : Hypothesis Testing¶

Estimate a model that imposes the constraint that r = 0.12. Test the hypothesis that r = 0.12.
We first estimated the restricted model, then conduct a likelihood ratio test. The test statistic is: $$ 2(LL_{U} - LL_{R})$$ and distributed as a chi-squared with degrees of freedom = number of restrictions.

calc_test_stat <- function(r_value=0.12) {
    H$lcc = H$ic + (H$oc) / r_value
    m_restricted = mlogit(depvar ~ lcc|0, H)
    likelihood_test = lrtest(m, m_restricted)
    test_stat = 2*(likelihood_test$LogLik[1] - likelihood_test$LogLik[2])
    return(test_stat)
}

test_stat = calc_test_stat(0.12) # We can set the r value
critical_value = qchisq(0.05, df = 1, lower.tail = FALSE)
message("The test stat is ", test_stat, " and the critical value is ", critical_value, ".")

The test stat is 306.92953103169 and the critical value is 3.84145882069413.

Since the test stat is higher than the critical value, we conclude that the restricted and unrestricted models are different and we reject the null hypothesis that the discount rate r is 0.12.
We can also test a range of r values.

df = data.frame(r_value = seq(0.01, 2, 0.01), test_stat=NA)
for (i in 1:nrow(df)) df$test_stat[i] <- calc_test_stat(df$r_value[i])
ggplot(df, aes(x=r_value, y=test_stat)) + geom_line() + 
    geom_hline(yintercept=3.84, col="red") + theme_minimal() + coord_cartesian(ylim=c(0,20)) +
    labs(y="Test Statistic", x="r value", title="Test statistic vs different r values")

We can see that the test statistic is 0 at r value = 1.36, which makes the 2 models identical. But otherwise, when the r value is too different from 1.36, the likelihood test will reject the null hypothesis.

Question 4: Add alternative-specific constants¶

Add alternative-specific constants to the model. With J alternatives, at most J−1 alternative-specific constants can be estimated. The coefficients of J−1 constants are interpreted as relative to the Jth alternative. Normalize the constant for the alternative hp to 0.

model2 <- mlogit(depvar~ic+oc, H, reflevel = 'hp')
summary(model2)

Call:
mlogit(formula = depvar ~ ic + oc, data = H, reflevel = "hp", 
    method = "nr", print.level = 0)

Frequencies of alternatives:
      hp       ec       er       gc       gr 
0.055556 0.071111 0.093333 0.636667 0.143333 

nr method
6 iterations, 0h:0m:0s 
g'(-H)^-1g = 9.58E-06 
successive function values within tolerance limits 

Coefficients :
                  Estimate  Std. Error t-value  Pr(>|t|)    
ec:(intercept)  1.65884594  0.44841936  3.6993 0.0002162 ***
er:(intercept)  1.85343697  0.36195509  5.1206 3.045e-07 ***
gc:(intercept)  1.71097930  0.22674214  7.5459 4.485e-14 ***
gr:(intercept)  0.30826328  0.20659222  1.4921 0.1356640    
ic             -0.00153315  0.00062086 -2.4694 0.0135333 *  
oc             -0.00699637  0.00155408 -4.5019 6.734e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Log-Likelihood: -1008.2
McFadden R^2:  0.013691 
Likelihood ratio test : chisq = 27.99 (p.value = 8.3572e-07)

Same as before, we can look at the predicted shares of each alternative relative to the actual shares. We can see that they are actually identical - having alternative-specific constants ensures that the shares will always be identical.

plot_actual_vs_predicted_shares(model2) + facet_wrap(~variable)

As before, we can look at the implied discount rate to see if it is reasonable. We see that the implied discount rate is 22%, which seems high but is much more reasonable than before.

beta_oc = as.numeric(coef(model2)["oc"]); print(beta_oc)
beta_ic = as.numeric(coef(model2)["ic"]); print(beta_ic)
ratio = beta_oc / beta_ic; message("beta_oc / beta_ic is: ", ratio)
r = beta_ic / beta_oc; message("Implied discount rate r is: ", r)

[1] -0.006996368
[1] -0.001533153

beta_oc / beta_ic is: 4.5633850066011
Implied discount rate r is: 0.219135575576784

Now we try to re-estimate the same model, but change the reference level to gr. We see that with the shift, the coefficient for hp is now -0.308, which is the flip-side of the previous coefficient for gr.

update(model2, reflevel="gr")

Call:
mlogit(formula = depvar ~ ic + oc, data = H, reflevel = "gr",     method = "nr", print.level = 0)

Coefficients:
ec:(intercept)  er:(intercept)  gc:(intercept)  hp:(intercept)              ic  
     1.3505827       1.5451737       1.4027160      -0.3082633      -0.0015332  
            oc  
    -0.0069964

Now we try a fuller model incorporating individual-specific effects of income. This model is not much better, as the improvement in log likelihood from -1008.2 to -1005.9 is not statistically significant using the log-likelihood test. Also, all the income terms are insignificant, suggesting that socio-demographic factors are not very important. This makes sense because it is the builder of the house that decides what heating system to put in, not the buyers.

model3 <- mlogit(depvar ~ oc + ic|income, H, reflevel="gc")
summary(model3)
lrtest(model2, model3)

Call:
mlogit(formula = depvar ~ oc + ic | income, data = H, reflevel = "gc", 
    method = "nr", print.level = 0)

Frequencies of alternatives:
      gc       ec       er       gr       hp 
0.636667 0.071111 0.093333 0.143333 0.055556 

nr method
6 iterations, 0h:0m:0s 
g'(-H)^-1g = 6.27E-06 
successive function values within tolerance limits 

Coefficients :
                  Estimate  Std. Error t-value  Pr(>|t|)    
ec:(intercept) -0.10071221  0.59717905 -0.1686   0.86607    
er:(intercept)  0.25043834  0.52284834  0.4790   0.63195    
gr:(intercept) -0.91358879  0.28965174 -3.1541   0.00161 ** 
hp:(intercept) -2.05517018  0.48639682 -4.2253 2.386e-05 ***
oc             -0.00696000  0.00155383 -4.4792 7.491e-06 ***
ic             -0.00153534  0.00062251 -2.4664   0.01365 *  
ec:income       0.00815999  0.07887085  0.1035   0.91760    
er:income      -0.02506870  0.07028677 -0.3567   0.72134    
gr:income      -0.10802242  0.05814127 -1.8579   0.06318 .  
hp:income       0.07178917  0.08878777  0.8085   0.41878    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Log-Likelihood: -1005.9
McFadden R^2:  0.01598 
Likelihood ratio test : chisq = 32.67 (p.value = 1.2134e-05)

Alternative-Specific Marginal Effect¶

We recall that the probability of choosing alternative i is given by: $$ P_{in} = {\frac {e^{V_{in}}}{ \sum e^{V_{jn}}} } $$ and suppose $$ V_{in} = \beta Z_{in} $$ where $Z_{in}$ is the alternative-specific variable we are interested in changing. We would like to calculate the marginal effect of changing $Z_{in}$ on the probability $P_{in}$, so we do so by taking the derivative, and show it to be: $$ {dP_{in} \over dZ_{in}} = \beta P_{in}(1-P_{in})$$

Individual-Specific Marginal Effect¶

However, I do not know how to calculate the individual-specific marginal effect. That is, the marginal effect on probabilities of socio-demographic variables that vary across individuals, but not across alternatives. However, we can derive the marginal effects by a simple way - calculate probabilities before, adjust the variable, and calculate the probabilities again.

Suppose we are interested in how increasing income by 1 affects the probabilities on choosing different heating systems.

First, we calculate the mean oc, ic and income, which represents the typical person.
We then calculate the probabilities of each type of heating system he chooses.
We adjust the income by 1, then recalculate the probabilities.
We look at the differences - the marginal effects.

df_heating <- data.table(H)
df_heating[, alt := factor(alt, levels=c("gc", "ec", "er", "gr", "hp"))]
df_mean <- df_heating[, .(ic = mean(ic), oc = mean(oc), income = mean(income)), by=.(alt)][order(alt)]
df_mean

df_mean gives the mean values of ic, oc and income for each alternative, which represents the typical person. We then predict probabilities for this "person" before and after adjusting income, and compare effects.

test_data <- df_mean
before_prob <- predict(model3, newdata=test_data)
test_data$income <- test_data$income + 1 
after_prob <- predict(model3, newdata=test_data)
df = data.frame(before = before_prob, after = after_prob)
df$marginal = df$after - df$before
df

We can compare the marginal column above to what the effects function of mlogit provides - we can see that they are very similar, with very slight discrepancies. I wonder where the discrepancy comes from.

effects(model3, covariate="income")

Another way to view marginal effects is to plot the effects of income on the typical household for a range of income values. We can see that richer households favour central heating than room heating.

The effect is most clear for gas central (increase as income rises) and gas room (decrease as income rises).
The same effect is observed for electric central vs electric room, but the effect is much more moderate.

plot_income_effects <- function(model, df_mean, income_values) {
    temp = copy(df_mean)
    df = data.frame()
    for (i in 1:length(income_values)) {
        current_income = income_values[i]
        temp[, income := current_income]
        probability = predict(model, newdata=temp)
        temp_df = as.data.frame(probability)
        temp_df$income = current_income
        temp_df$alt = row.names(temp_df)
        df = bind_rows(df, temp_df)
    }
    ggplot(df, aes(x=income, y=probability, col=alt)) + geom_line() + theme_minimal() +
        scale_y_continuous(limits=c(0,1), labels=percent_format()) + ggtitle("Marginal effects of Income") +
        scale_color_brewer(palette=2, type="qual")
}
plot_income_effects(model3, df_mean, seq(2, 7, 0.5))

idcase	depvar	ic.gc	ic.gr	ic.ec	ic.er	ic.hp	oc.gc	oc.gr	oc.ec	oc.er	oc.hp	income	agehed	rooms	region
1	gc	866.00	962.64	859.90	995.76	1135.50	199.69	151.72	553.34	505.60	237.88	7	25	6	ncostl
2	gc	727.93	758.89	796.82	894.69	968.90	168.66	168.66	520.24	486.49	199.19	5	60	5	scostl
3	gc	599.48	783.05	719.86	900.11	1048.30	165.58	137.80	439.06	404.74	171.47	4	65	2	ncostl
4	er	835.17	793.06	761.25	831.04	1048.70	180.88	147.14	483.00	425.22	222.95	2	50	4	scostl
5	er	755.59	846.29	858.86	985.64	883.05	174.91	138.90	404.41	389.52	178.49	2	25	6	valley
6	gc	666.11	841.71	693.74	862.56	859.18	135.67	140.97	398.22	371.04	209.27	6	65	7	scostl

	idcase	depvar	income	agehed	rooms	region	alt	ic	oc	chid
1.ec	1	FALSE	7	25	6	ncostl	ec	859.90	553.34	1
1.er	1	FALSE	7	25	6	ncostl	er	995.76	505.60	1
1.gc	1	TRUE	7	25	6	ncostl	gc	866.00	199.69	1
1.gr	1	FALSE	7	25	6	ncostl	gr	962.64	151.72	1
1.hp	1	FALSE	7	25	6	ncostl	hp	1135.50	237.88	1
2.ec	2	FALSE	5	60	5	scostl	ec	796.82	520.24	2

#Df	LogLik	Df	Chisq	Pr(>Chisq)
6	-1008.229	NA	NA	NA
10	-1005.889	4	4.680344	0.3216955

alt	ic	oc	income
gc	776.8266	172.1158	4.641111
ec	824.5435	476.8030	4.641111
er	983.9280	429.7299	4.641111
gr	921.7702	154.4714	4.641111
hp	1046.4813	219.2993	4.641111

	before	after	marginal
gc	0.64593790	0.65385888	0.007920977
ec	0.06762509	0.06901523	0.001390142
er	0.08946353	0.08831858	-0.001144948
gr	0.14202993	0.12905047	-0.012979461
hp	0.05494355	0.05975684	0.004813289