Download the notebook here
Ordered Logit Example¶
Let’s suppose, completely hypothetically, that we are not a big fan of Stata or simply want to learn the mechanics behind an ordered logit model by coding it up ourselves.
In this notebook we show you how estimagic can help you to implement such a model very easily. Implementing a logit model consists of four basic steps:
Processing the user input into inputs for the likelihood function
Writing the likelihood function of an ordered logit model
Maximizing the likelihood function
Calculating standard errors
The first two have to be done by the user, the last two are done by estimagic.
To be very clear: Estimagic is not a package to estimate logit models or other models that are implemented in Stata, statsmodels or anywhere else. Its purpose is to estimate parameters with custom likelihood or method of simulated moments functions. We just use an orederd logit model as an example of a very simple likelihood function.
The example we will use to test our model is taken from the Stata Documentation.
[1]:
import numpy as np
import pandas as pd
from patsy import dmatrices
from scipy import stats
from estimagic import maximize
from estimagic.inference.likelihood_inference import do_likelihood_inference
Process the user input¶
First we have to take the formula and dataset, extract all relevant information about the model and construct the inputs for the likelihood function.
We will need four inputs:
A DataFrame with start parameters for the optimization.
An array with the dependent variable.
A 2d array with explanatory variables.
Constraints for the optimization that keep the cutoffs increasing.
[2]:
def ordered_logit_processing(formula, data):
"""Process user input for an ordered logit model."""
# extract data arrays
y, x = dmatrices(formula + " - 1", data, return_type="dataframe")
y = y[y.columns[0]]
# extract dimensions
num_choices = len(y.unique())
beta_names = list(x.columns)
num_betas = len(beta_names)
num_cutoffs = num_choices - 1
# set-up index for params_df
names = beta_names + list(range(num_cutoffs))
categories = ["beta"] * num_betas + ["cutoff"] * num_cutoffs
index = pd.MultiIndex.from_tuples(zip(categories, names), names=["type", "name"])
# make params_df
np.random.seed(5471)
start_params = pd.DataFrame(index=index)
start_params["value"] = np.hstack(
[
np.random.uniform(low=-0.5, high=0.5, size=len(x.columns)),
np.arange(num_cutoffs) * 2,
]
)
start_params["group"] = start_params.index.get_level_values("type")
# make constraints
constr = [{"loc": "cutoff", "type": "increasing"}]
# turn pandas objects into numpy arrays
y_arr = y.to_numpy().astype(int)
x_arr = x.to_numpy()
return start_params, y_arr, x_arr, constr
Calculate the Likelihood¶
Next, we want to evaluate the likelihood function, given parameters and data. There are more efficient ways of calculating the likelihood for an ordered logit but this one was chosen for brevity and readability.
[3]:
def ordered_logit_loglike(params, y, x):
"""Likelihood function of an orderd logit model."""
# parse the parameter vector into its quantities
beta = params.loc["beta", "value"].to_numpy()
cutoffs = params.loc["cutoff", "value"].to_numpy()
# calculate deterministic part of utilities
xb = x.dot(beta)
# evaluate likelihood
upper_cutoffs = np.hstack([cutoffs, np.inf])[y]
lower_cutoffs = np.hstack([-np.inf, cutoffs])[y]
upper_cdf = stats.logistic.cdf(upper_cutoffs - xb)
lower_cdf = stats.logistic.cdf(lower_cutoffs - xb)
contributions = np.log(upper_cdf - lower_cdf)
res = {"contributions": contributions, "value": contributions.sum()}
return res
If you have never programmed an estimator before, you migt be surprised how much code is spent on processing compared to calculating the actual likelihood function. This will almost always be the case - at least if you try to make your estimator flexible and user friendly. Estimagic is there to shorten this type of code as much as possible.
Another peculiarity you might notice is that the likelihood function does not just return a scalar value, but also the likelihood contributions of each individual. This is because some optimizers (e.g. bhhh) can actually use the information on the contributions. Morover, you will need the contributions to calculate standard errors by the outer product of gradients.
All estimagic functions (whether for numerical differentiation, standard error calculation or optimization) will simply pick from the dictionary what they need!
Maximizing the likelihood¶
[4]:
data = pd.read_pickle("ologit.pickle")
formula = "apply ~ pared + public + gpa"
start_params, y, x, constraints = ordered_logit_processing(formula, data)
res = maximize(
criterion=ordered_logit_loglike,
params=start_params,
algorithm="scipy_lbfgsb",
constraints=constraints,
criterion_kwargs={"y": y, "x": x},
logging="ordered_logit.db",
)
[5]:
params = res["solution_params"]
params
[5]:
group | lower_bound | upper_bound | value | ||
---|---|---|---|---|---|
type | name | ||||
beta | pared | beta | -inf | inf | 1.047661 |
public | beta | -inf | inf | -0.058685 | |
gpa | beta | -inf | inf | 0.615744 | |
cutoff | 0 | cutoff | -inf | inf | 2.203318 |
1 | cutoff | -inf | inf | 4.298762 |
Calculate standard errors¶
[6]:
from estimagic.decorators import numpy_interface
numpy_interface(ordered_logit_loglike, params=params, constraints=constraints)
[6]:
<function __main__.ordered_logit_loglike(params, y, x)>
[7]:
inference, free_cov = do_likelihood_inference(
loglike=ordered_logit_loglike,
params=params,
loglike_kwargs={"x": x, "y": y},
n_samples=10_000,
constraints=constraints,
)
inference.round(3)
[7]:
value | standard_error | p_value | ci_lower | ci_upper | stars | ||
---|---|---|---|---|---|---|---|
type | name | ||||||
beta | pared | 1.048 | 0.276 | 0.000 | 0.507 | 1.589 | *** |
public | -0.059 | 0.269 | 0.811 | -0.587 | 0.469 | ||
gpa | 0.616 | 0.275 | 0.025 | 0.077 | 1.155 | ** | |
cutoff | 0 | 2.203 | 0.822 | 0.007 | 0.592 | 3.815 | *** |
1 | 4.299 | 0.846 | 0.000 | 2.641 | 5.956 | *** |
Compare to STATA’s results¶
[8]:
stata_results = pd.read_csv("stata_ologit_results.csv")
stata_results.round(3)
[8]:
name | stata_value | stata_standard_error | stata_p_value | stata_ci_lower | stata_ci_upper | |
---|---|---|---|---|---|---|
0 | pared | 1.048 | 0.266 | 0.000 | 0.527 | 1.569 |
1 | public | -0.059 | 0.298 | 0.844 | -0.642 | 0.525 |
2 | gpa | 0.616 | 0.261 | 0.018 | 0.105 | 1.127 |
3 | cut1 | 2.203 | 0.780 | NaN | 0.675 | 3.731 |
4 | cut2 | 4.299 | 0.804 | NaN | 2.722 | 5.875 |
This looks pretty good! The parameter estimates line up perfectly. I actually had to try three optimizers to get at least one differenet digit which makes the result more credible. Other optimizers hit it on all digits.
Note that standard error calculation, especially in combination with constraints is still considered experimental in estimagic.
Use the dashboard for monitoring the optimization¶
Often you may want to monitor an optimization to see how far the algorithm has moved away from the start values or see how the algorithm arrived at its solution after it has finished.
Both can be done using the estimagic dashboard.
To use the dashboard, we need to activate logging which we had deactivated up until now. To activate logging, simply supply a database path to ordered_logit
.
To start the dashboard, make sure you have the estimagic environment installed and activated.
Then all you need to do is navigate to the path’s directory in your command line, start the cell below and enter the following into your command line after the optimization has started:
estimagic dashboard {db_path}
This should open a page in your browser where you can press “Start Updating from Database” to start watching the optimization.
[9]:
db_path = "./logging.db"
res = maximize(
criterion=ordered_logit_loglike,
params=start_params,
algorithm="scipy_lbfgsb",
constraints=constraints,
criterion_kwargs={"y": y, "x": x},
logging=False,
)