Download the notebook here

First maximum likelihood estimation with estimagic#

This notebook shows how to do a simple maximum likelihood (ml) estimation with estimagic. As illustrating example we implement a simple linear regression model. This is the same example model used in the method of moments notebook.

  1. Create a data generating process

  2. Set up a likelihood function

  3. Maximize the likelihood function

  4. Calculate standard errors, P-values and confidence intervals

The user only needs to do step 1 and 2. The rest is done by estimate_ml.

To be very clear: Estimagic is not a package to estimate linear models or other models that are implemented in Stata, statsmodels or anywhere else. Its purpose is to estimate parameters with custom likelihood or method of simulated moments functions. We just use an ordered logit model as an example of a very simple likelihood function.

Model:#

\[y = \beta_0 + \beta_1 x + \epsilon, \text{ where } \epsilon \sim N(0, \sigma^2)\]

We aim to estimate \(\beta_0, \beta_1, \sigma^2\).

[1]:
import numpy as np
import pandas as pd
from patsy import dmatrices
from scipy import stats
from scipy.stats import norm

from estimagic import estimate_ml

np.random.seed(0)

Create a data generating process#

[2]:
def simulate_data(params, n_draws):
    x = np.random.normal(0, 1, size=n_draws)
    e = np.random.normal(0, params.loc["sd", "value"], size=n_draws)
    y = params.loc["intercept", "value"] + params.loc["slope", "value"] * x + e
    return pd.DataFrame({"y": y, "x": x})
[3]:
true_params = pd.DataFrame(
    data=[[2, -np.inf], [-1, -np.inf], [1, 1e-10]],
    columns=["value", "lower_bound"],
    index=["intercept", "slope", "sd"],
)
true_params
[3]:
value lower_bound
intercept 2 -inf
slope -1 -inf
sd 1 1.000000e-10
[4]:
data = simulate_data(true_params, n_draws=100)

Defining the loglike function#

[5]:
def normal_loglike(params, data):
    norm_rv = norm(
        loc=params.loc["intercept", "value"] + params.loc["slope", "value"] * data["x"],
        scale=params.loc["sd", "value"],
    )
    contributions = norm_rv.logpdf(data["y"])
    return {"contributions": contributions, "value": contributions.sum()}

A few remarks are in order:

  1. There are numerically better ways to calculate the likelihood, we chose this implementation for brevity and readability.

  2. The loglike function takes params and other arguments. You are completely flexible in the number and names of the other arguments as long as the first argument is params.

  3. The loglike function returns a dictionary with the entries “contributions” and “value”. The “contributions” are the log likelihood evaluations of each individual in the dataset. The “value” are their sum. The “value” entry could be omitted, the “contributions” entry is mandatory.

Estimating the model#

[6]:
start_params = true_params.assign(value=[100, 100, 100])

res = estimate_ml(
    loglike=normal_loglike,
    params=start_params,
    optimize_options={"algorithm": "scipy_lbfgsb"},
    loglike_kwargs={"data": data},
)
[7]:
res["summary_jacobian"].round(3)
[7]:
value standard_error p_value ci_lower ci_upper stars
intercept 2.075 0.105 0.0 1.868 2.282 ***
slope -0.885 0.111 0.0 -1.103 -0.667 ***
sd 1.028 0.095 0.0 0.843 1.214 ***

What’s in the results?#

The result of estimate_ml is a dictionary with the following entries:

[8]:
res.keys()
[8]:
dict_keys(['summary_jacobian', 'cov_jacobian', 'jacobian', 'hessian', 'optimize_res', 'jacobian_numdiff_info'])

Importantly, we might have several summaries and several cov entries. This is because we always all possible types of standard errors, so you can compare them easily (for example using our estimation table functions).

In the current example we only have jacobian based standard errors because we did not provide a closed form hessian function and numerical hessians are not yet implemented. With a closed form hessian we would also get hessian based and robust standard errors. Even cluster and strata robust standard errors are possible if you provide the relevant information.

If numerical optimizations or derivative calculations were performed, the full output of those steps is also part of the results dictionary.