How to generate publication quality tables#

Estimagic can create publication quality tables of parameter estimates in LaTeX or HTML. It works with the results from estimate_ml and estimate_msm but also supports statsmodels results out of the box.

You can get almost limitless flexibility if you split the table generation into two steps. The fist generates a DataFrame which you can customize to your liking, the second renders that DataFrame in LaTeX or HTML. If you are interested in this feature, search for “render_inputs” below.

# Make necessary imports
import estimagic as em
import pandas as pd
import statsmodels.formula.api as sm
from estimagic.config import EXAMPLE_DIR
from IPython.core.display import HTML

Create tables from statsmodels results#

df = pd.read_csv(EXAMPLE_DIR / "diabetes.csv", index_col=0)
mod1 = sm.ols("target ~ Age + Sex", data=df).fit()
mod2 = sm.ols("target ~ Age + Sex + BMI + ABP", data=df).fit()
models = [mod1, mod2]
HTML(em.estimation_table(models, return_type="html"))
  target
  (1) (2)
Intercept 152.00$^{*** }$ 152.00$^{*** }$
(3.61) (2.85)
Age 301.00$^{*** }$ 37.20$^{ }$
(77.10) (64.10)
Sex 17.40$^{ }$ -107.00$^{* }$
(77.10) (62.10)
BMI 787.00$^{*** }$
(65.40)
ABP 417.00$^{*** }$
(69.50)
Observations 442 442
R$^2$ 0.04 0.40
Adj. R$^2$ 0.03 0.40
Residual Std. Error 75.90 60
F Statistic 8.06$^{***}$ 72.90$^{***}$
Note:***p<0.01; **p<0.05; *p<0.1

Adding estimagic results#

estimate_ml and estimate_msm can both generate summaries of estimation results. Those summaries are either DataFrames with the columns "value", "standard_error", "p_value" and "stars" or pytrees containing such DataFrames.

For examples, check out our tutorials on estimate_ml and estimate_msm.

Assume we got the following DataFrame from an estimation summary:

params = pd.DataFrame(
    {
        "value": [142.123, 51.456, -33.789],
        "standard_error": [3.1415, 2.71828, 1.6180],
        "p_value": [1e-8] * 3,
    },
    index=["Intercept", "Age", "Sex"],
)
params
value standard_error p_value
Intercept 142.123 3.14150 1.000000e-08
Age 51.456 2.71828 1.000000e-08
Sex -33.789 1.61800 1.000000e-08

You can either use just the params DataFrame or a dictionary containing “params” and additional information in estimation_table.

mod3 = {"params": params, "name": "target", "info": {"n_obs": 445}}
models = [mod1, mod2, mod3]
HTML(em.estimation_table(models, return_type="html"))
  target
  (1) (2) (3)
Intercept 152.00$^{*** }$ 152.00$^{*** }$ 142.00$^{*** }$
(3.61) (2.85) (3.14)
Age 301.00$^{*** }$ 37.20$^{ }$ 51.50$^{*** }$
(77.10) (64.10) (2.72)
Sex 17.40$^{ }$ -107.00$^{* }$ -33.80$^{*** }$
(77.10) (62.10) (1.62)
BMI 787.00$^{*** }$
(65.40)
ABP 417.00$^{*** }$
(69.50)
Observations 442 442 445
R$^2$ 0.04 0.40
Adj. R$^2$ 0.03 0.40
Residual Std. Error 75.90 60
F Statistic 8.06$^{***}$ 72.90$^{***}$
Note:***p<0.01; **p<0.05; *p<0.1

Selecting the right return_type#

The following return types are supported:

  • "latex": Returns a string that you can save and import into a LaTeX document

  • "html": Returns a string that you can save and import into a HTML document.

  • "render_inputs": Returns a dictionary with the following entries:

    • "body": A DataFrame containing the main table

    • "footer": A DataFrame containing the statisics

    • other stuff that you should ignore

  • "dataframe": Returns a DataFrame you can look at in a notebook

Use render_inputs for maximum flexibility#

As an example, let’s assume we want to remove a few rows from the footer.

Let’s first look at the footer we get from estimation_table

render_inputs = em.estimation_table(models, return_type="render_inputs")
footer = render_inputs["footer"]
footer
target
(1) (2) (3)
Observations 442 442 445
R$^2$ 0.04 0.40
Adj. R$^2$ 0.03 0.40
Residual Std. Error 75.90 60
F Statistic 8.06$^{***}$ 72.90$^{***}$

Now we can remove the rows we don’t need and render it to html.

render_inputs["footer"] = footer.loc[["R$^2$", "Observations"]]
HTML(em.render_html(**render_inputs))
  target
  (1) (2) (3)
Intercept 152.00$^{*** }$ 152.00$^{*** }$ 142.00$^{*** }$
(3.61) (2.85) (3.14)
Age 301.00$^{*** }$ 37.20$^{ }$ 51.50$^{*** }$
(77.10) (64.10) (2.72)
Sex 17.40$^{ }$ -107.00$^{* }$ -33.80$^{*** }$
(77.10) (62.10) (1.62)
BMI 787.00$^{*** }$
(65.40)
ABP 417.00$^{*** }$
(69.50)
R$^2$ 0.04 0.40
Observations 442 442 445
Note:***p<0.01; **p<0.05; *p<0.1

Using this 2-step-procedure, we can also easily add additional rows to the footer.

Note that we add the row using .loc[("Statsmodels", )] since the index of render_inputs["footer"] is a MultiIndex.

render_inputs["footer"].loc[("Statsmodels",)] = ["Yes"] * 2 + ["No"]
HTML(em.render_html(**render_inputs))
  target
  (1) (2) (3)
Intercept 152.00$^{*** }$ 152.00$^{*** }$ 142.00$^{*** }$
(3.61) (2.85) (3.14)
Age 301.00$^{*** }$ 37.20$^{ }$ 51.50$^{*** }$
(77.10) (64.10) (2.72)
Sex 17.40$^{ }$ -107.00$^{* }$ -33.80$^{*** }$
(77.10) (62.10) (1.62)
BMI 787.00$^{*** }$
(65.40)
ABP 417.00$^{*** }$
(69.50)
R$^2$ 0.04 0.40
Observations 442 442 445
Statsmodels Yes Yes No
Note:***p<0.01; **p<0.05; *p<0.1

Advanced options#

Below is an exmample that demonstrates how to use advanced options to customize your table.

stats_dict = {
    "n_obs": "Observations",
    "rsquared": "R$^2$",
    "rsquared_adj": "Adj. R$^2$",
    "resid_std_err": "Residual Std. Error",
    "fvalue": "F Statistic",
    "show_dof": True,
}
HTML(
    em.estimation_table(
        models=models,
        return_type="html",
        custom_param_names={"Intercept": "Constant", "Sex": "Gender"},
        custom_col_names=["Model 1", "Model 2", "Model 3"],
        custom_col_groups={"target": "Dependent variable: target"},
        render_options={"caption": "Table Latex(render_latex(**render_inputs))Title"},
        stats_options=stats_dict,
        number_format="{0:.3f}",
    )
)
Table Latex(render_latex(**render_inputs))Title
  Dependent variable: target
  Model 1 Model 2 Model 3
Constant 152.133$^{*** }$ 152.133$^{*** }$ 142.123$^{*** }$
(3.610) (2.853) (3.142)
Age 301.161$^{*** }$ 37.241$^{ }$ 51.456$^{*** }$
(77.060) (64.117) (2.718)
Gender 17.392$^{ }$ -106.578$^{* }$ -33.789$^{*** }$
(77.060) (62.125) (1.618)
BMI 787.179$^{*** }$
(65.424)
ABP 416.674$^{*** }$
(69.495)
Observations 442 442 445
R$^2$ 0.035 0.400
Adj. R$^2$ 0.031 0.395
Residual Std. Error 75.888(df=439) 59.976(df=437)
F Statistic 8.059$^{***}$(df=2;439) 72.913$^{***}$(df=4;437)
Note:***p<0.01; **p<0.05; *p<0.1

Note 1: You can pass a dictionary for custom_col_names to rename specific columns, e.g. custom_col_names={"(1)": "Model 1"}, leaving names of the other columns at default values.

Note 2: In addition to renaming the default column groups by passing a dictionary for custom_col_groups, you can also pass a list to create custom column groups, e.g. custom_col_groups=["target", "target", "not target"] will group the first two columns under the name "target", and the last column under the name "not target".

LaTeX peculiarities#

By default, tables in render_latex are structured in compliance with siunitx package. This is done by setting column formats to S in the default rendering options defined internally. To get nicely formatted tables, you need to add the following to your LaTeX preamble:

\usepackage{siunitx}
\sisetup{
        input-symbols            = (),
        table-align-text-post    = false
        group-digits             = false,
    }

The first line in \sisetup is necessary if you have parentheses in your table cells (e.g. when displaying standard errors or confidence intervals), otherwise LaTex will raise an error.

The second argument is necessary so that there is no spacing between the significance stars and the numerical values.

The third line prevents digits in numbers being grouped into groups of threes, which is the default behaviour. This line is optional, but recommended.

By default, whenever calling render_latex, a warning will be raised about this. To silence the warning, set siunitx_warning=False in the relvant function calls (when calling estimation_table with return_type=tex or when calling render_latex)

If you don’t want to generate siunitx style tables, you can pass render_options={"column_format":<desired formats>} to your function calls.

You can influence the format of the output table with keyword arguments passed via render_options. For the list of supported keyword arguments see the documentation of pandas.io.formats.style.Styler.to_latex

By default, siunitx will center table columns around the decimal point. This means, that if there is a number in a column that has many comparatively larger number of symbols after the decimal point (e.g. when there is a number with scientific notation), there will be extra spacing between that column and the preceeding one, since there is as much space reserved for the column before the decimal point, as there is after it.

You can adjust the spacing between columns, by using the format S[table-format =x.y] for the numeric columns, where x and y control the space pre and post the decimal point, respecitvely. We further show a case with the described problem and the solution to that problem. For number with scientific notations, use S[table-format=x.yez], where y reserves the space for the exponential, and z reserves the space for the column after the decimal point.

Compiling the following LaTex table will result in extra spacing between columns (2) and (3):

\begin{tabular}{lSSS}
  \toprule
  & \multicolumn{3}{c}{target} \\
  \cmidrule(lr){2-4}

               & (1)                     & (2)                     & (3)                     \\
  \midrule
  Intercept    & 152.00$^{*** }$         & 152.00$^{*** }$         & 1.43e08$^{*** }$        \\
               & (3.61)                  & (2.85)                  & (3.14)                  \\
  Age          & 301.00$^{*** }$         & 37.20$^{ }$             & 51.50$^{*** }$          \\
               & (77.10)                 & (64.10)                 & (2.72)                  \\
  Sex          & 17.40$^{ }$             & -107.00$^{* }$          & -33.80$^{*** }$         \\
               & (77.10)                 & (62.10)                 & (1.62)                  \\
  BMI          &                         & 787.00$^{*** }$         &                         \\
               &                         & (65.40)                 &                         \\
  ABP          &                         & 417.00$^{*** }$         &                         \\
               &                         & (69.50)                 &                         \\
  \midrule
  R$^2$        & 0.04                    & 0.40                    &                         \\
  Observations & \multicolumn{1}{c}{442} & \multicolumn{1}{c}{442} & \multicolumn{1}{c}{445} \\
  \midrule
  \textit{Note:} & \multicolumn{3}{r}{$^{***}$p$<$0.01;$^{**}$p$<$0.05;$^{*}$p$<$0.1} \\
  \bottomrule
\end{tabular}

We can get a nicer output by setting the format of the last column to, for example, S[table-format=3.2e4], via passing render_options={'column_format':'lSSS[table-format = 3.2e4]'}. The resulting table of render_latex will look like the following:

\begin{tabular}{lSSS[table-format = 3.2e4]}
  \toprule
  & \multicolumn{3}{c}{target} \\
  \cmidrule(lr){2-4}

               & (1)                     & (2)                     & (3)                     \\
  \midrule
  Intercept    & 152.00$^{*** }$         & 152.00$^{*** }$         & 1.43e08$^{*** }$        \\
               & (3.61)                  & (2.85)                  & (3.14)                  \\
  Age          & 301.00$^{*** }$         & 37.20$^{ }$             & 51.50$^{*** }$          \\
               & (77.10)                 & (64.10)                 & (2.72)                  \\
  Sex          & 17.40$^{ }$             & -107.00$^{* }$          & -33.80$^{*** }$         \\
               & (77.10)                 & (62.10)                 & (1.62)                  \\
  BMI          &                         & 787.00$^{*** }$         &                         \\
               &                         & (65.40)                 &                         \\
  ABP          &                         & 417.00$^{*** }$         &                         \\
               &                         & (69.50)                 &                         \\
  \midrule
  R$^2$        & 0.04                    & 0.40                    &                         \\
  Observations & \multicolumn{1}{c}{442} & \multicolumn{1}{c}{442} & \multicolumn{1}{c}{445} \\
  \midrule
  \textit{Note:} & \multicolumn{3}{r}{$^{***}$p$<$0.01;$^{**}$p$<$0.05;$^{*}$p$<$0.1} \\
  \bottomrule
\end{tabular}