How to generate publication quality tables#

Estimagic can create publication quality tables of parameter estimates in LaTeX or HTML. It works with the results from estimate_ml and estimate_msm but also supports statsmodels results out of the box.

You can get almost limitless flexibility if you split the table generation into two steps. The fist generates a DataFrame which you can customize to your liking, the second renders that DataFrame in LaTeX or HTML. If you are interested in this feature, search for “render_inputs” below.

# Make necessary imports
import estimagic as em
import pandas as pd
import statsmodels.formula.api as sm
from estimagic.config import EXAMPLE_DIR
from IPython.core.display import HTML

Create tables from statsmodels results#

df = pd.read_csv(EXAMPLE_DIR / "diabetes.csv", index_col=0)
mod1 = sm.ols("target ~ Age + Sex", data=df).fit()
mod2 = sm.ols("target ~ Age + Sex + BMI + ABP", data=df).fit()
models = [mod1, mod2]

HTML(em.estimation_table(models, return_type="html"))

	target
	(1)	(2)
Intercept	152.00$^{*** }$	152.00$^{*** }$
	(3.61)	(2.85)
Age	301.00$^{*** }$	37.20$^{ }$
	(77.10)	(64.10)
Sex	17.40$^{ }$	-107.00$^{* }$
	(77.10)	(62.10)
BMI		787.00$^{*** }$
		(65.40)
ABP		417.00$^{*** }$
		(69.50)

Observations	442	442
R$^2$	0.04	0.40
Adj. R$^2$	0.03	0.40
Residual Std. Error	75.90	60
F Statistic	8.06$^{***}$	72.90$^{***}$

Note:	^*p<0.01; ^p<0.05; ^*p<0.1

Adding estimagic results#

estimate_ml and estimate_msm can both generate summaries of estimation results. Those summaries are either DataFrames with the columns "value", "standard_error", "p_value" and "stars" or pytrees containing such DataFrames.

For examples, check out our tutorials on estimate_ml and estimate_msm.

Assume we got the following DataFrame from an estimation summary:

params = pd.DataFrame(
    {
        "value": [142.123, 51.456, -33.789],
        "standard_error": [3.1415, 2.71828, 1.6180],
        "p_value": [1e-8] * 3,
    },
    index=["Intercept", "Age", "Sex"],
)
params

	value	standard_error	p_value
Intercept	142.123	3.14150	1.000000e-08
Age	51.456	2.71828	1.000000e-08
Sex	-33.789	1.61800	1.000000e-08

You can either use just the params DataFrame or a dictionary containing “params” and additional information in estimation_table.

mod3 = {"params": params, "name": "target", "info": {"n_obs": 445}}
models = [mod1, mod2, mod3]

HTML(em.estimation_table(models, return_type="html"))

	target
	(1)	(2)	(3)
Intercept	152.00$^{*** }$	152.00$^{*** }$	142.00$^{*** }$
	(3.61)	(2.85)	(3.14)
Age	301.00$^{*** }$	37.20$^{ }$	51.50$^{*** }$
	(77.10)	(64.10)	(2.72)
Sex	17.40$^{ }$	-107.00$^{* }$	-33.80$^{*** }$
	(77.10)	(62.10)	(1.62)
BMI		787.00$^{*** }$
		(65.40)
ABP		417.00$^{*** }$
		(69.50)

Observations	442	442	445
R$^2$	0.04	0.40
Adj. R$^2$	0.03	0.40
Residual Std. Error	75.90	60
F Statistic	8.06$^{***}$	72.90$^{***}$

Note:	^*p<0.01; ^p<0.05; ^*p<0.1

Selecting the right return_type#

The following return types are supported:

"latex": Returns a string that you can save and import into a LaTeX document
"html": Returns a string that you can save and import into a HTML document.
"render_inputs": Returns a dictionary with the following entries:
- "body": A DataFrame containing the main table
- "footer": A DataFrame containing the statisics
- other stuff that you should ignore
"dataframe": Returns a DataFrame you can look at in a notebook

Use `render_inputs` for maximum flexibility#

As an example, let’s assume we want to remove a few rows from the footer.

Let’s first look at the footer we get from estimation_table

render_inputs = em.estimation_table(models, return_type="render_inputs")
footer = render_inputs["footer"]
footer

	target
	(1)	(2)	(3)
Observations	442	442	445
R$^2$	0.04	0.40
Adj. R$^2$	0.03	0.40
Residual Std. Error	75.90	60
F Statistic	8.06$^{***}$	72.90$^{***}$

Now we can remove the rows we don’t need and render it to html.

render_inputs["footer"] = footer.loc[["R$^2$", "Observations"]]
HTML(em.render_html(**render_inputs))

	target
	(1)	(2)	(3)
Intercept	152.00$^{*** }$	152.00$^{*** }$	142.00$^{*** }$
	(3.61)	(2.85)	(3.14)
Age	301.00$^{*** }$	37.20$^{ }$	51.50$^{*** }$
	(77.10)	(64.10)	(2.72)
Sex	17.40$^{ }$	-107.00$^{* }$	-33.80$^{*** }$
	(77.10)	(62.10)	(1.62)
BMI		787.00$^{*** }$
		(65.40)
ABP		417.00$^{*** }$
		(69.50)

R$^2$	0.04	0.40
Observations	442	442	445

Note:	^*p<0.01; ^p<0.05; ^*p<0.1

Using this 2-step-procedure, we can also easily add additional rows to the footer.

Note that we add the row using .loc[("Statsmodels", )] since the index of render_inputs["footer"] is a MultiIndex.

render_inputs["footer"].loc[("Statsmodels",)] = ["Yes"] * 2 + ["No"]
HTML(em.render_html(**render_inputs))

	target
	(1)	(2)	(3)
Intercept	152.00$^{*** }$	152.00$^{*** }$	142.00$^{*** }$
	(3.61)	(2.85)	(3.14)
Age	301.00$^{*** }$	37.20$^{ }$	51.50$^{*** }$
	(77.10)	(64.10)	(2.72)
Sex	17.40$^{ }$	-107.00$^{* }$	-33.80$^{*** }$
	(77.10)	(62.10)	(1.62)
BMI		787.00$^{*** }$
		(65.40)
ABP		417.00$^{*** }$
		(69.50)

R$^2$	0.04	0.40
Observations	442	442	445
Statsmodels	Yes	Yes	No

Note:	^*p<0.01; ^p<0.05; ^*p<0.1

Advanced options#

Below is an exmample that demonstrates how to use advanced options to customize your table.

stats_dict = {
    "n_obs": "Observations",
    "rsquared": "R$^2$",
    "rsquared_adj": "Adj. R$^2$",
    "resid_std_err": "Residual Std. Error",
    "fvalue": "F Statistic",
    "show_dof": True,
}

HTML(
    em.estimation_table(
        models=models,
        return_type="html",
        custom_param_names={"Intercept": "Constant", "Sex": "Gender"},
        custom_col_names=["Model 1", "Model 2", "Model 3"],
        custom_col_groups={"target": "Dependent variable: target"},
        render_options={"caption": "Table Latex(render_latex(**render_inputs))Title"},
        stats_options=stats_dict,
        number_format="{0:.3f}",
    )
)

Table Latex(render_latex(**render_inputs))Title
	Dependent variable: target
	Model 1	Model 2	Model 3
Constant	152.133$^{*** }$	152.133$^{*** }$	142.123$^{*** }$
	(3.610)	(2.853)	(3.142)
Age	301.161$^{*** }$	37.241$^{ }$	51.456$^{*** }$
	(77.060)	(64.117)	(2.718)
Gender	17.392$^{ }$	-106.578$^{* }$	-33.789$^{*** }$
	(77.060)	(62.125)	(1.618)
BMI		787.179$^{*** }$
		(65.424)
ABP		416.674$^{*** }$
		(69.495)

Observations	442	442	445
R$^2$	0.035	0.400
Adj. R$^2$	0.031	0.395
Residual Std. Error	75.888(df=439)	59.976(df=437)
F Statistic	8.059$^{***}$(df=2;439)	72.913$^{***}$(df=4;437)

Note:	^*p<0.01; ^p<0.05; ^*p<0.1

Note 1: You can pass a dictionary for custom_col_names to rename specific columns, e.g. custom_col_names={"(1)": "Model 1"}, leaving names of the other columns at default values.

Note 2: In addition to renaming the default column groups by passing a dictionary for custom_col_groups, you can also pass a list to create custom column groups, e.g. custom_col_groups=["target", "target", "not target"] will group the first two columns under the name "target", and the last column under the name "not target".

LaTeX peculiarities#

By default, tables in render_latex are structured in compliance with siunitx package. This is done by setting column formats to S in the default rendering options defined internally. To get nicely formatted tables, you need to add the following to your LaTeX preamble:

\usepackage{siunitx}
\sisetup{
        input-symbols            = (),
        table-align-text-post    = false
        group-digits             = false,
    }

The first line in \sisetup is necessary if you have parentheses in your table cells (e.g. when displaying standard errors or confidence intervals), otherwise LaTex will raise an error.

The second argument is necessary so that there is no spacing between the significance stars and the numerical values.

The third line prevents digits in numbers being grouped into groups of threes, which is the default behaviour. This line is optional, but recommended.

By default, whenever calling render_latex, a warning will be raised about this. To silence the warning, set siunitx_warning=False in the relvant function calls (when calling estimation_table with return_type=tex or when calling render_latex)

If you don’t want to generate siunitx style tables, you can pass render_options={"column_format":<desired formats>} to your function calls.

You can influence the format of the output table with keyword arguments passed via render_options. For the list of supported keyword arguments see the documentation of pandas.io.formats.style.Styler.to_latex

By default, siunitx will center table columns around the decimal point. This means, that if there is a number in a column that has many comparatively larger number of symbols after the decimal point (e.g. when there is a number with scientific notation), there will be extra spacing between that column and the preceeding one, since there is as much space reserved for the column before the decimal point, as there is after it.

You can adjust the spacing between columns, by using the format S[table-format =x.y] for the numeric columns, where x and y control the space pre and post the decimal point, respecitvely. We further show a case with the described problem and the solution to that problem. For number with scientific notations, use S[table-format=x.yez], where y reserves the space for the exponential, and z reserves the space for the column after the decimal point.

Compiling the following LaTex table will result in extra spacing between columns (2) and (3):

\begin{tabular}{lSSS}
  \toprule
  & \multicolumn{3}{c}{target} \\
  \cmidrule(lr){2-4}

               & (1)                     & (2)                     & (3)                     \\
  \midrule
  Intercept    & 152.00$^{*** }$         & 152.00$^{*** }$         & 1.43e08$^{*** }$        \\
               & (3.61)                  & (2.85)                  & (3.14)                  \\
  Age          & 301.00$^{*** }$         & 37.20$^{ }$             & 51.50$^{*** }$          \\
               & (77.10)                 & (64.10)                 & (2.72)                  \\
  Sex          & 17.40$^{ }$             & -107.00$^{* }$          & -33.80$^{*** }$         \\
               & (77.10)                 & (62.10)                 & (1.62)                  \\
  BMI          &                         & 787.00$^{*** }$         &                         \\
               &                         & (65.40)                 &                         \\
  ABP          &                         & 417.00$^{*** }$         &                         \\
               &                         & (69.50)                 &                         \\
  \midrule
  R$^2$        & 0.04                    & 0.40                    &                         \\
  Observations & \multicolumn{1}{c}{442} & \multicolumn{1}{c}{442} & \multicolumn{1}{c}{445} \\
  \midrule
  \textit{Note:} & \multicolumn{3}{r}{$^{***}$p$<$0.01;$^{**}$p$<$0.05;$^{*}$p$<$0.1} \\
  \bottomrule
\end{tabular}

We can get a nicer output by setting the format of the last column to, for example, S[table-format=3.2e4], via passing render_options={'column_format':'lSSS[table-format = 3.2e4]'}. The resulting table of render_latex will look like the following:

\begin{tabular}{lSSS[table-format = 3.2e4]}
  \toprule
  & \multicolumn{3}{c}{target} \\
  \cmidrule(lr){2-4}

               & (1)                     & (2)                     & (3)                     \\
  \midrule
  Intercept    & 152.00$^{*** }$         & 152.00$^{*** }$         & 1.43e08$^{*** }$        \\
               & (3.61)                  & (2.85)                  & (3.14)                  \\
  Age          & 301.00$^{*** }$         & 37.20$^{ }$             & 51.50$^{*** }$          \\
               & (77.10)                 & (64.10)                 & (2.72)                  \\
  Sex          & 17.40$^{ }$             & -107.00$^{* }$          & -33.80$^{*** }$         \\
               & (77.10)                 & (62.10)                 & (1.62)                  \\
  BMI          &                         & 787.00$^{*** }$         &                         \\
               &                         & (65.40)                 &                         \\
  ABP          &                         & 417.00$^{*** }$         &                         \\
               &                         & (69.50)                 &                         \\
  \midrule
  R$^2$        & 0.04                    & 0.40                    &                         \\
  Observations & \multicolumn{1}{c}{442} & \multicolumn{1}{c}{442} & \multicolumn{1}{c}{445} \\
  \midrule
  \textit{Note:} & \multicolumn{3}{r}{$^{***}$p$<$0.01;$^{**}$p$<$0.05;$^{*}$p$<$0.1} \\
  \bottomrule
\end{tabular}

Previous topic

Next topic

How to generate publication quality tables#

Create tables from statsmodels results#

Adding estimagic results#

Selecting the right return_type#

Use `render_inputs` for maximum flexibility#

Advanced options#

LaTeX peculiarities#

Previous topic

Next topic

How to generate publication quality tables#

Create tables from statsmodels results#

Adding estimagic results#

Selecting the right return_type#

Use render_inputs for maximum flexibility#

Advanced options#

LaTeX peculiarities#

Use `render_inputs` for maximum flexibility#