{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# How to generate publication quality tables\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Estimagic can create publication quality tables of parameter estimates in LaTeX or HTML. It works with the results from `estimate_ml` and `estimate_msm` but also supports statsmodels results out of the box. \n", "\n", "You can get almost limitless flexibility if you split the table generation into two steps. The fist generates a DataFrame which you can customize to your liking, the second renders that DataFrame in LaTeX or HTML. If you are interested in this feature, search for \"render_inputs\" below." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "# Make necessary imports\n", "import estimagic as em\n", "import pandas as pd\n", "import statsmodels.formula.api as sm\n", "from estimagic.config import EXAMPLE_DIR\n", "from IPython.core.display import HTML" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create tables from statsmodels results" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv(EXAMPLE_DIR / \"diabetes.csv\", index_col=0)\n", "mod1 = sm.ols(\"target ~ Age + Sex\", data=df).fit()\n", "mod2 = sm.ols(\"target ~ Age + Sex + BMI + ABP\", data=df).fit()\n", "models = [mod1, mod2]" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 target
 (1)(2)
Intercept152.00$^{*** }$152.00$^{*** }$
(3.61)(2.85)
Age301.00$^{*** }$37.20$^{ }$
(77.10)(64.10)
Sex17.40$^{ }$-107.00$^{* }$
(77.10)(62.10)
BMI787.00$^{*** }$
(65.40)
ABP417.00$^{*** }$
(69.50)
\n", "
Observations442442
R$^2$0.040.40
Adj. R$^2$0.030.40
Residual Std. Error75.9060
F Statistic8.06$^{***}$72.90$^{***}$
\n", "
Note:***p<0.01; **p<0.05; *p<0.1
" ], "text/plain": [ "" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "HTML(em.estimation_table(models, return_type=\"html\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Adding estimagic results\n", "\n", "`estimate_ml` and `estimate_msm` can both generate summaries of estimation results. Those summaries are either DataFrames with the columns `\"value\"`, `\"standard_error\"`, `\"p_value\"` and `\"stars\"` or pytrees containing such DataFrames. \n", "\n", "For examples, check out our tutorials on [`estimate_ml`](../../getting_started/first_likelihood_estimation_with_estimagic.ipynb) and [`estimate_msm`](../../getting_started/first_msm_estimation_with_estimagic.ipynb).\n", "\n", "\n", "Assume we got the following DataFrame from an estimation summary:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
valuestandard_errorp_value
Intercept142.1233.141501.000000e-08
Age51.4562.718281.000000e-08
Sex-33.7891.618001.000000e-08
\n", "
" ], "text/plain": [ " value standard_error p_value\n", "Intercept 142.123 3.14150 1.000000e-08\n", "Age 51.456 2.71828 1.000000e-08\n", "Sex -33.789 1.61800 1.000000e-08" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "params = pd.DataFrame(\n", " {\n", " \"value\": [142.123, 51.456, -33.789],\n", " \"standard_error\": [3.1415, 2.71828, 1.6180],\n", " \"p_value\": [1e-8] * 3,\n", " },\n", " index=[\"Intercept\", \"Age\", \"Sex\"],\n", ")\n", "params" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can either use just the params DataFrame or a dictionary containing \"params\" and additional information in `estimation_table`." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "mod3 = {\"params\": params, \"name\": \"target\", \"info\": {\"n_obs\": 445}}\n", "models = [mod1, mod2, mod3]" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 target
 (1)(2)(3)
Intercept152.00$^{*** }$152.00$^{*** }$142.00$^{*** }$
(3.61)(2.85)(3.14)
Age301.00$^{*** }$37.20$^{ }$51.50$^{*** }$
(77.10)(64.10)(2.72)
Sex17.40$^{ }$-107.00$^{* }$-33.80$^{*** }$
(77.10)(62.10)(1.62)
BMI787.00$^{*** }$
(65.40)
ABP417.00$^{*** }$
(69.50)
\n", "
Observations442442445
R$^2$0.040.40
Adj. R$^2$0.030.40
Residual Std. Error75.9060
F Statistic8.06$^{***}$72.90$^{***}$
\n", "
Note:***p<0.01; **p<0.05; *p<0.1
" ], "text/plain": [ "" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "HTML(em.estimation_table(models, return_type=\"html\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Selecting the right return_type\n", "\n", "The following return types are supported:\n", "- `\"latex\"`: Returns a string that you can save and import into a LaTeX document\n", "- `\"html\"`: Returns a string that you can save and import into a HTML document.\n", "- `\"render_inputs\"`: Returns a dictionary with the following entries:\n", " - `\"body\"`: A DataFrame containing the main table\n", " - `\"footer\"`: A DataFrame containing the statisics\n", " - other stuff that you should ignore\n", "- `\"dataframe\"`: Returns a DataFrame you can look at in a notebook" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Use `render_inputs` for maximum flexibility\n", "\n", "As an example, let's assume we want to remove a few rows from the footer.\n", "\n", "Let's first look at the footer we get from `estimation_table`" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
target
(1)(2)(3)
Observations442442445
R$^2$0.040.40
Adj. R$^2$0.030.40
Residual Std. Error75.9060
F Statistic8.06$^{***}$72.90$^{***}$
\n", "
" ], "text/plain": [ " target \n", " (1) (2) (3)\n", "Observations 442 442 445\n", "R$^2$ 0.04 0.40 \n", "Adj. R$^2$ 0.03 0.40 \n", "Residual Std. Error 75.90 60 \n", "F Statistic 8.06$^{***}$ 72.90$^{***}$ " ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "render_inputs = em.estimation_table(models, return_type=\"render_inputs\")\n", "footer = render_inputs[\"footer\"]\n", "footer" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can remove the rows we don't need and render it to html. " ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 target
 (1)(2)(3)
Intercept152.00$^{*** }$152.00$^{*** }$142.00$^{*** }$
(3.61)(2.85)(3.14)
Age301.00$^{*** }$37.20$^{ }$51.50$^{*** }$
(77.10)(64.10)(2.72)
Sex17.40$^{ }$-107.00$^{* }$-33.80$^{*** }$
(77.10)(62.10)(1.62)
BMI787.00$^{*** }$
(65.40)
ABP417.00$^{*** }$
(69.50)
\n", "
R$^2$0.040.40
Observations442442445
\n", "
Note:***p<0.01; **p<0.05; *p<0.1
" ], "text/plain": [ "" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "render_inputs[\"footer\"] = footer.loc[[\"R$^2$\", \"Observations\"]]\n", "HTML(em.render_html(**render_inputs))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Using this 2-step-procedure, we can also easily add additional rows to the footer.\n", "\n", "Note that we add the row using `.loc[(\"Statsmodels\", )]` since the index of `render_inputs[\"footer\"]` is a MultiIndex.\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 target
 (1)(2)(3)
Intercept152.00$^{*** }$152.00$^{*** }$142.00$^{*** }$
(3.61)(2.85)(3.14)
Age301.00$^{*** }$37.20$^{ }$51.50$^{*** }$
(77.10)(64.10)(2.72)
Sex17.40$^{ }$-107.00$^{* }$-33.80$^{*** }$
(77.10)(62.10)(1.62)
BMI787.00$^{*** }$
(65.40)
ABP417.00$^{*** }$
(69.50)
\n", "
R$^2$0.040.40
Observations442442445
StatsmodelsYesYesNo
\n", "
Note:***p<0.01; **p<0.05; *p<0.1
" ], "text/plain": [ "" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "render_inputs[\"footer\"].loc[(\"Statsmodels\",)] = [\"Yes\"] * 2 + [\"No\"]\n", "HTML(em.render_html(**render_inputs))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Advanced options \n", "\n", "Below is an exmample that demonstrates how to use advanced options to customize your table." ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "stats_dict = {\n", " \"n_obs\": \"Observations\",\n", " \"rsquared\": \"R$^2$\",\n", " \"rsquared_adj\": \"Adj. R$^2$\",\n", " \"resid_std_err\": \"Residual Std. Error\",\n", " \"fvalue\": \"F Statistic\",\n", " \"show_dof\": True,\n", "}" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Table Latex(render_latex(**render_inputs))Title
 Dependent variable: target
 Model 1Model 2Model 3
Constant152.133$^{*** }$152.133$^{*** }$142.123$^{*** }$
(3.610)(2.853)(3.142)
Age301.161$^{*** }$37.241$^{ }$51.456$^{*** }$
(77.060)(64.117)(2.718)
Gender17.392$^{ }$-106.578$^{* }$-33.789$^{*** }$
(77.060)(62.125)(1.618)
BMI787.179$^{*** }$
(65.424)
ABP416.674$^{*** }$
(69.495)
\n", "
Observations442442445
R$^2$0.0350.400
Adj. R$^2$0.0310.395
Residual Std. Error75.888(df=439)59.976(df=437)
F Statistic8.059$^{***}$(df=2;439)72.913$^{***}$(df=4;437)
\n", "
Note:***p<0.01; **p<0.05; *p<0.1
" ], "text/plain": [ "" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "HTML(\n", " em.estimation_table(\n", " models=models,\n", " return_type=\"html\",\n", " custom_param_names={\"Intercept\": \"Constant\", \"Sex\": \"Gender\"},\n", " custom_col_names=[\"Model 1\", \"Model 2\", \"Model 3\"],\n", " custom_col_groups={\"target\": \"Dependent variable: target\"},\n", " render_options={\"caption\": \"Table Latex(render_latex(**render_inputs))Title\"},\n", " stats_options=stats_dict,\n", " number_format=\"{0:.3f}\",\n", " )\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "***Note 1***: You can pass a dictionary for `custom_col_names` to rename specific columns, e.g. `custom_col_names={\"(1)\": \"Model 1\"}`, leaving names of the other columns at default values.\n", "\n", "***Note 2***: In addition to renaming the default column groups by passing a dictionary for `custom_col_groups`, you can also pass a list to create custom column groups, e.g. `custom_col_groups=[\"target\", \"target\", \"not target\"]` will group the first two columns under the name `\"target\"`, and the last column under the name `\"not target\"`.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## LaTeX peculiarities\n", "\n", "By default, tables in `render_latex` are structured in compliance with `siunitx` package. This is done by setting column formats to `S` in the default rendering options defined internally. \n", "To get nicely formatted tables, you need to add the following to your LaTeX preamble:\n", "```latex\n", "\\usepackage{siunitx}\n", "\\sisetup{\n", " input-symbols = (),\n", " table-align-text-post = false\n", " group-digits = false,\n", " }\n", "```\n", "The first line in `\\sisetup` is necessary if you have parentheses in your table cells (e.g. when displaying standard errors or confidence intervals), otherwise LaTex will raise an error.\n", "\n", "The second argument is necessary so that there is no spacing between the significance stars and the numerical values.\n", "\n", "The third line prevents digits in numbers being grouped into groups of threes, which is the default behaviour.\n", "This line is optional, but recommended.\n", "\n", "By default, whenever calling `render_latex`, a warning will be raised about this. To silence the warning, set `siunitx_warning=False` in the relvant function calls (when calling `estimation_table` with `return_type=tex` or when calling `render_latex`)\n", "\n", "If you don't want to generate `siunitx` style tables, you can pass `render_options={\"column_format\":}` to your function calls. \n", "\n", "You can influence the format of the output table with keyword arguments passed via `render_options`. For the list of supported keyword arguments see [the documentation of pandas.io.formats.style.Styler.to_latex](https://pandas.pydata.org/docs/reference/api/pandas.io.formats.style.Styler.to_latex.html)\n", "\n", "\n", "\n", "By default, `siunitx` will center table columns around the decimal point. This means, that if there is a number in a column that has many comparatively larger number of symbols after the decimal point (e.g. when there is a number with scientific notation), there will be extra spacing between that column and the preceeding one, since there is as much space reserved for the column before the decimal point, as there is after it. \n", "\n", "You can adjust the spacing between columns, by using the format `S[table-format =x.y]` for the numeric columns, where `x` and `y` control the space pre and post the decimal point, respecitvely. We further show a case with the described problem and the solution to that problem. For number with scientific notations, use `S[table-format=x.yez]`, where `y` reserves the space for the exponential, and `z` reserves the space for the column after the decimal point.\n", "\n", "Compiling the following LaTex table will result in extra spacing between columns `(2)` and `(3)`:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```latex\n", "\n", "\\begin{tabular}{lSSS}\n", " \\toprule\n", " & \\multicolumn{3}{c}{target} \\\\\n", " \\cmidrule(lr){2-4}\n", "\n", " & (1) & (2) & (3) \\\\\n", " \\midrule\n", " Intercept & 152.00$^{*** }$ & 152.00$^{*** }$ & 1.43e08$^{*** }$ \\\\\n", " & (3.61) & (2.85) & (3.14) \\\\\n", " Age & 301.00$^{*** }$ & 37.20$^{ }$ & 51.50$^{*** }$ \\\\\n", " & (77.10) & (64.10) & (2.72) \\\\\n", " Sex & 17.40$^{ }$ & -107.00$^{* }$ & -33.80$^{*** }$ \\\\\n", " & (77.10) & (62.10) & (1.62) \\\\\n", " BMI & & 787.00$^{*** }$ & \\\\\n", " & & (65.40) & \\\\\n", " ABP & & 417.00$^{*** }$ & \\\\\n", " & & (69.50) & \\\\\n", " \\midrule\n", " R$^2$ & 0.04 & 0.40 & \\\\\n", " Observations & \\multicolumn{1}{c}{442} & \\multicolumn{1}{c}{442} & \\multicolumn{1}{c}{445} \\\\\n", " \\midrule\n", " \\textit{Note:} & \\multicolumn{3}{r}{$^{***}$p$<$0.01;$^{**}$p$<$0.05;$^{*}$p$<$0.1} \\\\\n", " \\bottomrule\n", "\\end{tabular}\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can get a nicer output by setting the format of the last column to, for example, `S[table-format=3.2e4]`, via passing `render_options={'column_format':'lSSS[table-format = 3.2e4]'}`. The resulting table of `render_latex` will look like the following:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```latex\n", "\n", "\\begin{tabular}{lSSS[table-format = 3.2e4]}\n", " \\toprule\n", " & \\multicolumn{3}{c}{target} \\\\\n", " \\cmidrule(lr){2-4}\n", "\n", " & (1) & (2) & (3) \\\\\n", " \\midrule\n", " Intercept & 152.00$^{*** }$ & 152.00$^{*** }$ & 1.43e08$^{*** }$ \\\\\n", " & (3.61) & (2.85) & (3.14) \\\\\n", " Age & 301.00$^{*** }$ & 37.20$^{ }$ & 51.50$^{*** }$ \\\\\n", " & (77.10) & (64.10) & (2.72) \\\\\n", " Sex & 17.40$^{ }$ & -107.00$^{* }$ & -33.80$^{*** }$ \\\\\n", " & (77.10) & (62.10) & (1.62) \\\\\n", " BMI & & 787.00$^{*** }$ & \\\\\n", " & & (65.40) & \\\\\n", " ABP & & 417.00$^{*** }$ & \\\\\n", " & & (69.50) & \\\\\n", " \\midrule\n", " R$^2$ & 0.04 & 0.40 & \\\\\n", " Observations & \\multicolumn{1}{c}{442} & \\multicolumn{1}{c}{442} & \\multicolumn{1}{c}{445} \\\\\n", " \\midrule\n", " \\textit{Note:} & \\multicolumn{3}{r}{$^{***}$p$<$0.01;$^{**}$p$<$0.05;$^{*}$p$<$0.1} \\\\\n", " \\bottomrule\n", "\\end{tabular}\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] } ], "metadata": { "@webio": { "lastCommId": null, "lastKernelId": null }, "interpreter": { "hash": "5cdb9867252288f10687117449de6ad870b49795ca695c868016dc0022895cce" }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.10" } }, "nbformat": 4, "nbformat_minor": 4 }