{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "private-handle",
   "metadata": {},
   "source": [
    "# Method of Simulated Moments (MSM)\n",
    "\n",
    "This tutorial shows you how to do a Method of Simulated Moments estimation in estimagic. The Method of Simulated Moments (MSM) is a nonlinear estimation principle that is very useful for fitting complicated models to the data. The only ingredient required is a function that simulates the model outcomes you observe in some empirical dataset. \n",
    "\n",
    "In the tutorial here, we will use a simple linear regression model. This is the same model which we use in the tutorial on maximum likelihood estimation.\n",
    "\n",
    "Throughout the tutorial, we only talk about MSM estimation. However, the more general case of indirect inference estimation works exactly the same way. \n",
    "\n",
    "\n",
    "## Steps of MSM estimation\n",
    "\n",
    "1. Load (simulate) empirical data \n",
    "2. Define a function to calculate estimation moments on the data \n",
    "3. Calculate the covariance matrix of the empirical moments (with ``get_moments_cov``)\n",
    "4. Define a function to simulate moments from the model \n",
    "5. Estimate the model, calculate standard errors, do sensitivity analysis (with ``estimate_msm``)\n",
    "\n",
    "## Example: Estimate the parameters of a regression model\n",
    "\n",
    "The model we consider here is a simple regression model with only one explanatory variable (plus a constant). The goal is to estimate the slope coefficients and the error variance from a simulated data set.\n",
    "\n",
    "The estimation mechanics are exactly the same for more complicated models. A model is always defined by a function that can take parameters (here: the mean, variance and lower_cutoff and upper_cutoff) and returns a number of simulated moments (mean, variance, soft_min and soft_max of simulated exam points).\n",
    "\n",
    "### Model:\n",
    "\n",
    "$$ y = \\beta_0 + \\beta_1 x + \\epsilon, \\text{ where } \\epsilon \\sim N(0, \\sigma^2)$$\n",
    "\n",
    "We aim to estimate $\\beta_0, \\beta_1, \\sigma^2$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dirty-slovakia",
   "metadata": {},
   "outputs": [],
   "source": [
    "import estimagic as em\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "rng = np.random.default_rng(seed=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "annoying-guard",
   "metadata": {},
   "source": [
    "## 1. Simulate data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fdaf1542",
   "metadata": {},
   "outputs": [],
   "source": [
    "def simulate_data(params, n_draws, rng):\n",
    "    x = rng.normal(0, 1, size=n_draws)\n",
    "    e = rng.normal(0, params.loc[\"sd\", \"value\"], size=n_draws)\n",
    "    y = params.loc[\"intercept\", \"value\"] + params.loc[\"slope\", \"value\"] * x + e\n",
    "    return pd.DataFrame({\"y\": y, \"x\": x})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f965ccdc",
   "metadata": {},
   "outputs": [],
   "source": [
    "true_params = pd.DataFrame(\n",
    "    data=[[2, -np.inf], [-1, -np.inf], [1, 1e-10]],\n",
    "    columns=[\"value\", \"lower_bound\"],\n",
    "    index=[\"intercept\", \"slope\", \"sd\"],\n",
    ")\n",
    "\n",
    "data = simulate_data(true_params, n_draws=100, rng=rng)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "20a94f52",
   "metadata": {},
   "source": [
    "## 2. Calculate Moments"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "diverse-validation",
   "metadata": {},
   "outputs": [],
   "source": [
    "def calculate_moments(sample):\n",
    "    moments = {\n",
    "        \"y_mean\": sample[\"y\"].mean(),\n",
    "        \"x_mean\": sample[\"x\"].mean(),\n",
    "        \"yx_mean\": (sample[\"y\"] * sample[\"x\"]).mean(),\n",
    "        \"y_sqrd_mean\": (sample[\"y\"] ** 2).mean(),\n",
    "        \"x_sqrd_mean\": (sample[\"x\"] ** 2).mean(),\n",
    "    }\n",
    "    return pd.Series(moments)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "short-flood",
   "metadata": {},
   "outputs": [],
   "source": [
    "empirical_moments = calculate_moments(data)\n",
    "empirical_moments"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "italic-baptist",
   "metadata": {},
   "source": [
    "## 3. Calculate the covariance matrix of empirical moments\n",
    "\n",
    "The covariance matrix of the empirical moments (``moments_cov``) is needed for three things:\n",
    "1. to calculate the weighting matrix\n",
    "2. to calculate standard errors\n",
    "3. to calculate sensitivity measures\n",
    "\n",
    "We will calculate ``moments_cov`` via a bootstrap. Depending on your problem, there can be other ways to calculate the covariance matrix."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "rocky-willow",
   "metadata": {},
   "outputs": [],
   "source": [
    "moments_cov = em.get_moments_cov(\n",
    "    data, calculate_moments, bootstrap_kwargs={\"n_draws\": 5_000, \"seed\": 0}\n",
    ")\n",
    "\n",
    "moments_cov"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "hearing-dairy",
   "metadata": {},
   "source": [
    "``get_moments_cov`` mainly just calls estimagic's bootstrap function. See our [bootstrap_tutorial](../../how_to_guides/inference/how_to_do_bootstrap_inference.ipynb) for background information. \n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "worldwide-whole",
   "metadata": {},
   "source": [
    "## 4. Define a function to calculate simulated moments\n",
    "\n",
    "In a real world application, this is the step that would take most of the time. However, in our very simple example, all the work is already done by numpy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "creative-pittsburgh",
   "metadata": {},
   "outputs": [],
   "source": [
    "def simulate_moments(params, n_draws=10_000, seed=0):\n",
    "    rng = np.random.default_rng(seed)\n",
    "    sim_data = simulate_data(params, n_draws, rng)\n",
    "    sim_moments = calculate_moments(sim_data)\n",
    "    return sim_moments"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "casual-stream",
   "metadata": {},
   "outputs": [],
   "source": [
    "simulate_moments(true_params)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "sustainable-collectible",
   "metadata": {},
   "source": [
    "## 5. Estimate the model parameters\n",
    "\n",
    "Estimating a model consists of the following steps:\n",
    "\n",
    "- Building a criterion function that measures a distance between simulated and empirical moments\n",
    "- Minimizing this criterion function\n",
    "- Calculating the Jacobian of the model\n",
    "- Calculating standard errors, confidence intervals and p-values\n",
    "- Calculating sensitivity measures\n",
    "\n",
    "This can all be done in one go with the ``estimate_msm`` function. This function has sensible default values, so you only need a minimum number of inputs. However, you can configure almost any aspect of the workflow via optional arguments. If you need even more control, you can call the lower level functions, which the now famliliar``estimate_msm`` function is built on, directly. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "finite-david",
   "metadata": {},
   "outputs": [],
   "source": [
    "start_params = true_params.assign(value=[100, 100, 100])\n",
    "\n",
    "res = em.estimate_msm(\n",
    "    simulate_moments,\n",
    "    empirical_moments,\n",
    "    moments_cov,\n",
    "    start_params,\n",
    "    optimize_options=\"scipy_lbfgsb\",\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "outside-volleyball",
   "metadata": {},
   "outputs": [],
   "source": [
    "res.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "incident-government",
   "metadata": {},
   "source": [
    "## What's in the result?\n",
    "\n",
    "`MomentsResult` objects provide attributes and methods to calculate standard errors, confidence intervals and p-values. For all three, several methods are available. You can even calculate cluster robust standard errors.\n",
    "\n",
    "A few examples are:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "caring-scale",
   "metadata": {},
   "outputs": [],
   "source": [
    "res.params"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9fc88986",
   "metadata": {},
   "outputs": [],
   "source": [
    "res.cov(method=\"robust\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d7dbe79c",
   "metadata": {},
   "outputs": [],
   "source": [
    "res.se()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "blind-tractor",
   "metadata": {},
   "source": [
    "## How to visualize sensitivity measures?\n",
    "\n",
    "For more background on the sensitivity measures and their interpretation, check out the [how to guide](../../how_to_guides/miscellaneous/how_to_visualize_and_interpret_sensitivity_measures.ipynb) on sensitivity measures. \n",
    "\n",
    "Here, we just show you how to plot them:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fleet-qatar",
   "metadata": {},
   "outputs": [],
   "source": [
    "from estimagic.visualization.lollipop_plot import lollipop_plot  # noqa: E402\n",
    "\n",
    "sensitivity_data = res.sensitivity(kind=\"bias\").abs().T\n",
    "\n",
    "fig = lollipop_plot(sensitivity_data)\n",
    "\n",
    "fig = fig.update_layout(height=500, width=900)\n",
    "fig.show(renderer=\"png\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "estimagic",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:27:35) [Clang 14.0.6 ]"
  },
  "vscode": {
   "interpreter": {
    "hash": "e8a16b1bdcc80285313db4674a5df2a5a80c75795379c5d9f174c7c712f05b3a"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}