{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# How to calculate first derivatives\n",
    "\n",
    "In this guide, we show you how to compute first derivatives with estimagic - while introducing some core concepts."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import estimagic as em\n",
    "import numpy as np\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Introduction\n",
    "\n",
    "As in the getting started section, let's lookt at the sphere function $$f(x) = x^\\top x.$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "def sphere_scalar(params):\n",
    "    return params**2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The derivative of $f$ is given by $f'(x) = 2 x$. With numerical derivatives, we have to specify the value of $x$ at which we want to compute the derivative. Let's first consider two **scalar** points $x = 0$ and $x=1$. We have $f'(0) = 0$ and $f'(1) = 2$.\n",
    "\n",
    "To compute the derivative using estimagicm we simply pass the function ``sphere`` and the ``params`` to the function ``first_derivative``:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(0.)"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fd = em.first_derivative(func=sphere_scalar, params=0)\n",
    "fd[\"derivative\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(2.)"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fd = em.first_derivative(func=sphere_scalar, params=1)\n",
    "fd[\"derivative\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice that the output of ``first_derivative`` is a dictionary containing the derivative under the key \"derivative\". We discuss the ouput in more detail below."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Gradient and Jacobian\n",
    "\n",
    "The scalar case from above extends directly to the multivariate case. Let's consider two cases: \n",
    "\n",
    "|         |                                     |\n",
    "|:--------|:------------------------------------|\n",
    "|Gradient | $f_1: \\mathbb{R}^N \\to \\mathbb{R}$  |\n",
    "|Jacobian | $f_2: \\mathbb{R}^N \\to \\mathbb{R}^M$|\n",
    "\n",
    "\n",
    "The first derivative of $f_1$ is usually referred to as the gradient, while the first derivative of $f_2$ is usually called the Jacobian."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Gradient\n",
    "\n",
    "Let's again use the sphere function, but this time with a vector input. The gradient is a 1-dimensional vector of shape (N,)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "def sphere(params):\n",
    "    return params @ params"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0., 2., 4., 6.])"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fd = em.first_derivative(sphere, params=np.arange(4))\n",
    "fd[\"derivative\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Jacobian\n",
    "\n",
    "As an example, let's now use the function\n",
    "$$f(x) = (x^\\top x) \\begin{pmatrix}1\\\\2\\\\3 \\end{pmatrix},$$\n",
    "with $f: \\mathbb{R}^N \\to \\mathbb{R}^3$. The Jacobian is a 2-dimensional object of shape (M, N), where M is the output dimension."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "def sphere_multivariate(params):\n",
    "    return (params @ params) * np.arange(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0.,  0.,  0.,  0.],\n",
       "       [ 0.,  2.,  4.,  6.],\n",
       "       [ 0.,  4.,  8., 12.]])"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fd = em.first_derivative(sphere_multivariate, params=np.arange(4))\n",
    "fd[\"derivative\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The output of ``first_derivative``\n",
    "\n",
    "As we have already seen in the introduction, the output of ``first_derivative`` is a dictionary. This dictionary **always** contains an entry \"derivative\" which is the numerical derivative. Besides this entry, several additional entries may be found, conditional on the state of certain arguments.\n",
    "\n",
    "**``return_func_value``**\n",
    "\n",
    "If the argument ``return_func_value`` is ``True``, the output dictionary will contain an additional entry under the key \"func_value\" denoting the function value evaluated at the params vector.\n",
    "\n",
    "**``return_info``**\n",
    "\n",
    "If the argument ``return_info`` is ``True``, the output dictionary will contain one to two additional entries. In this case it will always contain the entry \"func_evals\", which is a data frame containing all internally executed function evaluations. And if ``n_steps`` is larger than 1, it will also contain \"derivative_candidates\", which is a data frame containing derivative estimates used in the Richardson extrapolation.\n",
    "\n",
    "> For an explaination of the argument ``n_steps`` and the Richardson method, please see the API Reference and the Richardson Extrapolation explanation in the documentation.\n",
    "\n",
    "\n",
    "The objects returned when ``return_info`` is ``True`` are rarely of any use directly and can be safely ignored. However, they are necessary data when using the plotting function ``derivative_plot`` as explained below. For better understanding, we print each of these additional objects once:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "fd = em.first_derivative(\n",
    "    sphere_scalar, params=0, n_steps=2, return_func_value=True, return_info=True\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "assert fd[\"func_value\"] == sphere_scalar(0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>step</th>\n",
       "      <th>eval</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sign</th>\n",
       "      <th>step_number</th>\n",
       "      <th>dim_x</th>\n",
       "      <th>dim_f</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">1</th>\n",
       "      <th>0</th>\n",
       "      <th>0</th>\n",
       "      <th>0</th>\n",
       "      <td>1.490116e-09</td>\n",
       "      <td>2.220446e-18</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <th>0</th>\n",
       "      <th>0</th>\n",
       "      <td>2.980232e-09</td>\n",
       "      <td>8.881784e-18</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">-1</th>\n",
       "      <th>0</th>\n",
       "      <th>0</th>\n",
       "      <th>0</th>\n",
       "      <td>1.490116e-09</td>\n",
       "      <td>2.220446e-18</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <th>0</th>\n",
       "      <th>0</th>\n",
       "      <td>2.980232e-09</td>\n",
       "      <td>8.881784e-18</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                      step          eval\n",
       "sign step_number dim_x dim_f                            \n",
       " 1   0           0     0      1.490116e-09  2.220446e-18\n",
       "     1           0     0      2.980232e-09  8.881784e-18\n",
       "-1   0           0     0      1.490116e-09  2.220446e-18\n",
       "     1           0     0      2.980232e-09  8.881784e-18"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fd[\"func_evals\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>der</th>\n",
       "      <th>err</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>method</th>\n",
       "      <th>num_term</th>\n",
       "      <th>dim_x</th>\n",
       "      <th>dim_f</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>forward</th>\n",
       "      <th>1</th>\n",
       "      <th>0</th>\n",
       "      <th>0</th>\n",
       "      <td>4.470348e-09</td>\n",
       "      <td>8.467417e-08</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>backward</th>\n",
       "      <th>1</th>\n",
       "      <th>0</th>\n",
       "      <th>0</th>\n",
       "      <td>-4.470348e-09</td>\n",
       "      <td>8.467417e-08</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>central</th>\n",
       "      <th>1</th>\n",
       "      <th>0</th>\n",
       "      <th>0</th>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>0.000000e+00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                        der           err\n",
       "method   num_term dim_x dim_f                            \n",
       "forward  1        0     0      4.470348e-09  8.467417e-08\n",
       "backward 1        0     0     -4.470348e-09  8.467417e-08\n",
       "central  1        0     0      0.000000e+00  0.000000e+00"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fd[\"derivative_candidates\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The ``params`` argument\n",
    "\n",
    "Above we used a ``numpy.ndarray`` as the ``params`` argument. In estimagic, params can be arbitrary [pytrees](https://jax.readthedocs.io/en/latest/pytrees.html). Examples are (nested) dictionaries of numbers, arrays, and pandas objects. Let's look at a few cases."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### pandas"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>category</th>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">time_pref</th>\n",
       "      <th>delta</th>\n",
       "      <td>0.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>beta</th>\n",
       "      <td>0.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>price</th>\n",
       "      <th>price</th>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 value\n",
       "category  name        \n",
       "time_pref delta    0.9\n",
       "          beta     0.6\n",
       "price     price    2.0"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "params = pd.DataFrame(\n",
    "    [[\"time_pref\", \"delta\", 0.9], [\"time_pref\", \"beta\", 0.6], [\"price\", \"price\", 2]],\n",
    "    columns=[\"category\", \"name\", \"value\"],\n",
    ").set_index([\"category\", \"name\"])\n",
    "\n",
    "params"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "def sphere_pandas(params):\n",
    "    return params[\"value\"] @ params[\"value\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "category   name \n",
       "time_pref  delta    1.8\n",
       "           beta     1.2\n",
       "price      price    4.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fd = em.first_derivative(sphere_pandas, params)\n",
    "fd[\"derivative\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### nested dicts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'a': 0,\n",
       " 'b': 1,\n",
       " 'c': 0    2\n",
       " 1    3\n",
       " 2    4\n",
       " dtype: int64}"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "params = {\"a\": 0, \"b\": 1, \"c\": pd.Series([2, 3, 4])}\n",
    "\n",
    "params"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "def dict_sphere(params):\n",
    "    return params[\"a\"] ** 2 + params[\"b\"] ** 2 + (params[\"c\"] ** 2).sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'a': array(0.),\n",
       " 'b': array(2.),\n",
       " 'c': 0    4.0\n",
       " 1    6.0\n",
       " 2    8.0\n",
       " dtype: float64}"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fd = em.first_derivative(\n",
    "    func=dict_sphere,\n",
    "    params=params,\n",
    ")\n",
    "\n",
    "fd[\"derivative\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Description of the output\n",
    "\n",
    "The output of `first_derivative` when using a general pytree is straight-forward. Nevertheless, this explanation requires terminolgy of pytrees. Please refer to the [JAX documentation of pytrees](https://jax.readthedocs.io/en/latest/pytrees.html).\n",
    "\n",
    "The output tree of `first_derivative` has the same structure as the params tree. Equivalent to the numpy case, where the gradient is a vector of shape `(len(params),)`. If, however, the params tree contains non-scalar entries, like `numpy.ndarray`'s, `pandas.Series`', or `pandas.DataFrame`'s, the output is not expanded but a block is created instead. In the above example, the entry `params[\"c\"]` is a `pandas.Series` with 3 entries. Thus, the first derivative output contains the corresponding 3x1-block of the gradient at the position `[\"c\"]`:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Multiprocessing\n",
    "\n",
    "For slow-to-evaluate functions, one may increase computation speed by running the function evaluations in parallel. This can be easily done by setting the ``n_cores`` argument. For example, if we wish to evaluate the function on ``2`` cores, we simply write"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "fd = em.first_derivative(sphere_scalar, params=0, n_cores=2)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "estimagic",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:27:35) [Clang 14.0.6 ]"
  },
  "vscode": {
   "interpreter": {
    "hash": "e8a16b1bdcc80285313db4674a5df2a5a80c75795379c5d9f174c7c712f05b3a"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}