{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# How to calculate first derivatives\n",
"\n",
"In this guide, we show you how to compute first derivatives with estimagic - while introducing some core concepts."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import estimagic as em\n",
"import numpy as np\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction\n",
"\n",
"As in the getting started section, let's lookt at the sphere function $$f(x) = x^\\top x.$$"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"def sphere_scalar(params):\n",
" return params**2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The derivative of $f$ is given by $f'(x) = 2 x$. With numerical derivatives, we have to specify the value of $x$ at which we want to compute the derivative. Let's first consider two **scalar** points $x = 0$ and $x=1$. We have $f'(0) = 0$ and $f'(1) = 2$.\n",
"\n",
"To compute the derivative using estimagicm we simply pass the function ``sphere`` and the ``params`` to the function ``first_derivative``:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(0.)"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fd = em.first_derivative(func=sphere_scalar, params=0)\n",
"fd[\"derivative\"]"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(2.)"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fd = em.first_derivative(func=sphere_scalar, params=1)\n",
"fd[\"derivative\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notice that the output of ``first_derivative`` is a dictionary containing the derivative under the key \"derivative\". We discuss the ouput in more detail below."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Gradient and Jacobian\n",
"\n",
"The scalar case from above extends directly to the multivariate case. Let's consider two cases: \n",
"\n",
"| | |\n",
"|:--------|:------------------------------------|\n",
"|Gradient | $f_1: \\mathbb{R}^N \\to \\mathbb{R}$ |\n",
"|Jacobian | $f_2: \\mathbb{R}^N \\to \\mathbb{R}^M$|\n",
"\n",
"\n",
"The first derivative of $f_1$ is usually referred to as the gradient, while the first derivative of $f_2$ is usually called the Jacobian."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Gradient\n",
"\n",
"Let's again use the sphere function, but this time with a vector input. The gradient is a 1-dimensional vector of shape (N,)."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"def sphere(params):\n",
" return params @ params"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0., 2., 4., 6.])"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fd = em.first_derivative(sphere, params=np.arange(4))\n",
"fd[\"derivative\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Jacobian\n",
"\n",
"As an example, let's now use the function\n",
"$$f(x) = (x^\\top x) \\begin{pmatrix}1\\\\2\\\\3 \\end{pmatrix},$$\n",
"with $f: \\mathbb{R}^N \\to \\mathbb{R}^3$. The Jacobian is a 2-dimensional object of shape (M, N), where M is the output dimension."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"def sphere_multivariate(params):\n",
" return (params @ params) * np.arange(3)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 0., 0., 0., 0.],\n",
" [ 0., 2., 4., 6.],\n",
" [ 0., 4., 8., 12.]])"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fd = em.first_derivative(sphere_multivariate, params=np.arange(4))\n",
"fd[\"derivative\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The output of ``first_derivative``\n",
"\n",
"As we have already seen in the introduction, the output of ``first_derivative`` is a dictionary. This dictionary **always** contains an entry \"derivative\" which is the numerical derivative. Besides this entry, several additional entries may be found, conditional on the state of certain arguments.\n",
"\n",
"**``return_func_value``**\n",
"\n",
"If the argument ``return_func_value`` is ``True``, the output dictionary will contain an additional entry under the key \"func_value\" denoting the function value evaluated at the params vector.\n",
"\n",
"**``return_info``**\n",
"\n",
"If the argument ``return_info`` is ``True``, the output dictionary will contain one to two additional entries. In this case it will always contain the entry \"func_evals\", which is a data frame containing all internally executed function evaluations. And if ``n_steps`` is larger than 1, it will also contain \"derivative_candidates\", which is a data frame containing derivative estimates used in the Richardson extrapolation.\n",
"\n",
"> For an explaination of the argument ``n_steps`` and the Richardson method, please see the API Reference and the Richardson Extrapolation explanation in the documentation.\n",
"\n",
"\n",
"The objects returned when ``return_info`` is ``True`` are rarely of any use directly and can be safely ignored. However, they are necessary data when using the plotting function ``derivative_plot`` as explained below. For better understanding, we print each of these additional objects once:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"fd = em.first_derivative(\n",
" sphere_scalar, params=0, n_steps=2, return_func_value=True, return_info=True\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"assert fd[\"func_value\"] == sphere_scalar(0)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" | \n",
" | \n",
" | \n",
" step | \n",
" eval | \n",
"
\n",
" \n",
" sign | \n",
" step_number | \n",
" dim_x | \n",
" dim_f | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1.490116e-09 | \n",
" 2.220446e-18 | \n",
"
\n",
" \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 2.980232e-09 | \n",
" 8.881784e-18 | \n",
"
\n",
" \n",
" -1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1.490116e-09 | \n",
" 2.220446e-18 | \n",
"
\n",
" \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 2.980232e-09 | \n",
" 8.881784e-18 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" step eval\n",
"sign step_number dim_x dim_f \n",
" 1 0 0 0 1.490116e-09 2.220446e-18\n",
" 1 0 0 2.980232e-09 8.881784e-18\n",
"-1 0 0 0 1.490116e-09 2.220446e-18\n",
" 1 0 0 2.980232e-09 8.881784e-18"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fd[\"func_evals\"]"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" | \n",
" | \n",
" | \n",
" der | \n",
" err | \n",
"
\n",
" \n",
" method | \n",
" num_term | \n",
" dim_x | \n",
" dim_f | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" forward | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 4.470348e-09 | \n",
" 8.467417e-08 | \n",
"
\n",
" \n",
" backward | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" -4.470348e-09 | \n",
" 8.467417e-08 | \n",
"
\n",
" \n",
" central | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0.000000e+00 | \n",
" 0.000000e+00 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" der err\n",
"method num_term dim_x dim_f \n",
"forward 1 0 0 4.470348e-09 8.467417e-08\n",
"backward 1 0 0 -4.470348e-09 8.467417e-08\n",
"central 1 0 0 0.000000e+00 0.000000e+00"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fd[\"derivative_candidates\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The ``params`` argument\n",
"\n",
"Above we used a ``numpy.ndarray`` as the ``params`` argument. In estimagic, params can be arbitrary [pytrees](https://jax.readthedocs.io/en/latest/pytrees.html). Examples are (nested) dictionaries of numbers, arrays, and pandas objects. Let's look at a few cases."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### pandas"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" | \n",
" value | \n",
"
\n",
" \n",
" category | \n",
" name | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" time_pref | \n",
" delta | \n",
" 0.9 | \n",
"
\n",
" \n",
" beta | \n",
" 0.6 | \n",
"
\n",
" \n",
" price | \n",
" price | \n",
" 2.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" value\n",
"category name \n",
"time_pref delta 0.9\n",
" beta 0.6\n",
"price price 2.0"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"params = pd.DataFrame(\n",
" [[\"time_pref\", \"delta\", 0.9], [\"time_pref\", \"beta\", 0.6], [\"price\", \"price\", 2]],\n",
" columns=[\"category\", \"name\", \"value\"],\n",
").set_index([\"category\", \"name\"])\n",
"\n",
"params"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"def sphere_pandas(params):\n",
" return params[\"value\"] @ params[\"value\"]"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"category name \n",
"time_pref delta 1.8\n",
" beta 1.2\n",
"price price 4.0\n",
"dtype: float64"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fd = em.first_derivative(sphere_pandas, params)\n",
"fd[\"derivative\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### nested dicts"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'a': 0,\n",
" 'b': 1,\n",
" 'c': 0 2\n",
" 1 3\n",
" 2 4\n",
" dtype: int64}"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"params = {\"a\": 0, \"b\": 1, \"c\": pd.Series([2, 3, 4])}\n",
"\n",
"params"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"def dict_sphere(params):\n",
" return params[\"a\"] ** 2 + params[\"b\"] ** 2 + (params[\"c\"] ** 2).sum()"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'a': array(0.),\n",
" 'b': array(2.),\n",
" 'c': 0 4.0\n",
" 1 6.0\n",
" 2 8.0\n",
" dtype: float64}"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fd = em.first_derivative(\n",
" func=dict_sphere,\n",
" params=params,\n",
")\n",
"\n",
"fd[\"derivative\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Description of the output\n",
"\n",
"The output of `first_derivative` when using a general pytree is straight-forward. Nevertheless, this explanation requires terminolgy of pytrees. Please refer to the [JAX documentation of pytrees](https://jax.readthedocs.io/en/latest/pytrees.html).\n",
"\n",
"The output tree of `first_derivative` has the same structure as the params tree. Equivalent to the numpy case, where the gradient is a vector of shape `(len(params),)`. If, however, the params tree contains non-scalar entries, like `numpy.ndarray`'s, `pandas.Series`', or `pandas.DataFrame`'s, the output is not expanded but a block is created instead. In the above example, the entry `params[\"c\"]` is a `pandas.Series` with 3 entries. Thus, the first derivative output contains the corresponding 3x1-block of the gradient at the position `[\"c\"]`:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Multiprocessing\n",
"\n",
"For slow-to-evaluate functions, one may increase computation speed by running the function evaluations in parallel. This can be easily done by setting the ``n_cores`` argument. For example, if we wish to evaluate the function on ``2`` cores, we simply write"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"fd = em.first_derivative(sphere_scalar, params=0, n_cores=2)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "estimagic",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:27:35) [Clang 14.0.6 ]"
},
"vscode": {
"interpreter": {
"hash": "e8a16b1bdcc80285313db4674a5df2a5a80c75795379c5d9f174c7c712f05b3a"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}