{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# How to calculate first derivatives\n", "\n", "In this guide, we show you how to compute first derivatives with estimagic - while introducing some core concepts." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import estimagic as em\n", "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "\n", "As in the getting started section, let's lookt at the sphere function $$f(x) = x^\\top x.$$" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def sphere_scalar(params):\n", " return params**2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The derivative of $f$ is given by $f'(x) = 2 x$. With numerical derivatives, we have to specify the value of $x$ at which we want to compute the derivative. Let's first consider two **scalar** points $x = 0$ and $x=1$. We have $f'(0) = 0$ and $f'(1) = 2$.\n", "\n", "To compute the derivative using estimagicm we simply pass the function ``sphere`` and the ``params`` to the function ``first_derivative``:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(0.)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd = em.first_derivative(func=sphere_scalar, params=0)\n", "fd[\"derivative\"]" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(2.)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd = em.first_derivative(func=sphere_scalar, params=1)\n", "fd[\"derivative\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that the output of ``first_derivative`` is a dictionary containing the derivative under the key \"derivative\". We discuss the ouput in more detail below." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Gradient and Jacobian\n", "\n", "The scalar case from above extends directly to the multivariate case. Let's consider two cases: \n", "\n", "| | |\n", "|:--------|:------------------------------------|\n", "|Gradient | $f_1: \\mathbb{R}^N \\to \\mathbb{R}$ |\n", "|Jacobian | $f_2: \\mathbb{R}^N \\to \\mathbb{R}^M$|\n", "\n", "\n", "The first derivative of $f_1$ is usually referred to as the gradient, while the first derivative of $f_2$ is usually called the Jacobian." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Gradient\n", "\n", "Let's again use the sphere function, but this time with a vector input. The gradient is a 1-dimensional vector of shape (N,)." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "def sphere(params):\n", " return params @ params" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0., 2., 4., 6.])" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd = em.first_derivative(sphere, params=np.arange(4))\n", "fd[\"derivative\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Jacobian\n", "\n", "As an example, let's now use the function\n", "$$f(x) = (x^\\top x) \\begin{pmatrix}1\\\\2\\\\3 \\end{pmatrix},$$\n", "with $f: \\mathbb{R}^N \\to \\mathbb{R}^3$. The Jacobian is a 2-dimensional object of shape (M, N), where M is the output dimension." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "def sphere_multivariate(params):\n", " return (params @ params) * np.arange(3)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0., 0., 0., 0.],\n", " [ 0., 2., 4., 6.],\n", " [ 0., 4., 8., 12.]])" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd = em.first_derivative(sphere_multivariate, params=np.arange(4))\n", "fd[\"derivative\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The output of ``first_derivative``\n", "\n", "As we have already seen in the introduction, the output of ``first_derivative`` is a dictionary. This dictionary **always** contains an entry \"derivative\" which is the numerical derivative. Besides this entry, several additional entries may be found, conditional on the state of certain arguments.\n", "\n", "**``return_func_value``**\n", "\n", "If the argument ``return_func_value`` is ``True``, the output dictionary will contain an additional entry under the key \"func_value\" denoting the function value evaluated at the params vector.\n", "\n", "**``return_info``**\n", "\n", "If the argument ``return_info`` is ``True``, the output dictionary will contain one to two additional entries. In this case it will always contain the entry \"func_evals\", which is a data frame containing all internally executed function evaluations. And if ``n_steps`` is larger than 1, it will also contain \"derivative_candidates\", which is a data frame containing derivative estimates used in the Richardson extrapolation.\n", "\n", "> For an explaination of the argument ``n_steps`` and the Richardson method, please see the API Reference and the Richardson Extrapolation explanation in the documentation.\n", "\n", "\n", "The objects returned when ``return_info`` is ``True`` are rarely of any use directly and can be safely ignored. However, they are necessary data when using the plotting function ``derivative_plot`` as explained below. For better understanding, we print each of these additional objects once:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "fd = em.first_derivative(\n", " sphere_scalar, params=0, n_steps=2, return_func_value=True, return_info=True\n", ")" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "assert fd[\"func_value\"] == sphere_scalar(0)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
stepeval
signstep_numberdim_xdim_f
10001.490116e-092.220446e-18
1002.980232e-098.881784e-18
-10001.490116e-092.220446e-18
1002.980232e-098.881784e-18
\n", "
" ], "text/plain": [ " step eval\n", "sign step_number dim_x dim_f \n", " 1 0 0 0 1.490116e-09 2.220446e-18\n", " 1 0 0 2.980232e-09 8.881784e-18\n", "-1 0 0 0 1.490116e-09 2.220446e-18\n", " 1 0 0 2.980232e-09 8.881784e-18" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd[\"func_evals\"]" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dererr
methodnum_termdim_xdim_f
forward1004.470348e-098.467417e-08
backward100-4.470348e-098.467417e-08
central1000.000000e+000.000000e+00
\n", "
" ], "text/plain": [ " der err\n", "method num_term dim_x dim_f \n", "forward 1 0 0 4.470348e-09 8.467417e-08\n", "backward 1 0 0 -4.470348e-09 8.467417e-08\n", "central 1 0 0 0.000000e+00 0.000000e+00" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd[\"derivative_candidates\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The ``params`` argument\n", "\n", "Above we used a ``numpy.ndarray`` as the ``params`` argument. In estimagic, params can be arbitrary [pytrees](https://jax.readthedocs.io/en/latest/pytrees.html). Examples are (nested) dictionaries of numbers, arrays, and pandas objects. Let's look at a few cases." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### pandas" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
value
categoryname
time_prefdelta0.9
beta0.6
priceprice2.0
\n", "
" ], "text/plain": [ " value\n", "category name \n", "time_pref delta 0.9\n", " beta 0.6\n", "price price 2.0" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "params = pd.DataFrame(\n", " [[\"time_pref\", \"delta\", 0.9], [\"time_pref\", \"beta\", 0.6], [\"price\", \"price\", 2]],\n", " columns=[\"category\", \"name\", \"value\"],\n", ").set_index([\"category\", \"name\"])\n", "\n", "params" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "def sphere_pandas(params):\n", " return params[\"value\"] @ params[\"value\"]" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "category name \n", "time_pref delta 1.8\n", " beta 1.2\n", "price price 4.0\n", "dtype: float64" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd = em.first_derivative(sphere_pandas, params)\n", "fd[\"derivative\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### nested dicts" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'a': 0,\n", " 'b': 1,\n", " 'c': 0 2\n", " 1 3\n", " 2 4\n", " dtype: int64}" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "params = {\"a\": 0, \"b\": 1, \"c\": pd.Series([2, 3, 4])}\n", "\n", "params" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "def dict_sphere(params):\n", " return params[\"a\"] ** 2 + params[\"b\"] ** 2 + (params[\"c\"] ** 2).sum()" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'a': array(0.),\n", " 'b': array(2.),\n", " 'c': 0 4.0\n", " 1 6.0\n", " 2 8.0\n", " dtype: float64}" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd = em.first_derivative(\n", " func=dict_sphere,\n", " params=params,\n", ")\n", "\n", "fd[\"derivative\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Description of the output\n", "\n", "The output of `first_derivative` when using a general pytree is straight-forward. Nevertheless, this explanation requires terminolgy of pytrees. Please refer to the [JAX documentation of pytrees](https://jax.readthedocs.io/en/latest/pytrees.html).\n", "\n", "The output tree of `first_derivative` has the same structure as the params tree. Equivalent to the numpy case, where the gradient is a vector of shape `(len(params),)`. If, however, the params tree contains non-scalar entries, like `numpy.ndarray`'s, `pandas.Series`', or `pandas.DataFrame`'s, the output is not expanded but a block is created instead. In the above example, the entry `params[\"c\"]` is a `pandas.Series` with 3 entries. Thus, the first derivative output contains the corresponding 3x1-block of the gradient at the position `[\"c\"]`:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Multiprocessing\n", "\n", "For slow-to-evaluate functions, one may increase computation speed by running the function evaluations in parallel. This can be easily done by setting the ``n_cores`` argument. For example, if we wish to evaluate the function on ``2`` cores, we simply write" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "fd = em.first_derivative(sphere_scalar, params=0, n_cores=2)" ] } ], "metadata": { "kernelspec": { "display_name": "estimagic", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:27:35) [Clang 14.0.6 ]" }, "vscode": { "interpreter": { "hash": "e8a16b1bdcc80285313db4674a5df2a5a80c75795379c5d9f174c7c712f05b3a" } } }, "nbformat": 4, "nbformat_minor": 2 }