{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# How to calculate second derivatives\n", "\n", "In this guide, we show you how to compute second derivatives with estimagic, while introducing some core concepts." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import estimagic as em\n", "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "\n", "Instead of the sphere function, let's now look at an ellipse $$f(x) = x^\\top W x,$$\n", "with a weighting matrix $W$.\n", "\n", "The second derivative of $f$ is given by $f''(x) = W + W^\\top$. With numerical derivatives, we have to specify the value of $x$ at which we want to compute the derivative. Note that in this case the second derivative should be independent of the value of $x$." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def ellipse_scalar(params):\n", " weight = 1\n", " return weight * params**2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's first consider two **scalar** points $x = 0$ and $x=1$. Since the second derivative here is constant, we have $f''(0) = f''(1) = 2$.\n", "\n", "To compute the derivative using estimagic, we simply pass the function ``ellipse_scalar`` and ``params`` to the function ``second_derivative``:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(2.)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sd = em.second_derivative(func=ellipse_scalar, params=0)\n", "sd[\"derivative\"]" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(1.99999625)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sd = em.second_derivative(func=ellipse_scalar, params=1)\n", "sd[\"derivative\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that the output of ``second_derivative`` is a dictionary containing the derivative under the key \"derivative\". We discuss the ouput in more detail below." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Hessian and Batch-Hessian\n", "\n", "The scalar case from above extends directly to the multivariate case. Let's consider two cases: \n", "\n", "| | |\n", "|:--------|:------------------------------------|\n", "|Hessian | $f_1: \\mathbb{R}^N \\to \\mathbb{R}$ |\n", "|Batch-Hessian | $f_2: \\mathbb{R}^N \\to \\mathbb{R}^M$|\n", "\n", "\n", "The second derivative of $f_1$ is usually referred to as the Hessian, while the second derivative of $f_2$ is usually called a Batch-Hessian." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Hessian\n", "\n", "Let's again use the ellipse function, but this time with a vector input. The hessian is a 2-dimensional object of shape (N, N)." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "def ellipse(params):\n", " weight = np.arange(len(params) ** 2).reshape(len(params), len(params))\n", " return params @ weight @ params" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0., 5., 10., 15.],\n", " [ 5., 10., 15., 20.],\n", " [10., 15., 20., 25.],\n", " [15., 20., 25., 30.]])" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sd = em.second_derivative(ellipse, params=np.arange(4))\n", "sd[\"derivative\"].round(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Batch-Hessian\n", "\n", "As an example let's now use the function\n", "$$f(x) = (x^\\top x) \\begin{pmatrix}1\\\\2\\\\3 \\end{pmatrix},$$\n", "with $f: \\mathbb{R}^N \\to \\mathbb{R}^3$. The Batch-Hessian is now a 3-dimensional object of shape (M, N, N), where M is the output dimension." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "def ellipse_multivariate(params):\n", " weight = np.arange(len(params) ** 2).reshape(len(params), len(params))\n", " return (params @ weight @ params) * np.arange(3)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[[ 0., 0., 0., 0.],\n", " [ 0., 0., 0., 0.],\n", " [ 0., 0., 0., 0.],\n", " [ 0., 0., 0., 0.]],\n", "\n", " [[ 0., 5., 10., 15.],\n", " [ 5., 10., 15., 20.],\n", " [10., 15., 20., 25.],\n", " [15., 20., 25., 30.]],\n", "\n", " [[ 0., 10., 20., 30.],\n", " [10., 20., 30., 40.],\n", " [20., 30., 40., 50.],\n", " [30., 40., 50., 60.]]])" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sd = em.second_derivative(ellipse_multivariate, params=np.arange(4))\n", "sd[\"derivative\"].round(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The output of ``second_derivative``\n", "\n", "As we have already seen in the introduction, the output of ``first_derivative`` is a dictionary. This dictionary **always** contains an entry \"derivative\" which is the numerical derivative. Besides this entry, several additional entries may be found, conditional on the state of certain arguments.\n", "\n", "**``return_func_value``**\n", "\n", "If the argument ``return_func_value`` is ``True``, the output dictionary will contain an additional entry under the key \"func_value\" denoting the function value evaluated at the params vector." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "sd = em.second_derivative(\n", " ellipse_scalar, params=0, return_func_value=True, return_info=True\n", ")" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "assert sd[\"func_value\"] == ellipse_scalar(0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The ``params`` argument\n", "\n", "Above we used a ``numpy.ndarray`` as the ``params`` argument. In estimagic, params can be arbitrary [pytrees](https://jax.readthedocs.io/en/latest/pytrees.html). Examples are (nested) dictionaries of numbers, arrays and pandas objects. Lets look at a few cases.\n", "\n", "### pandas" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
value
categoryname
time_prefdelta0.9
beta0.6
priceprice2.0
\n", "
" ], "text/plain": [ " value\n", "category name \n", "time_pref delta 0.9\n", " beta 0.6\n", "price price 2.0" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "params = pd.DataFrame(\n", " [[\"time_pref\", \"delta\", 0.9], [\"time_pref\", \"beta\", 0.6], [\"price\", \"price\", 2]],\n", " columns=[\"category\", \"name\", \"value\"],\n", ").set_index([\"category\", \"name\"])\n", "\n", "params" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "def ellipse_pandas(params):\n", " weight = np.arange(len(params) ** 2).reshape(len(params), len(params))\n", " return params[\"value\"] @ weight @ params[\"value\"]" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
categorytime_prefprice
namedeltabetaprice
categoryname
time_prefdelta0.0000004.0000097.999982
beta4.0000097.99965911.999973
priceprice7.99998211.99997315.999964
\n", "
" ], "text/plain": [ "category time_pref price\n", "name delta beta price\n", "category name \n", "time_pref delta 0.000000 4.000009 7.999982\n", " beta 4.000009 7.999659 11.999973\n", "price price 7.999982 11.999973 15.999964" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sd = em.second_derivative(ellipse_pandas, params)\n", "sd[\"derivative\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### nested dicts" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'a': 0,\n", " 'b': 1,\n", " 'c': 0 2\n", " 1 3\n", " 2 4\n", " dtype: int64}" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "params = {\"a\": 0, \"b\": 1, \"c\": pd.Series([2, 3, 4])}\n", "\n", "params" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "def dict_sphere(params):\n", " return params[\"a\"] ** 2 + params[\"b\"] ** 2 + (params[\"c\"] ** 2).sum()" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'a': {'a': array(2.00072215),\n", " 'b': array(0.),\n", " 'c': 0 0.0\n", " 1 0.0\n", " 2 0.0\n", " dtype: float64},\n", " 'b': {'a': array(0.),\n", " 'b': array(1.9999955),\n", " 'c': 0 0.0\n", " 1 0.0\n", " 2 0.0\n", " dtype: float64},\n", " 'c': {'a': 0 0.0\n", " 1 0.0\n", " 2 0.0\n", " dtype: float64,\n", " 'b': 0 0.0\n", " 1 0.0\n", " 2 0.0\n", " dtype: float64,\n", " 'c': 0 1 2\n", " 0 1.999995 0.000004 -0.000003\n", " 1 0.000004 2.000001 0.000000\n", " 2 -0.000003 0.000000 1.999997}}" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sd = em.second_derivative(\n", " func=dict_sphere,\n", " params=params,\n", ")\n", "\n", "sd[\"derivative\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Description of the output\n", "\n", "The output of `second_derivative` when using a general pytrees looks more complex but is easy once we remember that the second derivative is equivalent to applying the first derivative twice. This explanation requires terminolgy of pytrees. Please refer to the [JAX documentation of pytrees](https://jax.readthedocs.io/en/latest/pytrees.html).\n", "\n", "The output tree is a product of the params tree with itself. This is equivalent to the numpy case, where the hessian is a matrix of shape `(len(params), len(params))`. If, however, the params tree contains non-scalar entries like `numpy.ndarray`'s, `pandas.Series`', or `pandas.DataFrame`'s, the output is not expanded but a block is created instead. In the above example, the entry `params[\"c\"]` is a 3-dimensional `pandas.Series`. Thus, the second derivative output contains the corresponding 3x3-block of the hessian at the position `[\"c\"][\"c\"]`:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
012
02.00.0-0.0
10.02.00.0
2-0.00.02.0
\n", "
" ], "text/plain": [ " 0 1 2\n", "0 2.0 0.0 -0.0\n", "1 0.0 2.0 0.0\n", "2 -0.0 0.0 2.0" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sd[\"derivative\"][\"c\"][\"c\"].round(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Multiprocessing\n", "\n", "For slow-to-evaluate functions, one may increase computation speed by running the function evaluations in parallel. This can be easily done by setting the ``n_cores`` argument. For example, if we wish to evaluate the function on ``2`` cores we simply write" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "sd = em.second_derivative(ellipse_scalar, params=0, n_cores=2)" ] } ], "metadata": { "kernelspec": { "display_name": "estimagic", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:27:35) [Clang 14.0.6 ]" }, "vscode": { "interpreter": { "hash": "e8a16b1bdcc80285313db4674a5df2a5a80c75795379c5d9f174c7c712f05b3a" } } }, "nbformat": 4, "nbformat_minor": 2 }