Download the notebook here

[1]:
import numpy as np
import pandas as pd

Numerical differentiation#

This tutorial shows how to differentiate with estimagic with simple examples. More details on the topics covered here can be found in the how to guides.

Basic usage of first_derivative#

[2]:
from estimagic import first_derivative


def sphere(params):
    return params @ params
[3]:
fd = first_derivative(
    func=sphere,
    params=np.arange(5),
)

fd["derivative"]
[3]:
array([0., 2., 4., 6., 8.])

Basic usage of second_derivative#

[4]:
from estimagic import second_derivative

sd = second_derivative(
    func=sphere,
    params=np.arange(5),
)

sd["derivative"].round(3)
[4]:
array([[ 2.001,  0.   ,  0.   ,  0.   ,  0.   ],
       [ 0.   ,  2.   ,  0.   ,  0.   ,  0.   ],
       [ 0.   ,  0.   ,  2.   ,  0.   , -0.   ],
       [ 0.   ,  0.   ,  0.   ,  2.   ,  0.   ],
       [ 0.   ,  0.   , -0.   ,  0.   ,  2.   ]])

params do not have to be vectors#

In estimagic, params can be arbitrary pytrees. Examples are (nested) dictionaries of numbers, arrays and pandas objects.

[5]:
def dict_sphere(params):
    return params["a"] ** 2 + params["b"] ** 2 + (params["c"] ** 2).sum()


fd = first_derivative(
    func=dict_sphere,
    params={"a": 0, "b": 1, "c": pd.Series([2, 3, 4])},
)

fd["derivative"]
[5]:
{'a': array(0.),
 'b': array(2.),
 'c': 0    4.0
 1    6.0
 2    8.0
 dtype: float64}

Output description#

The output of first_derivative when using a general pytree is straight-forward. Nevertheless, this explanation requires terminolgy of pytrees. Please refer to the JAX documentation of pytrees.

The output tree of first_derivative has the same structure as the params tree. Equivalently as in the numpy case, where the gradient is a vector of shape (len(params),). If, however, the params tree contains non-scalar entries like numpy.ndarray‘s, pandas.Series’, or pandas.DataFrame’s, the output is not expanded but a block is created. In the above example the entry params["c"] is a pandas.Series with 3 entries. Thus the first derivative output contains the corresponding 3x1-block of the gradient at the position ["c"]:

[6]:
fd["derivative"]["c"]
[6]:
0    4.0
1    6.0
2    8.0
dtype: float64
[7]:
sd = second_derivative(
    func=dict_sphere,
    params={"a": 0, "b": 1, "c": pd.Series([2, 3, 4])},
)

sd["derivative"]
[7]:
{'a': {'a': array(2.00072215),
  'b': array(0.),
  'c': 0    0.0
  1    0.0
  2    0.0
  dtype: float64},
 'b': {'a': array(0.),
  'b': array(1.9999955),
  'c': 0    0.0
  1    0.0
  2    0.0
  dtype: float64},
 'c': {'a': 0    0.0
  1    0.0
  2    0.0
  dtype: float64,
  'b': 0    0.0
  1    0.0
  2    0.0
  dtype: float64,
  'c':           0         1         2
  0  1.999995  0.000004 -0.000003
  1  0.000004  2.000001  0.000000
  2 -0.000003  0.000000  1.999997}}

Output description#

The output of second_derivative when using a general pytrees looks more complex but is easy once we remember that the second derivative is equivalent to applying the first derivative twice. This explanation requires terminolgy of pytrees. Please refer to the JAX documentation of pytrees.

The output tree is a product of the params tree with itself. Equivalently as in the numpy case, where the hessian is a matrix of shape (len(params), len(params)). If, however, the params tree contains non-scalar entries like numpy.ndarray‘s, pandas.Series’, or pandas.DataFrame’s, the output is not expanded but a block is created. In the above example the entry params["c"] is a 3-dimensional pandas.Series. Thus the second derivative output contains the corresponding 3x3-block of the hessian at the position ["c"]["c"]:

[8]:
sd["derivative"]["c"]["c"].round(3)
[8]:
0 1 2
0 2.0 0.0 -0.0
1 0.0 2.0 0.0
2 -0.0 0.0 2.0

There are many options#

You can choose which finite difference method to use, whether the we should mind parameter bounds, or to evaluate the function in parallel. Lets go through some basic examples.

You can choose the difference method#

[9]:
fd = first_derivative(
    func=sphere, params=np.arange(5), method="backward"  # default: 'central'
)

fd["derivative"]
[9]:
array([-0.        ,  2.        ,  4.        ,  6.        ,  7.99999994])
[10]:
sd = second_derivative(
    func=sphere, params=np.arange(5), method="forward"  # default: 'central_cross'
)

sd["derivative"].round(3)
[10]:
array([[2.006, 0.   , 0.   , 0.   , 0.   ],
       [0.   , 2.   , 0.   , 0.   , 0.   ],
       [0.   , 0.   , 2.   , 0.   , 0.   ],
       [0.   , 0.   , 0.   , 2.   , 0.   ],
       [0.   , 0.   , 0.   , 0.   , 2.   ]])

You can add bounds#

[11]:
params = np.arange(5)

fd = first_derivative(
    func=sphere,
    params=params,
    lower_bounds=params,  # forces first_derivative to use forward differences
    upper_bounds=params + 1,
)

fd["derivative"]
[11]:
array([0.        , 2.        , 4.        , 6.        , 8.00000006])
[12]:
sd = second_derivative(
    func=sphere,
    params=params,
    lower_bounds=params,  # forces first_derivative to use forward differences
    upper_bounds=params + 1,
)

sd["derivative"].round(3)
[12]:
array([[2.006, 0.   , 0.   , 0.   , 0.   ],
       [0.   , 2.   , 0.   , 0.   , 0.   ],
       [0.   , 0.   , 2.   , 0.   , 0.   ],
       [0.   , 0.   , 0.   , 2.   , 0.   ],
       [0.   , 0.   , 0.   , 0.   , 2.   ]])

Or use parallelized numerical derivatives#

[13]:
fd = first_derivative(
    func=sphere,
    params=np.arange(5),
    n_cores=4,
)

fd["derivative"]
[13]:
array([0., 2., 4., 6., 8.])
[14]:
sd = second_derivative(
    func=sphere,
    params=params,
    n_cores=4,
)

sd["derivative"].round(3)
[14]:
array([[ 2.001,  0.   ,  0.   ,  0.   ,  0.   ],
       [ 0.   ,  2.   ,  0.   ,  0.   ,  0.   ],
       [ 0.   ,  0.   ,  2.   ,  0.   , -0.   ],
       [ 0.   ,  0.   ,  0.   ,  2.   ,  0.   ],
       [ 0.   ,  0.   , -0.   ,  0.   ,  2.   ]])