Download the notebook here!

Selecting Elements of DataFrames

Typically, a constraint will only apply to a subset of parameters. Before starting to explain how to specify constraints in estimagic, we will therefore briefly explain how to select subsets of rows of a DataFrame. Feel free to skip if you already know this.

Lets first look at a simple example DataFrame:

[8]:
import pandas as pd
import numpy as np

index = pd.MultiIndex.from_product(
    [["a", "b"], np.arange(3)], names=["category", "number"]
)

df = pd.DataFrame(
    data=[0.1, 0.45, 0.55, 0.75, 0.85, -1.0], index=index, columns=["value"]
)
df
[8]:
value
category number
a 0 0.10
1 0.45
2 0.55
b 0 0.75
1 0.85
2 -1.00

To select subsets of the rows we have two options: loc and query.

loc is best if the rows we want to select correspond to an entry in the index, reading from the left. For example, we can select all parameters of category “a” by:

[9]:
df.loc["a"]
[9]:
value
number
0 0.10
1 0.45
2 0.55

In order to only get the second row, we would do:

[10]:
df.loc[("a", 1)]
[10]:
value    0.45
Name: (a, 1), dtype: float64

For these examples, query would be much more verbose:

[11]:
df.query("category == 'a'")
[11]:
value
category number
a 0 0.10
1 0.45
2 0.55

However, if we wanted to select all columns where number equals 1, loc would be more cumbersome:

[12]:
df.loc[[("a", 1), ("b", 1)]]
[12]:
value
category number
a 1 0.45
b 1 0.85

Imagine how that would look like if we had twenty categories! For such more cases, query is a much better solution:

[13]:
df.query("number == 1")
[13]:
value
category number
a 1 0.45
b 1 0.85

In order to specify constraints for a parameter, you specify either loc or query, this will be passed on as an argument to params_df.loc[] or params_df.query(), respectively.