Download the notebook here
!
Selecting Elements of DataFrames¶
Typically, a constraint will only apply to a subset of parameters. Before starting to explain how to specify constraints in estimagic, we will therefore briefly explain how to select subsets of rows of a DataFrame. Feel free to skip if you already know this.
Lets first look at a simple example DataFrame:
[8]:
import pandas as pd
import numpy as np
index = pd.MultiIndex.from_product(
[["a", "b"], np.arange(3)], names=["category", "number"]
)
df = pd.DataFrame(
data=[0.1, 0.45, 0.55, 0.75, 0.85, -1.0], index=index, columns=["value"]
)
df
[8]:
value | ||
---|---|---|
category | number | |
a | 0 | 0.10 |
1 | 0.45 | |
2 | 0.55 | |
b | 0 | 0.75 |
1 | 0.85 | |
2 | -1.00 |
To select subsets of the rows we have two options: loc and query.
loc
is best if the rows we want to select correspond to an entry in the index, reading from the left. For example, we can select all parameters of category “a” by:
[9]:
df.loc["a"]
[9]:
value | |
---|---|
number | |
0 | 0.10 |
1 | 0.45 |
2 | 0.55 |
In order to only get the second row, we would do:
[10]:
df.loc[("a", 1)]
[10]:
value 0.45
Name: (a, 1), dtype: float64
For these examples, query
would be much more verbose:
[11]:
df.query("category == 'a'")
[11]:
value | ||
---|---|---|
category | number | |
a | 0 | 0.10 |
1 | 0.45 | |
2 | 0.55 |
However, if we wanted to select all columns where number equals 1, loc would be more cumbersome:
[12]:
df.loc[[("a", 1), ("b", 1)]]
[12]:
value | ||
---|---|---|
category | number | |
a | 1 | 0.45 |
b | 1 | 0.85 |
Imagine how that would look like if we had twenty categories! For such more cases, query is a much better solution:
[13]:
df.query("number == 1")
[13]:
value | ||
---|---|---|
category | number | |
a | 1 | 0.45 |
b | 1 | 0.85 |
In order to specify constraints for a parameter, you specify either loc
or query
, this will be passed on as an argument to params_df.loc[]
or params_df.query()
, respectively.