Download the notebook here

How to use logging#

Estimagic can keep a persistent log of the parameter and criterion values tried out by an optimizer. For this we use an sqlite database, which makes it easy to read from and write to the log-file from several processes or threads. Moreover, it is possible to retrieve data from the log-file without ever loading it into memory, which might be relevant for very long running optimizations.

The log-file is updated instantly when new information becomes available. Thus, no data is lost when an optimization has to be aborted or a server is shut down for maintenance.

The sqlite database is also used to exchange data between the optimization and the dashboard.

In addition to parameters and criterion values, we also save all arguments to a maximize or minimize in the database as well as other information in the database that can help to reproduce an optimization result.

Turn logging on or off#

logging is an optional argument of maximize or minimize. It can be a string or pathlib.Path that specifies the path to a sqlite3 database. Typically, those files have the file extension .db. If the file does not exist, it will be created for you.

We encourage users to always use logging. However, it introduces a small overhead of about 5 milliseconds for each evaluation of the criterion function. If this is too much, you can turn logging off by specifying logging=False

Reading the log#

The most convenient way of displaying the content of the log file is the dashboard. However, sometimes you need to access the content of a specific iteration. Examples are: - You want to re-start the optimization from the last or some other parameter vector - You want to find a parameter vector that caused an error in order to investigate that error - You want to visualize the fit of your model at a specific parameter vector in a way that is not yet supported in the dashboard.

We provide a convenience function for this:

[2]:
from estimagic.config import EXAMPLE_DIR
from estimagic.logging.read_log import read_optimization_iteration
[3]:
path = EXAMPLE_DIR / "db1.db"
info = read_optimization_iteration(path, 1)
info.keys()
[3]:
dict_keys(['rowid', 'timestamp', 'exceptions', 'valid', 'hash', 'value', 'params'])

The entries should be rather self explanatory. For more detailed information about the function, check out the docstring

[21]:
print(read_optimization_iteration.__doc__)
Get information about an optimization iteration.

    Args:
        path (str or pathlib.Path): Path to the sqlite database file used for logging.
            Typically, those have the file extension ``.db``.
        iteration (int): The index of the iteration that should be retrieved. The row_id
            behaves as Python list indices, i.e. 0 identifies the first iteration,
            -1 the last one, etc.
        include_internals (bool): Whether internally used quantities like the
            internal parameter vector and the corresponding derivative etc. are included
             in the result. Default False. This should only be used by advanced users.

    Returns:
        dict: The logged information corresponding to the iteration. The keys correspond
            to database columns.

    Raises:
        KeyError if the iteration is out of bounds.


Customize the logging behaviour with the log_options argument#

log_options is an optional argument to maximize or minimize. It is a dictionary with keyword arguments that influence the logging behavior. The following options are available:

  • “fast_logging”: A boolean that determines if “unsafe” settings are used to speed up write processes to the database. This should only be used for very short running criterion functions where the main purpose of the log is a real-time dashboard and it would not be catastrophic to get a corrupted database in case of a sudden system shutdown. If one evaluation of the criterion function (and gradient if applicable) takes more than 100 ms, the logging overhead is negligible.

  • “if_exists”: (str) One of “extend”, “replace”, “raise”