How Constraints Are Implemented

Most of the optimizers wrapped in estimagic cannot deal natively with anything but box constraints. So the problem they can solve is:

\[\min_{x \in \mathbb{R}^k} f(x) \quad \text{s.t.} \hspace{0.5cm} l \leq x \leq u\]

However, in most econometric applications we also need other constraints. Examples are that some parameters sum to a value, that they form a covariance matrix or that they are probabilities. More abstractly, the problem becomes:

\[\min_{x \in \mathbb{R}^k} f(x) \quad \text{s.t.} \hspace{0.5cm} l \leq x \leq u \text{ and } C(x) = 0\]

There are two basic ways of converting optimizers that only can deal with box constraints into constrained optimizers. Reparametrization and penalties. Below we explain what both approaches are, why we chose the reparametrization approach over penalties and which reparametrizations we are using for each type of constraint.

Possible Approaches

Reparametrizations

In the reparametrization approach need to find an invertible mapping g such as well as two k’ dimensional vectors l and u such that:

\[l' \leq \tilde{x} \leq u' \iff l \leq g(\tilde{x}) \leq u \text { and } C(g(\tilde{x})) = 0\]

This means that:

\[\begin{split}\min_{\tilde{x} \in \mathbb{R}^{k'}} f(g(\tilde{x})) \quad \text{s.t.} \hspace{0.5cm} l' \leq \tilde{x} \leq u'\\\end{split}\]

is equivalent to the original minimization problem.

This sounds more complicated than it is. Let’s look at the simple example of a two dimensional parameter vector, where the constraint is that the two parameters have to sum to 5.

\[ \begin{align}\begin{aligned}x = (x_1, x_2)\\f(x) = x_1^2 + 2 x_2^2\\c(x) = x_1 + x_2 - 5\\\tilde{x} = x_1\\g(\tilde{x}) = (\tilde{x}, 5 - \tilde{x})\end{aligned}\end{align} \]

Typically, users implement such reparametrizations manually and write functions to convert between the parameters of interest and their reparametrized version. Estimagic does this for you, for a large number of constraints that are typically used in econometric applications.

For this approach to be efficient, it is crucial that the reparametrizations preserve desirable properties of the original problem. In particular the mapping g should be differentiable and if possible linear. Moreover, the dimensionality of \(\tilde{x}\) should be chosen as small as possible. Estimagic only implements constraints that can be enforced with differentiable transformations and always achieves full dimensionality reduction.

Penalties

The penalty approach is conceptually much simpler. Whenever \(C(x) \neq 0\), a penalty term is added to the criterion function. If the penalty term is large enough (e.g. as large as the criterion function at start values), this makes sure that no x that does not satisfy the constraints can be optimal.

While the generality and conceptual simplicity of this approach is attractive, it also has its drawbacks. Applying penalties in a naive way can introduce kinks, discontinuities and even local optima into the penalized criterion.

What estimagic does

We chose to implement constraints via reparametrizations for the following reasons:

  • Reparametrizations ensure that the criterion function is only evaluated at parameters that satisfy all constraints. This is not only efficient, but essential if the criterion function is only defined for such parameters.

  • Reparametrizations can often achieve a substantial dimensionality reduction. In particular, fixes and equality constraints are implemented at zero cost, i.e. as efficiently as if you directly plug them into your problem. This is important because fixes and equality constraints often make user code much nicer and more flexible.

  • It is easier to preserve desirable properties such as convexity and differentiability with reparametrizations than with penalties.

The constraints that can be implemented with this approach are available for all optimizers. More general constraints are only available with optimizers that can deal natively with them. This includes all optimizers from the nlopt and ipopt library.

The Non-Trivial Reparametrizations

Fixed parameters, equality and pairwise equality constraints can be implemented trivially with reparametrizations by simply plugging them into the criterion function. Increasing and decreasing constraints are internally implemented as linear constraints. The following section explains how the other constraints are implemented:

Covariance and sdcorr Constraints

The main difficulty with covariance and sdcorr constraints is to keep the (implied) covariance matrix valid, i.e. positive semi-definite. In both cases, \(\tilde{x}\) contains the non-zero elements of the lower triangular cholesky factor of the (implied) covariance matrix. For covariance constraints, g is then simply the product of the cholesky factor with its transpose. For sdcorr covariance matrix the product is further converted to standard deviations and the unique elements of a covariance matrix.

Several papers show that the cholesky reparametrization is a very efficient way to optimize over covariance matrices. Examples are [PB96] and [Gro94].

A limitation of this approach is that there can be no additional fixes, box constraints or other constraints on any of the involved parameters.

Linear Constraints

Assume we have m linear constraints on an n-dimensional parameter vector. Then the set of all parameter vectors that satisfies the constraints can be written as:

\[\mathbf{X} \equiv \{\mathbf{x} \in \mathbb{R}^n \mid \mathbf{l} \leq \mathbf{Ax} \leq \mathbf{u}\}\]

We are looking for a set \(\mathbf{\tilde{X}}\) that only satisfies box constraints and reparametrizations. The reparametrizations will turn out to be a linear mapping, and thus have a matrix representation, say M. We are good if the following holds:

\[x \in \mathbf{X} \iff \exists \mathbf{\tilde{x}} \in \mathbf{\tilde{X}} \text{s.t.} \mathbf{x} = \mathbf{M\tilde{x}}\]

A suitable choice of \(\mathbf{\tilde{X}}\) and \(\mathbf{M}\) are:

\[ \begin{align}\begin{aligned}\mathbf{\tilde{X}} \equiv \{(\tilde{x}_1, \tilde{x}_2)^T \mid \mathbf{\tilde{x}}_1 \in \mathbb{R}^{k}$ \text{ and } \mathbf{l} \leq \mathbf{\tilde{x}}_2 \leq \mathbf{l}\}\\\begin{split}\mathbf{M} = \left[ {\begin{array}{cc} \mathbb{I}_n[k] \\ A \\ \end{array} } \right]^{-1}\end{split}\end{aligned}\end{align} \]

where \(k = m - n\) and \(\mathbb{I}_n[k]\) are the k rows of the identity matrix that make all rows of \(\mathbf{M}\) linearly independent.

Proof:

\(\Rightarrow\)”:

Let \(x\in \mathbf{X}\), then we define \(\mathbf{\tilde{x}} = \mathbf{M}^{-1} x\). Claim: \(\mathbf{\tilde{x}} \in \mathbf{\tilde{X}}\): \

\[\begin{split}\mathbf{\tilde{x}} = \mathbf{M}^{-1} x = \left[ {\begin{array}{cc} \mathbb{I}_n[k]x \\ Ax \\ \end{array} } \right] = (\tilde{x}_1, \tilde{x}_2)^T\end{split}\]

where \(\tilde{x}_1 \in \mathbb{R}^k\) and \(\mathbf{l} \leq \mathbf{\tilde{x}}_2 \leq \mathbf{u}\) because \(\mathbf{l} \leq \mathbf{Ax} \leq \mathbf{u}\). Thus \(\mathbf{\tilde{x}} \in \mathbf{\tilde{X}}\).

\(\Leftarrow\)” (Proof by negation):

Let \(x \not\in \mathbf{X}\) and define \(\mathbf{\tilde{x}} = \mathbf{M}^{-1} x\). Claim \(\mathbf{\tilde{x}} \not\in \mathbf{\tilde{X}}\).

By the same argument as above we can show, that, because \(\neg(\mathbf{l} \leq \mathbf{Ax} \leq \mathbf{u})\), \(\mathbf{\tilde{x}} \not\in \mathbf{\tilde{X}}\).

The rank condition on M makes it clear that there can be at most as many linear constraints as involved parameters. This includes any box constraints on the involved parameters.

Probability Constraints

A probability constraint on k parameters means that all parameters lie in \([0, 1]\) and their sum equals one. While those are all linear constraints, they cannot be implemented in the way described above, because there are k + 1 constraints for k parameters.

Instead we do the following

\[\begin{split}\tilde{x} = (\tilde{x}_1, \tilde{x}_2, \ldots, \tilde{x}_{k - 1})\\ g(\tilde{x}) = (\frac{\tilde{x}_1}{1 + \sum_{i=1}^{k-1}\tilde{x}_i}, \frac{\tilde{x}_2}{1 + \sum_{i=1}^{k-1}\tilde{x}_i}, \ldots, \frac{1}{1 + \sum_{i=1}^{k-1}\tilde{x}_i})\\ l' = (0, 0, \ldots, 0)\end{split}\]

A limitation of this approach is that there can be no additional fixes, box constraints or other constraints on any of the involved parameters.

References:

Gro94

Eildert Groeneveld. A reparameterization to improve numerical optimization in multivariate reml (co)variance component estimation. Genetics, Selection, Evolution : GSE, 26:537 – 545, 1994.

PB96

José C. Pinheiro and Douglas M. Bates. Unconstrained parameterizations for variance-covariance matrices. Statistics and Computing, 6:289–296, 1996.