Hi there, thanks for exploring this repository. It has currently been archived. Please refer to here.
People optimize. This is a repo for my notes and exercise of Book Numerical Optimization by J. Nocedal, S.J. Wright
โญElements of optimization๐ฆ
objective
: sth. need to be first defined with quantitative measure.
variable / unknown
: the characteristics of the system, which can optimize the objective
constraint
: the constraint of variables
modeling
: the process identifying objectives, variables, and constraints for a given problem
algorithm
: no universal one, but should find the tailored one related to the objective
optimality condition
: mathematical expression checking whether it is a good solution
sensitivity analysis
: possible solution to improve
โญ(mathematical speaking) Optimization is the minimization / maximization of a function subject to constraints on its variables.
-
$x$ โ - variable, unknown, parameters(vectorๅ้) -
$f$ - objective function็ฎๆ ๅฝๆฐ(scalar function) of$x$ -
$c_i$ - constraint function็บฆๆๅฝๆฐ(scalar function) of$x$ โโ where must be satisfied -
$\Omega$ - feasible domainๅฏ่กๅ
Let's take an example:
$$
\min(x_1-2)^2+(x_2-1)^2\space\space \text{subject to}ย ย ย ย ย ย ย ย
\begin{cases}
ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย x_1^2-x_2\leq0,ย ย \
ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย x_1+x_2\leq2.
ย ย ย ย ย ย ย ย \end{cases}
$$
$ x_1+x_2\leq2$โ is a linear curve
The above equation can be illustrated as followed:
The above equations can be written as followed:
$$
f(x) =(x_1-2)^2+(x_2-1)^2, \quad x=\begin{bmatrix}x1\x2\end{bmatrix},\
c(x)=\begin{bmatrix}\quad c_1(x)\quad\\quad c_2(x)\quad\end{bmatrix}=\begin{bmatrix}-{x_1}^2+x_2\\quad-x_1+-x_2+2\quad\end{bmatrix}, I={1,2}, E=\empty
$$
:heavy_check_mark: The
โ
โ
โ To conclude, the optimization can be written as:
โญThis is a linear problem.
But if there is a fee for storing product, then the cost is $$ \sum_{ij}c_{ij}\sqrt{{\epsilon}+x_{ij}} $$ :star: This is a non-linear problem!
๐ Difficulty: continuous optimization <
discrete optimization.
continuous optimization:
example in
1.1
discrete optimization:
integer programming : constraints, which have the form
$x_i โ Z$ ,where$Z$ is the set of integers($x_i โ{1,2,5}$ ), or binary constraints($x_i โ{0, 1}$ โ)
mixed integer programming(MIP) : both integer or binary constraints.
๐ Difficulty: unconstrained optimizationๆ ็บฆๆไผๅ <
constrained optimization็บฆๆไผๅ
Unconstrained optimization
e.g.
$E = I = \empty$
Constrained optimization
e.g.
$0\leq x_i\leq 100,\sum_ix_i\leq1$
๐ Difficulty: local optimization <
global optimization
In linear programming / convex programming:
global solution = local solution
In non-linear programming:
โ tend to find local solution, since global is hard to find
Certainty:
โ Deterministic optimization
Uncertainty:
โ stochastic optimization => a number of possible scenarios to optimize the expected performance
โ chance-constrained optimization => ensure
$x$ โ satisfy constraints with some probabilityโ robust optimization => certain constraints to hold for all possible values of the uncertain data.
โญThis is the fundamental concept in optimization. In CHN, convex is ๅธ๏ผnot ็ง
Convex Set Definition: A set
Left: Non-convex, Right: Convex
Convex Function: its domain
โ You may wonder is this convex?? why not concave?? Yes, function is said to be concave if it is convex.
Strictly convex:
Linear function = convex: $$ f(x)=c^Tx+a $$
โ
$c$ : vector$\in\mathbb{R}$
โ
$a$ : scalar
Quadratic function = convex: $$ f(x) = x^T Hx $$
โ
$H$ : symmetric positive semidefinite matrix
Unit ball = convex: $$ {y โ \mathbb{R}^n\quad |\quad\norm{y}_2\leq 1} $$
polyhedron = convex: $$ {x โ \mathbb{R}^n\quad |\quad Ax = b, \quad Cx โค d} $$
โ
$A,C$ : matrices of appropriate dimension
โ
$b, d$ :vectors.
โญ If the objective function in the optimization problem (formula 1) and the feasible region are both convex, then any local solution of the problem is in fact a global solution.
Convex programming
It is a special case of general constrained optimization problem.
โ the objective function is convex,
โ the equality constraint functions $c_i (ยท), i โ E, $are linear
โ the inequality constraint functions
$c_i (ยท), i โ I$ ,are concave.
โญ They are all iterative process.
Good algorithms must:
โ Robustness, Efficiency, Accuracy
The wisdom is to manage the tradeoffs between convergence rate and storage requirements, and between robustness and speed, and so on, are central issues in numerical optimization. Because no algorithm is perfect and they all have pros and cons.
Unconstrained Optimization๏ผ
โ 1. variable: real number$\mathbb{R}$โ with no restrictions(infinite)
โ 2. formula:
$\text{min}_xf(x)$ โโ 3.
$x\in \mathbb{R}^n$ : is a real vector,$n\geq 1$ โ 4.
$f:\mathbb{R}^n \to \mathbb{R}$ โ , is a smooth functionโ 5. characteristic:lack of global perspective on the function(due to 1.), only some scope on
$x_1,x_2,...$
Example on unconstrained optimization:
Least squares data fitting problem Taking the above problem, we can take down the process into
- inspect the data
- deduce the signal with possible solution
- detect exponential and oscillatory behavior
- write a formula
$\phi(t;x)=x_1+x_2e^{-(x_3-t^2)/4}+x_5cos(x_6t)$ $t$ : times at$x$ axis, (input)- y: output
$x_i,i=1,2,...6$ : the parameter of the model- therefore
$x_i$ can also be written as a vector:$x=(x_1,x_2,...,x_6)^T$ - What to do? Minimize the discrepancy between
$\phi(t;x)$ and$y_t$ $r_j(x)=y_j-\phi(t_j;x),\quad j=1,2,...,m$ m is the amount of input data
Written formally: $$ \min_{x\in\mathbb{R}^6} f(x)=r_1^2(x)+r_2^2(x)+...+r_m^2(x) $$ :star:This is so-called non-linear least-squares problems(้็บฟๆงๆๅฐไบไน้ฎ้ข).
Quick question why square the
Global minimizer:
โ A point $x^$ is a global minimizer if $f(x^)\leq f(x)$ for all
$x\in\mathbb{R}^n$ โ
Local minimizer:
โ A point
$x^โ$ is a local minimizer if there is a neighborhood$N$ of$x^โ$ such that $f(x^โ) โค f(x)$for all$x โ N$ .
Weak local minimizer:
โ copy definition from
local minimizer
, + when$N$ โ is an open set that contains $x^$โโ , e.g. $N=[0,2], x^=2$โโ e.g.
$f(x)=2$ โ, where every point is weal local minimizer
Strict/Strong local minimizer:
โ A point
e.g. f(x)=(x-2)^2$x^โ$ โโ is a strict local minimizer if there is a neighborhood$N$ โโ of$x^โ$ โโ such that$f(x^โ) < f(x)$ โโfor all$x โ N$ โ with$x\neq x^*$ โโโ.
Isolated local minimizer:
โ This is a little bit confusing thinking together with
strict local minimizer
.โ :star: The solid conclusion is isolated local minimizer
$\subset$ โโ strict local minimizerโ What the hell?! OK, please recall the methodology of infinity and look at the following formula:
โ
$f(x)=x^4cos(1/x)+2x^4$ โ Nothing special in the beginning, but if we zoom in around 0, we can notice that the curve oscillates very much!! Therefore, if we think about
$x_j\to0$ , there are infinite points are local minimizer whose value=0. Therefore$j\to\infin$ โ, there are many many strict local minimizers but NONE of them are isolated local minimizer.
So in future practice, we should pay attention to those crazy function may be "trapped".
A difficult case for global minimizationโ How to recognize a local minimum without examining all the points?
โ :star:In particular, if
$f$ is twice continuously differentiable, we may be able to tell that$x^โ$ is a local minimizer (and possibly a strict local minimizer) by examining just the gradient$โ f(x^โ)$ and the Hessian$โ^2 f(x^โ)$ .
In the following, few theorem will be used and introduced multiple times.
โ Prerequisites:
$f :\mathbb{R}^n โ \mathbb{R}$ is continuously differentiable and that$p โ \mathbb{R}^n$ (first continuously differentiable)โ then we have:
โ for some
$t\in (0,1)$ , if$f$ โโ is twice continuously differentiable,โ then we have:
โ Prerequisites: If
$x^โ$ is a local minimizer and$f$ is continuously differentiable in an open neighborhood of$x^โ$ ,โ then we have:
โ To see it geometrically:
โ Fun fact: the point
$x^*$ are also called stationary point.
// TODO Explain and proof
โ Prerequisites:
$x^โ$ is a local minimizer of$f$ and$โ^2 f$ exists and is continuous in an open neighborhood of$x^*$ โ,โ then we have:
// TODO Explain and proof, and add texts of positive semidefinite
.
โ Prerequisites:
$โ^2 f$ is continuous in an open neighborhood of$x^โ$ and that$โ f(x^โ)= 0$ and$โ^2 f(x^โ)$ โ is positive definite,โ then we have:
// TODO add example and proof
โ Prerequisites: When
$f$ is convex, then we have:
โ Prerequisites: if in addition
$f$ is differentiable, then we have:
These results, which are based on elementary calculus, provide the foundations for unconstrained optimization algorithms. In one way or another, all algorithms seek a point where
โ What it is?
Geometrically, the nonsmooth function consists of a few smooth pieces, with discontinuities between the pieces.
โ So what we gonna do?
It may be possible to find the minimizer by minimizing each smooth piece individually, a.k.a. examing the subgradient and generalized gradient.
(a side note, this book will not cover non-smooth problem.)
All algorithms for unconstrained minimization require the user to supply a starting point, which we usually denote by