From 346970a6fce2612fe9002df7084dd71a03bf37f3 Mon Sep 17 00:00:00 2001 From: "Documenter.jl" Date: Fri, 28 Jun 2024 12:24:46 +0000 Subject: [PATCH] build based on 9be046c --- dev/advanced/index.html | 2 +- dev/basics/index.html | 2 +- dev/contributing/index.html | 2 +- dev/examples/docs_00_fw_visualized/index.html | 404 +++++++-------- dev/examples/docs_01_mathopt_lmo/index.html | 330 ++++++------ .../docs_02_polynomial_regression/index.html | 468 +++++++++--------- .../docs_03_matrix_completion/index.html | 440 ++++++++-------- dev/examples/docs_04_rational_opt/index.html | 48 +- dev/examples/docs_05_blended_cg/index.html | 156 +++--- dev/examples/docs_06_spectrahedron/index.html | 166 +++---- .../docs_07_shifted_norm_polytopes/index.html | 162 +++--- .../docs_08_callback_and_tracking/index.html | 4 +- .../docs_09_extra_vertex_storage/index.html | 4 +- .../docs_10_alternating_methods/index.html | 222 ++++----- .../docs_11_block_coordinate_fw/index.html | 466 ++++++++--------- .../docs_12_quadratic_symmetric/index.html | 230 ++++----- dev/examples/plot_utils.jl | 16 - dev/index.html | 2 +- dev/reference/0_reference/index.html | 2 +- dev/reference/1_algorithms/index.html | 2 +- dev/reference/2_lmo/index.html | 2 +- dev/reference/3_backend/index.html | 6 +- dev/reference/4_linesearch/index.html | 2 +- dev/search/index.html | 2 +- 24 files changed, 1564 insertions(+), 1576 deletions(-) diff --git a/dev/advanced/index.html b/dev/advanced/index.html index 922909bbf..5f2ef1964 100644 --- a/dev/advanced/index.html +++ b/dev/advanced/index.html @@ -76,4 +76,4 @@ Base.:-(x1::IT, x2::IT) LinearAlgebra.dot(x1::IT, x2::IT) LinearAlgebra.norm(::IT)

For methods using an FrankWolfe.ActiveSet, the atoms or individual extreme points of the feasible region are not necessarily of the same type as the iterate. They are assumed to be immutable, must implement LinearAlgebra.dot with a gradient object. See for example FrankWolfe.RankOneMatrix or FrankWolfe.ScaledHotVector.

The iterate type IT must be a broadcastable mutable object or implement FrankWolfe.compute_active_set_iterate!:

FrankWolfe.compute_active_set_iterate!(active_set::FrankWolfe.ActiveSet{AT, R, IT}) where {AT, R}

which recomputes the iterate from the current convex decomposition and the following methods FrankWolfe.active_set_update_scale! and FrankWolfe.active_set_update_iterate_pairwise!:

FrankWolfe.active_set_update_scale!(x::IT, lambda, atom)
-FrankWolfe.active_set_update_iterate_pairwise!(x::IT, lambda, fw_atom, away_atom)

Symmetry reduction

Example: examples/reynolds.jl

Suppose that there is a group $G$ acting on the underlying vector space and such that for all $x\in\mathcal{C}$ and $g\in G$

\[f(g\cdot x)=f(x)\quad\text{and}\quad g\cdot x\in\mathcal{C}.\]

Then, the computations can be performed in the subspace invariant under $G$. This subspace is the image of the Reynolds operator defined by

\[\mathcal{R}(x)=\frac{1}{|G|}\sum_{g\in G}g\cdot x.\]

In practice, the type SymmetricLMO allows the user to provide the Reynolds operator $\mathcal{R}$ as well as its adjoint $\mathcal{R}^\ast$. The gradient is symmetrised with $\mathcal{R}^\ast$, then passed to the non-symmetric LMO, and the resulting output is symmetrised with $\mathcal{R}$. In many cases, the gradient is already symmetric so that reynolds_adjoint(gradient, lmo) = gradient is a fast and valid choice.

+FrankWolfe.active_set_update_iterate_pairwise!(x::IT, lambda, fw_atom, away_atom)

Symmetry reduction

Example: examples/reynolds.jl

Suppose that there is a group $G$ acting on the underlying vector space and such that for all $x\in\mathcal{C}$ and $g\in G$

\[f(g\cdot x)=f(x)\quad\text{and}\quad g\cdot x\in\mathcal{C}.\]

Then, the computations can be performed in the subspace invariant under $G$. This subspace is the image of the Reynolds operator defined by

\[\mathcal{R}(x)=\frac{1}{|G|}\sum_{g\in G}g\cdot x.\]

In practice, the type SymmetricLMO allows the user to provide the Reynolds operator $\mathcal{R}$ as well as its adjoint $\mathcal{R}^\ast$. The gradient is symmetrised with $\mathcal{R}^\ast$, then passed to the non-symmetric LMO, and the resulting output is symmetrised with $\mathcal{R}$. In many cases, the gradient is already symmetric so that reynolds_adjoint(gradient, lmo) = gradient is a fast and valid choice.

diff --git a/dev/basics/index.html b/dev/basics/index.html index 6e2b7912b..d46719767 100644 --- a/dev/basics/index.html +++ b/dev/basics/index.html @@ -1,2 +1,2 @@ -How does it work? · FrankWolfe.jl

How does it work?

FrankWolfe.jl contains generic routines to solve optimization problems of the form

\[\min_{x \in \mathcal{C}} f(x)\]

where $\mathcal{C}$ is a compact convex set and $f$ is a differentiable function. These routines work by solving a sequence of linear subproblems:

\[\min_{x \in \mathcal{C}} \langle d_k, x \rangle \quad \text{where} \quad d_k = \nabla f(x_k)\]

Linear Minimization Oracles

The Linear Minimization Oracle (LMO) is a key component, which is called at each iteration of the FW algorithm. Given a direction $d$, it returns an optimal vertex of the feasible set:

\[v \in \arg \min_{x\in \mathcal{C}} \langle d,x \rangle.\]

Custom LMOs

To be used by the algorithms provided here, an LMO must be a subtype of FrankWolfe.LinearMinimizationOracle and implement the following method:

compute_extreme_point(lmo::LMO, direction; kwargs...) -> v

This method should minimize $v \mapsto \langle d, v \rangle$ over the set $\mathcal{C}$ defined by the LMO. Note that this means the set $\mathcal{C}$ doesn't have to be represented explicitly: all we need is to be able to minimize a linear function over it, even if the minimization procedure is a black box.

Pre-defined LMOs

If you don't want to define your LMO manually, several common implementations are available out-of-the-box:

  • Simplices: unit simplex, probability simplex
  • Balls in various norms
  • Polytopes: K-sparse, Birkhoff

You can use an oracle defined via a Linear Programming solver (e.g. SCIP or HiGHS) with MathOptInferface: see FrankWolfe.MathOptLMO.

Finally, we provide wrappers to combine oracles easily, for example in a product.

See Combettes, Pokutta (2021) for references on most LMOs implemented in the package and their comparison with projection operators.

Optimization algorithms

The package features several variants of Frank-Wolfe that share the same basic API.

Most of the algorithms listed below also have a lazified version: see Braun, Pokutta, Zink (2016).

Standard Frank-Wolfe (FW)

It is implemented in the frank_wolfe function.

See Jaggi (2013) for an overview.

This algorithm works both for convex and non-convex functions (use step size rule FrankWolfe.Nonconvex() in the second case).

Away-step Frank-Wolfe (AFW)

It is implemented in the away_frank_wolfe function.

See Lacoste-Julien, Jaggi (2015) for an overview.

Stochastic Frank-Wolfe (SFW)

It is implemented in the FrankWolfe.stochastic_frank_wolfe function.

Blended Conditional Gradients (BCG)

It is implemented in the blended_conditional_gradient function, with a built-in stability feature that temporarily increases accuracy.

See Braun, Pokutta, Tu, Wright (2018).

Pairwise Frank-Wolfe (PFW)

It is implemented in the pairwise_frank_wolfe function. See Lacoste-Julien, Jaggi (2015) for an overview.

Blended Pairwise Conditional Gradients (BPCG)

It is implemented in the FrankWolfe.blended_pairwise_conditional_gradient function, with a minor modification to improve sparsity.

See Tsuji, Tanaka, Pokutta (2021)

Comparison

The following table compares the characteristics of the algorithms presented in the package:

AlgorithmProgress/IterationTime/IterationSparsityNumerical StabilityActive SetLazifiable
FWLowLowLowHighNoYes
AFWMediumMedium-HighMediumMedium-HighYesYes
B(P)CGHighMedium-HighHighMediumYesBy design
SFWLowLowLowHighNoNo

While the standard Frank-Wolfe algorithm can only move towards extreme points of the compact convex set $\mathcal{C}$, Away-step Frank-Wolfe can move away from them. The following figure from our paper illustrates this behaviour:

FW vs AFW.

Both algorithms minimize a quadratic function (whose contour lines are depicted) over a simple polytope (the black square). When the minimizer lies on a face, the standard Frank-Wolfe algorithm zig-zags towards the solution, while its Away-step variant converges more quickly.

Block-Coordinate Frank-Wolfe (BCFW)

It is implemented in the FrankWolfe.block_coordinate_frank_wolfe function.

See Lacoste-Julien, Jaggi, Schmidt, Pletscher (2013) and Beck, Pauwels, Sabach (2015) for more details about different variants of Block-Coordinate Frank-Wolfe.

Alternating Linear Minimization (ALM)

It is implemented in the FrankWolfe.alternating_linear_minimization function.

+How does it work? · FrankWolfe.jl

How does it work?

FrankWolfe.jl contains generic routines to solve optimization problems of the form

\[\min_{x \in \mathcal{C}} f(x)\]

where $\mathcal{C}$ is a compact convex set and $f$ is a differentiable function. These routines work by solving a sequence of linear subproblems:

\[\min_{x \in \mathcal{C}} \langle d_k, x \rangle \quad \text{where} \quad d_k = \nabla f(x_k)\]

Linear Minimization Oracles

The Linear Minimization Oracle (LMO) is a key component, which is called at each iteration of the FW algorithm. Given a direction $d$, it returns an optimal vertex of the feasible set:

\[v \in \arg \min_{x\in \mathcal{C}} \langle d,x \rangle.\]

Custom LMOs

To be used by the algorithms provided here, an LMO must be a subtype of FrankWolfe.LinearMinimizationOracle and implement the following method:

compute_extreme_point(lmo::LMO, direction; kwargs...) -> v

This method should minimize $v \mapsto \langle d, v \rangle$ over the set $\mathcal{C}$ defined by the LMO. Note that this means the set $\mathcal{C}$ doesn't have to be represented explicitly: all we need is to be able to minimize a linear function over it, even if the minimization procedure is a black box.

Pre-defined LMOs

If you don't want to define your LMO manually, several common implementations are available out-of-the-box:

  • Simplices: unit simplex, probability simplex
  • Balls in various norms
  • Polytopes: K-sparse, Birkhoff

You can use an oracle defined via a Linear Programming solver (e.g. SCIP or HiGHS) with MathOptInferface: see FrankWolfe.MathOptLMO.

Finally, we provide wrappers to combine oracles easily, for example in a product.

See Combettes, Pokutta (2021) for references on most LMOs implemented in the package and their comparison with projection operators.

Optimization algorithms

The package features several variants of Frank-Wolfe that share the same basic API.

Most of the algorithms listed below also have a lazified version: see Braun, Pokutta, Zink (2016).

Standard Frank-Wolfe (FW)

It is implemented in the frank_wolfe function.

See Jaggi (2013) for an overview.

This algorithm works both for convex and non-convex functions (use step size rule FrankWolfe.Nonconvex() in the second case).

Away-step Frank-Wolfe (AFW)

It is implemented in the away_frank_wolfe function.

See Lacoste-Julien, Jaggi (2015) for an overview.

Stochastic Frank-Wolfe (SFW)

It is implemented in the FrankWolfe.stochastic_frank_wolfe function.

Blended Conditional Gradients (BCG)

It is implemented in the blended_conditional_gradient function, with a built-in stability feature that temporarily increases accuracy.

See Braun, Pokutta, Tu, Wright (2018).

Pairwise Frank-Wolfe (PFW)

It is implemented in the pairwise_frank_wolfe function. See Lacoste-Julien, Jaggi (2015) for an overview.

Blended Pairwise Conditional Gradients (BPCG)

It is implemented in the FrankWolfe.blended_pairwise_conditional_gradient function, with a minor modification to improve sparsity.

See Tsuji, Tanaka, Pokutta (2021)

Comparison

The following table compares the characteristics of the algorithms presented in the package:

AlgorithmProgress/IterationTime/IterationSparsityNumerical StabilityActive SetLazifiable
FWLowLowLowHighNoYes
AFWMediumMedium-HighMediumMedium-HighYesYes
B(P)CGHighMedium-HighHighMediumYesBy design
SFWLowLowLowHighNoNo

While the standard Frank-Wolfe algorithm can only move towards extreme points of the compact convex set $\mathcal{C}$, Away-step Frank-Wolfe can move away from them. The following figure from our paper illustrates this behaviour:

FW vs AFW.

Both algorithms minimize a quadratic function (whose contour lines are depicted) over a simple polytope (the black square). When the minimizer lies on a face, the standard Frank-Wolfe algorithm zig-zags towards the solution, while its Away-step variant converges more quickly.

Block-Coordinate Frank-Wolfe (BCFW)

It is implemented in the FrankWolfe.block_coordinate_frank_wolfe function.

See Lacoste-Julien, Jaggi, Schmidt, Pletscher (2013) and Beck, Pauwels, Sabach (2015) for more details about different variants of Block-Coordinate Frank-Wolfe.

Alternating Linear Minimization (ALM)

It is implemented in the FrankWolfe.alternating_linear_minimization function.

diff --git a/dev/contributing/index.html b/dev/contributing/index.html index 09838d002..34ed440d4 100644 --- a/dev/contributing/index.html +++ b/dev/contributing/index.html @@ -4,4 +4,4 @@ """ function f(x) # ... -end

Provide a new example or test

If you fix a bug, one would typically expect to add a test that validates that the bug is gone. A test would be added in a file in the test/ folder, for which the entry point is runtests.jl.

The examples/ folder features several examples covering different problem settings and algorithms. The examples are expected to run with the same environment and dependencies as the tests using TestEnv. If the example is lightweight enough, it can be added to the docs/src/examples/ folder which generates pages for the documentation based on Literate.jl.

Provide a new feature

Contributions bringing new features are also welcome. If the feature is likely to impact performance, some benchmarks should be run with BenchmarkTools on several of the examples to assert the effect at different problem sizes. If the feature should only be active in some cases, a keyword should be added to the main algorithms to support it.

Some typical features to implement are:

  1. A new Linear Minimization Oracle (LMO)
  2. A new step size
  3. A new algorithm (less frequent) following the same API.

Code style

We try to follow the Julia documentation guidelines. We run JuliaFormatter.jl on the repo in the way set in the .JuliaFormatter.toml file, which enforces a number of conventions.

This contribution guide was inspired by ColPrac and the one in Manopt.jl.

+end

Provide a new example or test

If you fix a bug, one would typically expect to add a test that validates that the bug is gone. A test would be added in a file in the test/ folder, for which the entry point is runtests.jl.

The examples/ folder features several examples covering different problem settings and algorithms. The examples are expected to run with the same environment and dependencies as the tests using TestEnv. If the example is lightweight enough, it can be added to the docs/src/examples/ folder which generates pages for the documentation based on Literate.jl.

Provide a new feature

Contributions bringing new features are also welcome. If the feature is likely to impact performance, some benchmarks should be run with BenchmarkTools on several of the examples to assert the effect at different problem sizes. If the feature should only be active in some cases, a keyword should be added to the main algorithms to support it.

Some typical features to implement are:

  1. A new Linear Minimization Oracle (LMO)
  2. A new step size
  3. A new algorithm (less frequent) following the same API.

Code style

We try to follow the Julia documentation guidelines. We run JuliaFormatter.jl on the repo in the way set in the .JuliaFormatter.toml file, which enforces a number of conventions.

This contribution guide was inspired by ColPrac and the one in Manopt.jl.

diff --git a/dev/examples/docs_00_fw_visualized/index.html b/dev/examples/docs_00_fw_visualized/index.html index c9b1ccf4a..2b1b9d710 100644 --- a/dev/examples/docs_00_fw_visualized/index.html +++ b/dev/examples/docs_00_fw_visualized/index.html @@ -1,5 +1,5 @@ -Visualization of Frank-Wolfe running on a 2-dimensional polytope · FrankWolfe.jl
Main.check_gradients

Visualization of Frank-Wolfe running on a 2-dimensional polytope

This example provides an intuitive view of the Frank-Wolfe algorithm by running it on a polyhedral set with a quadratic function. The Linear Minimization Oracle (LMO) corresponds to a call to a generic simplex solver from MathOptInterface.jl (MOI).

Import and setup

We first import the necessary packages, including Polyhedra to visualize the feasible set.

using LinearAlgebra
+Visualization of Frank-Wolfe running on a 2-dimensional polytope · FrankWolfe.jl
plot_sparsity (generic function with 1 method)

Visualization of Frank-Wolfe running on a 2-dimensional polytope

This example provides an intuitive view of the Frank-Wolfe algorithm by running it on a polyhedral set with a quadratic function. The Linear Minimization Oracle (LMO) corresponds to a call to a generic simplex solver from MathOptInterface.jl (MOI).

Import and setup

We first import the necessary packages, including Polyhedra to visualize the feasible set.

using LinearAlgebra
 using FrankWolfe
 
 import MathOptInterface
@@ -122,119 +122,119 @@
 )
- + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + +

plot chosen vertices

scatter!([vertices[1][1]], [vertices[1][2]], m=:diamond, markersize=6, color=colors[1], label="v_1")
 scatter!(
     [vertices[2][1]],
@@ -248,121 +248,121 @@
 )
- + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

This page was generated using Literate.jl.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +

This page was generated using Literate.jl.

diff --git a/dev/examples/docs_01_mathopt_lmo/index.html b/dev/examples/docs_01_mathopt_lmo/index.html index 83d2135ff..b6641ed91 100644 --- a/dev/examples/docs_01_mathopt_lmo/index.html +++ b/dev/examples/docs_01_mathopt_lmo/index.html @@ -1,5 +1,5 @@ -Comparison with MathOptInterface on a Probability Simplex · FrankWolfe.jl
Main.check_gradients

Comparison with MathOptInterface on a Probability Simplex

In this example, we project a random point onto a probability simplex with the Frank-Wolfe algorithm using either the specialized LMO defined in the package or a generic LP formulation using MathOptInterface.jl (MOI) and GLPK as underlying LP solver. It can be found as Example 4.4 in the paper.

using FrankWolfe
+Comparison with MathOptInterface on a Probability Simplex · FrankWolfe.jl
plot_sparsity (generic function with 1 method)

Comparison with MathOptInterface on a Probability Simplex

In this example, we project a random point onto a probability simplex with the Frank-Wolfe algorithm using either the specialized LMO defined in the package or a generic LP formulation using MathOptInterface.jl (MOI) and GLPK as underlying LP solver. It can be found as Example 4.4 in the paper.

using FrankWolfe
 
 using LinearAlgebra
 using LaTeXStrings
@@ -130,191 +130,191 @@
 )
- + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + -

This page was generated using Literate.jl.

+

This page was generated using Literate.jl.

diff --git a/dev/examples/docs_02_polynomial_regression/index.html b/dev/examples/docs_02_polynomial_regression/index.html index 851a749b8..d4cd1b883 100644 --- a/dev/examples/docs_02_polynomial_regression/index.html +++ b/dev/examples/docs_02_polynomial_regression/index.html @@ -1,5 +1,5 @@ -Polynomial Regression · FrankWolfe.jl
Main.check_gradients

Polynomial Regression

The following example features the LMO for polynomial regression on the $\ell_1$ norm ball. Given input/output pairs $\{x_i,y_i\}_{i=1}^N$ and sparse coefficients $c_j$, where

\[y_i=\sum_{j=1}^m c_j f_j(x_i)\]

and $f_j: \mathbb{R}^n\to\mathbb{R}$, the task is to recover those $c_j$ that are non-zero alongside their corresponding values. Under certain assumptions, this problem can be convexified into

\[\min_{c\in\mathcal{C}}||y-Ac||^2\]

for a convex set $\mathcal{C}$. It can also be found as example 4.1 in the paper. In order to evaluate the polynomial, we generate a total of 1000 data points $\{x_i\}_{i=1}^N$ from the standard multivariate Gaussian, with which we will compute the output variables $\{y_i\}_{i=1}^N$. Before evaluating the polynomial, these points will be contaminated with noise drawn from a standard multivariate Gaussian. We run the away_frank_wolfe and blended_conditional_gradient algorithms, and compare them to Projected Gradient Descent using a smoothness estimate. We will evaluate the output solution on test points drawn in a similar manner as the training points.

using FrankWolfe
+Polynomial Regression · FrankWolfe.jl
plot_sparsity (generic function with 1 method)

Polynomial Regression

The following example features the LMO for polynomial regression on the $\ell_1$ norm ball. Given input/output pairs $\{x_i,y_i\}_{i=1}^N$ and sparse coefficients $c_j$, where

\[y_i=\sum_{j=1}^m c_j f_j(x_i)\]

and $f_j: \mathbb{R}^n\to\mathbb{R}$, the task is to recover those $c_j$ that are non-zero alongside their corresponding values. Under certain assumptions, this problem can be convexified into

\[\min_{c\in\mathcal{C}}||y-Ac||^2\]

for a convex set $\mathcal{C}$. It can also be found as example 4.1 in the paper. In order to evaluate the polynomial, we generate a total of 1000 data points $\{x_i\}_{i=1}^N$ from the standard multivariate Gaussian, with which we will compute the output variables $\{y_i\}_{i=1}^N$. Before evaluating the polynomial, these points will be contaminated with noise drawn from a standard multivariate Gaussian. We run the away_frank_wolfe and blended_conditional_gradient algorithms, and compare them to Projected Gradient Descent using a smoothness estimate. We will evaluate the output solution on test points drawn in a similar manner as the training points.

using FrankWolfe
 
 using LinearAlgebra
 import Random
@@ -246,260 +246,260 @@
 )
- + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + -

This page was generated using Literate.jl.

+

This page was generated using Literate.jl.

diff --git a/dev/examples/docs_03_matrix_completion/index.html b/dev/examples/docs_03_matrix_completion/index.html index f7e5234b1..14c091426 100644 --- a/dev/examples/docs_03_matrix_completion/index.html +++ b/dev/examples/docs_03_matrix_completion/index.html @@ -1,5 +1,5 @@ -Matrix Completion · FrankWolfe.jl
Main.check_gradients

Matrix Completion

We present another example that is about matrix completion. The idea is, given a partially observed matrix $Y\in\mathbb{R}^{m\times n}$, to find $X\in\mathbb{R}^{m\times n}$ to minimize the sum of squared errors from the observed entries while 'completing' the matrix $Y$, i.e. filling the unobserved entries to match $Y$ as good as possible. A detailed explanation can be found in section 4.2 of the paper. We will try to solve

\[\min_{||X||_*\le \tau} \sum_{(i,j)\in\mathcal{I}} (X_{i,j}-Y_{i,j})^2,\]

where $\tau>0$, $||X||_*$ is the nuclear norm, and $\mathcal{I}$ denotes the indices of the observed entries. We will use FrankWolfe.NuclearNormLMO and compare our Frank-Wolfe implementation with a Projected Gradient Descent (PGD) algorithm which, after each gradient descent step, projects the iterates back onto the nuclear norm ball. We use a movielens dataset for comparison.

using FrankWolfe
+Matrix Completion · FrankWolfe.jl
plot_sparsity (generic function with 1 method)

Matrix Completion

We present another example that is about matrix completion. The idea is, given a partially observed matrix $Y\in\mathbb{R}^{m\times n}$, to find $X\in\mathbb{R}^{m\times n}$ to minimize the sum of squared errors from the observed entries while 'completing' the matrix $Y$, i.e. filling the unobserved entries to match $Y$ as good as possible. A detailed explanation can be found in section 4.2 of the paper. We will try to solve

\[\min_{||X||_*\le \tau} \sum_{(i,j)\in\mathcal{I}} (X_{i,j}-Y_{i,j})^2,\]

where $\tau>0$, $||X||_*$ is the nuclear norm, and $\mathcal{I}$ denotes the indices of the observed entries. We will use FrankWolfe.NuclearNormLMO and compare our Frank-Wolfe implementation with a Projected Gradient Descent (PGD) algorithm which, after each gradient descent step, projects the iterates back onto the nuclear norm ball. We use a movielens dataset for comparison.

using FrankWolfe
 using ZipFile, DataFrames, CSV
 
 using Random
@@ -265,244 +265,248 @@
 )
- + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + -

This page was generated using Literate.jl.

+

This page was generated using Literate.jl.

diff --git a/dev/examples/docs_04_rational_opt/index.html b/dev/examples/docs_04_rational_opt/index.html index cc38fc5ff..7c9ec1072 100644 --- a/dev/examples/docs_04_rational_opt/index.html +++ b/dev/examples/docs_04_rational_opt/index.html @@ -1,5 +1,5 @@ -Exact Optimization with Rational Arithmetic · FrankWolfe.jl
Main.check_gradients

Exact Optimization with Rational Arithmetic

This example can be found in section 4.3 in the paper. The package allows for exact optimization with rational arithmetic. For this, it suffices to set up the LMO to be rational and choose an appropriate step-size rule as detailed below. For the LMOs included in the package, this simply means initializing the radius with a rational-compatible element type, e.g., 1, rather than a floating-point number, e.g., 1.0. Given that numerators and denominators can become quite large in rational arithmetic, it is strongly advised to base the used rationals on extended-precision integer types such as BigInt, i.e., we use Rational{BigInt}.

The second requirement ensuring that the computation runs in rational arithmetic is a rational-compatible step-size rule. The most basic step-size rule compatible with rational optimization is the agnostic step-size rule with $\gamma_t = 2/(2 + t)$. With this step-size rule, the gradient does not even need to be rational as long as the atom computed by the LMO is of a rational type. Assuming these requirements are met, all iterates and the computed solution will then be rational.

using FrankWolfe
+Exact Optimization with Rational Arithmetic · FrankWolfe.jl
plot_sparsity (generic function with 1 method)

Exact Optimization with Rational Arithmetic

This example can be found in section 4.3 in the paper. The package allows for exact optimization with rational arithmetic. For this, it suffices to set up the LMO to be rational and choose an appropriate step-size rule as detailed below. For the LMOs included in the package, this simply means initializing the radius with a rational-compatible element type, e.g., 1, rather than a floating-point number, e.g., 1.0. Given that numerators and denominators can become quite large in rational arithmetic, it is strongly advised to base the used rationals on extended-precision integer types such as BigInt, i.e., we use Rational{BigInt}.

The second requirement ensuring that the computation runs in rational arithmetic is a rational-compatible step-size rule. The most basic step-size rule compatible with rational optimization is the agnostic step-size rule with $\gamma_t = 2/(2 + t)$. With this step-size rule, the gradient does not even need to be rational as long as the atom computed by the LMO is of a rational type. Assuming these requirements are met, all iterates and the computed solution will then be rational.

using FrankWolfe
 using LinearAlgebra
 
 n = 100
@@ -34,17 +34,17 @@
   Type     Iteration         Primal           Dual       Dual Gap           Time         It/sec
 -------------------------------------------------------------------------------------------------
      I             1   1.000000e+00  -1.000000e+00   2.000000e+00   0.000000e+00            Inf
-    FW            10   1.407407e-01  -1.407407e-01   2.814815e-01   4.453731e-01   2.245308e+01
-    FW            20   6.842105e-02  -6.842105e-02   1.368421e-01   4.466507e-01   4.477772e+01
-    FW            30   4.521073e-02  -4.521073e-02   9.042146e-02   4.483771e-01   6.690797e+01
-    FW            40   3.376068e-02  -3.376068e-02   6.752137e-02   4.497198e-01   8.894427e+01
-    FW            50   2.693878e-02  -2.693878e-02   5.387755e-02   4.511299e-01   1.108328e+02
-    FW            60   2.241055e-02  -2.241055e-02   4.482109e-02   4.526809e-01   1.325437e+02
-    FW            70   1.918565e-02  -1.918565e-02   3.837129e-02   4.543722e-01   1.540587e+02
-    FW            80   1.677215e-02  -1.677215e-02   3.354430e-02   4.560899e-01   1.754040e+02
-    FW            90   1.489804e-02  -1.489804e-02   2.979609e-02   4.579065e-01   1.965467e+02
-    FW           100   1.340067e-02  -1.340067e-02   2.680135e-02   4.598284e-01   2.174725e+02
-  Last           101   1.314422e-02  -1.236767e-02   2.551189e-02   4.605268e-01   2.193141e+02
+    FW            10   1.407407e-01  -1.407407e-01   2.814815e-01   4.720291e-01   2.118513e+01
+    FW            20   6.842105e-02  -6.842105e-02   1.368421e-01   4.736166e-01   4.222825e+01
+    FW            30   4.521073e-02  -4.521073e-02   9.042146e-02   4.750587e-01   6.315009e+01
+    FW            40   3.376068e-02  -3.376068e-02   6.752137e-02   4.766075e-01   8.392650e+01
+    FW            50   2.693878e-02  -2.693878e-02   5.387755e-02   4.782406e-01   1.045499e+02
+    FW            60   2.241055e-02  -2.241055e-02   4.482109e-02   4.800899e-01   1.249766e+02
+    FW            70   1.918565e-02  -1.918565e-02   3.837129e-02   4.822360e-01   1.451571e+02
+    FW            80   1.677215e-02  -1.677215e-02   3.354430e-02   4.845011e-01   1.651183e+02
+    FW            90   1.489804e-02  -1.489804e-02   2.979609e-02   4.870107e-01   1.848009e+02
+    FW           100   1.340067e-02  -1.340067e-02   2.680135e-02   4.897315e-01   2.041935e+02
+  Last           101   1.314422e-02  -1.236767e-02   2.551189e-02   4.906565e-01   2.058467e+02
 -------------------------------------------------------------------------------------------------
 
 Output type of solution: BigFloat

Another possible step-size rule is rationalshortstep which computes the step size by minimizing the smoothness inequality as $\gamma_t=\frac{\langle \nabla f(x_t),x_t-v_t\rangle}{2L||x_t-v_t||^2}$. However, as this step size depends on an upper bound on the Lipschitz constant $L$ as well as the inner product with the gradient $\nabla f(x_t)$, both have to be of a rational type.

@time x, v, primal, dual_gap, trajectory = FrankWolfe.frank_wolfe(
@@ -67,16 +67,16 @@
   Type     Iteration         Primal           Dual       Dual Gap           Time         It/sec
 -------------------------------------------------------------------------------------------------
      I             1   1.000000e+00  -1.000000e+00   2.000000e+00   0.000000e+00            Inf
-    FW            10   1.000000e-01  -1.000000e-01   2.000000e-01   3.931198e-01   2.543754e+01
-    FW            20   5.000000e-02  -5.000000e-02   1.000000e-01   4.484986e-01   4.459323e+01
-    FW            30   3.333333e-02  -3.333333e-02   6.666667e-02   4.497968e-01   6.669679e+01
-    FW            40   2.500000e-02  -2.500000e-02   5.000000e-02   4.509931e-01   8.869316e+01
-    FW            50   2.000000e-02  -2.000000e-02   4.000000e-02   4.523795e-01   1.105267e+02
-    FW            60   1.666667e-02  -1.666667e-02   3.333333e-02   4.539741e-01   1.321661e+02
-    FW            70   1.428571e-02  -1.428571e-02   2.857143e-02   4.557243e-01   1.536016e+02
-    FW            80   1.250000e-02  -1.250000e-02   2.500000e-02   4.576830e-01   1.747935e+02
-    FW            90   1.111111e-02  -1.111111e-02   2.222222e-02   4.598284e-01   1.957252e+02
-    FW           100   1.000000e-02   1.000000e-02   1.889162e-78   4.621865e-01   2.163629e+02
-  Last           100   1.000000e-02   1.000000e-02   2.159042e-78   4.627391e-01   2.161045e+02
+    FW            10   1.000000e-01  -1.000000e-01   2.000000e-01   3.964277e-01   2.522528e+01
+    FW            20   5.000000e-02  -5.000000e-02   1.000000e-01   3.978978e-01   5.026417e+01
+    FW            30   3.333333e-02  -3.333333e-02   6.666667e-02   3.993298e-01   7.512588e+01
+    FW            40   2.500000e-02  -2.500000e-02   5.000000e-02   4.010297e-01   9.974323e+01
+    FW            50   2.000000e-02  -2.000000e-02   4.000000e-02   4.030473e-01   1.240549e+02
+    FW            60   1.666667e-02  -1.666667e-02   3.333333e-02   4.053454e-01   1.480219e+02
+    FW            70   1.428571e-02  -1.428571e-02   2.857143e-02   4.078982e-01   1.716114e+02
+    FW            80   1.250000e-02  -1.250000e-02   2.500000e-02   4.107034e-01   1.947878e+02
+    FW            90   1.111111e-02  -1.111111e-02   2.222222e-02   4.138139e-01   2.174891e+02
+    FW           100   1.000000e-02   1.000000e-02   1.889162e-78   4.172943e-01   2.396390e+02
+  Last           100   1.000000e-02   1.000000e-02   2.159042e-78   4.180297e-01   2.392175e+02
 -------------------------------------------------------------------------------------------------
-  0.722508 seconds (1.65 M allocations: 92.761 MiB, 6.10% gc time, 1.46% compilation time)

Note: at the last step, we exactly close the gap, finding the solution 1//n * ones(n)


This page was generated using Literate.jl.

+ 0.674124 seconds (1.65 M allocations: 92.974 MiB, 1.54% compilation time)

Note: at the last step, we exactly close the gap, finding the solution 1//n * ones(n)


This page was generated using Literate.jl.

diff --git a/dev/examples/docs_05_blended_cg/index.html b/dev/examples/docs_05_blended_cg/index.html index 3ee5f2199..c34875053 100644 --- a/dev/examples/docs_05_blended_cg/index.html +++ b/dev/examples/docs_05_blended_cg/index.html @@ -1,5 +1,5 @@ -Blended Conditional Gradients · FrankWolfe.jl
Main.check_gradients

Blended Conditional Gradients

The FW and AFW algorithms, and their lazy variants share one feature: they attempt to make primal progress over a reduced set of vertices. The AFW algorithm does this through away steps (which do not increase the cardinality of the active set), and the lazy variants do this through the use of previously exploited vertices. A third strategy that one can follow is to explicitly blend Frank-Wolfe steps with gradient descent steps over the convex hull of the active set (note that this can be done without requiring a projection oracle over $C$, thus making the algorithm projection-free). This results in the Blended Conditional Gradient (BCG) algorithm, which attempts to make as much progress as possible through the convex hull of the current active set $S_t$ until it automatically detects that in order to make further progress it requires additional calls to the LMO.

See also Blended Conditional Gradients: the unconditioning of conditional gradients, Braun et al, 2019, https://arxiv.org/abs/1805.07311

using FrankWolfe
+Blended Conditional Gradients · FrankWolfe.jl
plot_sparsity (generic function with 1 method)

Blended Conditional Gradients

The FW and AFW algorithms, and their lazy variants share one feature: they attempt to make primal progress over a reduced set of vertices. The AFW algorithm does this through away steps (which do not increase the cardinality of the active set), and the lazy variants do this through the use of previously exploited vertices. A third strategy that one can follow is to explicitly blend Frank-Wolfe steps with gradient descent steps over the convex hull of the active set (note that this can be done without requiring a projection oracle over $C$, thus making the algorithm projection-free). This results in the Blended Conditional Gradient (BCG) algorithm, which attempts to make as much progress as possible through the convex hull of the current active set $S_t$ until it automatically detects that in order to make further progress it requires additional calls to the LMO.

See also Blended Conditional Gradients: the unconditioning of conditional gradients, Braun et al, 2019, https://arxiv.org/abs/1805.07311

using FrankWolfe
 using LinearAlgebra
 using Random
 using SparseArrays
@@ -154,104 +154,104 @@
 plot_trajectories(data, label, xscalelog=true)
- + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - + + + + + + + + + + + + + + - + - - - - - - - - - - - - - + + + + + + + + + + + + + -

This page was generated using Literate.jl.

+

This page was generated using Literate.jl.

diff --git a/dev/examples/docs_06_spectrahedron/index.html b/dev/examples/docs_06_spectrahedron/index.html index 87f7e54c6..9473199fd 100644 --- a/dev/examples/docs_06_spectrahedron/index.html +++ b/dev/examples/docs_06_spectrahedron/index.html @@ -1,5 +1,5 @@ -Spectrahedron · FrankWolfe.jl
Main.check_gradients

Spectrahedron

This example shows an optimization problem over the spectraplex:

\[S = \{X \in \mathbb{S}_+^n, Tr(X) = 1\}\]

with $\mathbb{S}_+^n$ the set of positive semidefinite matrices. Linear optimization with symmetric objective $D$ over the spetraplex consists in computing the leading eigenvector of $D$.

The package also exposes UnitSpectrahedronLMO which corresponds to the feasible set:

\[S_u = \{X \in \mathbb{S}_+^n, Tr(X) \leq 1\}\]

using FrankWolfe
+Spectrahedron · FrankWolfe.jl
plot_sparsity (generic function with 1 method)

Spectrahedron

This example shows an optimization problem over the spectraplex:

\[S = \{X \in \mathbb{S}_+^n, Tr(X) = 1\}\]

with $\mathbb{S}_+^n$ the set of positive semidefinite matrices. Linear optimization with symmetric objective $D$ over the spetraplex consists in computing the leading eigenvector of $D$.

The package also exposes UnitSpectrahedronLMO which corresponds to the feasible set:

\[S_u = \{X \in \mathbb{S}_+^n, Tr(X) \leq 1\}\]

using FrankWolfe
 using LinearAlgebra
 using Random
 using SparseArrays

The objective function will be the symmetric squared distance to a set of known or observed entries $Y_{ij}$ of the matrix.

\[f(X) = \sum_{(i,j) \in L} 1/2 (X_{ij} - Y_{ij})^2\]

Setting up the input data, objective, and gradient

Dimension, number of iterations and number of known entries:

n = 1500
@@ -67,7 +67,7 @@
   Type     Iteration         Primal           Dual       Dual Gap           Time         It/sec
 -------------------------------------------------------------------------------------------------
      I             1   1.018597e+00   1.014119e+00   4.477824e-03   0.000000e+00            Inf
-  Last            25   1.014314e+00   1.014314e+00   9.179324e-09   1.013652e+00   2.466330e+01
+  Last            25   1.014314e+00   1.014314e+00   9.179324e-09   1.108118e+00   2.256078e+01
 -------------------------------------------------------------------------------------------------
 
 Lazified Conditional Gradient (Frank-Wolfe + Lazification).
@@ -80,113 +80,113 @@
   Type     Iteration         Primal           Dual       Dual Gap           Time         It/sec     Cache Size
 ----------------------------------------------------------------------------------------------------------------
      I             1   1.018597e+00   1.014119e+00   4.477824e-03   0.000000e+00            Inf              1
-    LD             2   1.014317e+00   1.014314e+00   3.630596e-06   6.061515e-01   3.299505e+00              2
-    LD             3   1.014315e+00   1.014314e+00   1.025225e-06   6.420375e-01   4.672624e+00              3
-    LD             4   1.014315e+00   1.014314e+00   5.032060e-07   6.794439e-01   5.887168e+00              4
-    LD             6   1.014314e+00   1.014314e+00   1.996252e-07   7.286946e-01   8.233902e+00              5
-    LD             9   1.014314e+00   1.014314e+00   8.299030e-08   7.928383e-01   1.135162e+01              6
-    LD            13   1.014314e+00   1.014314e+00   3.827847e-08   8.716304e-01   1.491458e+01              7
-    LD            19   1.014314e+00   1.014314e+00   1.745621e-08   9.773692e-01   1.943994e+01              8
-    LD            27   1.014314e+00   1.014314e+00   8.503621e-09   1.111994e+00   2.428071e+01              9
-  Last            27   1.014314e+00   1.014314e+00   7.896182e-09   1.187729e+00   2.273246e+01             10
+    LD             2   1.014317e+00   1.014314e+00   3.630596e-06   6.148183e-01   3.252993e+00              2
+    LD             3   1.014315e+00   1.014314e+00   1.025225e-06   6.542644e-01   4.585302e+00              3
+    LD             4   1.014315e+00   1.014314e+00   5.032060e-07   6.959614e-01   5.747445e+00              4
+    LD             6   1.014314e+00   1.014314e+00   1.996252e-07   7.842583e-01   7.650541e+00              5
+    LD             9   1.014314e+00   1.014314e+00   8.299030e-08   8.542761e-01   1.053524e+01              6
+    LD            13   1.014314e+00   1.014314e+00   3.827847e-08   9.387335e-01   1.384845e+01              7
+    LD            19   1.014314e+00   1.014314e+00   1.745621e-08   1.052248e+00   1.805657e+01              8
+    LD            27   1.014314e+00   1.014314e+00   8.503621e-09   1.196635e+00   2.256326e+01              9
+  Last            27   1.014314e+00   1.014314e+00   7.896182e-09   1.282051e+00   2.106001e+01             10
 ----------------------------------------------------------------------------------------------------------------

Plotting the resulting trajectories

data = [trajectory, trajectory_lazy]
 label = ["FW", "LCG"]
 plot_trajectories(data, label, xscalelog=true)
- + - + - + - + - + - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + -

This page was generated using Literate.jl.

+

This page was generated using Literate.jl.

diff --git a/dev/examples/docs_07_shifted_norm_polytopes/index.html b/dev/examples/docs_07_shifted_norm_polytopes/index.html index 845cebe91..1e24ba478 100644 --- a/dev/examples/docs_07_shifted_norm_polytopes/index.html +++ b/dev/examples/docs_07_shifted_norm_polytopes/index.html @@ -67,27 +67,27 @@ Type Iteration Primal Dual Dual Gap Time It/sec ------------------------------------------------------------------------------------------------- I 1 2.000000e+00 -6.000000e+00 8.000000e+00 0.000000e+00 Inf - FW 50 2.198243e-01 1.859119e-01 3.391239e-02 9.865068e-02 5.068389e+02 - FW 100 2.104540e-01 1.927834e-01 1.767061e-02 9.893093e-02 1.010806e+03 - FW 150 2.071345e-01 1.951277e-01 1.200679e-02 9.920419e-02 1.512033e+03 - FW 200 2.054240e-01 1.963167e-01 9.107240e-03 9.947129e-02 2.010630e+03 - FW 250 2.043783e-01 1.970372e-01 7.341168e-03 9.974048e-02 2.506505e+03 - FW 300 2.036722e-01 1.975209e-01 6.151268e-03 1.000086e-01 2.999743e+03 - FW 350 2.031630e-01 1.978684e-01 5.294582e-03 1.002762e-01 3.490360e+03 - FW 400 2.027782e-01 1.981301e-01 4.648079e-03 1.005385e-01 3.978574e+03 - FW 450 2.024772e-01 1.983344e-01 4.142727e-03 1.008055e-01 4.464041e+03 - FW 500 2.022352e-01 1.984984e-01 3.736776e-03 1.010708e-01 4.947025e+03 - FW 550 2.020364e-01 1.986329e-01 3.403479e-03 1.013376e-01 5.427403e+03 - FW 600 2.018701e-01 1.987452e-01 3.124906e-03 1.016105e-01 5.904901e+03 - FW 650 2.017290e-01 1.988404e-01 2.888583e-03 1.018779e-01 6.380189e+03 - FW 700 2.016078e-01 1.989222e-01 2.685564e-03 1.021422e-01 6.853194e+03 - FW 750 2.015024e-01 1.989932e-01 2.509264e-03 1.024600e-01 7.319929e+03 - FW 800 2.014101e-01 1.990554e-01 2.354727e-03 1.027363e-01 7.786926e+03 - FW 850 2.013284e-01 1.991103e-01 2.218154e-03 1.030261e-01 8.250340e+03 - FW 900 2.012558e-01 1.991592e-01 2.096580e-03 1.032962e-01 8.712806e+03 - FW 950 2.011906e-01 1.992030e-01 1.987662e-03 1.035614e-01 9.173300e+03 - FW 1000 2.011319e-01 1.992424e-01 1.889519e-03 1.038272e-01 9.631392e+03 - Last 1001 2.011297e-01 1.992439e-01 1.885794e-03 1.039916e-01 9.625777e+03 + FW 50 2.198243e-01 1.859119e-01 3.391239e-02 1.028485e-01 4.861519e+02 + FW 100 2.104540e-01 1.927834e-01 1.767061e-02 1.031254e-01 9.696932e+02 + FW 150 2.071345e-01 1.951277e-01 1.200679e-02 1.033977e-01 1.450710e+03 + FW 200 2.054240e-01 1.963167e-01 9.107240e-03 1.036635e-01 1.929319e+03 + FW 250 2.043783e-01 1.970372e-01 7.341168e-03 1.039676e-01 2.404596e+03 + FW 300 2.036722e-01 1.975209e-01 6.151268e-03 1.042331e-01 2.878165e+03 + FW 350 2.031630e-01 1.978684e-01 5.294582e-03 1.045052e-01 3.349116e+03 + FW 400 2.027782e-01 1.981301e-01 4.648079e-03 1.047674e-01 3.817981e+03 + FW 450 2.024772e-01 1.983344e-01 4.142727e-03 1.050300e-01 4.284489e+03 + FW 500 2.022352e-01 1.984984e-01 3.736776e-03 1.052915e-01 4.748719e+03 + FW 550 2.020364e-01 1.986329e-01 3.403479e-03 1.055575e-01 5.210429e+03 + FW 600 2.018701e-01 1.987452e-01 3.124906e-03 1.058207e-01 5.669968e+03 + FW 650 2.017290e-01 1.988404e-01 2.888583e-03 1.060800e-01 6.127453e+03 + FW 700 2.016078e-01 1.989222e-01 2.685564e-03 1.063455e-01 6.582317e+03 + FW 750 2.015024e-01 1.989932e-01 2.509264e-03 1.066088e-01 7.035068e+03 + FW 800 2.014101e-01 1.990554e-01 2.354727e-03 1.068682e-01 7.485855e+03 + FW 850 2.013284e-01 1.991103e-01 2.218154e-03 1.071306e-01 7.934241e+03 + FW 900 2.012558e-01 1.991592e-01 2.096580e-03 1.073968e-01 8.380135e+03 + FW 950 2.011906e-01 1.992030e-01 1.987662e-03 1.076581e-01 8.824233e+03 + FW 1000 2.011319e-01 1.992424e-01 1.889519e-03 1.079464e-01 9.263857e+03 + Last 1001 2.011297e-01 1.992439e-01 1.885794e-03 1.081079e-01 9.259267e+03 ------------------------------------------------------------------------------------------------- Final solution: [1.799813188674937, 0.5986834801090863] @@ -102,27 +102,27 @@ Type Iteration Primal Dual Dual Gap Time It/sec ------------------------------------------------------------------------------------------------- I 1 1.300000e+01 -1.900000e+01 3.200000e+01 0.000000e+00 Inf - FW 50 1.084340e-02 -7.590380e-02 8.674720e-02 5.425781e-02 9.215264e+02 - FW 100 5.509857e-03 -3.856900e-02 4.407886e-02 5.454183e-02 1.833455e+03 - FW 150 3.695414e-03 -2.586790e-02 2.956331e-02 5.481783e-02 2.736336e+03 - FW 200 2.780453e-03 -1.946317e-02 2.224362e-02 5.508845e-02 3.630525e+03 - FW 250 2.228830e-03 -1.560181e-02 1.783064e-02 5.535878e-02 4.515995e+03 - FW 300 1.859926e-03 -1.301948e-02 1.487941e-02 5.562963e-02 5.392810e+03 - FW 350 1.595838e-03 -1.117087e-02 1.276670e-02 5.589723e-02 6.261491e+03 - FW 400 1.397443e-03 -9.782098e-03 1.117954e-02 5.616310e-02 7.122114e+03 - FW 450 1.242935e-03 -8.700548e-03 9.943483e-03 5.642736e-02 7.974854e+03 - FW 500 1.119201e-03 -7.834409e-03 8.953610e-03 5.669435e-02 8.819221e+03 - FW 550 1.017878e-03 -7.125146e-03 8.143024e-03 5.696270e-02 9.655441e+03 - FW 600 9.333816e-04 -6.533671e-03 7.467053e-03 5.722630e-02 1.048469e+04 - FW 650 8.618413e-04 -6.032889e-03 6.894730e-03 5.749204e-02 1.130591e+04 - FW 700 8.004890e-04 -5.603423e-03 6.403912e-03 5.778519e-02 1.211383e+04 - FW 750 7.472928e-04 -5.231050e-03 5.978342e-03 5.805527e-02 1.291872e+04 - FW 800 7.007275e-04 -4.905093e-03 5.605820e-03 5.832290e-02 1.371674e+04 - FW 850 6.596259e-04 -4.617381e-03 5.277007e-03 5.859207e-02 1.450708e+04 - FW 900 6.230796e-04 -4.361557e-03 4.984637e-03 5.885699e-02 1.529130e+04 - FW 950 5.903710e-04 -4.132597e-03 4.722968e-03 5.912149e-02 1.606861e+04 - FW 1000 5.609256e-04 -3.926479e-03 4.487405e-03 5.938899e-02 1.683814e+04 - Last 1001 5.598088e-04 -3.918661e-03 4.478470e-03 5.954127e-02 1.681187e+04 + FW 50 1.084340e-02 -7.590380e-02 8.674720e-02 5.695778e-02 8.778432e+02 + FW 100 5.509857e-03 -3.856900e-02 4.407886e-02 5.722516e-02 1.747483e+03 + FW 150 3.695414e-03 -2.586790e-02 2.956331e-02 5.748523e-02 2.609366e+03 + FW 200 2.780453e-03 -1.946317e-02 2.224362e-02 5.776724e-02 3.462170e+03 + FW 250 2.228830e-03 -1.560181e-02 1.783064e-02 5.803489e-02 4.307753e+03 + FW 300 1.859926e-03 -1.301948e-02 1.487941e-02 5.829398e-02 5.146329e+03 + FW 350 1.595838e-03 -1.117087e-02 1.276670e-02 5.855056e-02 5.977739e+03 + FW 400 1.397443e-03 -9.782098e-03 1.117954e-02 5.880511e-02 6.802129e+03 + FW 450 1.242935e-03 -8.700548e-03 9.943483e-03 5.906019e-02 7.619346e+03 + FW 500 1.119201e-03 -7.834409e-03 8.953610e-03 5.931522e-02 8.429539e+03 + FW 550 1.017878e-03 -7.125146e-03 8.143024e-03 5.957357e-02 9.232282e+03 + FW 600 9.333816e-04 -6.533671e-03 7.467053e-03 5.982625e-02 1.002904e+04 + FW 650 8.618413e-04 -6.032889e-03 6.894730e-03 6.008090e-02 1.081875e+04 + FW 700 8.004890e-04 -5.603423e-03 6.403912e-03 6.033251e-02 1.160237e+04 + FW 750 7.472928e-04 -5.231050e-03 5.978342e-03 6.058825e-02 1.237864e+04 + FW 800 7.007275e-04 -4.905093e-03 5.605820e-03 6.084195e-02 1.314882e+04 + FW 850 6.596259e-04 -4.617381e-03 5.277007e-03 6.109509e-02 1.391274e+04 + FW 900 6.230796e-04 -4.361557e-03 4.984637e-03 6.134930e-02 1.467009e+04 + FW 950 5.903710e-04 -4.132597e-03 4.722968e-03 6.160092e-02 1.542185e+04 + FW 1000 5.609256e-04 -3.926479e-03 4.487405e-03 6.187493e-02 1.616163e+04 + Last 1001 5.598088e-04 -3.918661e-03 4.478470e-03 6.202507e-02 1.613864e+04 ------------------------------------------------------------------------------------------------- Final solution: [2.0005598087769556, 0.9763463450796975]

We plot the polytopes alongside the solutions from above:

xcoord1 = [1, 3, 1, -1, 1]
@@ -158,53 +158,53 @@
 )
- + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

This page was generated using Literate.jl.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

This page was generated using Literate.jl.

diff --git a/dev/examples/docs_08_callback_and_tracking/index.html b/dev/examples/docs_08_callback_and_tracking/index.html index 1f48bf7d0..29d9dcfeb 100644 --- a/dev/examples/docs_08_callback_and_tracking/index.html +++ b/dev/examples/docs_08_callback_and_tracking/index.html @@ -1,5 +1,5 @@ -Tracking, counters and custom callbacks for Frank Wolfe · FrankWolfe.jl
Main.check_gradients

Tracking, counters and custom callbacks for Frank Wolfe

In this example we will run the standard Frank-Wolfe algorithm while tracking the number of calls to the different oracles, namely function, gradient evaluations, and LMO calls. In order to track each of these metrics, a "Tracking" version of the Gradient, LMO and Function methods have to be supplied to the frank_wolfe algorithm, which are wrapping a standard one.

using FrankWolfe
+Tracking, counters and custom callbacks for Frank Wolfe · FrankWolfe.jl
plot_sparsity (generic function with 1 method)

Tracking, counters and custom callbacks for Frank Wolfe

In this example we will run the standard Frank-Wolfe algorithm while tracking the number of calls to the different oracles, namely function, gradient evaluations, and LMO calls. In order to track each of these metrics, a "Tracking" version of the Gradient, LMO and Function methods have to be supplied to the frank_wolfe algorithm, which are wrapping a standard one.

using FrankWolfe
 using Test
 using LinearAlgebra
 using FrankWolfe: ActiveSet

The trackers for primal objective, gradient and LMO.

In order to count the number of function calls, a TrackingObjective is built from a standard objective function f, which will act in the same way as the original function does, but with an additional .counter field which tracks the number of calls.

f(x) = norm(x)^2
@@ -79,4 +79,4 @@
 total_iterations = 500
 tf.counter = 501
 tgrad!.counter = 501
-tlmo_prob.counter = 13

This page was generated using Literate.jl.

+tlmo_prob.counter = 13

This page was generated using Literate.jl.

diff --git a/dev/examples/docs_09_extra_vertex_storage/index.html b/dev/examples/docs_09_extra_vertex_storage/index.html index b729d8c6f..0351e515d 100644 --- a/dev/examples/docs_09_extra_vertex_storage/index.html +++ b/dev/examples/docs_09_extra_vertex_storage/index.html @@ -1,5 +1,5 @@ -Extra-lazification · FrankWolfe.jl
Main.check_gradients

Extra-lazification

Sometimes the Frank-Wolfe algorithm will be run multiple times with slightly different settings under which vertices collected in a previous run are still valid.

The extra-lazification feature can be used for this purpose. It consists of a storage that can collect dropped vertices during a run, and the ability to use these vertices in another run, when they are not part of the current active set. The vertices that are part of the active set do not need to be duplicated in the extra-lazification storage. The extra-vertices can be used instead of calling the LMO when it is a relatively expensive operation.

using FrankWolfe
+Extra-lazification · FrankWolfe.jl
plot_sparsity (generic function with 1 method)

Extra-lazification

Sometimes the Frank-Wolfe algorithm will be run multiple times with slightly different settings under which vertices collected in a previous run are still valid.

The extra-lazification feature can be used for this purpose. It consists of a storage that can collect dropped vertices during a run, and the ability to use these vertices in another run, when they are not part of the current active set. The vertices that are part of the active set do not need to be duplicated in the extra-lazification storage. The extra-vertices can be used instead of calling the LMO when it is a relatively expensive operation.

using FrankWolfe
 using Test
 using LinearAlgebra

We will use a parameterized objective function $1/2 \|x - c\|^2$ over the unit simplex.

const n = 100
 const center0 = 5.0 .+ 3 * rand(n)
@@ -66,4 +66,4 @@
 [ Info: Number of LMO calls in iter 9: 17
 [ Info: Vertex storage size: 77
 [ Info: Number of LMO calls in iter 10: 16
-[ Info: Vertex storage size: 82

This page was generated using Literate.jl.

+[ Info: Vertex storage size: 82

This page was generated using Literate.jl.

diff --git a/dev/examples/docs_10_alternating_methods/index.html b/dev/examples/docs_10_alternating_methods/index.html index b86672124..c4fd7cb61 100644 --- a/dev/examples/docs_10_alternating_methods/index.html +++ b/dev/examples/docs_10_alternating_methods/index.html @@ -1,6 +1,6 @@ -Alternating methods · FrankWolfe.jl
Main.check_gradients

Alternating methods

In this example we will compare FrankWolfe.alternating_linear_minimization and FrankWolfe.alternating_projections for a very simple feasibility problem.

We consider the probability simplex

\[P = \{ x \in \mathbb{R}^n \colon \sum_{i=1}^n x_i = 1, x_i \geq 0 ~~ i=1,\dots,n\} ~.\]

and a scaled, shifted $\ell^{\infty}$ norm ball

\[Q = [-1,0]^n ~.\]

The goal is to find a point that lies both in $P$ and $Q$. We do this by reformulating the problem first. Instead of a finding a point in the intersection $P \cap Q$, we search for a pair of points, $(x_P, x_Q)$ in the cartesian product $P \times Q$, which attains minimal distance between $P$ and $Q$,

\[\|x_P - x_Q\|_2 = \min_{(x,y) \in P \times Q} \|x - y \|_2 ~.\]

using FrankWolfe
-include("../examples/plot_utils.jl")
Main.check_gradients

Setting up objective, gradient and linear minimization oracles

Alternating Linear Minimization (ALM) allows for an additional objective such that one can optimize over an intersection of sets instead of finding only feasible points. Since this example only considers the feasibility, we set the objective function as well as the gradient to zero.

n = 20
+Alternating methods · FrankWolfe.jl
plot_sparsity (generic function with 1 method)

Alternating methods

In this example we will compare FrankWolfe.alternating_linear_minimization and FrankWolfe.alternating_projections for a very simple feasibility problem.

We consider the probability simplex

\[P = \{ x \in \mathbb{R}^n \colon \sum_{i=1}^n x_i = 1, x_i \geq 0 ~~ i=1,\dots,n\} ~.\]

and a scaled, shifted $\ell^{\infty}$ norm ball

\[Q = [-1,0]^n ~.\]

The goal is to find a point that lies both in $P$ and $Q$. We do this by reformulating the problem first. Instead of a finding a point in the intersection $P \cap Q$, we search for a pair of points, $(x_P, x_Q)$ in the cartesian product $P \times Q$, which attains minimal distance between $P$ and $Q$,

\[\|x_P - x_Q\|_2 = \min_{(x,y) \in P \times Q} \|x - y \|_2 ~.\]

using FrankWolfe
+include("../examples/plot_utils.jl")
plot_sparsity (generic function with 1 method)

Setting up objective, gradient and linear minimization oracles

Alternating Linear Minimization (ALM) allows for an additional objective such that one can optimize over an intersection of sets instead of finding only feasible points. Since this example only considers the feasibility, we set the objective function as well as the gradient to zero.

n = 20
 
 f(x) = 0
 
@@ -43,9 +43,9 @@
   Type     Iteration         Primal           Dual       Dual Gap           Time         It/sec          Dist2
 ----------------------------------------------------------------------------------------------------------------
      I             1   3.778464e+00  -4.022154e+01   4.400000e+01   0.000000e+00            Inf   3.778464e+00
-    FW          1000   5.000047e-02   4.988536e-02   1.151053e-04   1.126365e+00   8.878113e+02   5.000047e-02
-    FW          2000   5.000000e-02   4.999558e-02   4.423306e-06   1.141198e+00   1.752544e+03   5.000000e-02
-  Last          2445   5.000000e-02   4.999898e-02   1.022986e-06   1.316251e+00   1.857549e+03   5.000000e-02
+    FW          1000   5.000047e-02   4.988536e-02   1.151053e-04   1.194305e+00   8.373069e+02   5.000047e-02
+    FW          2000   5.000000e-02   4.999558e-02   4.423306e-06   1.208702e+00   1.654668e+03   5.000000e-02
+  Last          2445   5.000000e-02   4.999898e-02   1.022986e-06   1.377965e+00   1.774356e+03   5.000000e-02
 ----------------------------------------------------------------------------------------------------------------
 
 Alternating Linear Minimization (ALM).
@@ -59,9 +59,9 @@
   Type     Iteration         Primal           Dual       Dual Gap           Time         It/sec          Dist2
 ----------------------------------------------------------------------------------------------------------------
      I             1   3.778464e+00  -4.022154e+01   4.400000e+01   0.000000e+00            Inf   3.778464e+00
-    FW          1000   5.000047e-02   4.988833e-02   1.121389e-04   1.851028e-01   5.402405e+03   5.000047e-02
-    FW          2000   5.000000e-02   4.999569e-02   4.308009e-06   2.124111e-01   9.415703e+03   5.000000e-02
-  Last          2446   5.000000e-02   4.999898e-02   1.018029e-06   2.260865e-01   1.081887e+04   5.000000e-02
+    FW          1000   5.000047e-02   4.988833e-02   1.121389e-04   1.891192e-01   5.287670e+03   5.000047e-02
+    FW          2000   5.000000e-02   4.999569e-02   4.308009e-06   2.106586e-01   9.494035e+03   5.000000e-02
+  Last          2446   5.000000e-02   4.999898e-02   1.018029e-06   2.204669e-01   1.109464e+04   5.000000e-02
 ----------------------------------------------------------------------------------------------------------------
 
 Alternating Linear Minimization (ALM).
@@ -75,9 +75,9 @@
   Type     Iteration         Primal           Dual       Dual Gap           Time         It/sec          Dist2
 ----------------------------------------------------------------------------------------------------------------
      I             1   5.732358e-01  -4.342676e+01   4.400000e+01   0.000000e+00            Inf   5.732358e-01
-    FW          1000   5.000037e-02   4.990066e-02   9.970862e-05   8.810796e-02   1.134971e+04   5.000037e-02
-    FW          2000   5.000000e-02   4.999619e-02   3.813015e-06   1.102421e-01   1.814188e+04   5.000000e-02
-  Last          2402   5.000000e-02   4.999899e-02   1.008292e-06   1.191287e-01   2.016307e+04   5.000000e-02
+    FW          1000   5.000037e-02   4.990066e-02   9.970862e-05   9.045997e-02   1.105461e+04   5.000037e-02
+    FW          2000   5.000000e-02   4.999619e-02   3.813015e-06   1.121890e-01   1.782706e+04   5.000000e-02
+  Last          2402   5.000000e-02   4.999899e-02   1.008292e-06   1.211730e-01   1.982290e+04   5.000000e-02
 ----------------------------------------------------------------------------------------------------------------

As an alternative to Block-Coordiante Frank-Wolfe (BCFW), one can also run alternating linear minimization with standard Frank-Wolfe algorithm. These methods perform then the full (simulatenous) update at each iteration. In this example we also use FrankWolfe.away_frank_wolfe.

_, _, _, _, _, afw_trajectory = FrankWolfe.alternating_linear_minimization(
     FrankWolfe.away_frank_wolfe,
     f,
@@ -100,9 +100,9 @@
   Type     Iteration         Primal           Dual       Dual Gap           Time         It/sec          Dist2     #ActiveSet
 -------------------------------------------------------------------------------------------------------------------------------
      I             1   2.300000e+01           -Inf            Inf   0.000000e+00            Inf   2.010582e+00              2
-  Last           171   5.000000e-02   4.999908e-02   9.177388e-07   5.411178e-01   3.160125e+02   5.000000e-02             84
+  Last           171   5.000000e-02   4.999908e-02   9.177388e-07   5.553152e-01   3.079332e+02   5.000000e-02             84
 -------------------------------------------------------------------------------------------------------------------------------
-    PP           171   5.000000e-02   4.999908e-02   9.177388e-07   6.319072e-01   2.706094e+02   5.000000e-02             84
+    PP           171   5.000000e-02   4.999908e-02   9.177388e-07   6.507932e-01   2.627563e+02   5.000000e-02             84
 -------------------------------------------------------------------------------------------------------------------------------

Running Alternating Projections

Unlike ALM, Alternating Projections (AP) is only suitable for feasibility problems. One omits the objective and gradient as parameters.

_, _, _, _, ap_trajectory = FrankWolfe.alternating_projections(
     lmos,
     x0,
@@ -121,130 +121,130 @@
   Type     Iteration       Dual Gap         Infeas           Time         It/sec
 ----------------------------------------------------------------------------------
      I             1   4.040716e-01   2.020358e-01   0.000000e+00            Inf
-    FW           100   1.045308e-04   5.000040e-02   9.505141e-01   1.052062e+02
-    FW           200   2.524997e-05   5.000002e-02   1.600122e+00   1.249904e+02
-    FW           300   1.122811e-05   5.000000e-02   2.417918e+00   1.240737e+02
-    FW           400   6.334101e-06   5.000000e-02   3.325428e+00   1.202853e+02
-    FW           500   4.028960e-06   5.000000e-02   4.286733e+00   1.166389e+02
-    FW           600   2.823896e-06   5.000000e-02   5.298042e+00   1.132494e+02
-    FW           700   2.088682e-06   5.000000e-02   6.406487e+00   1.092642e+02
-    FW           800   1.569986e-06   5.000000e-02   7.494233e+00   1.067488e+02
-    FW           900   1.258972e-06   5.000000e-02   8.674578e+00   1.037514e+02
-    FW          1000   9.944106e-07   5.000000e-02   9.872019e+00   1.012964e+02
-  Last          1000   9.944106e-07   5.000000e-02   9.882478e+00   1.011892e+02
+    FW           100   1.045308e-04   5.000040e-02   9.642665e-01   1.037058e+02
+    FW           200   2.524997e-05   5.000002e-02   1.623748e+00   1.231718e+02
+    FW           300   1.122811e-05   5.000000e-02   2.454423e+00   1.222283e+02
+    FW           400   6.334101e-06   5.000000e-02   3.365214e+00   1.188632e+02
+    FW           500   4.028960e-06   5.000000e-02   4.348019e+00   1.149949e+02
+    FW           600   2.823896e-06   5.000000e-02   5.376481e+00   1.115972e+02
+    FW           700   2.088682e-06   5.000000e-02   6.501181e+00   1.076727e+02
+    FW           800   1.569986e-06   5.000000e-02   7.665167e+00   1.043682e+02
+    FW           900   1.258972e-06   5.000000e-02   8.871550e+00   1.014479e+02
+    FW          1000   9.944106e-07   5.000000e-02   1.010268e+01   9.898368e+01
+  Last          1000   9.944106e-07   5.000000e-02   1.011353e+01   9.887740e+01
 ----------------------------------------------------------------------------------

Plotting the resulting trajectories

labels = ["BCFW - Full", "BCFW - Cyclic", "BCFW - Stochastic", "AFW", "AP"]
 
 plot_trajectories(trajectories, labels, xscalelog=true)
- + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + -

This page was generated using Literate.jl.

+

This page was generated using Literate.jl.

diff --git a/dev/examples/docs_11_block_coordinate_fw/index.html b/dev/examples/docs_11_block_coordinate_fw/index.html index 843285f92..04cab3a06 100644 --- a/dev/examples/docs_11_block_coordinate_fw/index.html +++ b/dev/examples/docs_11_block_coordinate_fw/index.html @@ -1,8 +1,8 @@ -Block-Coordinate Frank-Wolfe and Block-Vectors · FrankWolfe.jl
Main.check_gradients

Block-Coordinate Frank-Wolfe and Block-Vectors

In this example, we demonstrate the usage of the FrankWolfe.block_coordinate_frank_wolfe and FrankWolfe.BlockVector. We consider the problem of minimizing the squared Euclidean distance between two sets. We compare different update orders and different update steps.

Import and setup

We first import the necessary packages and include the code for plotting the results.

using FrankWolfe
+Block-Coordinate Frank-Wolfe and Block-Vectors · FrankWolfe.jl
plot_sparsity (generic function with 1 method)

Block-Coordinate Frank-Wolfe and Block-Vectors

In this example, we demonstrate the usage of the FrankWolfe.block_coordinate_frank_wolfe and FrankWolfe.BlockVector. We consider the problem of minimizing the squared Euclidean distance between two sets. We compare different update orders and different update steps.

Import and setup

We first import the necessary packages and include the code for plotting the results.

using FrankWolfe
 using LinearAlgebra
 
-include("plot_utils.jl")
Main.check_gradients

Next, we define the objective function and its gradient. The iterates x are instances of the FrankWolfe.BlockVector type. The different blocks of the vector can be accessed via the blocks field.

f(x) = dot(x.blocks[1] - x.blocks[2], x.blocks[1] - x.blocks[2])
+include("plot_utils.jl")
plot_sparsity (generic function with 1 method)

Next, we define the objective function and its gradient. The iterates x are instances of the FrankWolfe.BlockVector type. The different blocks of the vector can be accessed via the blocks field.

f(x) = dot(x.blocks[1] - x.blocks[2], x.blocks[1] - x.blocks[2])
 
 function grad!(storage, x)
     @. storage.blocks = [x.blocks[1] - x.blocks[2], x.blocks[2] - x.blocks[1]]
@@ -42,17 +42,17 @@
   Type     Iteration         Primal           Dual       Dual Gap           Time         It/sec
 -------------------------------------------------------------------------------------------------
      I             1   5.732358e-01  -1.014268e+02   1.020000e+02   0.000000e+00            Inf
-    FW          1000   1.006522e-02   9.790473e-03   2.747434e-04   3.526409e-01   2.835746e+03
-    FW          2000   1.002905e-02   9.883695e-03   1.453591e-04   3.706592e-01   5.395792e+03
-    FW          3000   1.001690e-02   9.923899e-03   9.300514e-05   3.887102e-01   7.717831e+03
-    FW          4000   1.001088e-02   9.940501e-03   7.037823e-05   4.062500e-01   9.846155e+03
-    FW          5000   1.000730e-02   9.952154e-03   5.515099e-05   4.236585e-01   1.180196e+04
-    FW          6000   1.000507e-02   9.960902e-03   4.416633e-05   4.410476e-01   1.360397e+04
-    FW          7000   1.000360e-02   9.967424e-03   3.617261e-05   4.584322e-01   1.526943e+04
-    FW          8000   1.000260e-02   9.972504e-03   3.009367e-05   4.756479e-01   1.681917e+04
-    FW          9000   1.000190e-02   9.976620e-03   2.528359e-05   4.929297e-01   1.825818e+04
-    FW         10000   1.000141e-02   9.979993e-03   2.141696e-05   5.101169e-01   1.960335e+04
-  Last         10001   1.000141e-02   9.979981e-03   2.142870e-05   6.124432e-01   1.632968e+04
+    FW          1000   1.006522e-02   9.790473e-03   2.747434e-04   3.066148e-01   3.261422e+03
+    FW          2000   1.002905e-02   9.883695e-03   1.453591e-04   3.248936e-01   6.155861e+03
+    FW          3000   1.001690e-02   9.923899e-03   9.300514e-05   3.430174e-01   8.745912e+03
+    FW          4000   1.001088e-02   9.940501e-03   7.037823e-05   4.098118e-01   9.760578e+03
+    FW          5000   1.000730e-02   9.952154e-03   5.515099e-05   4.288965e-01   1.165782e+04
+    FW          6000   1.000507e-02   9.960902e-03   4.416633e-05   4.471006e-01   1.341980e+04
+    FW          7000   1.000360e-02   9.967424e-03   3.617261e-05   4.652583e-01   1.504541e+04
+    FW          8000   1.000260e-02   9.972504e-03   3.009367e-05   4.832743e-01   1.655375e+04
+    FW          9000   1.000190e-02   9.976620e-03   2.528359e-05   5.010452e-01   1.796245e+04
+    FW         10000   1.000141e-02   9.979993e-03   2.141696e-05   5.188395e-01   1.927378e+04
+  Last         10001   1.000141e-02   9.979981e-03   2.142870e-05   6.233634e-01   1.604361e+04
 -------------------------------------------------------------------------------------------------
 
 Block coordinate Frank-Wolfe (BCFW).
@@ -64,17 +64,17 @@
   Type     Iteration         Primal           Dual       Dual Gap           Time         It/sec
 -------------------------------------------------------------------------------------------------
      I             1   5.732358e-01  -1.014268e+02   1.020000e+02   0.000000e+00            Inf
-    FW          1000   1.006522e-02   9.790473e-03   2.747434e-04   1.149025e-01   8.703028e+03
-    FW          2000   1.002905e-02   9.883695e-03   1.453591e-04   1.396053e-01   1.432610e+04
-    FW          3000   1.001690e-02   9.923899e-03   9.300514e-05   1.638033e-01   1.831465e+04
-    FW          4000   1.001088e-02   9.940501e-03   7.037823e-05   1.876971e-01   2.131094e+04
-    FW          5000   1.000730e-02   9.952154e-03   5.515099e-05   2.115986e-01   2.362965e+04
-    FW          6000   1.000507e-02   9.960902e-03   4.416633e-05   2.355085e-01   2.547679e+04
-    FW          7000   1.000360e-02   9.967424e-03   3.617261e-05   2.600878e-01   2.691399e+04
-    FW          8000   1.000260e-02   9.972504e-03   3.009367e-05   2.845758e-01   2.811202e+04
-    FW          9000   1.000190e-02   9.976620e-03   2.528359e-05   3.718916e-01   2.420060e+04
-    FW         10000   1.000141e-02   9.979993e-03   2.141696e-05   3.958062e-01   2.526489e+04
-  Last         10001   1.000141e-02   9.979981e-03   2.142870e-05   3.961353e-01   2.524642e+04
+    FW          1000   1.006522e-02   9.790473e-03   2.747434e-04   1.798858e-01   5.559084e+03
+    FW          2000   1.002905e-02   9.883695e-03   1.453591e-04   2.048092e-01   9.765188e+03
+    FW          3000   1.001690e-02   9.923899e-03   9.300514e-05   2.298036e-01   1.305462e+04
+    FW          4000   1.001088e-02   9.940501e-03   7.037823e-05   2.547445e-01   1.570201e+04
+    FW          5000   1.000730e-02   9.952154e-03   5.515099e-05   2.791033e-01   1.791452e+04
+    FW          6000   1.000507e-02   9.960902e-03   4.416633e-05   3.033585e-01   1.977858e+04
+    FW          7000   1.000360e-02   9.967424e-03   3.617261e-05   3.275410e-01   2.137137e+04
+    FW          8000   1.000260e-02   9.972504e-03   3.009367e-05   3.518700e-01   2.273567e+04
+    FW          9000   1.000190e-02   9.976620e-03   2.528359e-05   3.761325e-01   2.392774e+04
+    FW         10000   1.000141e-02   9.979993e-03   2.141696e-05   4.004089e-01   2.497447e+04
+  Last         10001   1.000141e-02   9.979981e-03   2.142870e-05   4.007906e-01   2.495318e+04
 -------------------------------------------------------------------------------------------------
 
 Block coordinate Frank-Wolfe (BCFW).
@@ -86,17 +86,17 @@
   Type     Iteration         Primal           Dual       Dual Gap           Time         It/sec
 -------------------------------------------------------------------------------------------------
      I             1   1.553033e+01  -8.646967e+01   1.020000e+02   0.000000e+00            Inf
-    FW          1000   1.007412e-02   9.802631e-03   2.714844e-04   5.615801e-02   1.780690e+04
-    FW          2000   1.003644e-02   9.880057e-03   1.563873e-04   8.040481e-02   2.487413e+04
-    FW          3000   1.002143e-02   9.912465e-03   1.089657e-04   1.045390e-01   2.869741e+04
-    FW          4000   1.001338e-02   9.934712e-03   7.866350e-05   1.290512e-01   3.099545e+04
-    FW          5000   1.000873e-02   9.946578e-03   6.215714e-05   1.535771e-01   3.255694e+04
-    FW          6000   1.000604e-02   9.956508e-03   4.953581e-05   2.340050e-01   2.564048e+04
-    FW          7000   1.000429e-02   9.963627e-03   4.065807e-05   2.578950e-01   2.714283e+04
-    FW          8000   1.000307e-02   9.970732e-03   3.233383e-05   2.823666e-01   2.833197e+04
-    FW          9000   1.000221e-02   9.974771e-03   2.744119e-05   3.082767e-01   2.919455e+04
-    FW         10000   1.000163e-02   9.978927e-03   2.269950e-05   3.321654e-01   3.010548e+04
-  Last         10001   1.000163e-02   9.978643e-03   2.298276e-05   3.324517e-01   3.008256e+04
+    FW          1000   1.007412e-02   9.802631e-03   2.714844e-04   5.825607e-02   1.716559e+04
+    FW          2000   1.003644e-02   9.880057e-03   1.563873e-04   8.295064e-02   2.411072e+04
+    FW          3000   1.002143e-02   9.912465e-03   1.089657e-04   1.072164e-01   2.798080e+04
+    FW          4000   1.001338e-02   9.934712e-03   7.866350e-05   1.314341e-01   3.043350e+04
+    FW          5000   1.000873e-02   9.946578e-03   6.215714e-05   1.555428e-01   3.214550e+04
+    FW          6000   1.000604e-02   9.956508e-03   4.953581e-05   1.794727e-01   3.343126e+04
+    FW          7000   1.000429e-02   9.963627e-03   4.065807e-05   2.033656e-01   3.442077e+04
+    FW          8000   1.000307e-02   9.970732e-03   3.233383e-05   2.963333e-01   2.699663e+04
+    FW          9000   1.000221e-02   9.974771e-03   2.744119e-05   3.227836e-01   2.788246e+04
+    FW         10000   1.000163e-02   9.978927e-03   2.269950e-05   3.477737e-01   2.875433e+04
+  Last         10001   1.000163e-02   9.978643e-03   2.298276e-05   3.481719e-01   2.872431e+04
 -------------------------------------------------------------------------------------------------
 
 Block coordinate Frank-Wolfe (BCFW).
@@ -108,133 +108,133 @@
   Type     Iteration         Primal           Dual       Dual Gap           Time         It/sec
 -------------------------------------------------------------------------------------------------
      I             1   1.024074e+02           -Inf            Inf   0.000000e+00            Inf
-    FW          1000   1.003847e-02   9.875995e-03   1.624779e-04   6.364258e-02   1.571275e+04
-    FW          2000   1.001380e-02   9.931922e-03   8.188221e-05   9.275685e-02   2.156175e+04
-    FW          3000   1.000624e-02   9.955789e-03   5.044741e-05   1.779180e-01   1.686170e+04
-    FW          4000   1.000315e-02   9.969360e-03   3.378772e-05   2.067744e-01   1.934476e+04
-    FW          5000   1.000169e-02   9.978459e-03   2.323352e-05   2.369172e-01   2.110442e+04
-    FW          6000   1.000095e-02   9.983540e-03   1.740806e-05   2.650526e-01   2.263702e+04
-    FW          7000   1.000055e-02   9.987713e-03   1.283319e-05   2.929460e-01   2.389519e+04
-    FW          8000   1.000032e-02   9.990836e-03   9.486923e-06   3.207316e-01   2.494297e+04
-    FW          9000   1.000019e-02   9.992954e-03   7.238509e-06   3.487524e-01   2.580628e+04
-    FW         10000   1.000012e-02   9.994356e-03   5.760023e-06   3.765196e-01   2.655905e+04
-  Last         10001   1.000012e-02   9.994357e-03   5.759399e-06   3.768234e-01   2.654028e+04
+    FW          1000   1.003847e-02   9.875995e-03   1.624779e-04   6.479003e-02   1.543447e+04
+    FW          2000   1.001380e-02   9.931922e-03   8.188221e-05   9.342510e-02   2.140752e+04
+    FW          3000   1.000624e-02   9.955789e-03   5.044741e-05   1.220782e-01   2.457440e+04
+    FW          4000   1.000315e-02   9.969360e-03   3.378772e-05   2.072547e-01   1.929993e+04
+    FW          5000   1.000169e-02   9.978459e-03   2.323352e-05   2.383329e-01   2.097906e+04
+    FW          6000   1.000095e-02   9.983540e-03   1.740806e-05   2.678919e-01   2.239709e+04
+    FW          7000   1.000055e-02   9.987713e-03   1.283319e-05   2.968469e-01   2.358118e+04
+    FW          8000   1.000032e-02   9.990836e-03   9.486923e-06   3.255050e-01   2.457719e+04
+    FW          9000   1.000019e-02   9.992954e-03   7.238509e-06   3.555002e-01   2.531644e+04
+    FW         10000   1.000012e-02   9.994356e-03   5.760023e-06   3.845053e-01   2.600744e+04
+  Last         10001   1.000012e-02   9.994357e-03   5.759399e-06   3.849047e-01   2.598306e+04
 -------------------------------------------------------------------------------------------------

Plotting the results

labels = ["Full update", "Cyclic order", "Stochstic order", "Custom order"]
 plot_trajectories(trajectories, labels, xscalelog=true)
- + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + +

Running BCFW with different update methods

As a second step, we compare different update steps. We consider the FrankWolfe.BPCGStep and the FrankWolfe.FrankWolfeStep. One can either pass a tuple of FrankWolfe.UpdateStep to define for each block the update procedure or pass a single update step so that each block uses the same procedure.

trajectories = []
 
@@ -260,17 +260,17 @@ 

Plotting the results

labels = ["BPCG FW", "FW BPCG", "BPCG", "FW"]
 plot_trajectories(trajectories, labels, xscalelog=true)
- + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - + + + + + + + + + + + + + + -

This page was generated using Literate.jl.

+

This page was generated using Literate.jl.

diff --git a/dev/examples/docs_12_quadratic_symmetric/index.html b/dev/examples/docs_12_quadratic_symmetric/index.html index 558be2ea8..4681e0edc 100644 --- a/dev/examples/docs_12_quadratic_symmetric/index.html +++ b/dev/examples/docs_12_quadratic_symmetric/index.html @@ -1,5 +1,5 @@ -Accelerations for quadratic functions and symmetric problems · FrankWolfe.jl
Main.check_gradients

Accelerations for quadratic functions and symmetric problems

This example illustrates how to exploit symmetry to reduce the dimension of the problem via SymmetricLMO. Moreover, active set based algorithms can be accelerated by using the specialized structure ActiveSetQuadratic.

The specific problem we consider here comes from quantum information and some context can be found here. Formally, we want to find the distance between a tensor of size m^N and the N-partite local polytope which is defined by its vertices

\[d^{\vec{a}^{(1)}\ldots \vec{a}^{(N)}}_{x_1\ldots x_N}\coloneqq\prod_{n=1}^Na^{(n)}_{x_n}\]

labeled by $\vec{a}^{(n)}=a^{(n)}_1\ldots a^{(n)}_m$ for $n\in[1,N]$, where $a^{(n)}_x=\pm1$. In the bipartite case (N=2), this polytope is affinely equivalent to the cut polytope.

Import and setup

We first import the necessary packages.

import Combinatorics
+Accelerations for quadratic functions and symmetric problems · FrankWolfe.jl
plot_sparsity (generic function with 1 method)

Accelerations for quadratic functions and symmetric problems

This example illustrates how to exploit symmetry to reduce the dimension of the problem via SymmetricLMO. Moreover, active set based algorithms can be accelerated by using the specialized structure ActiveSetQuadratic.

The specific problem we consider here comes from quantum information and some context can be found here. Formally, we want to find the distance between a tensor of size m^N and the N-partite local polytope which is defined by its vertices

\[d^{\vec{a}^{(1)}\ldots \vec{a}^{(N)}}_{x_1\ldots x_N}\coloneqq\prod_{n=1}^Na^{(n)}_{x_n}\]

labeled by $\vec{a}^{(n)}=a^{(n)}_1\ldots a^{(n)}_m$ for $n\in[1,N]$, where $a^{(n)}_x=\pm1$. In the bipartite case (N=2), this polytope is affinely equivalent to the cut polytope.

Import and setup

We first import the necessary packages.

import Combinatorics
 import FrankWolfe
 import LinearAlgebra
 import Tullio

Then we can define our custom LMO, together with the method compute_extreme_point, which simply enumerates the vertices $d^{\vec{a}^{(1)}}$ defined above. This structure is specialized for the case N=5 and contains pre-allocated fields used to accelerate the enumeration. Note that the output type (full tensor) is quite naive, but this is enough to illustrate the syntax in this toy example.

struct BellCorrelationsLMO{T} <: FrankWolfe.LinearMinimizationOracle
@@ -86,30 +86,30 @@
   Type     Iteration         Primal           Dual       Dual Gap           Time         It/sec     #ActiveSet
 ----------------------------------------------------------------------------------------------------------------
      I             1   4.132812e+01  -4.029553e+01   8.162365e+01   0.000000e+00            Inf              1
-    LD            46   1.102219e+01  -2.889653e+01   3.991872e+01   4.423889e+00   1.039809e+01             46
-    LD           103   4.741326e+00  -1.484609e+01   1.958742e+01   7.966163e+00   1.292969e+01             79
-    LD           217   1.129314e+00  -5.004581e+00   6.133895e+00   1.277774e+01   1.698266e+01            126
-    LD           351   7.257875e-01  -2.271567e+00   2.997354e+00   1.607727e+01   2.183207e+01            157
-     P          1000   4.532655e-01  -2.544089e+00   2.997354e+00   3.181935e+01   3.142742e+01            310
-    LD          1621   2.613257e-01  -1.230782e+00   1.492108e+00   4.585325e+01   3.535191e+01            445
-     P          2000   2.268145e-01  -1.265294e+00   1.492108e+00   4.615251e+01   4.333459e+01            440
-     P          3000   1.381013e-01  -1.354007e+00   1.492108e+00   5.374959e+01   5.581438e+01            496
-     P          4000   6.040867e-02  -1.431699e+00   1.492108e+00   6.650890e+01   6.014233e+01            610
-    LD          4415   3.072837e-02  -4.006321e-01   4.313605e-01   7.147884e+01   6.176653e+01            647
-     P          5000   1.738588e-02  -4.139746e-01   4.313605e-01   7.182587e+01   6.961280e+01            627
-    LD          5663   1.229003e-02  -1.045655e-01   1.168556e-01   7.211937e+01   7.852260e+01            625
-     P          6000   1.113787e-02  -1.057177e-01   1.168556e-01   7.240928e+01   8.286230e+01            625
-     P          7000   9.735387e-03  -1.071202e-01   1.168556e-01   7.283144e+01   9.611233e+01            625
-    LD          7469   9.517109e-03  -2.020889e-02   2.972600e-02   7.303327e+01   1.022685e+02            625
-     P          8000   9.392674e-03  -2.033332e-02   2.972600e-02   7.335194e+01   1.090632e+02            625
-     P          9000   9.304994e-03  -2.042100e-02   2.972600e-02   7.376850e+01   1.220033e+02            625
-    LD          9490   9.290088e-03   1.701123e-03   7.588965e-03   7.397499e+01   1.282866e+02            625
-     P         10000   9.282251e-03   1.693286e-03   7.588965e-03   7.427431e+01   1.346361e+02            625
-  Last         10001   9.282230e-03   3.629714e-03   5.652516e-03   7.436918e+01   1.344777e+02            625
+    LD            46   1.102219e+01  -2.889653e+01   3.991872e+01   4.708395e+00   9.769783e+00             46
+    LD           103   4.741326e+00  -1.484609e+01   1.958742e+01   8.443191e+00   1.219918e+01             79
+    LD           217   1.129314e+00  -5.004581e+00   6.133895e+00   1.347365e+01   1.610550e+01            126
+    LD           351   7.257875e-01  -2.271567e+00   2.997354e+00   1.703386e+01   2.060602e+01            157
+     P          1000   4.532655e-01  -2.544089e+00   2.997354e+00   3.333667e+01   2.999700e+01            310
+    LD          1621   2.613257e-01  -1.230782e+00   1.492108e+00   4.793890e+01   3.381387e+01            445
+     P          2000   2.268145e-01  -1.265294e+00   1.492108e+00   4.829450e+01   4.141258e+01            440
+     P          3000   1.381013e-01  -1.354007e+00   1.492108e+00   5.626761e+01   5.331664e+01            496
+     P          4000   6.040867e-02  -1.431699e+00   1.492108e+00   6.971927e+01   5.737295e+01            610
+    LD          4415   3.072837e-02  -4.006321e-01   4.313605e-01   7.486578e+01   5.897220e+01            647
+     P          5000   1.738588e-02  -4.139746e-01   4.313605e-01   7.527048e+01   6.642711e+01            627
+    LD          5663   1.229003e-02  -1.045655e-01   1.168556e-01   7.555396e+01   7.495305e+01            625
+     P          6000   1.113787e-02  -1.057177e-01   1.168556e-01   7.579390e+01   7.916204e+01            625
+     P          7000   9.735387e-03  -1.071202e-01   1.168556e-01   7.623175e+01   9.182526e+01            625
+    LD          7469   9.517109e-03  -2.020889e-02   2.972600e-02   7.643088e+01   9.772229e+01            625
+     P          8000   9.392674e-03  -2.033332e-02   2.972600e-02   7.675736e+01   1.042245e+02            625
+     P          9000   9.304994e-03  -2.042100e-02   2.972600e-02   7.718718e+01   1.165997e+02            625
+    LD          9490   9.290088e-03   1.701123e-03   7.588965e-03   7.739612e+01   1.226160e+02            625
+     P         10000   9.282251e-03   1.693286e-03   7.588965e-03   7.770382e+01   1.286938e+02            625
+  Last         10001   9.282230e-03   3.629714e-03   5.652516e-03   7.786347e+01   1.284428e+02            625
 ----------------------------------------------------------------------------------------------------------------
-    PP         10001   9.282230e-03   3.629714e-03   5.652516e-03   7.446291e+01   1.343085e+02            625
+    PP         10001   9.282230e-03   3.629714e-03   5.652516e-03   7.796427e+01   1.282767e+02            625
 ----------------------------------------------------------------------------------------------------------------
- 74.551631 seconds (378.66 M allocations: 45.133 GiB, 11.44% gc time, 0.18% compilation time)

Faster active set for quadratic functions

A first acceleration can be obtained by using the active set specialized for the quadratic objective function, whose gradient is here $x-p$, explaining the hessian and linear part provided as arguments. The speedup is obtained by pre-computing some scalar products to quickly obtained, in each iteration, the best and worst atoms currently in the active set.

asq_naive = FrankWolfe.ActiveSetQuadratic([(one(T), x0)], LinearAlgebra.I, -p)
+ 78.056789 seconds (378.66 M allocations: 45.133 GiB, 11.90% gc time, 0.18% compilation time)

Faster active set for quadratic functions

A first acceleration can be obtained by using the active set specialized for the quadratic objective function, whose gradient is here $x-p$, explaining the hessian and linear part provided as arguments. The speedup is obtained by pre-computing some scalar products to quickly obtained, in each iteration, the best and worst atoms currently in the active set.

asq_naive = FrankWolfe.ActiveSetQuadratic([(one(T), x0)], LinearAlgebra.I, -p)
 @time FrankWolfe.blended_pairwise_conditional_gradient(f, grad!, lmo_naive, asq_naive; verbose, lazy=true, line_search=FrankWolfe.Shortstep(one(T)), max_iteration)

 Blended Pairwise Conditional Gradient Algorithm.
 MEMORY_MODE: FrankWolfe.InplaceEmphasis() STEPSIZE: Shortstep EPSILON: 1.0e-7 MAXITERATION: 10000 TYPE: Float64
@@ -120,30 +120,30 @@
   Type     Iteration         Primal           Dual       Dual Gap           Time         It/sec     #ActiveSet
 ----------------------------------------------------------------------------------------------------------------
      I             1   4.132812e+01  -4.029553e+01   8.162365e+01   0.000000e+00            Inf              1
-    LD            46   1.102219e+01  -2.889653e+01   3.991872e+01   4.532795e+00   1.014826e+01             46
-    LD           104   4.682362e+00  -1.505762e+01   1.973998e+01   8.038085e+00   1.293840e+01             79
-    LD           217   1.153286e+00  -4.834945e+00   5.988231e+00   1.287612e+01   1.685291e+01            126
-    LD           361   7.246254e-01  -2.234291e+00   2.958916e+00   1.630159e+01   2.214508e+01            158
-     P          1000   4.580028e-01  -2.500913e+00   2.958916e+00   3.125182e+01   3.199814e+01            305
-    LD          1759   2.293608e-01  -1.236982e+00   1.466343e+00   4.776038e+01   3.682969e+01            460
-     P          2000   2.035603e-01  -1.262782e+00   1.466343e+00   4.785723e+01   4.179097e+01            458
-     P          3000   1.211665e-01  -1.345176e+00   1.466343e+00   5.333721e+01   5.624591e+01            500
-     P          4000   4.749301e-02  -1.418850e+00   1.466343e+00   6.717642e+01   5.954471e+01            632
-    LD          4227   3.237072e-02  -3.805380e-01   4.129088e-01   6.962316e+01   6.071256e+01            654
-     P          5000   1.624641e-02  -3.966624e-01   4.129088e-01   6.973468e+01   7.170034e+01            625
-    LD          5566   1.241685e-02  -1.000254e-01   1.124422e-01   6.975435e+01   7.979431e+01            625
-     P          6000   1.102951e-02  -1.014127e-01   1.124422e-01   6.985514e+01   8.589203e+01            625
-     P          7000   9.754199e-03  -1.026880e-01   1.124422e-01   6.988708e+01   1.001616e+02            625
-    LD          7558   9.511291e-03  -1.988317e-02   2.939446e-02   6.990674e+01   1.081155e+02            625
-     P          8000   9.410900e-03  -1.998356e-02   2.939446e-02   7.006707e+01   1.141763e+02            625
-     P          9000   9.314629e-03  -2.007983e-02   2.939446e-02   7.009930e+01   1.283893e+02            625
-    LD          9702   9.291671e-03   1.444824e-03   7.846848e-03   7.012388e+01   1.383552e+02            625
-     P         10000   9.286464e-03   1.439616e-03   7.846848e-03   7.022242e+01   1.424047e+02            625
-  Last         10001   9.286436e-03   2.102908e-03   7.183528e-03   7.031905e+01   1.422232e+02            625
+    LD            46   1.102219e+01  -2.889653e+01   3.991872e+01   4.642015e+00   9.909489e+00             46
+    LD           104   4.682362e+00  -1.505762e+01   1.973998e+01   8.357524e+00   1.244388e+01             79
+    LD           217   1.153286e+00  -4.834945e+00   5.988231e+00   1.341283e+01   1.617854e+01            126
+    LD           361   7.246254e-01  -2.234291e+00   2.958916e+00   1.705435e+01   2.116762e+01            158
+     P          1000   4.580028e-01  -2.500913e+00   2.958916e+00   3.261350e+01   3.066215e+01            305
+    LD          1759   2.293608e-01  -1.236982e+00   1.466343e+00   4.987715e+01   3.526665e+01            460
+     P          2000   2.035603e-01  -1.262782e+00   1.466343e+00   4.997727e+01   4.001819e+01            458
+     P          3000   1.211665e-01  -1.345176e+00   1.466343e+00   5.572143e+01   5.383925e+01            500
+     P          4000   4.749301e-02  -1.418850e+00   1.466343e+00   7.016498e+01   5.700850e+01            632
+    LD          4227   3.237072e-02  -3.805380e-01   4.129088e-01   7.272396e+01   5.812390e+01            654
+     P          5000   1.624641e-02  -3.966624e-01   4.129088e-01   7.283870e+01   6.864483e+01            625
+    LD          5566   1.241685e-02  -1.000254e-01   1.124422e-01   7.285848e+01   7.639468e+01            625
+     P          6000   1.102951e-02  -1.014127e-01   1.124422e-01   7.296316e+01   8.223329e+01            625
+     P          7000   9.754199e-03  -1.026880e-01   1.124422e-01   7.299519e+01   9.589673e+01            625
+    LD          7558   9.511291e-03  -1.988317e-02   2.939446e-02   7.301474e+01   1.035133e+02            625
+     P          8000   9.410900e-03  -1.998356e-02   2.939446e-02   7.319008e+01   1.093044e+02            625
+     P          9000   9.314629e-03  -2.007983e-02   2.939446e-02   7.322208e+01   1.229137e+02            625
+    LD          9702   9.291671e-03   1.444824e-03   7.846848e-03   7.324649e+01   1.324569e+02            625
+     P         10000   9.286464e-03   1.439616e-03   7.846848e-03   7.334863e+01   1.363352e+02            625
+  Last         10001   9.286436e-03   2.102908e-03   7.183528e-03   7.344825e+01   1.361639e+02            625
 ----------------------------------------------------------------------------------------------------------------
-    PP         10001   9.286436e-03   2.102908e-03   7.183528e-03   7.041467e+01   1.420301e+02            625
+    PP         10001   9.286436e-03   2.102908e-03   7.183528e-03   7.354701e+01   1.359811e+02            625
 ----------------------------------------------------------------------------------------------------------------
- 70.502565 seconds (374.82 M allocations: 44.686 GiB, 12.12% gc time, 0.17% compilation time)

In this small example, the acceleration is quite minimal, but as soon as one of the following conditions is met, significant speedups (factor ten at least) can be expected:

  • quite expensive scalar product between atoms, for instance, due to a high dimension (say, more than 10000),
  • high number of atoms in the active set (say, more than 1000),
  • high number of iterations (say, more than 100000), spending most of the time redistributing the weights in the active set.

Dimension reduction via symmetrization

Permutation of the tensor axes

It is easy to see that our specific instance remains invariant under permutation of the dimensions of the tensor. This means that all computations can be performed in the symmetric subspace, which leads to an important speedup, owing to the reduced dimension (hence reduced size of the final active set and reduced number of iterations).

The way to operate this in the FrankWolfe package is to use a symmetrized LMO, which basically does the following:

  • symmetrize the gradient, which is not necessary here as the gradient remains symmetric throughout the algorithm,
  • call the standard LMO,
  • symmetrize its output, which amounts to averaging over its orbit with respect to the group considered (here the symmetric group permuting the dimensions of the tensor).
function reynolds_permutedims(atom::Array{T, N}, lmo::BellCorrelationsLMO{T}) where {T <: Number, N}
+ 73.641378 seconds (374.82 M allocations: 44.686 GiB, 12.30% gc time, 0.16% compilation time)

In this small example, the acceleration is quite minimal, but as soon as one of the following conditions is met, significant speedups (factor ten at least) can be expected:

  • quite expensive scalar product between atoms, for instance, due to a high dimension (say, more than 10000),
  • high number of atoms in the active set (say, more than 1000),
  • high number of iterations (say, more than 100000), spending most of the time redistributing the weights in the active set.

Dimension reduction via symmetrization

Permutation of the tensor axes

It is easy to see that our specific instance remains invariant under permutation of the dimensions of the tensor. This means that all computations can be performed in the symmetric subspace, which leads to an important speedup, owing to the reduced dimension (hence reduced size of the final active set and reduced number of iterations).

The way to operate this in the FrankWolfe package is to use a symmetrized LMO, which basically does the following:

  • symmetrize the gradient, which is not necessary here as the gradient remains symmetric throughout the algorithm,
  • call the standard LMO,
  • symmetrize its output, which amounts to averaging over its orbit with respect to the group considered (here the symmetric group permuting the dimensions of the tensor).
function reynolds_permutedims(atom::Array{T, N}, lmo::BellCorrelationsLMO{T}) where {T <: Number, N}
     res = zeros(T, size(atom))
     for per in Combinatorics.permutations(1:N)
         res .+= permutedims(atom, per)
@@ -162,33 +162,33 @@
   Type     Iteration         Primal           Dual       Dual Gap           Time         It/sec     #ActiveSet
 ----------------------------------------------------------------------------------------------------------------
      I             1   4.132812e+01  -4.029553e+01   8.162365e+01   0.000000e+00            Inf              1
-    LD            11   1.153403e+01  -2.734161e+01   3.887563e+01   7.602921e-01   1.446812e+01              9
-    LD            29   3.294910e+00  -1.375281e+01   1.704772e+01   1.311549e+00   2.211126e+01             13
-    LD            47   1.630202e+00  -6.744183e+00   8.374385e+00   1.649351e+00   2.849605e+01             15
-    LD           100   4.530833e-01  -2.912142e+00   3.365226e+00   2.662143e+00   3.756372e+01             23
-    LD           175   1.770240e-01  -1.434036e+00   1.611060e+00   3.166229e+00   5.527079e+01             23
-    LD           268   8.360800e-02  -6.726422e-01   7.562502e-01   3.900978e+00   6.870072e+01             29
-    LD           555   2.209150e-02  -2.970870e-01   3.191785e-01   4.499796e+00   1.233389e+02             30
-    LD           773   1.139297e-02  -6.500159e-02   7.639456e-02   4.604498e+00   1.678793e+02             26
-     P          1000   9.617158e-03  -6.677740e-02   7.639456e-02   4.698313e+00   2.128424e+02             26
-    LD          1086   9.450474e-03  -9.519297e-03   1.896977e-02   4.710277e+00   2.305597e+02             26
-    LD          1500   9.283632e-03   4.858918e-03   4.424714e-03   4.870732e+00   3.079619e+02             26
-    LD          1900   9.274785e-03   8.183461e-03   1.091323e-03   4.969937e+00   3.822986e+02             26
-     P          2000   9.274488e-03   8.183165e-03   1.091323e-03   5.060952e+00   3.951826e+02             26
-    LD          2326   9.274249e-03   9.015738e-03   2.585111e-04   5.069603e+00   4.588131e+02             26
-    LD          2740   9.274216e-03   9.214614e-03   5.960203e-05   5.168063e+00   5.301793e+02             26
-     P          3000   9.274214e-03   9.214612e-03   5.960203e-05   5.262419e+00   5.700800e+02             26
-    LD          3178   9.274214e-03   9.262304e-03   1.190966e-05   5.267284e+00   6.033470e+02             26
-    LD          3636   9.274214e-03   9.271595e-03   2.619296e-06   5.430602e+00   6.695391e+02             26
-     P          4000   9.274214e-03   9.271595e-03   2.619296e-06   5.529110e+00   7.234437e+02             26
-    LD          4064   9.274214e-03   9.273578e-03   6.357779e-07   5.531070e+00   7.347584e+02             26
-    LD          4470   9.274214e-03   9.274066e-03   1.484091e-07   5.630139e+00   7.939413e+02             26
-    LD          4865   9.274214e-03   9.274179e-03   3.537488e-08   5.728024e+00   8.493330e+02             26
-  Last          4865   9.274214e-03   9.274179e-03   3.537488e-08   5.968409e+00   8.151250e+02             26
+    LD            11   1.153403e+01  -2.734161e+01   3.887563e+01   8.546162e-01   1.287128e+01              9
+    LD            29   3.294910e+00  -1.375281e+01   1.704772e+01   1.427754e+00   2.031162e+01             13
+    LD            47   1.630202e+00  -6.744183e+00   8.374385e+00   1.706338e+00   2.754437e+01             15
+    LD           100   4.530833e-01  -2.912142e+00   3.365226e+00   2.838300e+00   3.523235e+01             23
+    LD           175   1.770240e-01  -1.434036e+00   1.611060e+00   3.362644e+00   5.204238e+01             23
+    LD           268   8.360800e-02  -6.726422e-01   7.562502e-01   4.074344e+00   6.577745e+01             29
+    LD           555   2.209150e-02  -2.970870e-01   3.191785e-01   4.698991e+00   1.181105e+02             30
+    LD           773   1.139297e-02  -6.500159e-02   7.639456e-02   4.860580e+00   1.590345e+02             26
+     P          1000   9.617158e-03  -6.677740e-02   7.639456e-02   4.964775e+00   2.014190e+02             26
+    LD          1086   9.450474e-03  -9.519297e-03   1.896977e-02   4.977450e+00   2.181840e+02             26
+    LD          1500   9.283632e-03   4.858918e-03   4.424714e-03   5.081382e+00   2.951953e+02             26
+    LD          1900   9.274785e-03   8.183461e-03   1.091323e-03   5.183873e+00   3.665213e+02             26
+     P          2000   9.274488e-03   8.183165e-03   1.091323e-03   5.343303e+00   3.743003e+02             26
+    LD          2326   9.274249e-03   9.015738e-03   2.585111e-04   5.351815e+00   4.346189e+02             26
+    LD          2740   9.274216e-03   9.214614e-03   5.960203e-05   5.455796e+00   5.022182e+02             26
+     P          3000   9.274214e-03   9.214612e-03   5.960203e-05   5.556525e+00   5.399058e+02             26
+    LD          3178   9.274214e-03   9.262304e-03   1.190966e-05   5.561342e+00   5.714448e+02             26
+    LD          3636   9.274214e-03   9.271595e-03   2.619296e-06   5.665722e+00   6.417540e+02             26
+     P          4000   9.274214e-03   9.271595e-03   2.619296e-06   5.767400e+00   6.935534e+02             26
+    LD          4064   9.274214e-03   9.273578e-03   6.357779e-07   5.769292e+00   7.044192e+02             26
+    LD          4470   9.274214e-03   9.274066e-03   1.484091e-07   5.936401e+00   7.529815e+02             26
+    LD          4865   9.274214e-03   9.274179e-03   3.537488e-08   6.040748e+00   8.053638e+02             26
+  Last          4865   9.274214e-03   9.274179e-03   3.537488e-08   6.227180e+00   7.812525e+02             26
 ----------------------------------------------------------------------------------------------------------------
-    PP          4865   9.274214e-03   9.274179e-03   3.537488e-08   6.058086e+00   8.030590e+02             26
+    PP          4865   9.274214e-03   9.274179e-03   3.537488e-08   6.387196e+00   7.616801e+02             26
 ----------------------------------------------------------------------------------------------------------------
-  6.148897 seconds (31.79 M allocations: 3.930 GiB, 12.01% gc time, 2.18% compilation time)

Now, convergence is reached within 10000 iterations, and the size of the final active set is considerably smaller than before, thanks to the reduced dimension.

Uniqueness pattern

In this specific case, there is a bigger symmetry group that we can exploit. Its action roughly allows us to work in the subspace respecting the structure of the objective point p, that is, to average over tensor entries that have the same value in p. Although quite general, this kind of symmetry is not always applicable, and great care has to be taken when using it, in particular, to ensure that there exists a suitable group action whose Reynolds operator corresponds to this averaging procedure. In our current case, the theoretical study enabling this further symmetrization can be found here.

function build_reynolds_unique(p::Array{T, N}) where {T <: Number, N}
+  6.482109 seconds (31.79 M allocations: 3.930 GiB, 12.87% gc time, 2.10% compilation time)

Now, convergence is reached within 10000 iterations, and the size of the final active set is considerably smaller than before, thanks to the reduced dimension.

Uniqueness pattern

In this specific case, there is a bigger symmetry group that we can exploit. Its action roughly allows us to work in the subspace respecting the structure of the objective point p, that is, to average over tensor entries that have the same value in p. Although quite general, this kind of symmetry is not always applicable, and great care has to be taken when using it, in particular, to ensure that there exists a suitable group action whose Reynolds operator corresponds to this averaging procedure. In our current case, the theoretical study enabling this further symmetrization can be found here.

function build_reynolds_unique(p::Array{T, N}) where {T <: Number, N}
     ptol = round.(p; digits=8)
     ptol[ptol .== zero(T)] .= zero(T) # transform -0.0 into 0.0 as isequal(0.0, -0.0) is false
     uniquetol = unique(ptol[:])
@@ -214,31 +214,31 @@
   Type     Iteration         Primal           Dual       Dual Gap           Time         It/sec     #ActiveSet
 ----------------------------------------------------------------------------------------------------------------
      I             1   4.132812e+01  -4.029553e+01   8.162365e+01   0.000000e+00            Inf              1
-    LD             4   2.991558e+00  -4.896575e+00   7.888134e+00   1.663806e-01   2.404126e+01              3
-    LD            19   4.246112e-01  -2.691388e+00   3.115999e+00   2.996558e-01   6.340608e+01              3
-    LD            64   1.802165e-01  -4.514355e-01   6.316521e-01   7.216238e-01   8.868887e+01              5
-    LD            81   2.002637e-02  -1.558821e-01   1.759085e-01   1.063291e+00   7.617857e+01              5
-    LD           118   1.364017e-02  -4.146471e-02   5.510488e-02   1.153833e+00   1.022679e+02              3
-    LD           189   9.505756e-03  -8.772067e-03   1.827782e-02   1.331629e+00   1.419314e+02              4
-    LD           204   9.297335e-03   2.741829e-03   6.555506e-03   1.419553e+00   1.437073e+02              4
-    LD           219   9.278805e-03   6.432857e-03   2.845948e-03   1.575082e+00   1.390404e+02              4
-    LD           237   9.274768e-03   8.268553e-03   1.006215e-03   1.664629e+00   1.423740e+02              4
-    LD           255   9.274310e-03   8.844241e-03   4.300684e-04   1.753099e+00   1.454567e+02              4
-    LD           270   9.274227e-03   9.117815e-03   1.564121e-04   1.841324e+00   1.466336e+02              4
-    LD           288   9.274216e-03   9.207613e-03   6.660359e-05   1.928540e+00   1.493358e+02              4
-    LD           303   9.274214e-03   9.249991e-03   2.422305e-05   2.076740e+00   1.459017e+02              4
-    LD           321   9.274214e-03   9.263899e-03   1.031492e-05   2.166414e+00   1.481711e+02              4
-    LD           336   9.274214e-03   9.270465e-03   3.748623e-06   2.254405e+00   1.490416e+02              4
-    LD           354   9.274214e-03   9.272618e-03   1.595962e-06   2.341968e+00   1.511549e+02              4
-    LD           369   9.274214e-03   9.273636e-03   5.784864e-07   2.429791e+00   1.518649e+02              4
-    LD           384   9.274214e-03   9.273964e-03   2.499339e-07   2.578003e+00   1.489525e+02              4
-    LD           397   9.274214e-03   9.274104e-03   1.101735e-07   2.666715e+00   1.488723e+02              4
-    LD           412   9.274214e-03   9.274166e-03   4.841374e-08   2.754764e+00   1.495591e+02              4
-  Last           412   9.274214e-03   9.274166e-03   4.841374e-08   2.931944e+00   1.405211e+02              4
+    LD             4   2.991558e+00  -4.896575e+00   7.888134e+00   2.456645e-01   1.628237e+01              3
+    LD            19   4.246112e-01  -2.691388e+00   3.115999e+00   3.846965e-01   4.938959e+01              3
+    LD            64   1.802165e-01  -4.514355e-01   6.316521e-01   8.325422e-01   7.687298e+01              5
+    LD            81   2.002637e-02  -1.558821e-01   1.759085e-01   1.109586e+00   7.300020e+01              5
+    LD           118   1.364017e-02  -4.146471e-02   5.510488e-02   1.201949e+00   9.817385e+01              3
+    LD           189   9.505756e-03  -8.772067e-03   1.827782e-02   1.456834e+00   1.297334e+02              4
+    LD           204   9.297335e-03   2.741829e-03   6.555506e-03   1.549693e+00   1.316390e+02              4
+    LD           219   9.278805e-03   6.432857e-03   2.845948e-03   1.641719e+00   1.333968e+02              4
+    LD           237   9.274768e-03   8.268553e-03   1.006215e-03   1.733713e+00   1.367008e+02              4
+    LD           255   9.274310e-03   8.844241e-03   4.300684e-04   1.894026e+00   1.346338e+02              4
+    LD           270   9.274227e-03   9.117815e-03   1.564121e-04   1.987603e+00   1.358420e+02              4
+    LD           288   9.274216e-03   9.207613e-03   6.660359e-05   2.080196e+00   1.384485e+02              4
+    LD           303   9.274214e-03   9.249991e-03   2.422305e-05   2.172205e+00   1.394896e+02              4
+    LD           321   9.274214e-03   9.263899e-03   1.031492e-05   2.325451e+00   1.380378e+02              4
+    LD           336   9.274214e-03   9.270465e-03   3.748623e-06   2.418494e+00   1.389294e+02              4
+    LD           354   9.274214e-03   9.272618e-03   1.595962e-06   2.511971e+00   1.409252e+02              4
+    LD           369   9.274214e-03   9.273636e-03   5.784864e-07   2.604002e+00   1.417050e+02              4
+    LD           384   9.274214e-03   9.273964e-03   2.499339e-07   2.696361e+00   1.424142e+02              4
+    LD           397   9.274214e-03   9.274104e-03   1.101735e-07   2.859052e+00   1.388572e+02              4
+    LD           412   9.274214e-03   9.274166e-03   4.841374e-08   2.952435e+00   1.395458e+02              4
+  Last           412   9.274214e-03   9.274166e-03   4.841374e-08   3.141217e+00   1.311593e+02              4
 ----------------------------------------------------------------------------------------------------------------
-    PP           412   9.274214e-03   9.274166e-03   4.841374e-08   3.080739e+00   1.337341e+02              4
+    PP           412   9.274214e-03   9.274166e-03   4.841374e-08   3.232711e+00   1.274472e+02              4
 ----------------------------------------------------------------------------------------------------------------
-  3.168549 seconds (16.53 M allocations: 1.955 GiB, 12.26% gc time, 3.91% compilation time)

Reduction of the memory footprint of the iterate

In the previous run, the dimension reduction is mathematically exploited to accelerate the algorithm, but it is not used to effectively work in a subspace of reduced dimension. Indeed, the iterate, although symmetric, was still a full tensor. As a last example of the speedup obtainable through symmetry reduction, we show how to map the computations into a space whose physical dimension is also reduced during the algorithm. This makes all in-place operations marginally faster, which can lead, in bigger instances, to significant accelerations, especially for active set based algorithms in the regime where many lazy iterations are performed. We refer to the example symmetric.jl for a small benchmark with symmetric matrices.

function build_reduce_inflate(p::Array{T, N}) where {T <: Number, N}
+  3.327729 seconds (16.53 M allocations: 1.955 GiB, 12.53% gc time, 3.78% compilation time)

Reduction of the memory footprint of the iterate

In the previous run, the dimension reduction is mathematically exploited to accelerate the algorithm, but it is not used to effectively work in a subspace of reduced dimension. Indeed, the iterate, although symmetric, was still a full tensor. As a last example of the speedup obtainable through symmetry reduction, we show how to map the computations into a space whose physical dimension is also reduced during the algorithm. This makes all in-place operations marginally faster, which can lead, in bigger instances, to significant accelerations, especially for active set based algorithms in the regime where many lazy iterations are performed. We refer to the example symmetric.jl for a small benchmark with symmetric matrices.

function build_reduce_inflate(p::Array{T, N}) where {T <: Number, N}
     ptol = round.(p; digits=8)
     ptol[ptol .== zero(T)] .= zero(T) # transform -0.0 into 0.0 as isequal(0.0, -0.0) is false
     uniquetol = unique(ptol[:])
@@ -284,27 +284,27 @@
   Type     Iteration         Primal           Dual       Dual Gap           Time         It/sec     #ActiveSet
 ----------------------------------------------------------------------------------------------------------------
      I             1   4.132812e+01  -4.029553e+01   8.162365e+01   0.000000e+00            Inf              1
-    LD             4   2.991558e+00  -4.896575e+00   7.888134e+00   2.498473e-01   1.600978e+01              3
-    LD            13   2.369634e-01  -2.280882e+00   2.517846e+00   4.745954e-01   2.739175e+01              4
-    LD            19   1.668285e-01  -3.158581e-01   4.826866e-01   8.123035e-01   2.339027e+01              5
-    LD            32   1.699135e-02  -5.050512e-02   6.749647e-02   9.909681e-01   3.229166e+01              5
-    LD           108   9.519110e-03  -6.845621e-03   1.636473e-02   1.166391e+00   9.259329e+01              4
-    LD           121   9.297134e-03   3.819920e-03   5.477214e-03   1.322093e+00   9.152154e+01              4
-    LD           130   9.276520e-03   7.486464e-03   1.790056e-03   1.412268e+00   9.205052e+01              4
-    LD           139   9.274453e-03   8.643094e-03   6.313589e-04   1.501267e+00   9.258846e+01              4
-    LD           148   9.274257e-03   8.982108e-03   2.921491e-04   1.590043e+00   9.307926e+01              4
-    LD           161   9.274223e-03   9.143488e-03   1.307348e-04   1.678482e+00   9.592003e+01              4
-    LD           174   9.274215e-03   9.222314e-03   5.190121e-05   1.835258e+00   9.480957e+01              4
-    LD           187   9.274214e-03   9.250977e-03   2.323707e-05   1.925660e+00   9.710956e+01              4
-    LD           202   9.274214e-03   9.265798e-03   8.416106e-06   2.014556e+00   1.002702e+02              4
-    LD           215   9.274214e-03   9.270455e-03   3.759433e-06   2.102993e+00   1.022353e+02              4
-    LD           228   9.274214e-03   9.272829e-03   1.384972e-06   2.191493e+00   1.040386e+02              4
-    LD           244   9.274214e-03   9.273611e-03   6.025805e-07   2.342278e+00   1.041721e+02              4
-    LD           257   9.274214e-03   9.273974e-03   2.395475e-07   2.431981e+00   1.056752e+02              4
-    LD           270   9.274214e-03   9.274107e-03   1.073458e-07   2.520579e+00   1.071182e+02              4
-    LD           285   9.274214e-03   9.274175e-03   3.860101e-08   2.609371e+00   1.092217e+02              4
-  Last           285   9.274214e-03   9.274175e-03   3.860101e-08   2.848189e+00   1.000636e+02              4
+    LD             4   2.991558e+00  -4.896575e+00   7.888134e+00   1.787093e-01   2.238272e+01              3
+    LD            13   2.369634e-01  -2.280882e+00   2.517846e+00   4.850443e-01   2.680167e+01              4
+    LD            19   1.668285e-01  -3.158581e-01   4.826866e-01   8.347361e-01   2.276168e+01              5
+    LD            32   1.699135e-02  -5.050512e-02   6.749647e-02   1.017029e+00   3.146420e+01              5
+    LD           108   9.519110e-03  -6.845621e-03   1.636473e-02   1.200171e+00   8.998716e+01              4
+    LD           121   9.297134e-03   3.819920e-03   5.477214e-03   1.371231e+00   8.824189e+01              4
+    LD           130   9.276520e-03   7.486464e-03   1.790056e-03   1.463872e+00   8.880556e+01              4
+    LD           139   9.274453e-03   8.643094e-03   6.313589e-04   1.557083e+00   8.926950e+01              4
+    LD           148   9.274257e-03   8.982108e-03   2.921491e-04   1.648896e+00   8.975704e+01              4
+    LD           161   9.274223e-03   9.143488e-03   1.307348e-04   1.740566e+00   9.249863e+01              4
+    LD           174   9.274215e-03   9.222314e-03   5.190121e-05   1.900202e+00   9.156922e+01              4
+    LD           187   9.274214e-03   9.250977e-03   2.323707e-05   1.993124e+00   9.382255e+01              4
+    LD           202   9.274214e-03   9.265798e-03   8.416106e-06   2.086213e+00   9.682618e+01              4
+    LD           215   9.274214e-03   9.270455e-03   3.759433e-06   2.177926e+00   9.871777e+01              4
+    LD           228   9.274214e-03   9.272829e-03   1.384972e-06   2.269885e+00   1.004456e+02              4
+    LD           244   9.274214e-03   9.273611e-03   6.025805e-07   2.422318e+00   1.007300e+02              4
+    LD           257   9.274214e-03   9.273974e-03   2.395475e-07   2.515263e+00   1.021762e+02              4
+    LD           270   9.274214e-03   9.274107e-03   1.073458e-07   2.608002e+00   1.035275e+02              4
+    LD           285   9.274214e-03   9.274175e-03   3.860101e-08   2.699726e+00   1.055662e+02              4
+  Last           285   9.274214e-03   9.274175e-03   3.860101e-08   2.943724e+00   9.681615e+01              4
 ----------------------------------------------------------------------------------------------------------------
-    PP           285   9.274214e-03   9.274175e-03   3.860101e-08   2.937802e+00   9.701130e+01              4
+    PP           285   9.274214e-03   9.274175e-03   3.860101e-08   3.036717e+00   9.385136e+01              4
 ----------------------------------------------------------------------------------------------------------------
-  3.034269 seconds (15.45 M allocations: 1.825 GiB, 13.22% gc time, 4.57% compilation time)

This page was generated using Literate.jl.

+ 3.135840 seconds (15.45 M allocations: 1.825 GiB, 13.01% gc time, 4.40% compilation time)

This page was generated using Literate.jl.

diff --git a/dev/examples/plot_utils.jl b/dev/examples/plot_utils.jl index c0d938223..048ea4382 100644 --- a/dev/examples/plot_utils.jl +++ b/dev/examples/plot_utils.jl @@ -1,5 +1,4 @@ using Plots -using FiniteDifferences """ plot_results @@ -421,18 +420,3 @@ function plot_sparsity( end return fp end - -""" -Check if the gradient using finite differences matches the grad! provided. -""" -function check_gradients(grad!, f, gradient, num_tests=10, tolerance=1.0e-5) - for i in 1:num_tests - random_point = similar(gradient) - random_point .= rand(length(gradient)) - grad!(gradient, random_point) - if norm(grad(central_fdm(5, 1), f, random_point)[1] - gradient) > tolerance - @warn "There is a noticeable difference between the gradient provided and - the gradient computed using finite differences.:\n$(norm(grad(central_fdm(5, 1), f, random_point)[1] - gradient))" - end - end -end diff --git a/dev/index.html b/dev/index.html index e516de85e..8546ab147 100644 --- a/dev/index.html +++ b/dev/index.html @@ -40,4 +40,4 @@ ...

If you need the plotting utilities in your own code, make sure Plots.jl is included in your current project and run:

using Plots
 using FrankWolfe
 
-include(joinpath(dirname(pathof(FrankWolfe)), "../examples/plot_utils.jl"))
+include(joinpath(dirname(pathof(FrankWolfe)), "../examples/plot_utils.jl"))
diff --git a/dev/reference/0_reference/index.html b/dev/reference/0_reference/index.html index 56da9b463..8ed55545d 100644 --- a/dev/reference/0_reference/index.html +++ b/dev/reference/0_reference/index.html @@ -1,2 +1,2 @@ -API Reference · FrankWolfe.jl +API Reference · FrankWolfe.jl diff --git a/dev/reference/1_algorithms/index.html b/dev/reference/1_algorithms/index.html index 321b1ffab..4cd916511 100644 --- a/dev/reference/1_algorithms/index.html +++ b/dev/reference/1_algorithms/index.html @@ -1,2 +1,2 @@ -Algorithms · FrankWolfe.jl

Algorithms

This section contains all main algorithms of the package. These are the ones typical users will call.

The typical signature for these algorithms is:

my_algorithm(f, grad!, lmo, x0)

Standard algorithms

FrankWolfe.frank_wolfeMethod
frank_wolfe(f, grad!, lmo, x0; ...)

Simplest form of the Frank-Wolfe algorithm. Returns a tuple (x, v, primal, dual_gap, traj_data) with:

  • x final iterate
  • v last vertex from the LMO
  • primal primal value f(x)
  • dual_gap final Frank-Wolfe gap
  • traj_data vector of trajectory information.
source
FrankWolfe.stochastic_frank_wolfeMethod
stochastic_frank_wolfe(f::StochasticObjective, lmo, x0; ...)

Stochastic version of Frank-Wolfe, evaluates the objective and gradient stochastically, implemented through the FrankWolfe.StochasticObjective interface.

Keyword arguments include batch_size to pass a fixed batch_size or a batch_iterator implementing batch_size = FrankWolfe.batchsize_iterate(batch_iterator) for algorithms like Variance-reduced and projection-free stochastic optimization, E Hazan, H Luo, 2016.

Similarly, a constant momentum can be passed or replaced by a momentum_iterator implementing momentum = FrankWolfe.momentum_iterate(momentum_iterator).

source
FrankWolfe.block_coordinate_frank_wolfeFunction
block_coordinate_frank_wolfe(f, grad!, lmo::ProductLMO{N}, x0; ...) where {N}

Block-coordinate version of the Frank-Wolfe algorithm. Minimizes objective f over the product of feasible domains specified by the lmo. The optional argument the update_order is of type FrankWolfe.BlockCoordinateUpdateOrder and controls the order in which the blocks are updated. The argument update_step is a single instance or tuple of FrankWolfe.UpdateStep and defines which FW-algorithms to use to update the iterates in the different blocks.

The method returns a tuple (x, v, primal, dual_gap, traj_data) with:

  • x cartesian product of final iterates
  • v cartesian product of last vertices of the LMOs
  • primal primal value f(x)
  • dual_gap final Frank-Wolfe gap
  • traj_data vector of trajectory information.

See S. Lacoste-Julien, M. Jaggi, M. Schmidt, and P. Pletscher 2013 and A. Beck, E. Pauwels and S. Sabach 2015 for more details about Block-Coordinate Frank-Wolfe.

source

Active-set based methods

The following algorithms maintain the representation of the iterates as a convex combination of vertices.

Away-step

Pairwise Frank-Wolfe

Blended Conditional Gradient

FrankWolfe.blended_conditional_gradientMethod
blended_conditional_gradient(f, grad!, lmo, x0)

Entry point for the Blended Conditional Gradient algorithm. See Braun, Gábor, et al. "Blended conditonal gradients" ICML 2019. The method works on an active set like FrankWolfe.away_frank_wolfe, performing gradient descent over the convex hull of active vertices, removing vertices when their weight drops to 0 and adding new vertices by calling the linear oracle in a lazy fashion.

source
FrankWolfe.build_reduced_problemMethod
build_reduced_problem(atoms::AbstractVector{<:AbstractVector}, hessian, weights, gradient, tolerance)

Given an active set formed by vectors , a (constant) Hessian and a gradient constructs a quadratic problem over the unit probability simplex that is equivalent to minimizing the original function over the convex hull of the active set. If λ are the barycentric coordinates of dimension equal to the cardinality of the active set, the objective function is:

f(λ) = reduced_linear^T λ + 0.5 * λ^T reduced_hessian λ

In the case where we find that the current iterate has a strong-Wolfe gap over the convex hull of the active set that is below the tolerance we return nothing (as there is nothing to do).

source
FrankWolfe.lp_separation_oracleMethod

Returns either a tuple (y, val) with y an atom from the active set satisfying the progress criterion and val the corresponding gap dot(y, direction) or the same tuple with y from the LMO.

inplace_loop controls whether the iterate type allows in-place writes. kwargs are passed on to the LMO oracle.

source
FrankWolfe.minimize_over_convex_hull!Method
minimize_over_convex_hull!

Given a function f with gradient grad! and an active set active_set this function will minimize the function over the convex hull of the active set until the strong-wolfe gap over the active set is below tolerance.

It will either directly minimize over the convex hull using simplex gradient descent, or it will transform the problem to barycentric coordinates and minimize over the unit probability simplex using gradient descent or Nesterov's accelerated gradient descent.

source
FrankWolfe.simplex_gradient_descent_over_convex_hullMethod
simplex_gradient_descent_over_convex_hull(f, grad!, gradient, active_set, tolerance, t, time_start, non_simplex_iter)

Minimizes an objective function over the convex hull of the active set until the Strong-Wolfe gap is below tolerance using simplex gradient descent.

source

Blended Pairwise Conditional Gradient

Alternating Methods

Problems over intersections of convex sets, i.e.

\[\min_{x \in \bigcap_{i=1}^n P_i} f(x),\]

pose a challenge as one has to combine the information of two or more LMOs.

FrankWolfe.alternating_linear_minimization converts the problem into a series of subproblems over single sets. To find a point within the intersection, one minimizes both the distance to the iterates of the other subproblems and the original objective function.

FrankWolfe.alternating_projections solves feasibility problems over intersections of feasible regions.

FrankWolfe.alternating_linear_minimizationMethod
alternating_linear_minimization(bc_algo::BlockCoordinateMethod, f, grad!, lmos::NTuple{N,LinearMinimizationOracle}, x0; ...) where {N}

Alternating Linear Minimization minimizes the objective f over the intersections of the feasible domains specified by lmos. The tuple x0 defines the initial points for each domain. Returns a tuple (x, v, primal, dual_gap, infeas, traj_data) with:

  • x cartesian product of final iterates
  • v cartesian product of last vertices of the LMOs
  • primal primal value f(x)
  • dual_gap final Frank-Wolfe gap
  • infeas sum of squared, pairwise distances between iterates
  • traj_data vector of trajectory information.
source
FrankWolfe.alternating_projectionsMethod
alternating_projections(lmos::NTuple{N,LinearMinimizationOracle}, x0; ...) where {N}

Computes a point in the intersection of feasible domains specified by lmos. Returns a tuple (x, v, dual_gap, infeas, traj_data) with:

  • x cartesian product of final iterates
  • v cartesian product of last vertices of the LMOs
  • dual_gap final Frank-Wolfe gap
  • infeas sum of squared, pairwise distances between iterates
  • traj_data vector of trajectory information.
source

Index

    +Algorithms · FrankWolfe.jl

    Algorithms

    This section contains all main algorithms of the package. These are the ones typical users will call.

    The typical signature for these algorithms is:

    my_algorithm(f, grad!, lmo, x0)

    Standard algorithms

    FrankWolfe.frank_wolfeMethod
    frank_wolfe(f, grad!, lmo, x0; ...)

    Simplest form of the Frank-Wolfe algorithm. Returns a tuple (x, v, primal, dual_gap, traj_data) with:

    • x final iterate
    • v last vertex from the LMO
    • primal primal value f(x)
    • dual_gap final Frank-Wolfe gap
    • traj_data vector of trajectory information.
    source
    FrankWolfe.stochastic_frank_wolfeMethod
    stochastic_frank_wolfe(f::StochasticObjective, lmo, x0; ...)

    Stochastic version of Frank-Wolfe, evaluates the objective and gradient stochastically, implemented through the FrankWolfe.StochasticObjective interface.

    Keyword arguments include batch_size to pass a fixed batch_size or a batch_iterator implementing batch_size = FrankWolfe.batchsize_iterate(batch_iterator) for algorithms like Variance-reduced and projection-free stochastic optimization, E Hazan, H Luo, 2016.

    Similarly, a constant momentum can be passed or replaced by a momentum_iterator implementing momentum = FrankWolfe.momentum_iterate(momentum_iterator).

    source
    FrankWolfe.block_coordinate_frank_wolfeFunction
    block_coordinate_frank_wolfe(f, grad!, lmo::ProductLMO{N}, x0; ...) where {N}

    Block-coordinate version of the Frank-Wolfe algorithm. Minimizes objective f over the product of feasible domains specified by the lmo. The optional argument the update_order is of type FrankWolfe.BlockCoordinateUpdateOrder and controls the order in which the blocks are updated. The argument update_step is a single instance or tuple of FrankWolfe.UpdateStep and defines which FW-algorithms to use to update the iterates in the different blocks.

    The method returns a tuple (x, v, primal, dual_gap, traj_data) with:

    • x cartesian product of final iterates
    • v cartesian product of last vertices of the LMOs
    • primal primal value f(x)
    • dual_gap final Frank-Wolfe gap
    • traj_data vector of trajectory information.

    See S. Lacoste-Julien, M. Jaggi, M. Schmidt, and P. Pletscher 2013 and A. Beck, E. Pauwels and S. Sabach 2015 for more details about Block-Coordinate Frank-Wolfe.

    source

    Active-set based methods

    The following algorithms maintain the representation of the iterates as a convex combination of vertices.

    Away-step

    Pairwise Frank-Wolfe

    Blended Conditional Gradient

    FrankWolfe.blended_conditional_gradientMethod
    blended_conditional_gradient(f, grad!, lmo, x0)

    Entry point for the Blended Conditional Gradient algorithm. See Braun, Gábor, et al. "Blended conditonal gradients" ICML 2019. The method works on an active set like FrankWolfe.away_frank_wolfe, performing gradient descent over the convex hull of active vertices, removing vertices when their weight drops to 0 and adding new vertices by calling the linear oracle in a lazy fashion.

    source
    FrankWolfe.build_reduced_problemMethod
    build_reduced_problem(atoms::AbstractVector{<:AbstractVector}, hessian, weights, gradient, tolerance)

    Given an active set formed by vectors , a (constant) Hessian and a gradient constructs a quadratic problem over the unit probability simplex that is equivalent to minimizing the original function over the convex hull of the active set. If λ are the barycentric coordinates of dimension equal to the cardinality of the active set, the objective function is:

    f(λ) = reduced_linear^T λ + 0.5 * λ^T reduced_hessian λ

    In the case where we find that the current iterate has a strong-Wolfe gap over the convex hull of the active set that is below the tolerance we return nothing (as there is nothing to do).

    source
    FrankWolfe.lp_separation_oracleMethod

    Returns either a tuple (y, val) with y an atom from the active set satisfying the progress criterion and val the corresponding gap dot(y, direction) or the same tuple with y from the LMO.

    inplace_loop controls whether the iterate type allows in-place writes. kwargs are passed on to the LMO oracle.

    source
    FrankWolfe.minimize_over_convex_hull!Method
    minimize_over_convex_hull!

    Given a function f with gradient grad! and an active set active_set this function will minimize the function over the convex hull of the active set until the strong-wolfe gap over the active set is below tolerance.

    It will either directly minimize over the convex hull using simplex gradient descent, or it will transform the problem to barycentric coordinates and minimize over the unit probability simplex using gradient descent or Nesterov's accelerated gradient descent.

    source
    FrankWolfe.simplex_gradient_descent_over_convex_hullMethod
    simplex_gradient_descent_over_convex_hull(f, grad!, gradient, active_set, tolerance, t, time_start, non_simplex_iter)

    Minimizes an objective function over the convex hull of the active set until the Strong-Wolfe gap is below tolerance using simplex gradient descent.

    source

    Blended Pairwise Conditional Gradient

    Alternating Methods

    Problems over intersections of convex sets, i.e.

    \[\min_{x \in \bigcap_{i=1}^n P_i} f(x),\]

    pose a challenge as one has to combine the information of two or more LMOs.

    FrankWolfe.alternating_linear_minimization converts the problem into a series of subproblems over single sets. To find a point within the intersection, one minimizes both the distance to the iterates of the other subproblems and the original objective function.

    FrankWolfe.alternating_projections solves feasibility problems over intersections of feasible regions.

    FrankWolfe.alternating_linear_minimizationMethod
    alternating_linear_minimization(bc_algo::BlockCoordinateMethod, f, grad!, lmos::NTuple{N,LinearMinimizationOracle}, x0; ...) where {N}

    Alternating Linear Minimization minimizes the objective f over the intersections of the feasible domains specified by lmos. The tuple x0 defines the initial points for each domain. Returns a tuple (x, v, primal, dual_gap, infeas, traj_data) with:

    • x cartesian product of final iterates
    • v cartesian product of last vertices of the LMOs
    • primal primal value f(x)
    • dual_gap final Frank-Wolfe gap
    • infeas sum of squared, pairwise distances between iterates
    • traj_data vector of trajectory information.
    source
    FrankWolfe.alternating_projectionsMethod
    alternating_projections(lmos::NTuple{N,LinearMinimizationOracle}, x0; ...) where {N}

    Computes a point in the intersection of feasible domains specified by lmos. Returns a tuple (x, v, dual_gap, infeas, traj_data) with:

    • x cartesian product of final iterates
    • v cartesian product of last vertices of the LMOs
    • dual_gap final Frank-Wolfe gap
    • infeas sum of squared, pairwise distances between iterates
    • traj_data vector of trajectory information.
    source

    Index

      diff --git a/dev/reference/2_lmo/index.html b/dev/reference/2_lmo/index.html index 46001bbcd..6c9d12e66 100644 --- a/dev/reference/2_lmo/index.html +++ b/dev/reference/2_lmo/index.html @@ -1,2 +1,2 @@ -Linear Minimization Oracles · FrankWolfe.jl

      Linear Minimization Oracles

      The Linear Minimization Oracle (LMO) is a key component called at each iteration of the FW algorithm. Given $d\in \mathcal{X}$, it returns a vertex of the feasible set:

      \[v\in \argmin_{x\in \mathcal{C}} \langle d,x \rangle.\]

      See Combettes, Pokutta 2021 for references on most LMOs implemented in the package and their comparison with projection operators.

      Interface and wrappers

      FrankWolfe.LinearMinimizationOracleType

      Supertype for linear minimization oracles.

      All LMOs must implement compute_extreme_point(lmo::LMO, direction) and return a vector v of the appropriate type.

      source

      All of them are subtypes of FrankWolfe.LinearMinimizationOracle and implement the following method:

      FrankWolfe.compute_extreme_pointFunction
      compute_extreme_point(lmo::LinearMinimizationOracle, direction; kwargs...)

      Computes the point argmin_{v ∈ C} v ⋅ direction with C the set represented by the LMO. Most LMOs feature v as a keyword argument that allows for an in-place computation whenever v is dense. All LMOs should accept keyword arguments that they can ignore.

      source

      We also provide some meta-LMOs wrapping another one with extended behavior:

      FrankWolfe.CachedLinearMinimizationOracleType
      CachedLinearMinimizationOracle{LMO}

      Oracle wrapping another one of type lmo. Subtypes of CachedLinearMinimizationOracle contain a cache of previous solutions.

      By convention, the inner oracle is named inner. Cached optimizers are expected to implement Base.empty! and Base.length.

      source
      FrankWolfe.SingleLastCachedLMOType
      SingleLastCachedLMO{LMO, VT}

      Caches only the last result from an LMO and stores it in last_vertex. Vertices of LMO have to be of type VT if provided.

      source
      FrankWolfe.MultiCacheLMOType
      MultiCacheLMO{N, LMO, A}

      Cache for a LMO storing up to N vertices in the cache, removed in FIFO style. oldest_idx keeps track of the oldest index in the tuple, i.e. to replace next. VT, if provided, must be the type of vertices returned by LMO

      source
      FrankWolfe.VectorCacheLMOType
      VectorCacheLMO{LMO, VT}

      Cache for a LMO storing an unbounded number of vertices of type VT in the cache. VT, if provided, must be the type of vertices returned by LMO

      source

      Norm balls

      FrankWolfe.EllipsoidLMOType
      EllipsoidLMO(A, c, r)

      Linear minimization over an ellipsoid centered at c of radius r:

      x: (x - c)^T A (x - c) ≤ r

      The LMO stores the factorization F of A that is used to solve linear systems A⁻¹ x. The result of the linear system solve is stored in buffer. The ellipsoid is assumed to be full-dimensional -> A is positive definite.

      source
      FrankWolfe.KNormBallLMOType
      KNormBallLMO{T}(K::Int, right_hand_side::T)

      LMO with feasible set being the K-norm ball in the sense of 2010.07243, i.e., the convex hull over the union of an L1-ball with radius τ and an L∞-ball with radius τ/K:

      C_{K,τ} = conv { B_1(τ) ∪ B_∞(τ / K) }

      with τ the right_hand_side parameter. The K-norm is defined as the sum of the largest K absolute entries in a vector.

      source
      FrankWolfe.LpNormLMOType
      LpNormLMO{T, p}(right_hand_side)

      LMO with feasible set being an L-p norm ball:

      C = {x ∈ R^n, norm(x, p) ≤ right_hand_side}
      source
      FrankWolfe.NuclearNormLMOType
      NuclearNormLMO{T}(radius)

      LMO over matrices that have a nuclear norm less than radius. The LMO returns the best rank-one approximation matrix with singular value radius, computed with Arpack.

      source
      FrankWolfe.OrderWeightNormLMOType
      OrderWeightNormLMO(weights,radius)

      LMO with feasible set being the atomic ordered weighted l1 norm: https://arxiv.org/pdf/1409.4271

      C = {x ∈ R^n, Ω_w(x) ≤ R} 

      The weights are assumed to be positive.

      source
      FrankWolfe.SpectraplexLMOType
      SpectraplexLMO{T,M}(radius::T,gradient_container::M,ensure_symmetry::Bool=true)

      Feasible set

      {X ∈ 𝕊_n^+, trace(X) == radius}

      gradient_container is used to store the symmetrized negative direction. ensure_symmetry indicates whether the linear function is made symmetric before computing the eigenvector.

      source
      FrankWolfe.UnitSpectrahedronLMOType
      UnitSpectrahedronLMO{T,M}(radius::T, gradient_container::M)

      Feasible set of PSD matrices with bounded trace:

      {X ∈ 𝕊_n^+, trace(X) ≤ radius}

      gradient_container is used to store the symmetrized negative direction. ensure_symmetry indicates whether the linear function is made symmetric before computing the eigenvector.

      source

      Simplex

      FrankWolfe.HyperSimplexOracleType
      HyperSimplexOracle(radius)

      Represents the scaled hypersimplex of radius τ, the convex hull of vectors v such that:

      • v_i ∈ {0, τ}
      • ||v||_0 = k

      Equivalently, this is the convex hull of the vertices of the K-sparse polytope lying in the nonnegative orthant.

      source
      FrankWolfe.UnitHyperSimplexOracleType
      UnitHyperSimplexOracle(radius)

      Represents the scaled unit hypersimplex of radius τ, the convex hull of vectors v such that:

      • v_i ∈ {0, τ}
      • ||v||_0 ≤ k

      Equivalently, this is the intersection of the K-sparse polytope and the nonnegative orthant.

      source
      FrankWolfe.compute_dual_solutionMethod

      Dual costs for a given primal solution to form a primal dual pair for scaled probability simplex. Returns two vectors. The first one is the dual costs associated with the constraints and the second is the reduced costs for the variables.

      source
      FrankWolfe.compute_dual_solutionMethod

      Dual costs for a given primal solution to form a primal dual pair for scaled unit simplex. Returns two vectors. The first one is the dual costs associated with the constraints and the second is the reduced costs for the variables.

      source
      FrankWolfe.compute_extreme_pointMethod

      LMO for scaled probability simplex. Returns a vector with one active value equal to RHS in the most improving (or least degrading) direction.

      source
      FrankWolfe.compute_extreme_pointMethod

      LMO for scaled unit simplex: ∑ x_i = τ Returns either vector of zeros or vector with one active value equal to RHS if there exists an improving direction.

      source

      Polytope

      FrankWolfe.BirkhoffPolytopeLMOType
      BirkhoffPolytopeLMO

      The Birkhoff polytope encodes doubly stochastic matrices. Its extreme vertices are all permutation matrices of side-dimension dimension.

      source
      FrankWolfe.KSparseLMOType
      KSparseLMO{T}(K::Int, right_hand_side::T)

      LMO for the K-sparse polytope:

      C = B_1(τK) ∩ B_∞(τ)

      with τ the right_hand_side parameter. The LMO results in a vector with the K largest absolute values of direction, taking values -τ sign(x_i).

      source
      FrankWolfe.ScaledBoundL1NormBallType
      ScaledBoundL1NormBall(lower_bounds, upper_bounds)

      Polytope similar to a L1-ball with shifted bounds. It is the convex hull of two scaled and shifted unit vectors for each axis (shifted to the center of the polytope, i.e., the elementwise midpoint of the bounds). Lower and upper bounds are passed on as abstract vectors, possibly of different types. For the standard L1-ball, all lower and upper bounds would be -1 and 1.

      source
      FrankWolfe.ScaledBoundLInfNormBallType
      ScaledBoundLInfNormBall(lower_bounds, upper_bounds)

      Polytope similar to a L-inf-ball with shifted bounds or general box constraints. Lower- and upper-bounds are passed on as abstract vectors, possibly of different types. For the standard L-inf ball, all lower- and upper-bounds would be -1 and 1.

      source

      MathOptInterface

      FrankWolfe.MathOptLMOType
      MathOptLMO{OT <: MOI.Optimizer} <: LinearMinimizationOracle

      Linear minimization oracle with feasible space defined through a MathOptInterface.Optimizer. The oracle call sets the direction and reruns the optimizer.

      The direction vector has to be set in the same order of variables as the MOI.ListOfVariableIndices() getter.

      The Boolean use_modify determines if the objective incompute_extreme_point is updated with MOI.modify(o, ::MOI.ObjectiveFunction, ::MOI.ScalarCoefficientChange) or with MOI.set(o, ::MOI.ObjectiveFunction, f). use_modify = true decreases the runtime and memory allocation for models created as an optimizer object and defined directly with MathOptInterface. use_modify = false should be used for CachingOptimizers.

      source
      FrankWolfe.convert_mathoptFunction
      convert_mathopt(lmo::LMO, optimizer::OT; kwargs...) -> MathOptLMO{OT}

      Converts the given LMO to its equivalent MathOptInterface representation using optimizer. Must be implemented by LMOs.

      source

      Index

        +Linear Minimization Oracles · FrankWolfe.jl

        Linear Minimization Oracles

        The Linear Minimization Oracle (LMO) is a key component called at each iteration of the FW algorithm. Given $d\in \mathcal{X}$, it returns a vertex of the feasible set:

        \[v\in \argmin_{x\in \mathcal{C}} \langle d,x \rangle.\]

        See Combettes, Pokutta 2021 for references on most LMOs implemented in the package and their comparison with projection operators.

        Interface and wrappers

        FrankWolfe.LinearMinimizationOracleType

        Supertype for linear minimization oracles.

        All LMOs must implement compute_extreme_point(lmo::LMO, direction) and return a vector v of the appropriate type.

        source

        All of them are subtypes of FrankWolfe.LinearMinimizationOracle and implement the following method:

        FrankWolfe.compute_extreme_pointFunction
        compute_extreme_point(lmo::LinearMinimizationOracle, direction; kwargs...)

        Computes the point argmin_{v ∈ C} v ⋅ direction with C the set represented by the LMO. Most LMOs feature v as a keyword argument that allows for an in-place computation whenever v is dense. All LMOs should accept keyword arguments that they can ignore.

        source

        We also provide some meta-LMOs wrapping another one with extended behavior:

        FrankWolfe.CachedLinearMinimizationOracleType
        CachedLinearMinimizationOracle{LMO}

        Oracle wrapping another one of type lmo. Subtypes of CachedLinearMinimizationOracle contain a cache of previous solutions.

        By convention, the inner oracle is named inner. Cached optimizers are expected to implement Base.empty! and Base.length.

        source
        FrankWolfe.SingleLastCachedLMOType
        SingleLastCachedLMO{LMO, VT}

        Caches only the last result from an LMO and stores it in last_vertex. Vertices of LMO have to be of type VT if provided.

        source
        FrankWolfe.MultiCacheLMOType
        MultiCacheLMO{N, LMO, A}

        Cache for a LMO storing up to N vertices in the cache, removed in FIFO style. oldest_idx keeps track of the oldest index in the tuple, i.e. to replace next. VT, if provided, must be the type of vertices returned by LMO

        source
        FrankWolfe.VectorCacheLMOType
        VectorCacheLMO{LMO, VT}

        Cache for a LMO storing an unbounded number of vertices of type VT in the cache. VT, if provided, must be the type of vertices returned by LMO

        source

        Norm balls

        FrankWolfe.EllipsoidLMOType
        EllipsoidLMO(A, c, r)

        Linear minimization over an ellipsoid centered at c of radius r:

        x: (x - c)^T A (x - c) ≤ r

        The LMO stores the factorization F of A that is used to solve linear systems A⁻¹ x. The result of the linear system solve is stored in buffer. The ellipsoid is assumed to be full-dimensional -> A is positive definite.

        source
        FrankWolfe.KNormBallLMOType
        KNormBallLMO{T}(K::Int, right_hand_side::T)

        LMO with feasible set being the K-norm ball in the sense of 2010.07243, i.e., the convex hull over the union of an L1-ball with radius τ and an L∞-ball with radius τ/K:

        C_{K,τ} = conv { B_1(τ) ∪ B_∞(τ / K) }

        with τ the right_hand_side parameter. The K-norm is defined as the sum of the largest K absolute entries in a vector.

        source
        FrankWolfe.LpNormLMOType
        LpNormLMO{T, p}(right_hand_side)

        LMO with feasible set being an L-p norm ball:

        C = {x ∈ R^n, norm(x, p) ≤ right_hand_side}
        source
        FrankWolfe.NuclearNormLMOType
        NuclearNormLMO{T}(radius)

        LMO over matrices that have a nuclear norm less than radius. The LMO returns the best rank-one approximation matrix with singular value radius, computed with Arpack.

        source
        FrankWolfe.OrderWeightNormLMOType
        OrderWeightNormLMO(weights,radius)

        LMO with feasible set being the atomic ordered weighted l1 norm: https://arxiv.org/pdf/1409.4271

        C = {x ∈ R^n, Ω_w(x) ≤ R} 

        The weights are assumed to be positive.

        source
        FrankWolfe.SpectraplexLMOType
        SpectraplexLMO{T,M}(radius::T,gradient_container::M,ensure_symmetry::Bool=true)

        Feasible set

        {X ∈ 𝕊_n^+, trace(X) == radius}

        gradient_container is used to store the symmetrized negative direction. ensure_symmetry indicates whether the linear function is made symmetric before computing the eigenvector.

        source
        FrankWolfe.UnitSpectrahedronLMOType
        UnitSpectrahedronLMO{T,M}(radius::T, gradient_container::M)

        Feasible set of PSD matrices with bounded trace:

        {X ∈ 𝕊_n^+, trace(X) ≤ radius}

        gradient_container is used to store the symmetrized negative direction. ensure_symmetry indicates whether the linear function is made symmetric before computing the eigenvector.

        source

        Simplex

        FrankWolfe.HyperSimplexOracleType
        HyperSimplexOracle(radius)

        Represents the scaled hypersimplex of radius τ, the convex hull of vectors v such that:

        • v_i ∈ {0, τ}
        • ||v||_0 = k

        Equivalently, this is the convex hull of the vertices of the K-sparse polytope lying in the nonnegative orthant.

        source
        FrankWolfe.UnitHyperSimplexOracleType
        UnitHyperSimplexOracle(radius)

        Represents the scaled unit hypersimplex of radius τ, the convex hull of vectors v such that:

        • v_i ∈ {0, τ}
        • ||v||_0 ≤ k

        Equivalently, this is the intersection of the K-sparse polytope and the nonnegative orthant.

        source
        FrankWolfe.compute_dual_solutionMethod

        Dual costs for a given primal solution to form a primal dual pair for scaled probability simplex. Returns two vectors. The first one is the dual costs associated with the constraints and the second is the reduced costs for the variables.

        source
        FrankWolfe.compute_dual_solutionMethod

        Dual costs for a given primal solution to form a primal dual pair for scaled unit simplex. Returns two vectors. The first one is the dual costs associated with the constraints and the second is the reduced costs for the variables.

        source
        FrankWolfe.compute_extreme_pointMethod

        LMO for scaled probability simplex. Returns a vector with one active value equal to RHS in the most improving (or least degrading) direction.

        source
        FrankWolfe.compute_extreme_pointMethod

        LMO for scaled unit simplex: ∑ x_i = τ Returns either vector of zeros or vector with one active value equal to RHS if there exists an improving direction.

        source

        Polytope

        FrankWolfe.BirkhoffPolytopeLMOType
        BirkhoffPolytopeLMO

        The Birkhoff polytope encodes doubly stochastic matrices. Its extreme vertices are all permutation matrices of side-dimension dimension.

        source
        FrankWolfe.KSparseLMOType
        KSparseLMO{T}(K::Int, right_hand_side::T)

        LMO for the K-sparse polytope:

        C = B_1(τK) ∩ B_∞(τ)

        with τ the right_hand_side parameter. The LMO results in a vector with the K largest absolute values of direction, taking values -τ sign(x_i).

        source
        FrankWolfe.ScaledBoundL1NormBallType
        ScaledBoundL1NormBall(lower_bounds, upper_bounds)

        Polytope similar to a L1-ball with shifted bounds. It is the convex hull of two scaled and shifted unit vectors for each axis (shifted to the center of the polytope, i.e., the elementwise midpoint of the bounds). Lower and upper bounds are passed on as abstract vectors, possibly of different types. For the standard L1-ball, all lower and upper bounds would be -1 and 1.

        source
        FrankWolfe.ScaledBoundLInfNormBallType
        ScaledBoundLInfNormBall(lower_bounds, upper_bounds)

        Polytope similar to a L-inf-ball with shifted bounds or general box constraints. Lower- and upper-bounds are passed on as abstract vectors, possibly of different types. For the standard L-inf ball, all lower- and upper-bounds would be -1 and 1.

        source

        MathOptInterface

        FrankWolfe.MathOptLMOType
        MathOptLMO{OT <: MOI.Optimizer} <: LinearMinimizationOracle

        Linear minimization oracle with feasible space defined through a MathOptInterface.Optimizer. The oracle call sets the direction and reruns the optimizer.

        The direction vector has to be set in the same order of variables as the MOI.ListOfVariableIndices() getter.

        The Boolean use_modify determines if the objective incompute_extreme_point is updated with MOI.modify(o, ::MOI.ObjectiveFunction, ::MOI.ScalarCoefficientChange) or with MOI.set(o, ::MOI.ObjectiveFunction, f). use_modify = true decreases the runtime and memory allocation for models created as an optimizer object and defined directly with MathOptInterface. use_modify = false should be used for CachingOptimizers.

        source
        FrankWolfe.convert_mathoptFunction
        convert_mathopt(lmo::LMO, optimizer::OT; kwargs...) -> MathOptLMO{OT}

        Converts the given LMO to its equivalent MathOptInterface representation using optimizer. Must be implemented by LMOs.

        source

        Index

          diff --git a/dev/reference/3_backend/index.html b/dev/reference/3_backend/index.html index 0069c637d..7499d489a 100644 --- a/dev/reference/3_backend/index.html +++ b/dev/reference/3_backend/index.html @@ -1,5 +1,5 @@ -Utilities and data structures · FrankWolfe.jl

          Utilities and data structures

          Active set

          FrankWolfe.AbstractActiveSetType
          AbstractActiveSet{AT, R, IT}

          Abstract type for an active set of atoms of type AT with weights of type R and iterate of type IT. An active set is typically expected to have a field weights, a field atoms, and a field x. Otherwise, all active set methods from src/active_set.jl can be overwritten.

          source
          FrankWolfe.ActiveSetType
          ActiveSet{AT, R, IT}

          Represents an active set of extreme vertices collected in a FW algorithm, along with their coefficients (λ_i, a_i). R is the type of the λ_i, AT is the type of the atoms a_i. The iterate x = ∑λ_i a_i is stored in x with type IT.

          source
          Base.copyMethod

          Copies an active set, the weight and atom vectors and the iterate. Individual atoms are not copied.

          source
          FrankWolfe.active_set_argminMethod
          active_set_argmin(active_set::AbstractActiveSet, direction)

          Computes the linear minimizer in the direction on the active set. Returns (λ_i, a_i, i)

          source
          FrankWolfe.active_set_argminmaxMethod
          active_set_argminmax(active_set::AbstractActiveSet, direction)

          Computes the linear minimizer in the direction on the active set. Returns (λ_min, a_min, i_min, val_min, λ_max, a_max, i_max, val_max, val_max-val_min ≥ Φ)

          source
          FrankWolfe.active_set_update!Method
          active_set_update!(active_set::AbstractActiveSet, lambda, atom)

          Adds the atom to the active set with weight lambda or adds lambda to existing atom.

          source
          FrankWolfe.compute_active_set_iterate!Method
          compute_active_set_iterate!(active_set::AbstractActiveSet) -> x

          Recomputes from scratch the iterate x from the current weights and vertices of the active set. Returns the iterate x.

          source

          Functions and gradients

          FrankWolfe.ObjectiveFunctionType
          ObjectiveFunction

          Represents an objective function optimized by algorithms. Subtypes of ObjectiveFunction must implement at least

          • compute_value(::ObjectiveFunction, x) for primal value evaluation
          • compute_gradient(::ObjectiveFunction, x) for gradient evaluation.

          and optionally compute_value_gradient(::ObjectiveFunction, x) returning the (primal, gradient) pair. compute_gradient may always use the same storage and return a reference to it.

          source
          FrankWolfe.SimpleFunctionObjectiveType
          SimpleFunctionObjective{F,G,S}

          An objective function built from separate primal objective f(x) and in-place gradient function grad!(storage, x). It keeps an internal storage of type s used to evaluate the gradient in-place.

          source
          FrankWolfe.StochasticObjectiveType
          StochasticObjective{F, G, XT, S}(f::F, grad!::G, xs::XT, storage::S)

          Represents a composite function evaluated with stochastic gradient. f(θ, x) evaluates the loss for a single data point x and parameter θ. grad!(storage, θ, x) adds to storage the partial gradient with respect to data point x at parameter θ. xs must be an indexable iterable (Vector{Vector{Float64}} for instance). Functions using a StochasticObjective have optional keyword arguments rng, batch_size and full_evaluation controlling whether the function should be evaluated over all data points.

          Note: grad! must not reset the storage to 0 before adding to it.

          source
          FrankWolfe.compute_gradientFunction
          compute_gradient(f::ObjectiveFunction, x; [kwargs...])

          Computes the gradient of f at x. May return a reference to an internal storage.

          source
          FrankWolfe.compute_value_gradientMethod
          compute_value_gradient(f::ObjectiveFunction, x; [kwargs...])

          Computes in one call the pair (value, gradient) evaluated at x. By default, calls compute_value and compute_gradient with keywords kwargs passed down to both.

          source

          Callbacks

          Custom vertex storage

          Custom extreme point types

          For some feasible sets, the extreme points of the feasible set returned by the LMO possess a specific structure that can be represented in an efficient manner both for storage and for common operations like scaling and addition with an iterate. They are presented below:

          Utils

          FrankWolfe.DeletedVertexStorageType

          Vertex storage to store dropped vertices or find a suitable direction in lazy settings. The algorithm will look for at most return_kth suitable atoms before returning the best. See Extra-lazification with a vertex storage for usage.

          A vertex storage can be any type that implements two operations:

          1. Base.push!(storage, atom) to add an atom to the storage.

          Note that it is the storage type responsibility to ensure uniqueness of the atoms present.

          1. storage_find_argmin_vertex(storage, direction, lazy_threshold) -> (found, vertex)

          returning whether a vertex with sufficient progress was found and the vertex. It is up to the storage to remove vertices (or not) when they have been picked up.

          source
          FrankWolfe.ExpMomentumIteratorType
          ExpMomentumIterator{T}

          Iterator for the momentum used in the variant of Stochastic Frank-Wolfe. Momentum coefficients are the values of the iterator: ρ_t = 1 - num / (offset + t)^exp

          The state corresponds to the iteration count.

          Source: Stochastic Conditional Gradient Methods: From Convex Minimization to Submodular Maximization Aryan Mokhtari, Hamed Hassani, Amin Karbasi, JMLR 2020.

          source
          FrankWolfe.IncrementBatchIteratorType
          IncrementBatchIterator(starting_batch_size, max_batch_size, [increment = 1])

          Batch size starting at startingbatchsize and incrementing by increment at every iteration.

          source
          FrankWolfe.batchsize_iterateFunction
          batchsize_iterate(iter::BatchSizeIterator) -> b

          Method to implement for a batch size iterator of type BatchSizeIterator. Calling batchsize_iterate returns the next batch size and typically update the internal state of iter.

          source
          FrankWolfe.momentum_iterateFunction
          momentum_iterate(iter::MomentumIterator) -> ρ

          Method to implement for a type MomentumIterator. Returns the next momentum value ρ and updates the iterator internal state.

          source
          FrankWolfe.muladd_memory_modeMethod
          (memory_mode::MemoryEmphasis, storage, x, gamma::Real, d)

          Performs storage = x - gamma * d in-place or not depending on MemoryEmphasis

          source
          FrankWolfe.trajectory_callbackMethod
          trajectory_callback(storage)

          Callback pushing the state at each iteration to the passed storage. The state data is only the 5 first fields, usually: (t,primal,dual,dual_gap,time)

          source

          Oracle counting trackers

          The following structures are wrapping given oracles to behave similarly but additionally track the number of calls.

          Also see the example Tracking, counters and custom callbacks for Frank Wolfe.

          Update order for block-coordinate methods

          Block-coordinate methods can be run with different update orders. All update orders are subtypes of FrankWolfe.BlockCoordinateUpdateOrder. They have to implement the method FrankWolfe.select_update_indices which selects which blocks to update in what order.

          FrankWolfe.BlockCoordinateUpdateOrderType

          Update order for a block-coordinate method. A BlockCoordinateUpdateOrder must implement

          select_update_indices(::BlockCoordinateUpdateOrder, s::CallbackState, dual_gaps)
          source
          FrankWolfe.select_update_indicesFunction
          select_update_indices(::BlockCoordinateUpdateOrder, s::CallbackState, dual_gaps)

          Returns a list of lists of the block indices. Each sublist represents one round of updates in an iteration. The indices in a list show which blocks should be updated parallely in one round. For example, a full update is given by [1:l] and a blockwise update by [[i] for i=1:l], where l is the number of blocks.

          source
          FrankWolfe.CyclicUpdateType

          The cyclic update initiates a sequence of update rounds. In each round only one block is updated. The order of the blocks is determined by the given order of the LMOs.

          source
          FrankWolfe.StochasticUpdateType

          The stochastic update initiates a sequence of update rounds. In each round only one block is updated. The order of the blocks is a random.

          source

          Update step for block-coordinate Frank-Wolfe

          Block-coordinate Frank-Wolfe (BCFW) can run different FW algorithms on different blocks. All update steps are subtypes of FrankWolfe.UpdateStep and implement FrankWolfe.update_iterate which defines one iteration of the corresponding method.

          FrankWolfe.UpdateStepType

          Update step for block-coordinate Frank-Wolfe. These are implementations of different FW-algorithms to be used in a blockwise manner. Each update step must implement

          update_iterate(
          +Utilities and data structures · FrankWolfe.jl

          Utilities and data structures

          Active set

          FrankWolfe.AbstractActiveSetType
          AbstractActiveSet{AT, R, IT}

          Abstract type for an active set of atoms of type AT with weights of type R and iterate of type IT. An active set is typically expected to have a field weights, a field atoms, and a field x. Otherwise, all active set methods from src/active_set.jl can be overwritten.

          source
          FrankWolfe.ActiveSetType
          ActiveSet{AT, R, IT}

          Represents an active set of extreme vertices collected in a FW algorithm, along with their coefficients (λ_i, a_i). R is the type of the λ_i, AT is the type of the atoms a_i. The iterate x = ∑λ_i a_i is stored in x with type IT.

          source
          Base.copyMethod

          Copies an active set, the weight and atom vectors and the iterate. Individual atoms are not copied.

          source
          FrankWolfe.active_set_argminMethod
          active_set_argmin(active_set::AbstractActiveSet, direction)

          Computes the linear minimizer in the direction on the active set. Returns (λ_i, a_i, i)

          source
          FrankWolfe.active_set_argminmaxMethod
          active_set_argminmax(active_set::AbstractActiveSet, direction)

          Computes the linear minimizer in the direction on the active set. Returns (λ_min, a_min, i_min, val_min, λ_max, a_max, i_max, val_max, val_max-val_min ≥ Φ)

          source
          FrankWolfe.active_set_update!Method
          active_set_update!(active_set::AbstractActiveSet, lambda, atom)

          Adds the atom to the active set with weight lambda or adds lambda to existing atom.

          source
          FrankWolfe.compute_active_set_iterate!Method
          compute_active_set_iterate!(active_set::AbstractActiveSet) -> x

          Recomputes from scratch the iterate x from the current weights and vertices of the active set. Returns the iterate x.

          source

          Functions and gradients

          FrankWolfe.ObjectiveFunctionType
          ObjectiveFunction

          Represents an objective function optimized by algorithms. Subtypes of ObjectiveFunction must implement at least

          • compute_value(::ObjectiveFunction, x) for primal value evaluation
          • compute_gradient(::ObjectiveFunction, x) for gradient evaluation.

          and optionally compute_value_gradient(::ObjectiveFunction, x) returning the (primal, gradient) pair. compute_gradient may always use the same storage and return a reference to it.

          source
          FrankWolfe.SimpleFunctionObjectiveType
          SimpleFunctionObjective{F,G,S}

          An objective function built from separate primal objective f(x) and in-place gradient function grad!(storage, x). It keeps an internal storage of type s used to evaluate the gradient in-place.

          source
          FrankWolfe.StochasticObjectiveType
          StochasticObjective{F, G, XT, S}(f::F, grad!::G, xs::XT, storage::S)

          Represents a composite function evaluated with stochastic gradient. f(θ, x) evaluates the loss for a single data point x and parameter θ. grad!(storage, θ, x) adds to storage the partial gradient with respect to data point x at parameter θ. xs must be an indexable iterable (Vector{Vector{Float64}} for instance). Functions using a StochasticObjective have optional keyword arguments rng, batch_size and full_evaluation controlling whether the function should be evaluated over all data points.

          Note: grad! must not reset the storage to 0 before adding to it.

          source
          FrankWolfe.compute_gradientFunction
          compute_gradient(f::ObjectiveFunction, x; [kwargs...])

          Computes the gradient of f at x. May return a reference to an internal storage.

          source
          FrankWolfe.compute_value_gradientMethod
          compute_value_gradient(f::ObjectiveFunction, x; [kwargs...])

          Computes in one call the pair (value, gradient) evaluated at x. By default, calls compute_value and compute_gradient with keywords kwargs passed down to both.

          source

          Callbacks

          Custom vertex storage

          Custom extreme point types

          For some feasible sets, the extreme points of the feasible set returned by the LMO possess a specific structure that can be represented in an efficient manner both for storage and for common operations like scaling and addition with an iterate. They are presented below:

          Utils

          FrankWolfe.DeletedVertexStorageType

          Vertex storage to store dropped vertices or find a suitable direction in lazy settings. The algorithm will look for at most return_kth suitable atoms before returning the best. See Extra-lazification with a vertex storage for usage.

          A vertex storage can be any type that implements two operations:

          1. Base.push!(storage, atom) to add an atom to the storage.

          Note that it is the storage type responsibility to ensure uniqueness of the atoms present.

          1. storage_find_argmin_vertex(storage, direction, lazy_threshold) -> (found, vertex)

          returning whether a vertex with sufficient progress was found and the vertex. It is up to the storage to remove vertices (or not) when they have been picked up.

          source
          FrankWolfe.ExpMomentumIteratorType
          ExpMomentumIterator{T}

          Iterator for the momentum used in the variant of Stochastic Frank-Wolfe. Momentum coefficients are the values of the iterator: ρ_t = 1 - num / (offset + t)^exp

          The state corresponds to the iteration count.

          Source: Stochastic Conditional Gradient Methods: From Convex Minimization to Submodular Maximization Aryan Mokhtari, Hamed Hassani, Amin Karbasi, JMLR 2020.

          source
          FrankWolfe.IncrementBatchIteratorType
          IncrementBatchIterator(starting_batch_size, max_batch_size, [increment = 1])

          Batch size starting at startingbatchsize and incrementing by increment at every iteration.

          source
          FrankWolfe.batchsize_iterateFunction
          batchsize_iterate(iter::BatchSizeIterator) -> b

          Method to implement for a batch size iterator of type BatchSizeIterator. Calling batchsize_iterate returns the next batch size and typically update the internal state of iter.

          source
          FrankWolfe.momentum_iterateFunction
          momentum_iterate(iter::MomentumIterator) -> ρ

          Method to implement for a type MomentumIterator. Returns the next momentum value ρ and updates the iterator internal state.

          source
          FrankWolfe.muladd_memory_modeMethod
          (memory_mode::MemoryEmphasis, storage, x, gamma::Real, d)

          Performs storage = x - gamma * d in-place or not depending on MemoryEmphasis

          source
          FrankWolfe.trajectory_callbackMethod
          trajectory_callback(storage)

          Callback pushing the state at each iteration to the passed storage. The state data is only the 5 first fields, usually: (t,primal,dual,dual_gap,time)

          source

          Oracle counting trackers

          The following structures are wrapping given oracles to behave similarly but additionally track the number of calls.

          Also see the example Tracking, counters and custom callbacks for Frank Wolfe.

          Update order for block-coordinate methods

          Block-coordinate methods can be run with different update orders. All update orders are subtypes of FrankWolfe.BlockCoordinateUpdateOrder. They have to implement the method FrankWolfe.select_update_indices which selects which blocks to update in what order.

          FrankWolfe.BlockCoordinateUpdateOrderType

          Update order for a block-coordinate method. A BlockCoordinateUpdateOrder must implement

          select_update_indices(::BlockCoordinateUpdateOrder, s::CallbackState, dual_gaps)
          source
          FrankWolfe.select_update_indicesFunction
          select_update_indices(::BlockCoordinateUpdateOrder, s::CallbackState, dual_gaps)

          Returns a list of lists of the block indices. Each sublist represents one round of updates in an iteration. The indices in a list show which blocks should be updated parallely in one round. For example, a full update is given by [1:l] and a blockwise update by [[i] for i=1:l], where l is the number of blocks.

          source
          FrankWolfe.CyclicUpdateType

          The cyclic update initiates a sequence of update rounds. In each round only one block is updated. The order of the blocks is determined by the given order of the LMOs.

          source
          FrankWolfe.StochasticUpdateType

          The stochastic update initiates a sequence of update rounds. In each round only one block is updated. The order of the blocks is a random.

          source

          Update step for block-coordinate Frank-Wolfe

          Block-coordinate Frank-Wolfe (BCFW) can run different FW algorithms on different blocks. All update steps are subtypes of FrankWolfe.UpdateStep and implement FrankWolfe.update_iterate which defines one iteration of the corresponding method.

          FrankWolfe.UpdateStepType

          Update step for block-coordinate Frank-Wolfe. These are implementations of different FW-algorithms to be used in a blockwise manner. Each update step must implement

          update_iterate(
               step::UpdateStep,
               x,
               lmo,
          @@ -12,7 +12,7 @@
               linesearch_workspace,
               memory_mode,
               epsilon,
          -)
          source
          FrankWolfe.update_iterateFunction
          update_iterate(
               step::UpdateStep,
               x,
               lmo,
          @@ -25,4 +25,4 @@
               linesearch_workspace,
               memory_mode,
               epsilon,
          -)

          Executes one iteration of the defined FrankWolfe.UpdateStep and updates the iterate x implicitly. The function returns a tuple (dual_gap, v, d, gamma, tt):

          • dual_gap is the updated FrankWolfe gap
          • v is the used vertex
          • d is the update direction
          • gamma is the applied step-size
          • tt is the applied step-type
          source
          FrankWolfe.BPCGStepType

          Implementation of the blended pairwise conditional gradient (BPCG) method as an update step for block-coordinate Frank-Wolfe.

          source

          Block vector

          FrankWolfe.BlockVectorType
          BlockVector{T, MT <: AbstractArray{T}, ST <: Tuple} <: AbstractVector{T}

          Represents a vector consisting of blocks. T is the element type of the vector, MT is the type of the underlying data array, and ST is the type of the tuple representing the sizes of each block. Each block can be accessed with the blocks field, and the sizes of the blocks are stored in the block_sizes field.

          source

          Index

          +)

          Executes one iteration of the defined FrankWolfe.UpdateStep and updates the iterate x implicitly. The function returns a tuple (dual_gap, v, d, gamma, tt):

          • dual_gap is the updated FrankWolfe gap
          • v is the used vertex
          • d is the update direction
          • gamma is the applied step-size
          • tt is the applied step-type
          source
          FrankWolfe.BPCGStepType

          Implementation of the blended pairwise conditional gradient (BPCG) method as an update step for block-coordinate Frank-Wolfe.

          source

          Block vector

          FrankWolfe.BlockVectorType
          BlockVector{T, MT <: AbstractArray{T}, ST <: Tuple} <: AbstractVector{T}

          Represents a vector consisting of blocks. T is the element type of the vector, MT is the type of the underlying data array, and ST is the type of the tuple representing the sizes of each block. Each block can be accessed with the blocks field, and the sizes of the blocks are stored in the block_sizes field.

          source

          Index

          diff --git a/dev/reference/4_linesearch/index.html b/dev/reference/4_linesearch/index.html index 900bbb0fd..2f01b1829 100644 --- a/dev/reference/4_linesearch/index.html +++ b/dev/reference/4_linesearch/index.html @@ -1,2 +1,2 @@ -Line search and step size settings · FrankWolfe.jl

          Line search and step size settings

          The step size dictates how far one traverses along a local descent direction. More specifically, the step size $\gamma_t$ is used at each iteration to determine how much the next iterate moves towards the new vertex:

          \[x_{t+1} = x_t - \gamma_t (x_t - v_t).\]

          $\gamma_t = 1$ implies that the next iterate is exactly the vertex, a zero $\gamma_t$ implies that the iterate is not moving.

          The following are step size selection rules for Frank Wolfe algorithms. Some methodologies (e.g. FixedStep and Agnostic) depend only on the iteration number and induce series $\gamma_t$ that are independent of the problem data, while others (e.g. GoldenSearch and Adaptive) change according to local information about the function; the adaptive methods often require extra function and/or gradient computations. The typical options for convex optimization are Agnostic or Adaptive.

          All step size computation strategies are subtypes of FrankWolfe.LineSearchMethod. The key method they have to implement is FrankWolfe.perform_line_search which is called at every iteration to compute the step size gamma.

          FrankWolfe.LineSearchMethodType

          Line search method to apply once the direction is computed. A LineSearchMethod must implement

          perform_line_search(ls::LineSearchMethod, t, f, grad!, gradient, x, d, gamma_max, workspace)

          with d = x - v. It may also implement build_linesearch_workspace(x, gradient) which creates a workspace structure that is passed as last argument to perform_line_search.

          source
          FrankWolfe.perform_line_searchFunction
          perform_line_search(ls::LineSearchMethod, t, f, grad!, gradient, x, d, gamma_max, workspace)

          Returns the step size gamma for step size strategy ls.

          source
          FrankWolfe.AdaptiveType

          Modified adaptive line search test from:

          S. Pokutta "The Frank-Wolfe algorith: a short introduction" (2023), preprint, https://arxiv.org/abs/2311.05313

          It replaces the original test implemented in the AdaptiveZerothOrder line search based on:

          Pedregosa, F., Negiar, G., Askari, A., and Jaggi, M. (2020). "Linearly convergent Frank–Wolfe with backtracking line-search", Proceedings of AISTATS.

          source
          FrankWolfe.AdaptiveZerothOrderType

          Slight modification of the Adaptive Step Size strategy from Pedregosa, Negiar, Askari, Jaggi (2018)

          \[ f(x_t + \gamma_t (x_t - v_t)) - f(x_t) \leq - \alpha \gamma_t \langle \nabla f(x_t), x_t - v_t \rangle + \alpha^2 \frac{\gamma_t^2 \|x_t - v_t\|^2}{2} M ~.\]

          The parameter alpha ∈ (0,1] relaxes the original smoothness condition to mitigate issues with nummerical errors. Its default value is 0.5. The Adaptive struct keeps track of the Lipschitz constant estimate L_est. The keyword argument relaxed_smoothness allows testing with an alternative smoothness condition,

          \[ \langle \nabla f(x_t + \gamma_t (x_t - v_t) ) - \nabla f(x_t), x_t - v_t \rangle \leq \gamma_t M \|x_t - v_t\|^2 ~.\]

          This condition yields potentially smaller and more stable estimations of the Lipschitz constant while being more computationally expensive due to the additional gradient computation.

          It is also the fallback when the Lipschitz constant estimation fails due to numerical errors. perform_line_search also has a should_upgrade keyword argument on whether there should be a temporary upgrade to BigFloat for extended precision.

          source
          FrankWolfe.AgnosticType

          Computes step size: l/(l + t) at iteration t, given l > 0.

          Using l > 2 leads to faster convergence rates than l = 2 over strongly and some uniformly convex set.

          Accelerated Affine-Invariant Convergence Rates of the Frank-Wolfe Algorithm with Open-Loop Step-Sizes, Wirth, Peña, Pokutta (2023), https://arxiv.org/abs/2310.04096

          See also the paper that introduced the study of open-loop step-sizes with l > 2:

          Acceleration of Frank-Wolfe Algorithms with Open-Loop Step-Sizes, Wirth, Kerdreux, Pokutta, (2023), https://arxiv.org/abs/2205.12838

          Fixing l = -1, results in the step size gamma_t = (2 + log(t+1)) / (t + 2 + log(t+1))

          S. Pokutta "The Frank-Wolfe algorith: a short introduction" (2023), https://arxiv.org/abs/2311.05313

          source
          FrankWolfe.GeneralizedAgnosticType

          Computes step size: g(t)/(t + g(t)) at iteration t, given g: R_{>= 0} -> R_{>= 0}.

          Defaults to the best open-loop step-size gamma_t = (2 + log(t+1)) / (t + 2 + log(t+1))

          S. Pokutta "The Frank-Wolfe algorith: a short introduction" (2023), https://arxiv.org/abs/2311.05313

          This step-size is as fast as the step-size gammat = 2 / (t + 2) up to polylogarithmic factors. Further, over strongly convex and some uniformly convex sets, it is faster than any traditional step-size gammat = l / (t + l) for any l in N.

          source
          FrankWolfe.MonotonicNonConvexStepSizeType
          MonotonicNonConvexStepSize{F}

          Represents a monotonic open-loop non-convex step size. Contains a halving factor N increased at each iteration until there is primal progress gamma = 1 / sqrt(t + 1) * 2^(-N).

          source
          FrankWolfe.MonotonicStepSizeType
          MonotonicStepSize{F}

          Represents a monotonic open-loop step size. Contains a halving factor N increased at each iteration until there is primal progress gamma = 2 / (t + 2) * 2^(-N).

          source
          FrankWolfe.ShortstepType

          Computes the 'Short step' step size: dual_gap / (L * norm(x - v)^2), where L is the Lipschitz constant of the gradient, x is the current iterate, and v is the current Frank-Wolfe vertex.

          source

          See Pedregosa, Negiar, Askari, Jaggi (2020) for the adaptive step size, Carderera, Besançon, Pokutta (2021) for the monotonic step size.

          Index

          +Line search and step size settings · FrankWolfe.jl

          Line search and step size settings

          The step size dictates how far one traverses along a local descent direction. More specifically, the step size $\gamma_t$ is used at each iteration to determine how much the next iterate moves towards the new vertex:

          \[x_{t+1} = x_t - \gamma_t (x_t - v_t).\]

          $\gamma_t = 1$ implies that the next iterate is exactly the vertex, a zero $\gamma_t$ implies that the iterate is not moving.

          The following are step size selection rules for Frank Wolfe algorithms. Some methodologies (e.g. FixedStep and Agnostic) depend only on the iteration number and induce series $\gamma_t$ that are independent of the problem data, while others (e.g. GoldenSearch and Adaptive) change according to local information about the function; the adaptive methods often require extra function and/or gradient computations. The typical options for convex optimization are Agnostic or Adaptive.

          All step size computation strategies are subtypes of FrankWolfe.LineSearchMethod. The key method they have to implement is FrankWolfe.perform_line_search which is called at every iteration to compute the step size gamma.

          FrankWolfe.LineSearchMethodType

          Line search method to apply once the direction is computed. A LineSearchMethod must implement

          perform_line_search(ls::LineSearchMethod, t, f, grad!, gradient, x, d, gamma_max, workspace)

          with d = x - v. It may also implement build_linesearch_workspace(x, gradient) which creates a workspace structure that is passed as last argument to perform_line_search.

          source
          FrankWolfe.perform_line_searchFunction
          perform_line_search(ls::LineSearchMethod, t, f, grad!, gradient, x, d, gamma_max, workspace)

          Returns the step size gamma for step size strategy ls.

          source
          FrankWolfe.AdaptiveType

          Modified adaptive line search test from:

          S. Pokutta "The Frank-Wolfe algorith: a short introduction" (2023), preprint, https://arxiv.org/abs/2311.05313

          It replaces the original test implemented in the AdaptiveZerothOrder line search based on:

          Pedregosa, F., Negiar, G., Askari, A., and Jaggi, M. (2020). "Linearly convergent Frank–Wolfe with backtracking line-search", Proceedings of AISTATS.

          source
          FrankWolfe.AdaptiveZerothOrderType

          Slight modification of the Adaptive Step Size strategy from Pedregosa, Negiar, Askari, Jaggi (2018)

          \[ f(x_t + \gamma_t (x_t - v_t)) - f(x_t) \leq - \alpha \gamma_t \langle \nabla f(x_t), x_t - v_t \rangle + \alpha^2 \frac{\gamma_t^2 \|x_t - v_t\|^2}{2} M ~.\]

          The parameter alpha ∈ (0,1] relaxes the original smoothness condition to mitigate issues with nummerical errors. Its default value is 0.5. The Adaptive struct keeps track of the Lipschitz constant estimate L_est. The keyword argument relaxed_smoothness allows testing with an alternative smoothness condition,

          \[ \langle \nabla f(x_t + \gamma_t (x_t - v_t) ) - \nabla f(x_t), x_t - v_t \rangle \leq \gamma_t M \|x_t - v_t\|^2 ~.\]

          This condition yields potentially smaller and more stable estimations of the Lipschitz constant while being more computationally expensive due to the additional gradient computation.

          It is also the fallback when the Lipschitz constant estimation fails due to numerical errors. perform_line_search also has a should_upgrade keyword argument on whether there should be a temporary upgrade to BigFloat for extended precision.

          source
          FrankWolfe.AgnosticType

          Computes step size: l/(l + t) at iteration t, given l > 0.

          Using l > 2 leads to faster convergence rates than l = 2 over strongly and some uniformly convex set.

          Accelerated Affine-Invariant Convergence Rates of the Frank-Wolfe Algorithm with Open-Loop Step-Sizes, Wirth, Peña, Pokutta (2023), https://arxiv.org/abs/2310.04096

          See also the paper that introduced the study of open-loop step-sizes with l > 2:

          Acceleration of Frank-Wolfe Algorithms with Open-Loop Step-Sizes, Wirth, Kerdreux, Pokutta, (2023), https://arxiv.org/abs/2205.12838

          Fixing l = -1, results in the step size gamma_t = (2 + log(t+1)) / (t + 2 + log(t+1))

          S. Pokutta "The Frank-Wolfe algorith: a short introduction" (2023), https://arxiv.org/abs/2311.05313

          source
          FrankWolfe.GeneralizedAgnosticType

          Computes step size: g(t)/(t + g(t)) at iteration t, given g: R_{>= 0} -> R_{>= 0}.

          Defaults to the best open-loop step-size gamma_t = (2 + log(t+1)) / (t + 2 + log(t+1))

          S. Pokutta "The Frank-Wolfe algorith: a short introduction" (2023), https://arxiv.org/abs/2311.05313

          This step-size is as fast as the step-size gammat = 2 / (t + 2) up to polylogarithmic factors. Further, over strongly convex and some uniformly convex sets, it is faster than any traditional step-size gammat = l / (t + l) for any l in N.

          source
          FrankWolfe.MonotonicNonConvexStepSizeType
          MonotonicNonConvexStepSize{F}

          Represents a monotonic open-loop non-convex step size. Contains a halving factor N increased at each iteration until there is primal progress gamma = 1 / sqrt(t + 1) * 2^(-N).

          source
          FrankWolfe.MonotonicStepSizeType
          MonotonicStepSize{F}

          Represents a monotonic open-loop step size. Contains a halving factor N increased at each iteration until there is primal progress gamma = 2 / (t + 2) * 2^(-N).

          source
          FrankWolfe.ShortstepType

          Computes the 'Short step' step size: dual_gap / (L * norm(x - v)^2), where L is the Lipschitz constant of the gradient, x is the current iterate, and v is the current Frank-Wolfe vertex.

          source

          See Pedregosa, Negiar, Askari, Jaggi (2020) for the adaptive step size, Carderera, Besançon, Pokutta (2021) for the monotonic step size.

          Index

          diff --git a/dev/search/index.html b/dev/search/index.html index a32bdc20b..39c07e2f2 100644 --- a/dev/search/index.html +++ b/dev/search/index.html @@ -1,2 +1,2 @@ -Search · FrankWolfe.jl +Search · FrankWolfe.jl