From 1102ae6a0cf933e9d0020133d638157486e6cbad Mon Sep 17 00:00:00 2001 From: "Documenter.jl" Date: Wed, 23 Oct 2024 14:43:23 +0000 Subject: [PATCH] build based on fa0ab28 --- previews/PR513/advanced/index.html | 2 +- previews/PR513/basics/index.html | 2 +- previews/PR513/contributing/index.html | 2 +- .../examples/docs_00_fw_visualized/index.html | 418 ++++++++-------- .../examples/docs_01_mathopt_lmo/index.html | 328 ++++++------ .../docs_02_polynomial_regression/index.html | 466 +++++++++--------- .../docs_03_matrix_completion/index.html | 434 ++++++++-------- .../examples/docs_04_rational_opt/index.html | 46 +- .../examples/docs_05_blended_cg/index.html | 154 +++--- .../examples/docs_06_spectrahedron/index.html | 164 +++--- .../docs_07_shifted_norm_polytopes/index.html | 162 +++--- .../docs_08_callback_and_tracking/index.html | 2 +- .../docs_09_extra_vertex_storage/index.html | 2 +- .../docs_10_alternating_methods/index.html | 208 ++++---- .../docs_11_block_coordinate_fw/index.html | 462 ++++++++--------- .../docs_12_quadratic_symmetric/index.html | 228 ++++----- previews/PR513/index.html | 2 +- .../PR513/reference/0_reference/index.html | 2 +- .../PR513/reference/1_algorithms/index.html | 2 +- previews/PR513/reference/2_lmo/index.html | 2 +- previews/PR513/reference/3_backend/index.html | 6 +- .../PR513/reference/4_linesearch/index.html | 2 +- previews/PR513/search/index.html | 2 +- 23 files changed, 1549 insertions(+), 1549 deletions(-) diff --git a/previews/PR513/advanced/index.html b/previews/PR513/advanced/index.html index b089db467..108b8c0af 100644 --- a/previews/PR513/advanced/index.html +++ b/previews/PR513/advanced/index.html @@ -76,4 +76,4 @@ Base.:-(x1::IT, x2::IT) LinearAlgebra.dot(x1::IT, x2::IT) LinearAlgebra.norm(::IT)

For methods using an FrankWolfe.ActiveSet, the atoms or individual extreme points of the feasible region are not necessarily of the same type as the iterate. They are assumed to be immutable, must implement LinearAlgebra.dot with a gradient object. See for example FrankWolfe.RankOneMatrix or FrankWolfe.ScaledHotVector.

The iterate type IT must be a broadcastable mutable object or implement FrankWolfe.compute_active_set_iterate!:

FrankWolfe.compute_active_set_iterate!(active_set::FrankWolfe.ActiveSet{AT, R, IT}) where {AT, R}

which recomputes the iterate from the current convex decomposition and the following methods FrankWolfe.active_set_update_scale! and FrankWolfe.active_set_update_iterate_pairwise!:

FrankWolfe.active_set_update_scale!(x::IT, lambda, atom)
-FrankWolfe.active_set_update_iterate_pairwise!(x::IT, lambda, fw_atom, away_atom)

Symmetry reduction

Example: examples/reynolds.jl

Suppose that there is a group $G$ acting on the underlying vector space and such that for all $x\in\mathcal{C}$ and $g\in G$

\[f(g\cdot x)=f(x)\quad\text{and}\quad g\cdot x\in\mathcal{C}.\]

Then, the computations can be performed in the subspace invariant under $G$. This subspace is the image of the Reynolds operator defined by

\[\mathcal{R}(x)=\frac{1}{|G|}\sum_{g\in G}g\cdot x.\]

In practice, the type SymmetricLMO allows the user to provide the Reynolds operator $\mathcal{R}$ as well as its adjoint $\mathcal{R}^\ast$. The gradient is symmetrised with $\mathcal{R}^\ast$, then passed to the non-symmetric LMO, and the resulting output is symmetrised with $\mathcal{R}$. In many cases, the gradient is already symmetric so that reynolds_adjoint(gradient, lmo) = gradient is a fast and valid choice.

+FrankWolfe.active_set_update_iterate_pairwise!(x::IT, lambda, fw_atom, away_atom)

Symmetry reduction

Example: examples/reynolds.jl

Suppose that there is a group $G$ acting on the underlying vector space and such that for all $x\in\mathcal{C}$ and $g\in G$

\[f(g\cdot x)=f(x)\quad\text{and}\quad g\cdot x\in\mathcal{C}.\]

Then, the computations can be performed in the subspace invariant under $G$. This subspace is the image of the Reynolds operator defined by

\[\mathcal{R}(x)=\frac{1}{|G|}\sum_{g\in G}g\cdot x.\]

In practice, the type SymmetricLMO allows the user to provide the Reynolds operator $\mathcal{R}$ as well as its adjoint $\mathcal{R}^\ast$. The gradient is symmetrised with $\mathcal{R}^\ast$, then passed to the non-symmetric LMO, and the resulting output is symmetrised with $\mathcal{R}$. In many cases, the gradient is already symmetric so that reynolds_adjoint(gradient, lmo) = gradient is a fast and valid choice.

diff --git a/previews/PR513/basics/index.html b/previews/PR513/basics/index.html index 247280f39..e255c8e00 100644 --- a/previews/PR513/basics/index.html +++ b/previews/PR513/basics/index.html @@ -1,2 +1,2 @@ -How does it work? · FrankWolfe.jl

How does it work?

FrankWolfe.jl contains generic routines to solve optimization problems of the form

\[\min_{x \in \mathcal{C}} f(x)\]

where $\mathcal{C}$ is a compact convex set and $f$ is a differentiable function. These routines work by solving a sequence of linear subproblems:

\[\min_{x \in \mathcal{C}} \langle d_k, x \rangle \quad \text{where} \quad d_k = \nabla f(x_k)\]

Linear Minimization Oracles

The Linear Minimization Oracle (LMO) is a key component, which is called at each iteration of the FW algorithm. Given a direction $d$, it returns an optimal vertex of the feasible set:

\[v \in \arg \min_{x\in \mathcal{C}} \langle d,x \rangle.\]

Custom LMOs

To be used by the algorithms provided here, an LMO must be a subtype of FrankWolfe.LinearMinimizationOracle and implement the following method:

compute_extreme_point(lmo::LMO, direction; kwargs...) -> v

This method should minimize $v \mapsto \langle d, v \rangle$ over the set $\mathcal{C}$ defined by the LMO. Note that this means the set $\mathcal{C}$ doesn't have to be represented explicitly: all we need is to be able to minimize a linear function over it, even if the minimization procedure is a black box.

Pre-defined LMOs

If you don't want to define your LMO manually, several common implementations are available out-of-the-box:

  • Simplices: unit simplex, probability simplex
  • Balls in various norms
  • Polytopes: K-sparse, Birkhoff

You can use an oracle defined via a Linear Programming solver (e.g. SCIP or HiGHS) with MathOptInferface: see FrankWolfe.MathOptLMO.

Finally, we provide wrappers to combine oracles easily, for example in a product.

See Combettes, Pokutta (2021) for references on most LMOs implemented in the package and their comparison with projection operators.

Optimization algorithms

The package features several variants of Frank-Wolfe that share the same basic API.

Most of the algorithms listed below also have a lazified version: see Braun, Pokutta, Zink (2016).

Standard Frank-Wolfe (FW)

It is implemented in the frank_wolfe function.

See Jaggi (2013) for an overview.

This algorithm works both for convex and non-convex functions (use step size rule FrankWolfe.Nonconvex() in the second case).

Away-step Frank-Wolfe (AFW)

It is implemented in the away_frank_wolfe function.

See Lacoste-Julien, Jaggi (2015) for an overview.

Stochastic Frank-Wolfe (SFW)

It is implemented in the FrankWolfe.stochastic_frank_wolfe function.

Blended Conditional Gradients (BCG)

It is implemented in the blended_conditional_gradient function, with a built-in stability feature that temporarily increases accuracy.

See Braun, Pokutta, Tu, Wright (2018).

Pairwise Frank-Wolfe (PFW)

It is implemented in the pairwise_frank_wolfe function. See Lacoste-Julien, Jaggi (2015) for an overview.

Blended Pairwise Conditional Gradients (BPCG)

It is implemented in the FrankWolfe.blended_pairwise_conditional_gradient function, with a minor modification to improve sparsity.

See Tsuji, Tanaka, Pokutta (2021)

Comparison

The following table compares the characteristics of the algorithms presented in the package:

AlgorithmProgress/IterationTime/IterationSparsityNumerical StabilityActive SetLazifiable
FWLowLowLowHighNoYes
AFWMediumMedium-HighMediumMedium-HighYesYes
B(P)CGHighMedium-HighHighMediumYesBy design
SFWLowLowLowHighNoNo

While the standard Frank-Wolfe algorithm can only move towards extreme points of the compact convex set $\mathcal{C}$, Away-step Frank-Wolfe can move away from them. The following figure from our paper illustrates this behaviour:

FW vs AFW.

Both algorithms minimize a quadratic function (whose contour lines are depicted) over a simple polytope (the black square). When the minimizer lies on a face, the standard Frank-Wolfe algorithm zig-zags towards the solution, while its Away-step variant converges more quickly.

Block-Coordinate Frank-Wolfe (BCFW)

It is implemented in the FrankWolfe.block_coordinate_frank_wolfe function.

See Lacoste-Julien, Jaggi, Schmidt, Pletscher (2013) and Beck, Pauwels, Sabach (2015) for more details about different variants of Block-Coordinate Frank-Wolfe.

Alternating Linear Minimization (ALM)

It is implemented in the FrankWolfe.alternating_linear_minimization function.

+How does it work? · FrankWolfe.jl

How does it work?

FrankWolfe.jl contains generic routines to solve optimization problems of the form

\[\min_{x \in \mathcal{C}} f(x)\]

where $\mathcal{C}$ is a compact convex set and $f$ is a differentiable function. These routines work by solving a sequence of linear subproblems:

\[\min_{x \in \mathcal{C}} \langle d_k, x \rangle \quad \text{where} \quad d_k = \nabla f(x_k)\]

Linear Minimization Oracles

The Linear Minimization Oracle (LMO) is a key component, which is called at each iteration of the FW algorithm. Given a direction $d$, it returns an optimal vertex of the feasible set:

\[v \in \arg \min_{x\in \mathcal{C}} \langle d,x \rangle.\]

Custom LMOs

To be used by the algorithms provided here, an LMO must be a subtype of FrankWolfe.LinearMinimizationOracle and implement the following method:

compute_extreme_point(lmo::LMO, direction; kwargs...) -> v

This method should minimize $v \mapsto \langle d, v \rangle$ over the set $\mathcal{C}$ defined by the LMO. Note that this means the set $\mathcal{C}$ doesn't have to be represented explicitly: all we need is to be able to minimize a linear function over it, even if the minimization procedure is a black box.

Pre-defined LMOs

If you don't want to define your LMO manually, several common implementations are available out-of-the-box:

  • Simplices: unit simplex, probability simplex
  • Balls in various norms
  • Polytopes: K-sparse, Birkhoff

You can use an oracle defined via a Linear Programming solver (e.g. SCIP or HiGHS) with MathOptInferface: see FrankWolfe.MathOptLMO.

Finally, we provide wrappers to combine oracles easily, for example in a product.

See Combettes, Pokutta (2021) for references on most LMOs implemented in the package and their comparison with projection operators.

Optimization algorithms

The package features several variants of Frank-Wolfe that share the same basic API.

Most of the algorithms listed below also have a lazified version: see Braun, Pokutta, Zink (2016).

Standard Frank-Wolfe (FW)

It is implemented in the frank_wolfe function.

See Jaggi (2013) for an overview.

This algorithm works both for convex and non-convex functions (use step size rule FrankWolfe.Nonconvex() in the second case).

Away-step Frank-Wolfe (AFW)

It is implemented in the away_frank_wolfe function.

See Lacoste-Julien, Jaggi (2015) for an overview.

Stochastic Frank-Wolfe (SFW)

It is implemented in the FrankWolfe.stochastic_frank_wolfe function.

Blended Conditional Gradients (BCG)

It is implemented in the blended_conditional_gradient function, with a built-in stability feature that temporarily increases accuracy.

See Braun, Pokutta, Tu, Wright (2018).

Pairwise Frank-Wolfe (PFW)

It is implemented in the pairwise_frank_wolfe function. See Lacoste-Julien, Jaggi (2015) for an overview.

Blended Pairwise Conditional Gradients (BPCG)

It is implemented in the FrankWolfe.blended_pairwise_conditional_gradient function, with a minor modification to improve sparsity.

See Tsuji, Tanaka, Pokutta (2021)

Comparison

The following table compares the characteristics of the algorithms presented in the package:

AlgorithmProgress/IterationTime/IterationSparsityNumerical StabilityActive SetLazifiable
FWLowLowLowHighNoYes
AFWMediumMedium-HighMediumMedium-HighYesYes
B(P)CGHighMedium-HighHighMediumYesBy design
SFWLowLowLowHighNoNo

While the standard Frank-Wolfe algorithm can only move towards extreme points of the compact convex set $\mathcal{C}$, Away-step Frank-Wolfe can move away from them. The following figure from our paper illustrates this behaviour:

FW vs AFW.

Both algorithms minimize a quadratic function (whose contour lines are depicted) over a simple polytope (the black square). When the minimizer lies on a face, the standard Frank-Wolfe algorithm zig-zags towards the solution, while its Away-step variant converges more quickly.

Block-Coordinate Frank-Wolfe (BCFW)

It is implemented in the FrankWolfe.block_coordinate_frank_wolfe function.

See Lacoste-Julien, Jaggi, Schmidt, Pletscher (2013) and Beck, Pauwels, Sabach (2015) for more details about different variants of Block-Coordinate Frank-Wolfe.

Alternating Linear Minimization (ALM)

It is implemented in the FrankWolfe.alternating_linear_minimization function.

diff --git a/previews/PR513/contributing/index.html b/previews/PR513/contributing/index.html index 33f629f5c..adec4dd78 100644 --- a/previews/PR513/contributing/index.html +++ b/previews/PR513/contributing/index.html @@ -4,4 +4,4 @@ """ function f(x) # ... -end

Provide a new example or test

If you fix a bug, one would typically expect to add a test that validates that the bug is gone. A test would be added in a file in the test/ folder, for which the entry point is runtests.jl.

The examples/ folder features several examples covering different problem settings and algorithms. The examples are expected to run with the same environment and dependencies as the tests using TestEnv. If the example is lightweight enough, it can be added to the docs/src/examples/ folder which generates pages for the documentation based on Literate.jl.

Provide a new feature

Contributions bringing new features are also welcome. If the feature is likely to impact performance, some benchmarks should be run with BenchmarkTools on several of the examples to assert the effect at different problem sizes. If the feature should only be active in some cases, a keyword should be added to the main algorithms to support it.

Some typical features to implement are:

  1. A new Linear Minimization Oracle (LMO)
  2. A new step size
  3. A new algorithm (less frequent) following the same API.

Code style

We try to follow the Julia documentation guidelines. We run JuliaFormatter.jl on the repo in the way set in the .JuliaFormatter.toml file, which enforces a number of conventions.

This contribution guide was inspired by ColPrac and the one in Manopt.jl.

+end

Provide a new example or test

If you fix a bug, one would typically expect to add a test that validates that the bug is gone. A test would be added in a file in the test/ folder, for which the entry point is runtests.jl.

The examples/ folder features several examples covering different problem settings and algorithms. The examples are expected to run with the same environment and dependencies as the tests using TestEnv. If the example is lightweight enough, it can be added to the docs/src/examples/ folder which generates pages for the documentation based on Literate.jl.

Provide a new feature

Contributions bringing new features are also welcome. If the feature is likely to impact performance, some benchmarks should be run with BenchmarkTools on several of the examples to assert the effect at different problem sizes. If the feature should only be active in some cases, a keyword should be added to the main algorithms to support it.

Some typical features to implement are:

  1. A new Linear Minimization Oracle (LMO)
  2. A new step size
  3. A new algorithm (less frequent) following the same API.

Code style

We try to follow the Julia documentation guidelines. We run JuliaFormatter.jl on the repo in the way set in the .JuliaFormatter.toml file, which enforces a number of conventions.

This contribution guide was inspired by ColPrac and the one in Manopt.jl.

diff --git a/previews/PR513/examples/docs_00_fw_visualized/index.html b/previews/PR513/examples/docs_00_fw_visualized/index.html index 75781e464..882a373de 100644 --- a/previews/PR513/examples/docs_00_fw_visualized/index.html +++ b/previews/PR513/examples/docs_00_fw_visualized/index.html @@ -122,123 +122,123 @@ ) - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

plot chosen vertices

scatter!([vertices[1][1]], [vertices[1][2]], m=:diamond, markersize=6, color=colors[1], label="v_1")
 scatter!(
     [vertices[2][1]],
@@ -252,125 +252,125 @@
 )
- + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

This page was generated using Literate.jl.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

This page was generated using Literate.jl.

diff --git a/previews/PR513/examples/docs_01_mathopt_lmo/index.html b/previews/PR513/examples/docs_01_mathopt_lmo/index.html index 782c782e5..e7f68d899 100644 --- a/previews/PR513/examples/docs_01_mathopt_lmo/index.html +++ b/previews/PR513/examples/docs_01_mathopt_lmo/index.html @@ -130,191 +130,191 @@ ) - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + -

This page was generated using Literate.jl.

+

This page was generated using Literate.jl.

diff --git a/previews/PR513/examples/docs_02_polynomial_regression/index.html b/previews/PR513/examples/docs_02_polynomial_regression/index.html index 8f162bfb5..6ce5a8603 100644 --- a/previews/PR513/examples/docs_02_polynomial_regression/index.html +++ b/previews/PR513/examples/docs_02_polynomial_regression/index.html @@ -246,260 +246,260 @@ ) - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + -

This page was generated using Literate.jl.

+

This page was generated using Literate.jl.

diff --git a/previews/PR513/examples/docs_03_matrix_completion/index.html b/previews/PR513/examples/docs_03_matrix_completion/index.html index 170ce71ef..934da48c3 100644 --- a/previews/PR513/examples/docs_03_matrix_completion/index.html +++ b/previews/PR513/examples/docs_03_matrix_completion/index.html @@ -265,244 +265,244 @@ ) - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + -

This page was generated using Literate.jl.

+

This page was generated using Literate.jl.

diff --git a/previews/PR513/examples/docs_04_rational_opt/index.html b/previews/PR513/examples/docs_04_rational_opt/index.html index 48eb32003..2e7f64bcb 100644 --- a/previews/PR513/examples/docs_04_rational_opt/index.html +++ b/previews/PR513/examples/docs_04_rational_opt/index.html @@ -34,17 +34,17 @@ Type Iteration Primal Dual Dual Gap Time It/sec ------------------------------------------------------------------------------------------------- I 1 1.000000e+00 -1.000000e+00 2.000000e+00 0.000000e+00 Inf - FW 10 1.407407e-01 -1.407407e-01 2.814815e-01 4.793737e-01 2.086055e+01 - FW 20 6.842105e-02 -6.842105e-02 1.368421e-01 4.810763e-01 4.157345e+01 - FW 30 4.521073e-02 -4.521073e-02 9.042146e-02 4.827817e-01 6.213988e+01 - FW 40 3.376068e-02 -3.376068e-02 6.752137e-02 4.847566e-01 8.251563e+01 - FW 50 2.693878e-02 -2.693878e-02 5.387755e-02 4.867635e-01 1.027193e+02 - FW 60 2.241055e-02 -2.241055e-02 4.482109e-02 4.888561e-01 1.227355e+02 - FW 70 1.918565e-02 -1.918565e-02 3.837129e-02 4.912267e-01 1.425004e+02 - FW 80 1.677215e-02 -1.677215e-02 3.354430e-02 4.934608e-01 1.621203e+02 - FW 90 1.489804e-02 -1.489804e-02 2.979609e-02 4.959325e-01 1.814763e+02 - FW 100 1.340067e-02 -1.340067e-02 2.680135e-02 4.985971e-01 2.005627e+02 - Last 101 1.314422e-02 -1.236767e-02 2.551189e-02 4.995164e-01 2.021956e+02 + FW 10 1.407407e-01 -1.407407e-01 2.814815e-01 4.516895e-01 2.213910e+01 + FW 20 6.842105e-02 -6.842105e-02 1.368421e-01 4.533268e-01 4.411829e+01 + FW 30 4.521073e-02 -4.521073e-02 9.042146e-02 4.548831e-01 6.595101e+01 + FW 40 3.376068e-02 -3.376068e-02 6.752137e-02 4.565929e-01 8.760539e+01 + FW 50 2.693878e-02 -2.693878e-02 5.387755e-02 4.583418e-01 1.090889e+02 + FW 60 2.241055e-02 -2.241055e-02 4.482109e-02 4.602469e-01 1.303648e+02 + FW 70 1.918565e-02 -1.918565e-02 3.837129e-02 4.623328e-01 1.514061e+02 + FW 80 1.677215e-02 -1.677215e-02 3.354430e-02 4.644758e-01 1.722372e+02 + FW 90 1.489804e-02 -1.489804e-02 2.979609e-02 4.668986e-01 1.927613e+02 + FW 100 1.340067e-02 -1.340067e-02 2.680135e-02 4.695284e-01 2.129797e+02 + Last 101 1.314422e-02 -1.236767e-02 2.551189e-02 4.704333e-01 2.146957e+02 ------------------------------------------------------------------------------------------------- Output type of solution: BigFloat

Another possible step-size rule is rationalshortstep which computes the step size by minimizing the smoothness inequality as $\gamma_t=\frac{\langle \nabla f(x_t),x_t-v_t\rangle}{2L||x_t-v_t||^2}$. However, as this step size depends on an upper bound on the Lipschitz constant $L$ as well as the inner product with the gradient $\nabla f(x_t)$, both have to be of a rational type.

@time x, v, primal, dual_gap, trajectory = FrankWolfe.frank_wolfe(
@@ -67,16 +67,16 @@
   Type     Iteration         Primal           Dual       Dual Gap           Time         It/sec
 -------------------------------------------------------------------------------------------------
      I             1   1.000000e+00  -1.000000e+00   2.000000e+00   0.000000e+00            Inf
-    FW            10   1.000000e-01  -1.000000e-01   2.000000e-01   4.110217e-01   2.432961e+01
-    FW            20   5.000000e-02  -5.000000e-02   1.000000e-01   4.129754e-01   4.842903e+01
-    FW            30   3.333333e-02  -3.333333e-02   6.666667e-02   4.143989e-01   7.239401e+01
-    FW            40   2.500000e-02  -2.500000e-02   5.000000e-02   4.161889e-01   9.611020e+01
-    FW            50   2.000000e-02  -2.000000e-02   4.000000e-02   4.182392e-01   1.195488e+02
-    FW            60   1.666667e-02  -1.666667e-02   3.333333e-02   4.206905e-01   1.426227e+02
-    FW            70   1.428571e-02  -1.428571e-02   2.857143e-02   4.233435e-01   1.653504e+02
-    FW            80   1.250000e-02  -1.250000e-02   2.500000e-02   4.261979e-01   1.877062e+02
-    FW            90   1.111111e-02  -1.111111e-02   2.222222e-02   4.293863e-01   2.096015e+02
-    FW           100   1.000000e-02   1.000000e-02   1.889162e-78   4.328681e-01   2.310173e+02
-  Last           100   1.000000e-02   1.000000e-02   2.159042e-78   4.336161e-01   2.306188e+02
+    FW            10   1.000000e-01  -1.000000e-01   2.000000e-01   3.892469e-01   2.569064e+01
+    FW            20   5.000000e-02  -5.000000e-02   1.000000e-01   3.908916e-01   5.116508e+01
+    FW            30   3.333333e-02  -3.333333e-02   6.666667e-02   3.922957e-01   7.647293e+01
+    FW            40   2.500000e-02  -2.500000e-02   5.000000e-02   3.940428e-01   1.015118e+02
+    FW            50   2.000000e-02  -2.000000e-02   4.000000e-02   3.960325e-01   1.262523e+02
+    FW            60   1.666667e-02  -1.666667e-02   3.333333e-02   3.983541e-01   1.506198e+02
+    FW            70   1.428571e-02  -1.428571e-02   2.857143e-02   4.009073e-01   1.746039e+02
+    FW            80   1.250000e-02  -1.250000e-02   2.500000e-02   4.037054e-01   1.981643e+02
+    FW            90   1.111111e-02  -1.111111e-02   2.222222e-02   4.068236e-01   2.212261e+02
+    FW           100   1.000000e-02   1.000000e-02   1.889162e-78   4.102987e-01   2.437249e+02
+  Last           100   1.000000e-02   1.000000e-02   2.159042e-78   4.110375e-01   2.432868e+02
 -------------------------------------------------------------------------------------------------
-  0.700205 seconds (1.63 M allocations: 92.355 MiB, 1.58% compilation time)

Note: at the last step, we exactly close the gap, finding the solution 1//n * ones(n)


This page was generated using Literate.jl.

+ 0.652839 seconds (1.63 M allocations: 92.354 MiB, 1.60% compilation time)

Note: at the last step, we exactly close the gap, finding the solution 1//n * ones(n)


This page was generated using Literate.jl.

diff --git a/previews/PR513/examples/docs_05_blended_cg/index.html b/previews/PR513/examples/docs_05_blended_cg/index.html index e4ecf1a2e..dea7c2cc8 100644 --- a/previews/PR513/examples/docs_05_blended_cg/index.html +++ b/previews/PR513/examples/docs_05_blended_cg/index.html @@ -154,104 +154,104 @@ plot_trajectories(data, label, xscalelog=true) - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - + + + + + + + + + + + + + + - + - - - - - - - - - - - - - + + + + + + + + + + + + + -

This page was generated using Literate.jl.

+

This page was generated using Literate.jl.

diff --git a/previews/PR513/examples/docs_06_spectrahedron/index.html b/previews/PR513/examples/docs_06_spectrahedron/index.html index 2a46e3093..67483529d 100644 --- a/previews/PR513/examples/docs_06_spectrahedron/index.html +++ b/previews/PR513/examples/docs_06_spectrahedron/index.html @@ -67,7 +67,7 @@ Type Iteration Primal Dual Dual Gap Time It/sec ------------------------------------------------------------------------------------------------- I 1 1.018597e+00 1.014119e+00 4.477824e-03 0.000000e+00 Inf - Last 25 1.014314e+00 1.014314e+00 9.179324e-09 1.055601e+00 2.368320e+01 + Last 25 1.014314e+00 1.014314e+00 9.179324e-09 9.996542e-01 2.500865e+01 ------------------------------------------------------------------------------------------------- Lazified Conditional Gradient (Frank-Wolfe + Lazification). @@ -80,113 +80,113 @@ Type Iteration Primal Dual Dual Gap Time It/sec Cache Size ---------------------------------------------------------------------------------------------------------------- I 1 1.018597e+00 1.014119e+00 4.477824e-03 0.000000e+00 Inf 1 - LD 2 1.014317e+00 1.014314e+00 3.630596e-06 6.273272e-01 3.188129e+00 2 - LD 3 1.014315e+00 1.014314e+00 1.025225e-06 6.696799e-01 4.479752e+00 3 - LD 4 1.014315e+00 1.014314e+00 5.032060e-07 7.122760e-01 5.615801e+00 4 - LD 6 1.014314e+00 1.014314e+00 1.996252e-07 7.680653e-01 7.811836e+00 5 - LD 9 1.014314e+00 1.014314e+00 8.299030e-08 8.394617e-01 1.072116e+01 6 - LD 13 1.014314e+00 1.014314e+00 3.827847e-08 9.253858e-01 1.404819e+01 7 - LD 19 1.014314e+00 1.014314e+00 1.745621e-08 1.046495e+00 1.815584e+01 8 - LD 27 1.014314e+00 1.014314e+00 8.503621e-09 1.198615e+00 2.252601e+01 9 - Last 27 1.014314e+00 1.014314e+00 7.896182e-09 1.282149e+00 2.105840e+01 10 + LD 2 1.014317e+00 1.014314e+00 3.630596e-06 6.113811e-01 3.271282e+00 2 + LD 3 1.014315e+00 1.014314e+00 1.025225e-06 6.540339e-01 4.586918e+00 3 + LD 4 1.014315e+00 1.014314e+00 5.032060e-07 6.922260e-01 5.778459e+00 4 + LD 6 1.014314e+00 1.014314e+00 1.996252e-07 7.449581e-01 8.054144e+00 5 + LD 9 1.014314e+00 1.014314e+00 8.299030e-08 8.114365e-01 1.109144e+01 6 + LD 13 1.014314e+00 1.014314e+00 3.827847e-08 8.934091e-01 1.455100e+01 7 + LD 19 1.014314e+00 1.014314e+00 1.745621e-08 1.006580e+00 1.887580e+01 8 + LD 27 1.014314e+00 1.014314e+00 8.503621e-09 1.155285e+00 2.337085e+01 9 + Last 27 1.014314e+00 1.014314e+00 7.896182e-09 1.232215e+00 2.191176e+01 10 ----------------------------------------------------------------------------------------------------------------

Plotting the resulting trajectories

data = [trajectory, trajectory_lazy]
 label = ["FW", "LCG"]
 plot_trajectories(data, label, xscalelog=true)
- + - + - + - + - + - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + -

This page was generated using Literate.jl.

+

This page was generated using Literate.jl.

diff --git a/previews/PR513/examples/docs_07_shifted_norm_polytopes/index.html b/previews/PR513/examples/docs_07_shifted_norm_polytopes/index.html index fae25f877..501403785 100644 --- a/previews/PR513/examples/docs_07_shifted_norm_polytopes/index.html +++ b/previews/PR513/examples/docs_07_shifted_norm_polytopes/index.html @@ -67,27 +67,27 @@ Type Iteration Primal Dual Dual Gap Time It/sec ------------------------------------------------------------------------------------------------- I 1 2.000000e+00 -6.000000e+00 8.000000e+00 0.000000e+00 Inf - FW 50 2.198243e-01 1.859119e-01 3.391239e-02 1.015024e-01 4.925991e+02 - FW 100 2.104540e-01 1.927834e-01 1.767061e-02 1.017787e-01 9.825238e+02 - FW 150 2.071345e-01 1.951277e-01 1.200679e-02 1.020459e-01 1.469926e+03 - FW 200 2.054240e-01 1.963167e-01 9.107240e-03 1.023027e-01 1.954983e+03 - FW 250 2.043783e-01 1.970372e-01 7.341168e-03 1.025589e-01 2.437623e+03 - FW 300 2.036722e-01 1.975209e-01 6.151268e-03 1.028142e-01 2.917886e+03 - FW 350 2.031630e-01 1.978684e-01 5.294582e-03 1.030667e-01 3.395860e+03 - FW 400 2.027782e-01 1.981301e-01 4.648079e-03 1.033194e-01 3.871490e+03 - FW 450 2.024772e-01 1.983344e-01 4.142727e-03 1.035715e-01 4.344823e+03 - FW 500 2.022352e-01 1.984984e-01 3.736776e-03 1.038218e-01 4.815944e+03 - FW 550 2.020364e-01 1.986329e-01 3.403479e-03 1.040738e-01 5.284711e+03 - FW 600 2.018701e-01 1.987452e-01 3.124906e-03 1.043254e-01 5.751237e+03 - FW 650 2.017290e-01 1.988404e-01 2.888583e-03 1.045763e-01 6.215556e+03 - FW 700 2.016078e-01 1.989222e-01 2.685564e-03 1.048266e-01 6.677694e+03 - FW 750 2.015024e-01 1.989932e-01 2.509264e-03 1.050761e-01 7.137681e+03 - FW 800 2.014101e-01 1.990554e-01 2.354727e-03 1.053275e-01 7.595360e+03 - FW 850 2.013284e-01 1.991103e-01 2.218154e-03 1.056031e-01 8.049008e+03 - FW 900 2.012558e-01 1.991592e-01 2.096580e-03 1.058567e-01 8.502062e+03 - FW 950 2.011906e-01 1.992030e-01 1.987662e-03 1.061109e-01 8.952898e+03 - FW 1000 2.011319e-01 1.992424e-01 1.889519e-03 1.063658e-01 9.401518e+03 - Last 1001 2.011297e-01 1.992439e-01 1.885794e-03 1.065131e-01 9.397905e+03 + FW 50 2.198243e-01 1.859119e-01 3.391239e-02 9.696344e-02 5.156583e+02 + FW 100 2.104540e-01 1.927834e-01 1.767061e-02 9.723661e-02 1.028419e+03 + FW 150 2.071345e-01 1.951277e-01 1.200679e-02 9.749763e-02 1.538499e+03 + FW 200 2.054240e-01 1.963167e-01 9.107240e-03 9.775243e-02 2.045985e+03 + FW 250 2.043783e-01 1.970372e-01 7.341168e-03 9.800608e-02 2.550862e+03 + FW 300 2.036722e-01 1.975209e-01 6.151268e-03 9.826108e-02 3.053091e+03 + FW 350 2.031630e-01 1.978684e-01 5.294582e-03 9.851163e-02 3.552880e+03 + FW 400 2.027782e-01 1.981301e-01 4.648079e-03 9.876137e-02 4.050167e+03 + FW 450 2.024772e-01 1.983344e-01 4.142727e-03 9.901863e-02 4.544599e+03 + FW 500 2.022352e-01 1.984984e-01 3.736776e-03 9.926868e-02 5.036835e+03 + FW 550 2.020364e-01 1.986329e-01 3.403479e-03 9.951886e-02 5.526590e+03 + FW 600 2.018701e-01 1.987452e-01 3.124906e-03 9.976657e-02 6.014039e+03 + FW 650 2.017290e-01 1.988404e-01 2.888583e-03 1.000165e-01 6.498927e+03 + FW 700 2.016078e-01 1.989222e-01 2.685564e-03 1.002641e-01 6.981561e+03 + FW 750 2.015024e-01 1.989932e-01 2.509264e-03 1.005144e-01 7.461615e+03 + FW 800 2.014101e-01 1.990554e-01 2.354727e-03 1.007631e-01 7.939417e+03 + FW 850 2.013284e-01 1.991103e-01 2.218154e-03 1.010313e-01 8.413234e+03 + FW 900 2.012558e-01 1.991592e-01 2.096580e-03 1.012877e-01 8.885581e+03 + FW 950 2.011906e-01 1.992030e-01 1.987662e-03 1.015386e-01 9.356051e+03 + FW 1000 2.011319e-01 1.992424e-01 1.889519e-03 1.017875e-01 9.824387e+03 + Last 1001 2.011297e-01 1.992439e-01 1.885794e-03 1.019392e-01 9.819582e+03 ------------------------------------------------------------------------------------------------- Final solution: [1.799813188674937, 0.5986834801090863] @@ -102,27 +102,27 @@ Type Iteration Primal Dual Dual Gap Time It/sec ------------------------------------------------------------------------------------------------- I 1 1.300000e+01 -1.900000e+01 3.200000e+01 0.000000e+00 Inf - FW 50 1.084340e-02 -7.590380e-02 8.674720e-02 5.393923e-02 9.269691e+02 - FW 100 5.509857e-03 -3.856900e-02 4.407886e-02 5.420225e-02 1.844942e+03 - FW 150 3.695414e-03 -2.586790e-02 2.956331e-02 5.446009e-02 2.754310e+03 - FW 200 2.780453e-03 -1.946317e-02 2.224362e-02 5.471665e-02 3.655194e+03 - FW 250 2.228830e-03 -1.560181e-02 1.783064e-02 5.496938e-02 4.547986e+03 - FW 300 1.859926e-03 -1.301948e-02 1.487941e-02 5.522845e-02 5.431983e+03 - FW 350 1.595838e-03 -1.117087e-02 1.276670e-02 5.550368e-02 6.305888e+03 - FW 400 1.397443e-03 -9.782098e-03 1.117954e-02 5.576413e-02 7.173070e+03 - FW 450 1.242935e-03 -8.700548e-03 9.943483e-03 5.601527e-02 8.033524e+03 - FW 500 1.119201e-03 -7.834409e-03 8.953610e-03 5.626873e-02 8.885930e+03 - FW 550 1.017878e-03 -7.125146e-03 8.143024e-03 5.652592e-02 9.730050e+03 - FW 600 9.333816e-04 -6.533671e-03 7.467053e-03 5.677976e-02 1.056715e+04 - FW 650 8.618413e-04 -6.032889e-03 6.894730e-03 5.703222e-02 1.139707e+04 - FW 700 8.004890e-04 -5.603423e-03 6.403912e-03 5.728866e-02 1.221882e+04 - FW 750 7.472928e-04 -5.231050e-03 5.978342e-03 5.754350e-02 1.303362e+04 - FW 800 7.007275e-04 -4.905093e-03 5.605820e-03 5.779380e-02 1.384232e+04 - FW 850 6.596259e-04 -4.617381e-03 5.277007e-03 5.804625e-02 1.464350e+04 - FW 900 6.230796e-04 -4.361557e-03 4.984637e-03 5.829990e-02 1.543742e+04 - FW 950 5.903710e-04 -4.132597e-03 4.722968e-03 5.854999e-02 1.622545e+04 - FW 1000 5.609256e-04 -3.926479e-03 4.487405e-03 5.879934e-02 1.700699e+04 - Last 1001 5.598088e-04 -3.918661e-03 4.478470e-03 5.894935e-02 1.698068e+04 + FW 50 1.084340e-02 -7.590380e-02 8.674720e-02 5.063746e-02 9.874112e+02 + FW 100 5.509857e-03 -3.856900e-02 4.407886e-02 5.090520e-02 1.964436e+03 + FW 150 3.695414e-03 -2.586790e-02 2.956331e-02 5.116671e-02 2.931594e+03 + FW 200 2.780453e-03 -1.946317e-02 2.224362e-02 5.142035e-02 3.889511e+03 + FW 250 2.228830e-03 -1.560181e-02 1.783064e-02 5.167687e-02 4.837754e+03 + FW 300 1.859926e-03 -1.301948e-02 1.487941e-02 5.193809e-02 5.776108e+03 + FW 350 1.595838e-03 -1.117087e-02 1.276670e-02 5.219277e-02 6.705910e+03 + FW 400 1.397443e-03 -9.782098e-03 1.117954e-02 5.244688e-02 7.626765e+03 + FW 450 1.242935e-03 -8.700548e-03 9.943483e-03 5.269784e-02 8.539250e+03 + FW 500 1.119201e-03 -7.834409e-03 8.953610e-03 5.294670e-02 9.443460e+03 + FW 550 1.017878e-03 -7.125146e-03 8.143024e-03 5.319896e-02 1.033855e+04 + FW 600 9.333816e-04 -6.533671e-03 7.467053e-03 5.344968e-02 1.122551e+04 + FW 650 8.618413e-04 -6.032889e-03 6.894730e-03 5.369907e-02 1.210449e+04 + FW 700 8.004890e-04 -5.603423e-03 6.403912e-03 5.394700e-02 1.297570e+04 + FW 750 7.472928e-04 -5.231050e-03 5.978342e-03 5.419529e-02 1.383884e+04 + FW 800 7.007275e-04 -4.905093e-03 5.605820e-03 5.446258e-02 1.468899e+04 + FW 850 6.596259e-04 -4.617381e-03 5.277007e-03 5.471685e-02 1.553452e+04 + FW 900 6.230796e-04 -4.361557e-03 4.984637e-03 5.496748e-02 1.637332e+04 + FW 950 5.903710e-04 -4.132597e-03 4.722968e-03 5.521665e-02 1.720496e+04 + FW 1000 5.609256e-04 -3.926479e-03 4.487405e-03 5.546593e-02 1.802908e+04 + Last 1001 5.598088e-04 -3.918661e-03 4.478470e-03 5.561105e-02 1.800002e+04 ------------------------------------------------------------------------------------------------- Final solution: [2.0005598087769556, 0.9763463450796975]

We plot the polytopes alongside the solutions from above:

xcoord1 = [1, 3, 1, -1, 1]
@@ -158,53 +158,53 @@
 )
- + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

This page was generated using Literate.jl.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

This page was generated using Literate.jl.

diff --git a/previews/PR513/examples/docs_08_callback_and_tracking/index.html b/previews/PR513/examples/docs_08_callback_and_tracking/index.html index 2e3930902..07e1ad31d 100644 --- a/previews/PR513/examples/docs_08_callback_and_tracking/index.html +++ b/previews/PR513/examples/docs_08_callback_and_tracking/index.html @@ -79,4 +79,4 @@ total_iterations = 500 tf.counter = 501 tgrad!.counter = 501 -tlmo_prob.counter = 13

This page was generated using Literate.jl.

+tlmo_prob.counter = 13

This page was generated using Literate.jl.

diff --git a/previews/PR513/examples/docs_09_extra_vertex_storage/index.html b/previews/PR513/examples/docs_09_extra_vertex_storage/index.html index ff6c97b7f..022cd4940 100644 --- a/previews/PR513/examples/docs_09_extra_vertex_storage/index.html +++ b/previews/PR513/examples/docs_09_extra_vertex_storage/index.html @@ -66,4 +66,4 @@ [ Info: Number of LMO calls in iter 9: 17 [ Info: Vertex storage size: 77 [ Info: Number of LMO calls in iter 10: 16 -[ Info: Vertex storage size: 82

This page was generated using Literate.jl.

+[ Info: Vertex storage size: 82

This page was generated using Literate.jl.

diff --git a/previews/PR513/examples/docs_10_alternating_methods/index.html b/previews/PR513/examples/docs_10_alternating_methods/index.html index cedc73fab..9d846bc2e 100644 --- a/previews/PR513/examples/docs_10_alternating_methods/index.html +++ b/previews/PR513/examples/docs_10_alternating_methods/index.html @@ -43,9 +43,9 @@ Type Iteration Primal Dual Dual Gap Time It/sec Dist2 ---------------------------------------------------------------------------------------------------------------- I 1 1.889232e+00 -2.011077e+01 2.200000e+01 0.000000e+00 Inf 1.889232e+00 - FW 1000 2.500023e-02 2.494268e-02 5.755263e-05 1.215416e+00 8.227635e+02 2.500023e-02 - FW 2000 2.500000e-02 2.499779e-02 2.211653e-06 1.229341e+00 1.626888e+03 2.500000e-02 - Last 2236 2.500000e-02 2.499900e-02 1.003006e-06 1.403859e+00 1.592753e+03 2.500000e-02 + FW 1000 2.500023e-02 2.494268e-02 5.755263e-05 1.160082e+00 8.620083e+02 2.500023e-02 + FW 2000 2.500000e-02 2.499779e-02 2.211653e-06 1.174166e+00 1.703337e+03 2.500000e-02 + Last 2236 2.500000e-02 2.499900e-02 1.003006e-06 1.339325e+00 1.669498e+03 2.500000e-02 ---------------------------------------------------------------------------------------------------------------- Alternating Linear Minimization (ALM). @@ -59,9 +59,9 @@ Type Iteration Primal Dual Dual Gap Time It/sec Dist2 ---------------------------------------------------------------------------------------------------------------- I 1 1.889232e+00 -2.011077e+01 2.200000e+01 0.000000e+00 Inf 1.889232e+00 - FW 1000 2.500023e-02 2.494416e-02 5.606946e-05 1.932408e-01 5.174890e+03 2.500023e-02 - FW 2000 2.500000e-02 2.499785e-02 2.154004e-06 2.142539e-01 9.334719e+03 2.500000e-02 - Last 2237 2.500000e-02 2.499900e-02 9.981445e-07 2.195598e-01 1.018857e+04 2.500000e-02 + FW 1000 2.500023e-02 2.494416e-02 5.606946e-05 1.829194e-01 5.466889e+03 2.500023e-02 + FW 2000 2.500000e-02 2.499785e-02 2.154004e-06 2.045137e-01 9.779298e+03 2.500000e-02 + Last 2237 2.500000e-02 2.499900e-02 9.981445e-07 2.097340e-01 1.066589e+04 2.500000e-02 ---------------------------------------------------------------------------------------------------------------- Alternating Linear Minimization (ALM). @@ -75,9 +75,9 @@ Type Iteration Primal Dual Dual Gap Time It/sec Dist2 ---------------------------------------------------------------------------------------------------------------- I 1 2.866179e-01 -2.171338e+01 2.200000e+01 0.000000e+00 Inf 2.866179e-01 - FW 1000 2.500018e-02 2.495033e-02 4.985431e-05 9.041288e-02 1.106037e+04 2.500018e-02 - FW 2000 2.500000e-02 2.499809e-02 1.906507e-06 1.114202e-01 1.795007e+04 2.500000e-02 - Last 2194 2.500000e-02 2.499900e-02 9.989094e-07 1.157296e-01 1.895799e+04 2.500000e-02 + FW 1000 2.500018e-02 2.495033e-02 4.985431e-05 8.423371e-02 1.187173e+04 2.500018e-02 + FW 2000 2.500000e-02 2.499809e-02 1.906507e-06 1.049965e-01 1.904826e+04 2.500000e-02 + Last 2194 2.500000e-02 2.499900e-02 9.989094e-07 1.091852e-01 2.009430e+04 2.500000e-02 ----------------------------------------------------------------------------------------------------------------

As an alternative to Block-Coordiante Frank-Wolfe (BCFW), one can also run alternating linear minimization with standard Frank-Wolfe algorithm. These methods perform then the full (simulatenous) update at each iteration. In this example we also use FrankWolfe.away_frank_wolfe.

_, _, _, _, _, afw_trajectory = FrankWolfe.alternating_linear_minimization(
     FrankWolfe.away_frank_wolfe,
     f,
@@ -100,9 +100,9 @@
   Type     Iteration         Primal           Dual       Dual Gap           Time         It/sec          Dist2     #ActiveSet
 -------------------------------------------------------------------------------------------------------------------------------
      I             1   1.150000e+01           -Inf            Inf   0.000000e+00            Inf   1.005291e+00              2
-  Last           161   2.500000e-02   2.499904e-02   9.621137e-07   5.341242e-01   3.014280e+02   2.500000e-02             78
+  Last           161   2.500000e-02   2.499904e-02   9.621137e-07   5.178975e-01   3.108723e+02   2.500000e-02             78
 -------------------------------------------------------------------------------------------------------------------------------
-    PP           161   2.500000e-02   2.499904e-02   9.621137e-07   6.174085e-01   2.607674e+02   2.500000e-02             78
+    PP           161   2.500000e-02   2.499904e-02   9.621137e-07   6.005910e-01   2.680693e+02   2.500000e-02             78
 -------------------------------------------------------------------------------------------------------------------------------

Running Alternating Projections

Unlike ALM, Alternating Projections (AP) is only suitable for feasibility problems. One omits the objective and gradient as parameters.

_, _, _, _, ap_trajectory = FrankWolfe.alternating_projections(
     lmos,
     x0,
@@ -121,125 +121,125 @@
   Type     Iteration       Dual Gap          dist2           Time         It/sec
 ----------------------------------------------------------------------------------
      I             1   6.021733e-01   1.505433e-01   0.000000e+00            Inf
-    FW           100   9.968329e-05   2.500020e-02   1.417199e+00   7.056174e+01
-    FW           200   2.316528e-05   2.500001e-02   1.421192e+00   1.407269e+02
-    FW           300   1.122389e-05   2.500000e-02   1.424584e+00   2.105878e+02
-    FW           400   6.105757e-06   2.500000e-02   1.427673e+00   2.801762e+02
-    FW           500   3.822934e-06   2.500000e-02   1.430676e+00   3.494851e+02
-    FW           600   2.624313e-06   2.500000e-02   1.433502e+00   4.185553e+02
-    FW           700   1.881535e-06   2.500000e-02   1.436331e+00   4.873528e+02
-    FW           800   1.505897e-06   2.500000e-02   1.487170e+00   5.379343e+02
-    FW           900   1.216129e-06   2.500000e-02   1.489649e+00   6.041690e+02
-  Last           992   9.990101e-07   2.500000e-02   1.857529e+00   5.340427e+02
+    FW           100   9.968329e-05   2.500020e-02   1.446472e+00   6.913372e+01
+    FW           200   2.316528e-05   2.500001e-02   1.450406e+00   1.378924e+02
+    FW           300   1.122389e-05   2.500000e-02   1.453856e+00   2.063478e+02
+    FW           400   6.105757e-06   2.500000e-02   1.456907e+00   2.745543e+02
+    FW           500   3.822934e-06   2.500000e-02   1.459888e+00   3.424921e+02
+    FW           600   2.624313e-06   2.500000e-02   1.462767e+00   4.101817e+02
+    FW           700   1.881535e-06   2.500000e-02   1.465646e+00   4.776053e+02
+    FW           800   1.505897e-06   2.500000e-02   1.468377e+00   5.448193e+02
+    FW           900   1.216129e-06   2.500000e-02   1.471144e+00   6.117689e+02
+  Last           992   9.990101e-07   2.500000e-02   1.809626e+00   5.481794e+02
 ----------------------------------------------------------------------------------

Plotting the resulting trajectories

labels = ["BCFW - Full", "BCFW - Cyclic", "BCFW - Stochastic", "AFW", "AP"]
 
 plot_trajectories(trajectories, labels, xscalelog=true)
- + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + -

This page was generated using Literate.jl.

+

This page was generated using Literate.jl.

diff --git a/previews/PR513/examples/docs_11_block_coordinate_fw/index.html b/previews/PR513/examples/docs_11_block_coordinate_fw/index.html index ff97b3c5e..6265f773b 100644 --- a/previews/PR513/examples/docs_11_block_coordinate_fw/index.html +++ b/previews/PR513/examples/docs_11_block_coordinate_fw/index.html @@ -42,17 +42,17 @@ Type Iteration Primal Dual Dual Gap Time It/sec ------------------------------------------------------------------------------------------------- I 1 5.732358e-01 -1.014268e+02 1.020000e+02 0.000000e+00 Inf - FW 1000 1.006522e-02 9.790473e-03 2.747434e-04 3.295141e-01 3.034772e+03 - FW 2000 1.002905e-02 9.883695e-03 1.453591e-04 3.476061e-01 5.753639e+03 - FW 3000 1.001690e-02 9.923899e-03 9.300514e-05 3.658116e-01 8.200942e+03 - FW 4000 1.001088e-02 9.940501e-03 7.037823e-05 3.836702e-01 1.042562e+04 - FW 5000 1.000730e-02 9.952154e-03 5.515099e-05 4.012299e-01 1.246168e+04 - FW 6000 1.000507e-02 9.960902e-03 4.416633e-05 4.189263e-01 1.432233e+04 - FW 7000 1.000360e-02 9.967424e-03 3.617261e-05 4.955349e-01 1.412615e+04 - FW 8000 1.000260e-02 9.972504e-03 3.009367e-05 5.134293e-01 1.558150e+04 - FW 9000 1.000190e-02 9.976620e-03 2.528359e-05 5.367377e-01 1.676797e+04 - FW 10000 1.000141e-02 9.979993e-03 2.141696e-05 5.591346e-01 1.788478e+04 - Last 10001 1.000141e-02 9.979981e-03 2.142870e-05 6.635091e-01 1.507289e+04 + FW 1000 1.006522e-02 9.790473e-03 2.747434e-04 3.316003e-01 3.015679e+03 + FW 2000 1.002905e-02 9.883695e-03 1.453591e-04 3.497754e-01 5.717955e+03 + FW 3000 1.001690e-02 9.923899e-03 9.300514e-05 3.679625e-01 8.153005e+03 + FW 4000 1.001088e-02 9.940501e-03 7.037823e-05 3.860523e-01 1.036129e+04 + FW 5000 1.000730e-02 9.952154e-03 5.515099e-05 4.042436e-01 1.236878e+04 + FW 6000 1.000507e-02 9.960902e-03 4.416633e-05 5.119320e-01 1.172031e+04 + FW 7000 1.000360e-02 9.967424e-03 3.617261e-05 5.297714e-01 1.321325e+04 + FW 8000 1.000260e-02 9.972504e-03 3.009367e-05 5.475845e-01 1.460962e+04 + FW 9000 1.000190e-02 9.976620e-03 2.528359e-05 5.671811e-01 1.586795e+04 + FW 10000 1.000141e-02 9.979993e-03 2.141696e-05 5.848760e-01 1.709764e+04 + Last 10001 1.000141e-02 9.979981e-03 2.142870e-05 6.873764e-01 1.454952e+04 ------------------------------------------------------------------------------------------------- Block coordinate Frank-Wolfe (BCFW). @@ -64,17 +64,17 @@ Type Iteration Primal Dual Dual Gap Time It/sec ------------------------------------------------------------------------------------------------- I 1 5.732358e-01 -1.014268e+02 1.020000e+02 0.000000e+00 Inf - FW 1000 1.006522e-02 9.790473e-03 2.747434e-04 1.191503e-01 8.392761e+03 - FW 2000 1.002905e-02 9.883695e-03 1.453591e-04 1.436735e-01 1.392045e+04 - FW 3000 1.001690e-02 9.923899e-03 9.300514e-05 2.284796e-01 1.313027e+04 - FW 4000 1.001088e-02 9.940501e-03 7.037823e-05 2.532521e-01 1.579454e+04 - FW 5000 1.000730e-02 9.952154e-03 5.515099e-05 2.799791e-01 1.785848e+04 - FW 6000 1.000507e-02 9.960902e-03 4.416633e-05 3.047957e-01 1.968532e+04 - FW 7000 1.000360e-02 9.967424e-03 3.617261e-05 3.293741e-01 2.125243e+04 - FW 8000 1.000260e-02 9.972504e-03 3.009367e-05 3.533800e-01 2.263852e+04 - FW 9000 1.000190e-02 9.976620e-03 2.528359e-05 3.773976e-01 2.384753e+04 - FW 10000 1.000141e-02 9.979993e-03 2.141696e-05 4.014466e-01 2.490991e+04 - Last 10001 1.000141e-02 9.979981e-03 2.142870e-05 4.018036e-01 2.489027e+04 + FW 1000 1.006522e-02 9.790473e-03 2.747434e-04 1.161970e-01 8.606075e+03 + FW 2000 1.002905e-02 9.883695e-03 1.453591e-04 2.008908e-01 9.955657e+03 + FW 3000 1.001690e-02 9.923899e-03 9.300514e-05 2.267382e-01 1.323112e+04 + FW 4000 1.001088e-02 9.940501e-03 7.037823e-05 2.510029e-01 1.593607e+04 + FW 5000 1.000730e-02 9.952154e-03 5.515099e-05 2.753545e-01 1.815841e+04 + FW 6000 1.000507e-02 9.960902e-03 4.416633e-05 2.992369e-01 2.005100e+04 + FW 7000 1.000360e-02 9.967424e-03 3.617261e-05 3.230816e-01 2.166635e+04 + FW 8000 1.000260e-02 9.972504e-03 3.009367e-05 3.469557e-01 2.305770e+04 + FW 9000 1.000190e-02 9.976620e-03 2.528359e-05 3.709311e-01 2.426327e+04 + FW 10000 1.000141e-02 9.979993e-03 2.141696e-05 3.949740e-01 2.531812e+04 + Last 10001 1.000141e-02 9.979981e-03 2.142870e-05 3.952584e-01 2.530243e+04 ------------------------------------------------------------------------------------------------- Block coordinate Frank-Wolfe (BCFW). @@ -86,17 +86,17 @@ Type Iteration Primal Dual Dual Gap Time It/sec ------------------------------------------------------------------------------------------------- I 1 5.732358e-01 -1.014268e+02 1.020000e+02 0.000000e+00 Inf - FW 1000 1.006649e-02 9.797679e-03 2.688115e-04 5.495573e-02 1.819647e+04 - FW 2000 1.003282e-02 9.887544e-03 1.452767e-04 7.942121e-02 2.518219e+04 - FW 3000 1.001920e-02 9.919860e-03 9.934003e-05 1.042008e-01 2.879056e+04 - FW 4000 1.001224e-02 9.935454e-03 7.678946e-05 1.288594e-01 3.104159e+04 - FW 5000 1.000812e-02 9.949492e-03 5.863302e-05 1.532046e-01 3.263609e+04 - FW 6000 1.000561e-02 9.958403e-03 4.720724e-05 1.771921e-01 3.386155e+04 - FW 7000 1.000400e-02 9.964922e-03 3.907771e-05 2.011037e-01 3.480791e+04 - FW 8000 1.000288e-02 9.971244e-03 3.163293e-05 2.251155e-01 3.553732e+04 - FW 9000 1.000209e-02 9.975610e-03 2.647559e-05 2.495517e-01 3.606467e+04 - FW 10000 1.000153e-02 9.979405e-03 2.212997e-05 3.362944e-01 2.973585e+04 - Last 10001 1.000153e-02 9.979008e-03 2.252590e-05 3.366863e-01 2.970421e+04 + FW 1000 1.006649e-02 9.797679e-03 2.688115e-04 5.552946e-02 1.800846e+04 + FW 2000 1.003282e-02 9.887544e-03 1.452767e-04 8.004421e-02 2.498619e+04 + FW 3000 1.001920e-02 9.919860e-03 9.934003e-05 1.045313e-01 2.869953e+04 + FW 4000 1.001224e-02 9.935454e-03 7.678946e-05 1.284396e-01 3.114303e+04 + FW 5000 1.000812e-02 9.949492e-03 5.863302e-05 1.524339e-01 3.280111e+04 + FW 6000 1.000561e-02 9.958403e-03 4.720724e-05 1.763499e-01 3.402327e+04 + FW 7000 1.000400e-02 9.964922e-03 3.907771e-05 2.001234e-01 3.497842e+04 + FW 8000 1.000288e-02 9.971244e-03 3.163293e-05 2.241914e-01 3.568379e+04 + FW 9000 1.000209e-02 9.975610e-03 2.647559e-05 3.079444e-01 2.922606e+04 + FW 10000 1.000153e-02 9.979405e-03 2.212997e-05 3.321041e-01 3.011104e+04 + Last 10001 1.000153e-02 9.979008e-03 2.252590e-05 3.324496e-01 3.008275e+04 ------------------------------------------------------------------------------------------------- Block coordinate Frank-Wolfe (BCFW). @@ -108,133 +108,133 @@ Type Iteration Primal Dual Dual Gap Time It/sec ------------------------------------------------------------------------------------------------- I 1 1.024074e+02 -Inf Inf 0.000000e+00 Inf - FW 1000 1.003847e-02 9.875995e-03 1.624779e-04 6.797181e-02 1.471198e+04 - FW 2000 1.001380e-02 9.931922e-03 8.188221e-05 9.738295e-02 2.053747e+04 - FW 3000 1.000624e-02 9.955789e-03 5.044741e-05 1.261059e-01 2.378954e+04 - FW 4000 1.000315e-02 9.969360e-03 3.378772e-05 1.547301e-01 2.585147e+04 - FW 5000 1.000169e-02 9.978459e-03 2.323352e-05 1.833911e-01 2.726414e+04 - FW 6000 1.000095e-02 9.983540e-03 1.740806e-05 2.699255e-01 2.222835e+04 - FW 7000 1.000055e-02 9.987713e-03 1.283319e-05 2.992921e-01 2.338853e+04 - FW 8000 1.000032e-02 9.990836e-03 9.486923e-06 3.288950e-01 2.432388e+04 - FW 9000 1.000019e-02 9.992954e-03 7.238509e-06 3.598142e-01 2.501291e+04 - FW 10000 1.000012e-02 9.994356e-03 5.760023e-06 3.886874e-01 2.572761e+04 - Last 10001 1.000012e-02 9.994357e-03 5.759399e-06 3.890735e-01 2.570465e+04 + FW 1000 1.003847e-02 9.875995e-03 1.624779e-04 6.712743e-02 1.489704e+04 + FW 2000 1.001380e-02 9.931922e-03 8.188221e-05 9.616761e-02 2.079702e+04 + FW 3000 1.000624e-02 9.955789e-03 5.044741e-05 1.251347e-01 2.397416e+04 + FW 4000 1.000315e-02 9.969360e-03 3.378772e-05 1.542512e-01 2.593172e+04 + FW 5000 1.000169e-02 9.978459e-03 2.323352e-05 2.398626e-01 2.084527e+04 + FW 6000 1.000095e-02 9.983540e-03 1.740806e-05 2.689215e-01 2.231134e+04 + FW 7000 1.000055e-02 9.987713e-03 1.283319e-05 2.984747e-01 2.345258e+04 + FW 8000 1.000032e-02 9.990836e-03 9.486923e-06 3.278976e-01 2.439786e+04 + FW 9000 1.000019e-02 9.992954e-03 7.238509e-06 3.579607e-01 2.514242e+04 + FW 10000 1.000012e-02 9.994356e-03 5.760023e-06 3.868124e-01 2.585232e+04 + Last 10001 1.000012e-02 9.994357e-03 5.759399e-06 3.871642e-01 2.583142e+04 -------------------------------------------------------------------------------------------------

Plotting the results

labels = ["Full update", "Cyclic order", "Stochstic order", "Custom order"]
 plot_trajectories(trajectories, labels, xscalelog=true)
- + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + +

Running BCFW with different update methods

As a second step, we compare different update steps. We consider the FrankWolfe.BPCGStep and the FrankWolfe.FrankWolfeStep. One can either pass a tuple of FrankWolfe.UpdateStep to define for each block the update procedure or pass a single update step so that each block uses the same procedure.

trajectories = []
 
@@ -260,17 +260,17 @@ 

Plotting the results

labels = ["BPCG FW", "FW BPCG", "BPCG", "FW"]
 plot_trajectories(trajectories, labels, xscalelog=true)
- + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - + + + + + + + + + + + + + + -

This page was generated using Literate.jl.

+

This page was generated using Literate.jl.

diff --git a/previews/PR513/examples/docs_12_quadratic_symmetric/index.html b/previews/PR513/examples/docs_12_quadratic_symmetric/index.html index bda8bc840..33dbb0ae4 100644 --- a/previews/PR513/examples/docs_12_quadratic_symmetric/index.html +++ b/previews/PR513/examples/docs_12_quadratic_symmetric/index.html @@ -86,30 +86,30 @@ Type Iteration Primal Dual Dual Gap Time It/sec #ActiveSet ---------------------------------------------------------------------------------------------------------------- I 1 4.132812e+01 -4.029553e+01 8.162365e+01 0.000000e+00 Inf 1 - LD 46 1.102219e+01 -2.889653e+01 3.991872e+01 4.464432e+00 1.030366e+01 46 - LD 103 4.741326e+00 -1.484609e+01 1.958742e+01 7.935975e+00 1.297887e+01 79 - LD 217 1.129314e+00 -5.004581e+00 6.133895e+00 1.277452e+01 1.698693e+01 126 - LD 351 7.257875e-01 -2.271567e+00 2.997354e+00 1.607671e+01 2.183282e+01 157 - P 1000 4.532655e-01 -2.544089e+00 2.997354e+00 3.159278e+01 3.165280e+01 310 - LD 1621 2.613257e-01 -1.230782e+00 1.492108e+00 4.542664e+01 3.568391e+01 445 - P 2000 2.268145e-01 -1.265294e+00 1.492108e+00 4.576705e+01 4.369956e+01 440 - P 3000 1.381013e-01 -1.354007e+00 1.492108e+00 5.325867e+01 5.632885e+01 496 - P 4000 6.040867e-02 -1.431699e+00 1.492108e+00 6.597965e+01 6.062476e+01 610 - LD 4415 3.072837e-02 -4.006321e-01 4.313605e-01 7.089815e+01 6.227243e+01 647 - P 5000 1.738588e-02 -4.139746e-01 4.313605e-01 7.123025e+01 7.019490e+01 627 - LD 5663 1.229003e-02 -1.045655e-01 1.168556e-01 7.150418e+01 7.919817e+01 625 - P 6000 1.113787e-02 -1.057177e-01 1.168556e-01 7.179710e+01 8.356884e+01 625 - P 7000 9.735387e-03 -1.071202e-01 1.168556e-01 7.221146e+01 9.693752e+01 625 - LD 7469 9.517109e-03 -2.020889e-02 2.972600e-02 7.241001e+01 1.031487e+02 625 - P 8000 9.392674e-03 -2.033332e-02 2.972600e-02 7.271643e+01 1.100164e+02 625 - P 9000 9.304994e-03 -2.042100e-02 2.972600e-02 7.312849e+01 1.230711e+02 625 - LD 9490 9.290088e-03 1.701123e-03 7.588965e-03 7.333721e+01 1.294023e+02 625 - P 10000 9.282251e-03 1.693286e-03 7.588965e-03 7.363172e+01 1.358110e+02 625 - Last 10001 9.282230e-03 3.629714e-03 5.652516e-03 7.372584e+01 1.356512e+02 625 + LD 46 1.102219e+01 -2.889653e+01 3.991872e+01 4.671683e+00 9.846559e+00 46 + LD 103 4.741326e+00 -1.484609e+01 1.958742e+01 8.273696e+00 1.244909e+01 79 + LD 217 1.129314e+00 -5.004581e+00 6.133895e+00 1.328597e+01 1.633302e+01 126 + LD 351 7.257875e-01 -2.271567e+00 2.997354e+00 1.674921e+01 2.095621e+01 157 + P 1000 4.532655e-01 -2.544089e+00 2.997354e+00 3.304772e+01 3.025927e+01 310 + LD 1621 2.613257e-01 -1.230782e+00 1.492108e+00 4.761263e+01 3.404559e+01 445 + P 2000 2.268145e-01 -1.265294e+00 1.492108e+00 4.797089e+01 4.169195e+01 440 + P 3000 1.381013e-01 -1.354007e+00 1.492108e+00 5.599450e+01 5.357669e+01 496 + P 4000 6.040867e-02 -1.431699e+00 1.492108e+00 6.946517e+01 5.758282e+01 610 + LD 4415 3.072837e-02 -4.006321e-01 4.313605e-01 7.463783e+01 5.915231e+01 647 + P 5000 1.738588e-02 -4.139746e-01 4.313605e-01 7.498688e+01 6.667833e+01 627 + LD 5663 1.229003e-02 -1.045655e-01 1.168556e-01 7.526177e+01 7.524405e+01 625 + P 6000 1.113787e-02 -1.057177e-01 1.168556e-01 7.549272e+01 7.947786e+01 625 + P 7000 9.735387e-03 -1.071202e-01 1.168556e-01 7.590415e+01 9.222158e+01 625 + LD 7469 9.517109e-03 -2.020889e-02 2.972600e-02 7.609931e+01 9.814807e+01 625 + P 8000 9.392674e-03 -2.033332e-02 2.972600e-02 7.646579e+01 1.046219e+02 625 + P 9000 9.304994e-03 -2.042100e-02 2.972600e-02 7.687653e+01 1.170708e+02 625 + LD 9490 9.290088e-03 1.701123e-03 7.588965e-03 7.708196e+01 1.231157e+02 625 + P 10000 9.282251e-03 1.693286e-03 7.588965e-03 7.738424e+01 1.292253e+02 625 + Last 10001 9.282230e-03 3.629714e-03 5.652516e-03 7.748345e+01 1.290727e+02 625 ---------------------------------------------------------------------------------------------------------------- - PP 10001 9.282230e-03 3.629714e-03 5.652516e-03 7.383236e+01 1.354555e+02 625 + PP 10001 9.282230e-03 3.629714e-03 5.652516e-03 7.758147e+01 1.289096e+02 625 ---------------------------------------------------------------------------------------------------------------- - 73.921499 seconds (378.65 M allocations: 45.133 GiB, 12.33% gc time, 0.18% compilation time)

Faster active set for quadratic functions

A first acceleration can be obtained by using the active set specialized for the quadratic objective function, whose gradient is here $x-p$, explaining the hessian and linear part provided as arguments. The speedup is obtained by pre-computing some scalar products to quickly obtained, in each iteration, the best and worst atoms currently in the active set.

asq_naive = FrankWolfe.ActiveSetQuadratic([(one(T), x0)], LinearAlgebra.I, -p)
+ 77.673449 seconds (378.65 M allocations: 45.133 GiB, 11.43% gc time, 0.16% compilation time)

Faster active set for quadratic functions

A first acceleration can be obtained by using the active set specialized for the quadratic objective function, whose gradient is here $x-p$, explaining the hessian and linear part provided as arguments. The speedup is obtained by pre-computing some scalar products to quickly obtained, in each iteration, the best and worst atoms currently in the active set.

asq_naive = FrankWolfe.ActiveSetQuadratic([(one(T), x0)], LinearAlgebra.I, -p)
 @time FrankWolfe.blended_pairwise_conditional_gradient(f, grad!, lmo_naive, asq_naive; verbose, lazy=true, line_search=FrankWolfe.Shortstep(one(T)), max_iteration)

 Blended Pairwise Conditional Gradient Algorithm.
 MEMORY_MODE: FrankWolfe.InplaceEmphasis() STEPSIZE: Shortstep EPSILON: 1.0e-7 MAXITERATION: 10000 TYPE: Float64
@@ -120,30 +120,30 @@
   Type     Iteration         Primal           Dual       Dual Gap           Time         It/sec     #ActiveSet
 ----------------------------------------------------------------------------------------------------------------
      I             1   4.132812e+01  -4.029553e+01   8.162365e+01   0.000000e+00            Inf              1
-    LD            46   1.102219e+01  -2.889653e+01   3.991872e+01   4.467849e+00   1.029578e+01             46
-    LD           104   4.682362e+00  -1.505762e+01   1.973998e+01   7.968033e+00   1.305216e+01             79
-    LD           217   1.153286e+00  -4.834945e+00   5.988231e+00   1.281589e+01   1.693210e+01            126
-    LD           361   7.246254e-01  -2.234291e+00   2.958916e+00   1.623445e+01   2.223666e+01            158
-     P          1000   4.580028e-01  -2.500913e+00   2.958916e+00   3.109401e+01   3.216054e+01            305
-    LD          1759   2.293608e-01  -1.236982e+00   1.466343e+00   4.754201e+01   3.699886e+01            460
-     P          2000   2.035603e-01  -1.262782e+00   1.466343e+00   4.769910e+01   4.192951e+01            458
-     P          3000   1.211665e-01  -1.345176e+00   1.466343e+00   5.318739e+01   5.640435e+01            500
-     P          4000   4.749301e-02  -1.418850e+00   1.466343e+00   6.715686e+01   5.956204e+01            632
-    LD          4227   3.237072e-02  -3.805380e-01   4.129088e-01   6.960508e+01   6.072833e+01            654
-     P          5000   1.624641e-02  -3.966624e-01   4.129088e-01   6.971699e+01   7.171853e+01            625
-    LD          5566   1.241685e-02  -1.000254e-01   1.124422e-01   6.973744e+01   7.981366e+01            625
-     P          6000   1.102951e-02  -1.014127e-01   1.124422e-01   6.983699e+01   8.591436e+01            625
-     P          7000   9.754199e-03  -1.026880e-01   1.124422e-01   6.986990e+01   1.001862e+02            625
-    LD          7558   9.511291e-03  -1.988317e-02   2.939446e-02   6.989033e+01   1.081409e+02            625
-     P          8000   9.410900e-03  -1.998356e-02   2.939446e-02   6.999027e+01   1.143016e+02            625
-     P          9000   9.314629e-03  -2.007983e-02   2.939446e-02   7.002250e+01   1.285301e+02            625
-    LD          9702   9.291671e-03   1.444824e-03   7.846848e-03   7.004648e+01   1.385080e+02            625
-     P         10000   9.286464e-03   1.439616e-03   7.846848e-03   7.020584e+01   1.424383e+02            625
-  Last         10001   9.286436e-03   2.102908e-03   7.183528e-03   7.030181e+01   1.422581e+02            625
+    LD            46   1.102219e+01  -2.889653e+01   3.991872e+01   4.665216e+00   9.860207e+00             46
+    LD           104   4.682362e+00  -1.505762e+01   1.973998e+01   8.272103e+00   1.257238e+01             79
+    LD           217   1.153286e+00  -4.834945e+00   5.988231e+00   1.333240e+01   1.627614e+01            126
+    LD           361   7.246254e-01  -2.234291e+00   2.958916e+00   1.687736e+01   2.138960e+01            158
+     P          1000   4.580028e-01  -2.500913e+00   2.958916e+00   3.243212e+01   3.083363e+01            305
+    LD          1759   2.293608e-01  -1.236982e+00   1.466343e+00   4.949695e+01   3.553754e+01            460
+     P          2000   2.035603e-01  -1.262782e+00   1.466343e+00   4.959422e+01   4.032728e+01            458
+     P          3000   1.211665e-01  -1.345176e+00   1.466343e+00   5.530252e+01   5.424707e+01            500
+     P          4000   4.749301e-02  -1.418850e+00   1.466343e+00   6.964896e+01   5.743086e+01            632
+    LD          4227   3.237072e-02  -3.805380e-01   4.129088e-01   7.219526e+01   5.854955e+01            654
+     P          5000   1.624641e-02  -3.966624e-01   4.129088e-01   7.237182e+01   6.908766e+01            625
+    LD          5566   1.241685e-02  -1.000254e-01   1.124422e-01   7.239191e+01   7.688704e+01            625
+     P          6000   1.102951e-02  -1.014127e-01   1.124422e-01   7.249875e+01   8.276004e+01            625
+     P          7000   9.754199e-03  -1.026880e-01   1.124422e-01   7.253142e+01   9.650990e+01            625
+    LD          7558   9.511291e-03  -1.988317e-02   2.939446e-02   7.255153e+01   1.041742e+02            625
+     P          8000   9.410900e-03  -1.998356e-02   2.939446e-02   7.265685e+01   1.101066e+02            625
+     P          9000   9.314629e-03  -2.007983e-02   2.939446e-02   7.269691e+01   1.238017e+02            625
+    LD          9702   9.291671e-03   1.444824e-03   7.846848e-03   7.272336e+01   1.334097e+02            625
+     P         10000   9.286464e-03   1.439616e-03   7.846848e-03   7.282387e+01   1.373176e+02            625
+  Last         10001   9.286436e-03   2.102908e-03   7.183528e-03   7.292288e+01   1.371449e+02            625
 ----------------------------------------------------------------------------------------------------------------
-    PP         10001   9.286436e-03   2.102908e-03   7.183528e-03   7.039613e+01   1.420675e+02            625
+    PP         10001   9.286436e-03   2.102908e-03   7.183528e-03   7.308292e+01   1.368446e+02            625
 ----------------------------------------------------------------------------------------------------------------
- 70.551759 seconds (374.84 M allocations: 44.687 GiB, 13.17% gc time, 0.18% compilation time)

In this small example, the acceleration is quite minimal, but as soon as one of the following conditions is met, significant speedups (factor ten at least) can be expected:

  • quite expensive scalar product between atoms, for instance, due to a high dimension (say, more than 10000),
  • high number of atoms in the active set (say, more than 1000),
  • high number of iterations (say, more than 100000), spending most of the time redistributing the weights in the active set.

Dimension reduction via symmetrization

Permutation of the tensor axes

It is easy to see that our specific instance remains invariant under permutation of the dimensions of the tensor. This means that all computations can be performed in the symmetric subspace, which leads to an important speedup, owing to the reduced dimension (hence reduced size of the final active set and reduced number of iterations).

The way to operate this in the FrankWolfe package is to use a symmetrized LMO, which basically does the following:

  • symmetrize the gradient, which is not necessary here as the gradient remains symmetric throughout the algorithm,
  • call the standard LMO,
  • symmetrize its output, which amounts to averaging over its orbit with respect to the group considered (here the symmetric group permuting the dimensions of the tensor).
function reynolds_permutedims(atom::Array{T, N}, lmo::BellCorrelationsLMO{T}) where {T <: Number, N}
+ 73.175567 seconds (374.84 M allocations: 44.687 GiB, 11.93% gc time, 0.26% compilation time)

In this small example, the acceleration is quite minimal, but as soon as one of the following conditions is met, significant speedups (factor ten at least) can be expected:

  • quite expensive scalar product between atoms, for instance, due to a high dimension (say, more than 10000),
  • high number of atoms in the active set (say, more than 1000),
  • high number of iterations (say, more than 100000), spending most of the time redistributing the weights in the active set.

Dimension reduction via symmetrization

Permutation of the tensor axes

It is easy to see that our specific instance remains invariant under permutation of the dimensions of the tensor. This means that all computations can be performed in the symmetric subspace, which leads to an important speedup, owing to the reduced dimension (hence reduced size of the final active set and reduced number of iterations).

The way to operate this in the FrankWolfe package is to use a symmetrized LMO, which basically does the following:

  • symmetrize the gradient, which is not necessary here as the gradient remains symmetric throughout the algorithm,
  • call the standard LMO,
  • symmetrize its output, which amounts to averaging over its orbit with respect to the group considered (here the symmetric group permuting the dimensions of the tensor).
function reynolds_permutedims(atom::Array{T, N}, lmo::BellCorrelationsLMO{T}) where {T <: Number, N}
     res = zeros(T, size(atom))
     for per in Combinatorics.permutations(1:N)
         res .+= permutedims(atom, per)
@@ -162,33 +162,33 @@
   Type     Iteration         Primal           Dual       Dual Gap           Time         It/sec     #ActiveSet
 ----------------------------------------------------------------------------------------------------------------
      I             1   4.132812e+01  -4.029553e+01   8.162365e+01   0.000000e+00            Inf              1
-    LD            11   1.153403e+01  -2.734161e+01   3.887563e+01   7.633553e-01   1.441007e+01              9
-    LD            29   3.294910e+00  -1.375281e+01   1.704772e+01   1.311445e+00   2.211301e+01             13
-    LD            47   1.630202e+00  -6.744183e+00   8.374385e+00   1.655254e+00   2.839444e+01             15
-    LD           100   4.530833e-01  -2.912142e+00   3.365226e+00   2.675991e+00   3.736933e+01             23
-    LD           175   1.770240e-01  -1.434036e+00   1.611060e+00   3.179006e+00   5.504866e+01             23
-    LD           268   8.360800e-02  -6.726422e-01   7.562502e-01   3.921765e+00   6.833659e+01             29
-    LD           555   2.209150e-02  -2.970870e-01   3.191785e-01   4.520332e+00   1.227786e+02             30
-    LD           773   1.139297e-02  -6.500159e-02   7.639456e-02   4.613920e+00   1.675365e+02             26
-     P          1000   9.617158e-03  -6.677740e-02   7.639456e-02   4.769654e+00   2.096588e+02             26
-    LD          1086   9.450474e-03  -9.519297e-03   1.896977e-02   4.782440e+00   2.270807e+02             26
-    LD          1500   9.283632e-03   4.858918e-03   4.424714e-03   4.880118e+00   3.073696e+02             26
-    LD          1900   9.274785e-03   8.183461e-03   1.091323e-03   4.979823e+00   3.815397e+02             26
-     P          2000   9.274488e-03   8.183165e-03   1.091323e-03   5.069750e+00   3.944968e+02             26
-    LD          2326   9.274249e-03   9.015738e-03   2.585111e-04   5.078455e+00   4.580133e+02             26
-    LD          2740   9.274216e-03   9.214614e-03   5.960203e-05   5.176481e+00   5.293171e+02             26
-     P          3000   9.274214e-03   9.214612e-03   5.960203e-05   5.336979e+00   5.621158e+02             26
-    LD          3178   9.274214e-03   9.262304e-03   1.190966e-05   5.341973e+00   5.949113e+02             26
-    LD          3636   9.274214e-03   9.271595e-03   2.619296e-06   5.442855e+00   6.680317e+02             26
-     P          4000   9.274214e-03   9.271595e-03   2.619296e-06   5.541201e+00   7.218652e+02             26
-    LD          4064   9.274214e-03   9.273578e-03   6.357779e-07   5.543160e+00   7.331558e+02             26
-    LD          4470   9.274214e-03   9.274066e-03   1.484091e-07   5.653608e+00   7.906456e+02             26
-    LD          4865   9.274214e-03   9.274179e-03   3.537488e-08   5.751012e+00   8.459380e+02             26
-  Last          4865   9.274214e-03   9.274179e-03   3.537488e-08   5.993356e+00   8.117322e+02             26
+    LD            11   1.153403e+01  -2.734161e+01   3.887563e+01   8.513509e-01   1.292064e+01              9
+    LD            29   3.294910e+00  -1.375281e+01   1.704772e+01   1.422571e+00   2.038563e+01             13
+    LD            47   1.630202e+00  -6.744183e+00   8.374385e+00   1.766867e+00   2.660075e+01             15
+    LD           100   4.530833e-01  -2.912142e+00   3.365226e+00   2.837175e+00   3.524633e+01             23
+    LD           175   1.770240e-01  -1.434036e+00   1.611060e+00   3.359796e+00   5.208650e+01             23
+    LD           268   8.360800e-02  -6.726422e-01   7.562502e-01   4.135818e+00   6.479975e+01             29
+    LD           555   2.209150e-02  -2.970870e-01   3.191785e-01   4.746034e+00   1.169397e+02             30
+    LD           773   1.139297e-02  -6.500159e-02   7.639456e-02   4.843513e+00   1.595949e+02             26
+     P          1000   9.617158e-03  -6.677740e-02   7.639456e-02   4.941127e+00   2.023830e+02             26
+    LD          1086   9.450474e-03  -9.519297e-03   1.896977e-02   4.953738e+00   2.192284e+02             26
+    LD          1500   9.283632e-03   4.858918e-03   4.424714e-03   5.055926e+00   2.966816e+02             26
+    LD          1900   9.274785e-03   8.183461e-03   1.091323e-03   5.219232e+00   3.640383e+02             26
+     P          2000   9.274488e-03   8.183165e-03   1.091323e-03   5.315377e+00   3.762668e+02             26
+    LD          2326   9.274249e-03   9.015738e-03   2.585111e-04   5.324081e+00   4.368829e+02             26
+    LD          2740   9.274216e-03   9.214614e-03   5.960203e-05   5.426222e+00   5.049554e+02             26
+     P          3000   9.274214e-03   9.214612e-03   5.960203e-05   5.525634e+00   5.429241e+02             26
+    LD          3178   9.274214e-03   9.262304e-03   1.190966e-05   5.530520e+00   5.746296e+02             26
+    LD          3636   9.274214e-03   9.271595e-03   2.619296e-06   5.695901e+00   6.383538e+02             26
+     P          4000   9.274214e-03   9.271595e-03   2.619296e-06   5.797390e+00   6.899656e+02             26
+    LD          4064   9.274214e-03   9.273578e-03   6.357779e-07   5.799338e+00   7.007696e+02             26
+    LD          4470   9.274214e-03   9.274066e-03   1.484091e-07   5.901273e+00   7.574637e+02             26
+    LD          4865   9.274214e-03   9.274179e-03   3.537488e-08   6.002983e+00   8.104305e+02             26
+  Last          4865   9.274214e-03   9.274179e-03   3.537488e-08   6.260506e+00   7.770938e+02             26
 ----------------------------------------------------------------------------------------------------------------
-    PP          4865   9.274214e-03   9.274179e-03   3.537488e-08   6.083480e+00   7.997068e+02             26
+    PP          4865   9.274214e-03   9.274179e-03   3.537488e-08   6.354452e+00   7.656050e+02             26
 ----------------------------------------------------------------------------------------------------------------
-  6.241631 seconds (31.77 M allocations: 3.930 GiB, 13.49% gc time, 2.10% compilation time)

Now, convergence is reached within 10000 iterations, and the size of the final active set is considerably smaller than before, thanks to the reduced dimension.

Uniqueness pattern

In this specific case, there is a bigger symmetry group that we can exploit. Its action roughly allows us to work in the subspace respecting the structure of the objective point p, that is, to average over tensor entries that have the same value in p. Although quite general, this kind of symmetry is not always applicable, and great care has to be taken when using it, in particular, to ensure that there exists a suitable group action whose Reynolds operator corresponds to this averaging procedure. In our current case, the theoretical study enabling this further symmetrization can be found here.

function build_reynolds_unique(p::Array{T, N}) where {T <: Number, N}
+  6.446800 seconds (31.77 M allocations: 3.930 GiB, 12.39% gc time, 3.03% compilation time)

Now, convergence is reached within 10000 iterations, and the size of the final active set is considerably smaller than before, thanks to the reduced dimension.

Uniqueness pattern

In this specific case, there is a bigger symmetry group that we can exploit. Its action roughly allows us to work in the subspace respecting the structure of the objective point p, that is, to average over tensor entries that have the same value in p. Although quite general, this kind of symmetry is not always applicable, and great care has to be taken when using it, in particular, to ensure that there exists a suitable group action whose Reynolds operator corresponds to this averaging procedure. In our current case, the theoretical study enabling this further symmetrization can be found here.

function build_reynolds_unique(p::Array{T, N}) where {T <: Number, N}
     ptol = round.(p; digits=8)
     ptol[ptol .== zero(T)] .= zero(T) # transform -0.0 into 0.0 as isequal(0.0, -0.0) is false
     uniquetol = unique(ptol[:])
@@ -214,31 +214,31 @@
   Type     Iteration         Primal           Dual       Dual Gap           Time         It/sec     #ActiveSet
 ----------------------------------------------------------------------------------------------------------------
      I             1   4.132812e+01  -4.029553e+01   8.162365e+01   0.000000e+00            Inf              1
-    LD             4   2.991558e+00  -4.896575e+00   7.888134e+00   1.669692e-01   2.395651e+01              3
-    LD            19   4.246112e-01  -2.691388e+00   3.115999e+00   2.979763e-01   6.376345e+01              3
-    LD            64   1.802165e-01  -4.514355e-01   6.316521e-01   7.264890e-01   8.809494e+01              5
-    LD            81   2.002637e-02  -1.558821e-01   1.759085e-01   1.064853e+00   7.606686e+01              5
-    LD           118   1.364017e-02  -4.146471e-02   5.510488e-02   1.153261e+00   1.023186e+02              3
-    LD           189   9.505756e-03  -8.772067e-03   1.827782e-02   1.328081e+00   1.423106e+02              4
-    LD           204   9.297335e-03   2.741829e-03   6.555506e-03   1.488714e+00   1.370311e+02              4
-    LD           219   9.278805e-03   6.432857e-03   2.845948e-03   1.577779e+00   1.388027e+02              4
-    LD           237   9.274768e-03   8.268553e-03   1.006215e-03   1.669422e+00   1.419653e+02              4
-    LD           255   9.274310e-03   8.844241e-03   4.300684e-04   1.765338e+00   1.444483e+02              4
-    LD           270   9.274227e-03   9.117815e-03   1.564121e-04   1.918723e+00   1.407186e+02              4
-    LD           288   9.274216e-03   9.207613e-03   6.660359e-05   2.007212e+00   1.434826e+02              4
-    LD           303   9.274214e-03   9.249991e-03   2.422305e-05   2.096149e+00   1.445508e+02              4
-    LD           321   9.274214e-03   9.263899e-03   1.031492e-05   2.183235e+00   1.470295e+02              4
-    LD           336   9.274214e-03   9.270465e-03   3.748623e-06   2.270358e+00   1.479943e+02              4
-    LD           354   9.274214e-03   9.272618e-03   1.595962e-06   2.423908e+00   1.460452e+02              4
-    LD           369   9.274214e-03   9.273636e-03   5.784864e-07   2.512888e+00   1.468430e+02              4
-    LD           384   9.274214e-03   9.273964e-03   2.499339e-07   2.601606e+00   1.476011e+02              4
-    LD           397   9.274214e-03   9.274104e-03   1.101735e-07   2.688539e+00   1.476638e+02              4
-    LD           412   9.274214e-03   9.274166e-03   4.841374e-08   2.786392e+00   1.478615e+02              4
-  Last           412   9.274214e-03   9.274166e-03   4.841374e-08   3.026762e+00   1.361191e+02              4
+    LD             4   2.991558e+00  -4.896575e+00   7.888134e+00   1.680742e-01   2.379902e+01              3
+    LD            19   4.246112e-01  -2.691388e+00   3.115999e+00   2.990617e-01   6.353204e+01              3
+    LD            64   1.802165e-01  -4.514355e-01   6.316521e-01   7.392607e-01   8.657298e+01              5
+    LD            81   2.002637e-02  -1.558821e-01   1.759085e-01   1.086352e+00   7.456148e+01              5
+    LD           118   1.364017e-02  -4.146471e-02   5.510488e-02   1.181202e+00   9.989825e+01              3
+    LD           189   9.505756e-03  -8.772067e-03   1.827782e-02   1.366817e+00   1.382774e+02              4
+    LD           204   9.297335e-03   2.741829e-03   6.555506e-03   1.460140e+00   1.397126e+02              4
+    LD           219   9.278805e-03   6.432857e-03   2.845948e-03   1.619136e+00   1.352573e+02              4
+    LD           237   9.274768e-03   8.268553e-03   1.006215e-03   1.712524e+00   1.383922e+02              4
+    LD           255   9.274310e-03   8.844241e-03   4.300684e-04   1.803948e+00   1.413566e+02              4
+    LD           270   9.274227e-03   9.117815e-03   1.564121e-04   1.897124e+00   1.423207e+02              4
+    LD           288   9.274216e-03   9.207613e-03   6.660359e-05   2.003137e+00   1.437745e+02              4
+    LD           303   9.274214e-03   9.249991e-03   2.422305e-05   2.154556e+00   1.406322e+02              4
+    LD           321   9.274214e-03   9.263899e-03   1.031492e-05   2.247762e+00   1.428087e+02              4
+    LD           336   9.274214e-03   9.270465e-03   3.748623e-06   2.338998e+00   1.436513e+02              4
+    LD           354   9.274214e-03   9.272618e-03   1.595962e-06   2.431534e+00   1.455871e+02              4
+    LD           369   9.274214e-03   9.273636e-03   5.784864e-07   2.523873e+00   1.462039e+02              4
+    LD           384   9.274214e-03   9.273964e-03   2.499339e-07   2.675746e+00   1.435114e+02              4
+    LD           397   9.274214e-03   9.274104e-03   1.101735e-07   2.766621e+00   1.434963e+02              4
+    LD           412   9.274214e-03   9.274166e-03   4.841374e-08   2.857552e+00   1.441794e+02              4
+  Last           412   9.274214e-03   9.274166e-03   4.841374e-08   3.096753e+00   1.330426e+02              4
 ----------------------------------------------------------------------------------------------------------------
-    PP           412   9.274214e-03   9.274166e-03   4.841374e-08   3.114500e+00   1.322845e+02              4
+    PP           412   9.274214e-03   9.274166e-03   4.841374e-08   3.188152e+00   1.292285e+02              4
 ----------------------------------------------------------------------------------------------------------------
-  3.205463 seconds (16.52 M allocations: 1.955 GiB, 13.19% gc time, 3.85% compilation time)

Reduction of the memory footprint of the iterate

In the previous run, the dimension reduction is mathematically exploited to accelerate the algorithm, but it is not used to effectively work in a subspace of reduced dimension. Indeed, the iterate, although symmetric, was still a full tensor. As a last example of the speedup obtainable through symmetry reduction, we show how to map the computations into a space whose physical dimension is also reduced during the algorithm. This makes all in-place operations marginally faster, which can lead, in bigger instances, to significant accelerations, especially for active set based algorithms in the regime where many lazy iterations are performed. We refer to the example symmetric.jl for a small benchmark with symmetric matrices.

function build_reduce_inflate(p::Array{T, N}) where {T <: Number, N}
+  3.362287 seconds (16.52 M allocations: 1.955 GiB, 13.99% gc time, 3.44% compilation time)

Reduction of the memory footprint of the iterate

In the previous run, the dimension reduction is mathematically exploited to accelerate the algorithm, but it is not used to effectively work in a subspace of reduced dimension. Indeed, the iterate, although symmetric, was still a full tensor. As a last example of the speedup obtainable through symmetry reduction, we show how to map the computations into a space whose physical dimension is also reduced during the algorithm. This makes all in-place operations marginally faster, which can lead, in bigger instances, to significant accelerations, especially for active set based algorithms in the regime where many lazy iterations are performed. We refer to the example symmetric.jl for a small benchmark with symmetric matrices.

function build_reduce_inflate(p::Array{T, N}) where {T <: Number, N}
     ptol = round.(p; digits=8)
     ptol[ptol .== zero(T)] .= zero(T) # transform -0.0 into 0.0 as isequal(0.0, -0.0) is false
     uniquetol = unique(ptol[:])
@@ -284,27 +284,27 @@
   Type     Iteration         Primal           Dual       Dual Gap           Time         It/sec     #ActiveSet
 ----------------------------------------------------------------------------------------------------------------
      I             1   4.132812e+01  -4.029553e+01   8.162365e+01   0.000000e+00            Inf              1
-    LD             4   2.991558e+00  -4.896575e+00   7.888134e+00   1.752391e-01   2.282596e+01              3
-    LD            13   2.369634e-01  -2.280882e+00   2.517846e+00   4.725638e-01   2.750951e+01              4
-    LD            19   1.668285e-01  -3.158581e-01   4.826866e-01   7.316609e-01   2.596831e+01              5
-    LD            32   1.699135e-02  -5.050512e-02   6.749647e-02   9.793102e-01   3.267606e+01              5
-    LD           108   9.519110e-03  -6.845621e-03   1.636473e-02   1.155463e+00   9.346901e+01              4
-    LD           121   9.297134e-03   3.819920e-03   5.477214e-03   1.254563e+00   9.644789e+01              4
-    LD           130   9.276520e-03   7.486464e-03   1.790056e-03   1.341637e+00   9.689657e+01              4
-    LD           139   9.274453e-03   8.643094e-03   6.313589e-04   1.498937e+00   9.273239e+01              4
-    LD           148   9.274257e-03   8.982108e-03   2.921491e-04   1.587442e+00   9.323173e+01              4
-    LD           161   9.274223e-03   9.143488e-03   1.307348e-04   1.675531e+00   9.608891e+01              4
-    LD           174   9.274215e-03   9.222314e-03   5.190121e-05   1.762325e+00   9.873321e+01              4
-    LD           187   9.274214e-03   9.250977e-03   2.323707e-05   1.849018e+00   1.011348e+02              4
-    LD           202   9.274214e-03   9.265798e-03   8.416106e-06   1.999576e+00   1.010214e+02              4
-    LD           215   9.274214e-03   9.270455e-03   3.759433e-06   2.088111e+00   1.029639e+02              4
-    LD           228   9.274214e-03   9.272829e-03   1.384972e-06   2.175815e+00   1.047883e+02              4
-    LD           244   9.274214e-03   9.273611e-03   6.025805e-07   2.263391e+00   1.078029e+02              4
-    LD           257   9.274214e-03   9.273974e-03   2.395475e-07   2.350174e+00   1.093536e+02              4
-    LD           270   9.274214e-03   9.274107e-03   1.073458e-07   2.499995e+00   1.080002e+02              4
-    LD           285   9.274214e-03   9.274175e-03   3.860101e-08   2.588411e+00   1.101062e+02              4
-  Last           285   9.274214e-03   9.274175e-03   3.860101e-08   2.762260e+00   1.031764e+02              4
+    LD             4   2.991558e+00  -4.896575e+00   7.888134e+00   1.772067e-01   2.257252e+01              3
+    LD            13   2.369634e-01  -2.280882e+00   2.517846e+00   4.016847e-01   3.236369e+01              4
+    LD            19   1.668285e-01  -3.158581e-01   4.826866e-01   7.477080e-01   2.541099e+01              5
+    LD            32   1.699135e-02  -5.050512e-02   6.749647e-02   9.287048e-01   3.445659e+01              5
+    LD           108   9.519110e-03  -6.845621e-03   1.636473e-02   1.176455e+00   9.180123e+01              4
+    LD           121   9.297134e-03   3.819920e-03   5.477214e-03   1.267403e+00   9.547080e+01              4
+    LD           130   9.276520e-03   7.486464e-03   1.790056e-03   1.358734e+00   9.567730e+01              4
+    LD           139   9.274453e-03   8.643094e-03   6.313589e-04   1.450988e+00   9.579677e+01              4
+    LD           148   9.274257e-03   8.982108e-03   2.921491e-04   1.611444e+00   9.184306e+01              4
+    LD           161   9.274223e-03   9.143488e-03   1.307348e-04   1.698784e+00   9.477370e+01              4
+    LD           174   9.274215e-03   9.222314e-03   5.190121e-05   1.791677e+00   9.711570e+01              4
+    LD           187   9.274214e-03   9.250977e-03   2.323707e-05   1.883381e+00   9.928955e+01              4
+    LD           202   9.274214e-03   9.265798e-03   8.416106e-06   1.974227e+00   1.023185e+02              4
+    LD           215   9.274214e-03   9.270455e-03   3.759433e-06   2.126143e+00   1.011221e+02              4
+    LD           228   9.274214e-03   9.272829e-03   1.384972e-06   2.218957e+00   1.027510e+02              4
+    LD           244   9.274214e-03   9.273611e-03   6.025805e-07   2.311197e+00   1.055730e+02              4
+    LD           257   9.274214e-03   9.273974e-03   2.395475e-07   2.404381e+00   1.068882e+02              4
+    LD           270   9.274214e-03   9.274107e-03   1.073458e-07   2.497234e+00   1.081196e+02              4
+    LD           285   9.274214e-03   9.274175e-03   3.860101e-08   2.652593e+00   1.074420e+02              4
+  Last           285   9.274214e-03   9.274175e-03   3.860101e-08   2.836403e+00   1.004794e+02              4
 ----------------------------------------------------------------------------------------------------------------
-    PP           285   9.274214e-03   9.274175e-03   3.860101e-08   2.848886e+00   1.000391e+02              4
+    PP           285   9.274214e-03   9.274175e-03   3.860101e-08   2.927928e+00   9.733847e+01              4
 ----------------------------------------------------------------------------------------------------------------
-  2.945935 seconds (15.43 M allocations: 1.825 GiB, 11.84% gc time, 4.63% compilation time)

This page was generated using Literate.jl.

+ 3.101475 seconds (15.43 M allocations: 1.825 GiB, 13.13% gc time, 4.20% compilation time)

This page was generated using Literate.jl.

diff --git a/previews/PR513/index.html b/previews/PR513/index.html index 0c89605ed..4389caf20 100644 --- a/previews/PR513/index.html +++ b/previews/PR513/index.html @@ -40,4 +40,4 @@ ...

If you need the plotting utilities in your own code, make sure Plots.jl is included in your current project and run:

using Plots
 using FrankWolfe
 
-include(joinpath(dirname(pathof(FrankWolfe)), "../examples/plot_utils.jl"))
+include(joinpath(dirname(pathof(FrankWolfe)), "../examples/plot_utils.jl")) diff --git a/previews/PR513/reference/0_reference/index.html b/previews/PR513/reference/0_reference/index.html index 837756942..bf748f2bb 100644 --- a/previews/PR513/reference/0_reference/index.html +++ b/previews/PR513/reference/0_reference/index.html @@ -1,2 +1,2 @@ -API Reference · FrankWolfe.jl +API Reference · FrankWolfe.jl diff --git a/previews/PR513/reference/1_algorithms/index.html b/previews/PR513/reference/1_algorithms/index.html index 89e7d0c25..d1ef6b2bf 100644 --- a/previews/PR513/reference/1_algorithms/index.html +++ b/previews/PR513/reference/1_algorithms/index.html @@ -1,2 +1,2 @@ -Algorithms · FrankWolfe.jl

Algorithms

This section contains all main algorithms of the package. These are the ones typical users will call.

The typical signature for these algorithms is:

my_algorithm(f, grad!, lmo, x0)

Standard algorithms

FrankWolfe.frank_wolfeMethod
frank_wolfe(f, grad!, lmo, x0; ...)

Simplest form of the Frank-Wolfe algorithm. Returns a tuple (x, v, primal, dual_gap, traj_data) with:

  • x final iterate
  • v last vertex from the LMO
  • primal primal value f(x)
  • dual_gap final Frank-Wolfe gap
  • traj_data vector of trajectory information.
source
FrankWolfe.stochastic_frank_wolfeMethod
stochastic_frank_wolfe(f::StochasticObjective, lmo, x0; ...)

Stochastic version of Frank-Wolfe, evaluates the objective and gradient stochastically, implemented through the FrankWolfe.StochasticObjective interface.

Keyword arguments include batch_size to pass a fixed batch_size or a batch_iterator implementing batch_size = FrankWolfe.batchsize_iterate(batch_iterator) for algorithms like Variance-reduced and projection-free stochastic optimization, E Hazan, H Luo, 2016.

Similarly, a constant momentum can be passed or replaced by a momentum_iterator implementing momentum = FrankWolfe.momentum_iterate(momentum_iterator).

source
FrankWolfe.block_coordinate_frank_wolfeFunction
block_coordinate_frank_wolfe(f, grad!, lmo::ProductLMO{N}, x0; ...) where {N}

Block-coordinate version of the Frank-Wolfe algorithm. Minimizes objective f over the product of feasible domains specified by the lmo. The optional argument the update_order is of type FrankWolfe.BlockCoordinateUpdateOrder and controls the order in which the blocks are updated. The argument update_step is a single instance or tuple of FrankWolfe.UpdateStep and defines which FW-algorithms to use to update the iterates in the different blocks.

The method returns a tuple (x, v, primal, dual_gap, traj_data) with:

  • x cartesian product of final iterates
  • v cartesian product of last vertices of the LMOs
  • primal primal value f(x)
  • dual_gap final Frank-Wolfe gap
  • traj_data vector of trajectory information.

See S. Lacoste-Julien, M. Jaggi, M. Schmidt, and P. Pletscher 2013 and A. Beck, E. Pauwels and S. Sabach 2015 for more details about Block-Coordinate Frank-Wolfe.

source

Active-set based methods

The following algorithms maintain the representation of the iterates as a convex combination of vertices.

Away-step

Pairwise Frank-Wolfe

Blended Conditional Gradient

FrankWolfe.blended_conditional_gradientMethod
blended_conditional_gradient(f, grad!, lmo, x0)

Entry point for the Blended Conditional Gradient algorithm. See Braun, Gábor, et al. "Blended conditonal gradients" ICML 2019. The method works on an active set like FrankWolfe.away_frank_wolfe, performing gradient descent over the convex hull of active vertices, removing vertices when their weight drops to 0 and adding new vertices by calling the linear oracle in a lazy fashion.

source
FrankWolfe.build_reduced_problemMethod
build_reduced_problem(atoms::AbstractVector{<:AbstractVector}, hessian, weights, gradient, tolerance)

Given an active set formed by vectors , a (constant) Hessian and a gradient constructs a quadratic problem over the unit probability simplex that is equivalent to minimizing the original function over the convex hull of the active set. If λ are the barycentric coordinates of dimension equal to the cardinality of the active set, the objective function is:

f(λ) = reduced_linear^T λ + 0.5 * λ^T reduced_hessian λ

In the case where we find that the current iterate has a strong-Wolfe gap over the convex hull of the active set that is below the tolerance we return nothing (as there is nothing to do).

source
FrankWolfe.lp_separation_oracleMethod

Returns either a tuple (y, val) with y an atom from the active set satisfying the progress criterion and val the corresponding gap dot(y, direction) or the same tuple with y from the LMO.

inplace_loop controls whether the iterate type allows in-place writes. kwargs are passed on to the LMO oracle.

source
FrankWolfe.minimize_over_convex_hull!Method
minimize_over_convex_hull!

Given a function f with gradient grad! and an active set active_set this function will minimize the function over the convex hull of the active set until the strong-wolfe gap over the active set is below tolerance.

It will either directly minimize over the convex hull using simplex gradient descent, or it will transform the problem to barycentric coordinates and minimize over the unit probability simplex using gradient descent or Nesterov's accelerated gradient descent.

source
FrankWolfe.simplex_gradient_descent_over_convex_hullMethod
simplex_gradient_descent_over_convex_hull(f, grad!, gradient, active_set, tolerance, t, time_start, non_simplex_iter)

Minimizes an objective function over the convex hull of the active set until the Strong-Wolfe gap is below tolerance using simplex gradient descent.

source

Blended Pairwise Conditional Gradient

Alternating Methods

Problems over intersections of convex sets, i.e.

\[\min_{x \in \bigcap_{i=1}^n P_i} f(x),\]

pose a challenge as one has to combine the information of two or more LMOs.

FrankWolfe.alternating_linear_minimization converts the problem into a series of subproblems over single sets. To find a point within the intersection, one minimizes both the distance to the iterates of the other subproblems and the original objective function.

FrankWolfe.alternating_projections solves feasibility problems over intersections of feasible regions.

FrankWolfe.alternating_linear_minimizationMethod
alternating_linear_minimization(bc_algo::BlockCoordinateMethod, f, grad!, lmos::NTuple{N,LinearMinimizationOracle}, x0; ...) where {N}

Alternating Linear Minimization minimizes the objective f over the intersections of the feasible domains specified by lmos. The tuple x0 defines the initial points for each domain. Returns a tuple (x, v, primal, dual_gap, dist2, traj_data) with:

  • x cartesian product of final iterates
  • v cartesian product of last vertices of the LMOs
  • primal primal value f(x)
  • dual_gap final Frank-Wolfe gap
  • dist2 is 1/2 of the sum of squared, pairwise distances between iterates
  • traj_data vector of trajectory information.
source
FrankWolfe.alternating_projectionsMethod
alternating_projections(lmos::NTuple{N,LinearMinimizationOracle}, x0; ...) where {N}

Computes a point in the intersection of feasible domains specified by lmos. Returns a tuple (x, v, dual_gap, dist2, traj_data) with:

  • x cartesian product of final iterates
  • v cartesian product of last vertices of the LMOs
  • dual_gap final Frank-Wolfe gap
  • dist2 is 1/2 * sum of squared, pairwise distances between iterates
  • traj_data vector of trajectory information.
source

Index

    +Algorithms · FrankWolfe.jl

    Algorithms

    This section contains all main algorithms of the package. These are the ones typical users will call.

    The typical signature for these algorithms is:

    my_algorithm(f, grad!, lmo, x0)

    Standard algorithms

    FrankWolfe.frank_wolfeMethod
    frank_wolfe(f, grad!, lmo, x0; ...)

    Simplest form of the Frank-Wolfe algorithm. Returns a tuple (x, v, primal, dual_gap, traj_data) with:

    • x final iterate
    • v last vertex from the LMO
    • primal primal value f(x)
    • dual_gap final Frank-Wolfe gap
    • traj_data vector of trajectory information.
    source
    FrankWolfe.stochastic_frank_wolfeMethod
    stochastic_frank_wolfe(f::StochasticObjective, lmo, x0; ...)

    Stochastic version of Frank-Wolfe, evaluates the objective and gradient stochastically, implemented through the FrankWolfe.StochasticObjective interface.

    Keyword arguments include batch_size to pass a fixed batch_size or a batch_iterator implementing batch_size = FrankWolfe.batchsize_iterate(batch_iterator) for algorithms like Variance-reduced and projection-free stochastic optimization, E Hazan, H Luo, 2016.

    Similarly, a constant momentum can be passed or replaced by a momentum_iterator implementing momentum = FrankWolfe.momentum_iterate(momentum_iterator).

    source
    FrankWolfe.block_coordinate_frank_wolfeFunction
    block_coordinate_frank_wolfe(f, grad!, lmo::ProductLMO{N}, x0; ...) where {N}

    Block-coordinate version of the Frank-Wolfe algorithm. Minimizes objective f over the product of feasible domains specified by the lmo. The optional argument the update_order is of type FrankWolfe.BlockCoordinateUpdateOrder and controls the order in which the blocks are updated. The argument update_step is a single instance or tuple of FrankWolfe.UpdateStep and defines which FW-algorithms to use to update the iterates in the different blocks.

    The method returns a tuple (x, v, primal, dual_gap, traj_data) with:

    • x cartesian product of final iterates
    • v cartesian product of last vertices of the LMOs
    • primal primal value f(x)
    • dual_gap final Frank-Wolfe gap
    • traj_data vector of trajectory information.

    See S. Lacoste-Julien, M. Jaggi, M. Schmidt, and P. Pletscher 2013 and A. Beck, E. Pauwels and S. Sabach 2015 for more details about Block-Coordinate Frank-Wolfe.

    source

    Active-set based methods

    The following algorithms maintain the representation of the iterates as a convex combination of vertices.

    Away-step

    Pairwise Frank-Wolfe

    Blended Conditional Gradient

    FrankWolfe.blended_conditional_gradientMethod
    blended_conditional_gradient(f, grad!, lmo, x0)

    Entry point for the Blended Conditional Gradient algorithm. See Braun, Gábor, et al. "Blended conditonal gradients" ICML 2019. The method works on an active set like FrankWolfe.away_frank_wolfe, performing gradient descent over the convex hull of active vertices, removing vertices when their weight drops to 0 and adding new vertices by calling the linear oracle in a lazy fashion.

    source
    FrankWolfe.build_reduced_problemMethod
    build_reduced_problem(atoms::AbstractVector{<:AbstractVector}, hessian, weights, gradient, tolerance)

    Given an active set formed by vectors , a (constant) Hessian and a gradient constructs a quadratic problem over the unit probability simplex that is equivalent to minimizing the original function over the convex hull of the active set. If λ are the barycentric coordinates of dimension equal to the cardinality of the active set, the objective function is:

    f(λ) = reduced_linear^T λ + 0.5 * λ^T reduced_hessian λ

    In the case where we find that the current iterate has a strong-Wolfe gap over the convex hull of the active set that is below the tolerance we return nothing (as there is nothing to do).

    source
    FrankWolfe.lp_separation_oracleMethod

    Returns either a tuple (y, val) with y an atom from the active set satisfying the progress criterion and val the corresponding gap dot(y, direction) or the same tuple with y from the LMO.

    inplace_loop controls whether the iterate type allows in-place writes. kwargs are passed on to the LMO oracle.

    source
    FrankWolfe.minimize_over_convex_hull!Method
    minimize_over_convex_hull!

    Given a function f with gradient grad! and an active set active_set this function will minimize the function over the convex hull of the active set until the strong-wolfe gap over the active set is below tolerance.

    It will either directly minimize over the convex hull using simplex gradient descent, or it will transform the problem to barycentric coordinates and minimize over the unit probability simplex using gradient descent or Nesterov's accelerated gradient descent.

    source
    FrankWolfe.simplex_gradient_descent_over_convex_hullMethod
    simplex_gradient_descent_over_convex_hull(f, grad!, gradient, active_set, tolerance, t, time_start, non_simplex_iter)

    Minimizes an objective function over the convex hull of the active set until the Strong-Wolfe gap is below tolerance using simplex gradient descent.

    source

    Blended Pairwise Conditional Gradient

    Alternating Methods

    Problems over intersections of convex sets, i.e.

    \[\min_{x \in \bigcap_{i=1}^n P_i} f(x),\]

    pose a challenge as one has to combine the information of two or more LMOs.

    FrankWolfe.alternating_linear_minimization converts the problem into a series of subproblems over single sets. To find a point within the intersection, one minimizes both the distance to the iterates of the other subproblems and the original objective function.

    FrankWolfe.alternating_projections solves feasibility problems over intersections of feasible regions.

    FrankWolfe.alternating_linear_minimizationMethod
    alternating_linear_minimization(bc_algo::BlockCoordinateMethod, f, grad!, lmos::NTuple{N,LinearMinimizationOracle}, x0; ...) where {N}

    Alternating Linear Minimization minimizes the objective f over the intersections of the feasible domains specified by lmos. The tuple x0 defines the initial points for each domain. Returns a tuple (x, v, primal, dual_gap, dist2, traj_data) with:

    • x cartesian product of final iterates
    • v cartesian product of last vertices of the LMOs
    • primal primal value f(x)
    • dual_gap final Frank-Wolfe gap
    • dist2 is 1/2 of the sum of squared, pairwise distances between iterates
    • traj_data vector of trajectory information.
    source
    FrankWolfe.alternating_projectionsMethod
    alternating_projections(lmos::NTuple{N,LinearMinimizationOracle}, x0; ...) where {N}

    Computes a point in the intersection of feasible domains specified by lmos. Returns a tuple (x, v, dual_gap, dist2, traj_data) with:

    • x cartesian product of final iterates
    • v cartesian product of last vertices of the LMOs
    • dual_gap final Frank-Wolfe gap
    • dist2 is 1/2 * sum of squared, pairwise distances between iterates
    • traj_data vector of trajectory information.
    source

    Index

      diff --git a/previews/PR513/reference/2_lmo/index.html b/previews/PR513/reference/2_lmo/index.html index 4ff29fb15..7a2a75acf 100644 --- a/previews/PR513/reference/2_lmo/index.html +++ b/previews/PR513/reference/2_lmo/index.html @@ -1,2 +1,2 @@ -Linear Minimization Oracles · FrankWolfe.jl

      Linear Minimization Oracles

      The Linear Minimization Oracle (LMO) is a key component called at each iteration of the FW algorithm. Given $d\in \mathcal{X}$, it returns a vertex of the feasible set:

      \[v\in \argmin_{x\in \mathcal{C}} \langle d,x \rangle.\]

      See Combettes, Pokutta 2021 for references on most LMOs implemented in the package and their comparison with projection operators.

      Interface and wrappers

      FrankWolfe.LinearMinimizationOracleType

      Supertype for linear minimization oracles.

      All LMOs must implement compute_extreme_point(lmo::LMO, direction) and return a vector v of the appropriate type.

      source

      All of them are subtypes of FrankWolfe.LinearMinimizationOracle and implement the following method:

      FrankWolfe.compute_extreme_pointFunction
      compute_extreme_point(lmo::LinearMinimizationOracle, direction; kwargs...)

      Computes the point argmin_{v ∈ C} v ⋅ direction with C the set represented by the LMO. Most LMOs feature v as a keyword argument that allows for an in-place computation whenever v is dense. All LMOs should accept keyword arguments that they can ignore.

      source

      We also provide some meta-LMOs wrapping another one with extended behavior:

      FrankWolfe.CachedLinearMinimizationOracleType
      CachedLinearMinimizationOracle{LMO}

      Oracle wrapping another one of type lmo. Subtypes of CachedLinearMinimizationOracle contain a cache of previous solutions.

      By convention, the inner oracle is named inner. Cached optimizers are expected to implement Base.empty! and Base.length.

      source
      FrankWolfe.SingleLastCachedLMOType
      SingleLastCachedLMO{LMO, VT}

      Caches only the last result from an LMO and stores it in last_vertex. Vertices of LMO have to be of type VT if provided.

      source
      FrankWolfe.MultiCacheLMOType
      MultiCacheLMO{N, LMO, A}

      Cache for a LMO storing up to N vertices in the cache, removed in FIFO style. oldest_idx keeps track of the oldest index in the tuple, i.e. to replace next. VT, if provided, must be the type of vertices returned by LMO

      source
      FrankWolfe.VectorCacheLMOType
      VectorCacheLMO{LMO, VT}

      Cache for a LMO storing an unbounded number of vertices of type VT in the cache. VT, if provided, must be the type of vertices returned by LMO

      source

      Norm balls

      FrankWolfe.EllipsoidLMOType
      EllipsoidLMO(A, c, r)

      Linear minimization over an ellipsoid centered at c of radius r:

      x: (x - c)^T A (x - c) ≤ r

      The LMO stores the factorization F of A that is used to solve linear systems A⁻¹ x. The result of the linear system solve is stored in buffer. The ellipsoid is assumed to be full-dimensional -> A is positive definite.

      source
      FrankWolfe.KNormBallLMOType
      KNormBallLMO{T}(K::Int, right_hand_side::T)

      LMO with feasible set being the K-norm ball in the sense of 2010.07243, i.e., the convex hull over the union of an L1-ball with radius τ and an L∞-ball with radius τ/K:

      C_{K,τ} = conv { B_1(τ) ∪ B_∞(τ / K) }

      with τ the right_hand_side parameter. The K-norm is defined as the sum of the largest K absolute entries in a vector.

      source
      FrankWolfe.LpNormLMOType
      LpNormLMO{T, p}(right_hand_side)

      LMO with feasible set being an L-p norm ball:

      C = {x ∈ R^n, norm(x, p) ≤ right_hand_side}
      source
      FrankWolfe.NuclearNormLMOType
      NuclearNormLMO{T}(radius)

      LMO over matrices that have a nuclear norm less than radius. The LMO returns the best rank-one approximation matrix with singular value radius, computed with Arpack.

      source
      FrankWolfe.OrderWeightNormLMOType
      OrderWeightNormLMO(weights,radius)

      LMO with feasible set being the atomic ordered weighted l1 norm: https://arxiv.org/pdf/1409.4271

      C = {x ∈ R^n, Ω_w(x) ≤ R} 

      The weights are assumed to be positive.

      source
      FrankWolfe.SpectraplexLMOType
      SpectraplexLMO{T,M}(radius::T,gradient_container::M,ensure_symmetry::Bool=true)

      Feasible set

      {X ∈ 𝕊_n^+, trace(X) == radius}

      gradient_container is used to store the symmetrized negative direction. ensure_symmetry indicates whether the linear function is made symmetric before computing the eigenvector.

      source
      FrankWolfe.UnitSpectrahedronLMOType
      UnitSpectrahedronLMO{T,M}(radius::T, gradient_container::M)

      Feasible set of PSD matrices with bounded trace:

      {X ∈ 𝕊_n^+, trace(X) ≤ radius}

      gradient_container is used to store the symmetrized negative direction. ensure_symmetry indicates whether the linear function is made symmetric before computing the eigenvector.

      source

      Simplex

      FrankWolfe.HyperSimplexOracleType
      HyperSimplexOracle(radius)

      Represents the scaled hypersimplex of radius τ, the convex hull of vectors v such that:

      • v_i ∈ {0, τ}
      • ||v||_0 = k

      Equivalently, this is the convex hull of the vertices of the K-sparse polytope lying in the nonnegative orthant.

      source
      FrankWolfe.UnitHyperSimplexOracleType
      UnitHyperSimplexOracle(radius)

      Represents the scaled unit hypersimplex of radius τ, the convex hull of vectors v such that:

      • v_i ∈ {0, τ}
      • ||v||_0 ≤ k

      Equivalently, this is the intersection of the K-sparse polytope and the nonnegative orthant.

      source
      FrankWolfe.compute_dual_solutionMethod

      Dual costs for a given primal solution to form a primal dual pair for scaled probability simplex. Returns two vectors. The first one is the dual costs associated with the constraints and the second is the reduced costs for the variables.

      source
      FrankWolfe.compute_dual_solutionMethod

      Dual costs for a given primal solution to form a primal dual pair for scaled unit simplex. Returns two vectors. The first one is the dual costs associated with the constraints and the second is the reduced costs for the variables.

      source
      FrankWolfe.compute_extreme_pointMethod

      LMO for scaled probability simplex. Returns a vector with one active value equal to RHS in the most improving (or least degrading) direction.

      source
      FrankWolfe.compute_extreme_pointMethod

      LMO for scaled unit simplex: ∑ x_i = τ Returns either vector of zeros or vector with one active value equal to RHS if there exists an improving direction.

      source

      Polytope

      FrankWolfe.BirkhoffPolytopeLMOType
      BirkhoffPolytopeLMO

      The Birkhoff polytope encodes doubly stochastic matrices. Its extreme vertices are all permutation matrices of side-dimension dimension.

      source
      FrankWolfe.KSparseLMOType
      KSparseLMO{T}(K::Int, right_hand_side::T)

      LMO for the K-sparse polytope:

      C = B_1(τK) ∩ B_∞(τ)

      with τ the right_hand_side parameter. The LMO results in a vector with the K largest absolute values of direction, taking values -τ sign(x_i).

      source
      FrankWolfe.ScaledBoundL1NormBallType
      ScaledBoundL1NormBall(lower_bounds, upper_bounds)

      Polytope similar to a L1-ball with shifted bounds. It is the convex hull of two scaled and shifted unit vectors for each axis (shifted to the center of the polytope, i.e., the elementwise midpoint of the bounds). Lower and upper bounds are passed on as abstract vectors, possibly of different types. For the standard L1-ball, all lower and upper bounds would be -1 and 1.

      source
      FrankWolfe.ScaledBoundLInfNormBallType
      ScaledBoundLInfNormBall(lower_bounds, upper_bounds)

      Polytope similar to a L-inf-ball with shifted bounds or general box constraints. Lower- and upper-bounds are passed on as abstract vectors, possibly of different types. For the standard L-inf ball, all lower- and upper-bounds would be -1 and 1.

      source

      MathOptInterface

      FrankWolfe.MathOptLMOType
      MathOptLMO{OT <: MOI.Optimizer} <: LinearMinimizationOracle

      Linear minimization oracle with feasible space defined through a MathOptInterface.Optimizer. The oracle call sets the direction and reruns the optimizer.

      The direction vector has to be set in the same order of variables as the MOI.ListOfVariableIndices() getter.

      The Boolean use_modify determines if the objective incompute_extreme_point is updated with MOI.modify(o, ::MOI.ObjectiveFunction, ::MOI.ScalarCoefficientChange) or with MOI.set(o, ::MOI.ObjectiveFunction, f). use_modify = true decreases the runtime and memory allocation for models created as an optimizer object and defined directly with MathOptInterface. use_modify = false should be used for CachingOptimizers.

      source
      FrankWolfe.convert_mathoptFunction
      convert_mathopt(lmo::LMO, optimizer::OT; kwargs...) -> MathOptLMO{OT}

      Converts the given LMO to its equivalent MathOptInterface representation using optimizer. Must be implemented by LMOs.

      source

      Index

        +Linear Minimization Oracles · FrankWolfe.jl

        Linear Minimization Oracles

        The Linear Minimization Oracle (LMO) is a key component called at each iteration of the FW algorithm. Given $d\in \mathcal{X}$, it returns a vertex of the feasible set:

        \[v\in \argmin_{x\in \mathcal{C}} \langle d,x \rangle.\]

        See Combettes, Pokutta 2021 for references on most LMOs implemented in the package and their comparison with projection operators.

        Interface and wrappers

        FrankWolfe.LinearMinimizationOracleType

        Supertype for linear minimization oracles.

        All LMOs must implement compute_extreme_point(lmo::LMO, direction) and return a vector v of the appropriate type.

        source

        All of them are subtypes of FrankWolfe.LinearMinimizationOracle and implement the following method:

        FrankWolfe.compute_extreme_pointFunction
        compute_extreme_point(lmo::LinearMinimizationOracle, direction; kwargs...)

        Computes the point argmin_{v ∈ C} v ⋅ direction with C the set represented by the LMO. Most LMOs feature v as a keyword argument that allows for an in-place computation whenever v is dense. All LMOs should accept keyword arguments that they can ignore.

        source

        We also provide some meta-LMOs wrapping another one with extended behavior:

        FrankWolfe.CachedLinearMinimizationOracleType
        CachedLinearMinimizationOracle{LMO}

        Oracle wrapping another one of type lmo. Subtypes of CachedLinearMinimizationOracle contain a cache of previous solutions.

        By convention, the inner oracle is named inner. Cached optimizers are expected to implement Base.empty! and Base.length.

        source
        FrankWolfe.SingleLastCachedLMOType
        SingleLastCachedLMO{LMO, VT}

        Caches only the last result from an LMO and stores it in last_vertex. Vertices of LMO have to be of type VT if provided.

        source
        FrankWolfe.MultiCacheLMOType
        MultiCacheLMO{N, LMO, A}

        Cache for a LMO storing up to N vertices in the cache, removed in FIFO style. oldest_idx keeps track of the oldest index in the tuple, i.e. to replace next. VT, if provided, must be the type of vertices returned by LMO

        source
        FrankWolfe.VectorCacheLMOType
        VectorCacheLMO{LMO, VT}

        Cache for a LMO storing an unbounded number of vertices of type VT in the cache. VT, if provided, must be the type of vertices returned by LMO

        source

        Norm balls

        FrankWolfe.EllipsoidLMOType
        EllipsoidLMO(A, c, r)

        Linear minimization over an ellipsoid centered at c of radius r:

        x: (x - c)^T A (x - c) ≤ r

        The LMO stores the factorization F of A that is used to solve linear systems A⁻¹ x. The result of the linear system solve is stored in buffer. The ellipsoid is assumed to be full-dimensional -> A is positive definite.

        source
        FrankWolfe.KNormBallLMOType
        KNormBallLMO{T}(K::Int, right_hand_side::T)

        LMO with feasible set being the K-norm ball in the sense of 2010.07243, i.e., the convex hull over the union of an L1-ball with radius τ and an L∞-ball with radius τ/K:

        C_{K,τ} = conv { B_1(τ) ∪ B_∞(τ / K) }

        with τ the right_hand_side parameter. The K-norm is defined as the sum of the largest K absolute entries in a vector.

        source
        FrankWolfe.LpNormLMOType
        LpNormLMO{T, p}(right_hand_side)

        LMO with feasible set being an L-p norm ball:

        C = {x ∈ R^n, norm(x, p) ≤ right_hand_side}
        source
        FrankWolfe.NuclearNormLMOType
        NuclearNormLMO{T}(radius)

        LMO over matrices that have a nuclear norm less than radius. The LMO returns the best rank-one approximation matrix with singular value radius, computed with Arpack.

        source
        FrankWolfe.OrderWeightNormLMOType
        OrderWeightNormLMO(weights,radius)

        LMO with feasible set being the atomic ordered weighted l1 norm: https://arxiv.org/pdf/1409.4271

        C = {x ∈ R^n, Ω_w(x) ≤ R} 

        The weights are assumed to be positive.

        source
        FrankWolfe.SpectraplexLMOType
        SpectraplexLMO{T,M}(radius::T,gradient_container::M,ensure_symmetry::Bool=true)

        Feasible set

        {X ∈ 𝕊_n^+, trace(X) == radius}

        gradient_container is used to store the symmetrized negative direction. ensure_symmetry indicates whether the linear function is made symmetric before computing the eigenvector.

        source
        FrankWolfe.UnitSpectrahedronLMOType
        UnitSpectrahedronLMO{T,M}(radius::T, gradient_container::M)

        Feasible set of PSD matrices with bounded trace:

        {X ∈ 𝕊_n^+, trace(X) ≤ radius}

        gradient_container is used to store the symmetrized negative direction. ensure_symmetry indicates whether the linear function is made symmetric before computing the eigenvector.

        source

        Simplex

        FrankWolfe.HyperSimplexOracleType
        HyperSimplexOracle(radius)

        Represents the scaled hypersimplex of radius τ, the convex hull of vectors v such that:

        • v_i ∈ {0, τ}
        • ||v||_0 = k

        Equivalently, this is the convex hull of the vertices of the K-sparse polytope lying in the nonnegative orthant.

        source
        FrankWolfe.UnitHyperSimplexOracleType
        UnitHyperSimplexOracle(radius)

        Represents the scaled unit hypersimplex of radius τ, the convex hull of vectors v such that:

        • v_i ∈ {0, τ}
        • ||v||_0 ≤ k

        Equivalently, this is the intersection of the K-sparse polytope and the nonnegative orthant.

        source
        FrankWolfe.compute_dual_solutionMethod

        Dual costs for a given primal solution to form a primal dual pair for scaled probability simplex. Returns two vectors. The first one is the dual costs associated with the constraints and the second is the reduced costs for the variables.

        source
        FrankWolfe.compute_dual_solutionMethod

        Dual costs for a given primal solution to form a primal dual pair for scaled unit simplex. Returns two vectors. The first one is the dual costs associated with the constraints and the second is the reduced costs for the variables.

        source
        FrankWolfe.compute_extreme_pointMethod

        LMO for scaled probability simplex. Returns a vector with one active value equal to RHS in the most improving (or least degrading) direction.

        source
        FrankWolfe.compute_extreme_pointMethod

        LMO for scaled unit simplex: ∑ x_i = τ Returns either vector of zeros or vector with one active value equal to RHS if there exists an improving direction.

        source

        Polytope

        FrankWolfe.BirkhoffPolytopeLMOType
        BirkhoffPolytopeLMO

        The Birkhoff polytope encodes doubly stochastic matrices. Its extreme vertices are all permutation matrices of side-dimension dimension.

        source
        FrankWolfe.KSparseLMOType
        KSparseLMO{T}(K::Int, right_hand_side::T)

        LMO for the K-sparse polytope:

        C = B_1(τK) ∩ B_∞(τ)

        with τ the right_hand_side parameter. The LMO results in a vector with the K largest absolute values of direction, taking values -τ sign(x_i).

        source
        FrankWolfe.ScaledBoundL1NormBallType
        ScaledBoundL1NormBall(lower_bounds, upper_bounds)

        Polytope similar to a L1-ball with shifted bounds. It is the convex hull of two scaled and shifted unit vectors for each axis (shifted to the center of the polytope, i.e., the elementwise midpoint of the bounds). Lower and upper bounds are passed on as abstract vectors, possibly of different types. For the standard L1-ball, all lower and upper bounds would be -1 and 1.

        source
        FrankWolfe.ScaledBoundLInfNormBallType
        ScaledBoundLInfNormBall(lower_bounds, upper_bounds)

        Polytope similar to a L-inf-ball with shifted bounds or general box constraints. Lower- and upper-bounds are passed on as abstract vectors, possibly of different types. For the standard L-inf ball, all lower- and upper-bounds would be -1 and 1.

        source

        MathOptInterface

        FrankWolfe.MathOptLMOType
        MathOptLMO{OT <: MOI.Optimizer} <: LinearMinimizationOracle

        Linear minimization oracle with feasible space defined through a MathOptInterface.Optimizer. The oracle call sets the direction and reruns the optimizer.

        The direction vector has to be set in the same order of variables as the MOI.ListOfVariableIndices() getter.

        The Boolean use_modify determines if the objective incompute_extreme_point is updated with MOI.modify(o, ::MOI.ObjectiveFunction, ::MOI.ScalarCoefficientChange) or with MOI.set(o, ::MOI.ObjectiveFunction, f). use_modify = true decreases the runtime and memory allocation for models created as an optimizer object and defined directly with MathOptInterface. use_modify = false should be used for CachingOptimizers.

        source
        FrankWolfe.convert_mathoptFunction
        convert_mathopt(lmo::LMO, optimizer::OT; kwargs...) -> MathOptLMO{OT}

        Converts the given LMO to its equivalent MathOptInterface representation using optimizer. Must be implemented by LMOs.

        source

        Index

          diff --git a/previews/PR513/reference/3_backend/index.html b/previews/PR513/reference/3_backend/index.html index 21e95aa38..f229cbcb6 100644 --- a/previews/PR513/reference/3_backend/index.html +++ b/previews/PR513/reference/3_backend/index.html @@ -1,5 +1,5 @@ -Utilities and data structures · FrankWolfe.jl

          Utilities and data structures

          Active set

          FrankWolfe.AbstractActiveSetType
          AbstractActiveSet{AT, R, IT}

          Abstract type for an active set of atoms of type AT with weights of type R and iterate of type IT. An active set is typically expected to have a field weights, a field atoms, and a field x. Otherwise, all active set methods from src/active_set.jl can be overwritten.

          source
          FrankWolfe.ActiveSetType
          ActiveSet{AT, R, IT}

          Represents an active set of extreme vertices collected in a FW algorithm, along with their coefficients (λ_i, a_i). R is the type of the λ_i, AT is the type of the atoms a_i. The iterate x = ∑λ_i a_i is stored in x with type IT.

          source
          Base.copyMethod

          Copies an active set, the weight and atom vectors and the iterate. Individual atoms are not copied.

          source
          FrankWolfe.active_set_argminMethod
          active_set_argmin(active_set::AbstractActiveSet, direction)

          Computes the linear minimizer in the direction on the active set. Returns (λ_i, a_i, i)

          source
          FrankWolfe.active_set_argminmaxMethod
          active_set_argminmax(active_set::AbstractActiveSet, direction)

          Computes the linear minimizer in the direction on the active set. Returns (λ_min, a_min, i_min, val_min, λ_max, a_max, i_max, val_max, val_max-val_min ≥ Φ)

          source
          FrankWolfe.active_set_update!Method
          active_set_update!(active_set::AbstractActiveSet, lambda, atom)

          Adds the atom to the active set with weight lambda or adds lambda to existing atom.

          source
          FrankWolfe.compute_active_set_iterate!Method
          compute_active_set_iterate!(active_set::AbstractActiveSet) -> x

          Recomputes from scratch the iterate x from the current weights and vertices of the active set. Returns the iterate x.

          source

          Functions and gradients

          FrankWolfe.ObjectiveFunctionType
          ObjectiveFunction

          Represents an objective function optimized by algorithms. Subtypes of ObjectiveFunction must implement at least

          • compute_value(::ObjectiveFunction, x) for primal value evaluation
          • compute_gradient(::ObjectiveFunction, x) for gradient evaluation.

          and optionally compute_value_gradient(::ObjectiveFunction, x) returning the (primal, gradient) pair. compute_gradient may always use the same storage and return a reference to it.

          source
          FrankWolfe.SimpleFunctionObjectiveType
          SimpleFunctionObjective{F,G,S}

          An objective function built from separate primal objective f(x) and in-place gradient function grad!(storage, x). It keeps an internal storage of type s used to evaluate the gradient in-place.

          source
          FrankWolfe.StochasticObjectiveType
          StochasticObjective{F, G, XT, S}(f::F, grad!::G, xs::XT, storage::S)

          Represents a composite function evaluated with stochastic gradient. f(θ, x) evaluates the loss for a single data point x and parameter θ. grad!(storage, θ, x) adds to storage the partial gradient with respect to data point x at parameter θ. xs must be an indexable iterable (Vector{Vector{Float64}} for instance). Functions using a StochasticObjective have optional keyword arguments rng, batch_size and full_evaluation controlling whether the function should be evaluated over all data points.

          Note: grad! must not reset the storage to 0 before adding to it.

          source
          FrankWolfe.compute_gradientFunction
          compute_gradient(f::ObjectiveFunction, x; [kwargs...])

          Computes the gradient of f at x. May return a reference to an internal storage.

          source
          FrankWolfe.compute_value_gradientMethod
          compute_value_gradient(f::ObjectiveFunction, x; [kwargs...])

          Computes in one call the pair (value, gradient) evaluated at x. By default, calls compute_value and compute_gradient with keywords kwargs passed down to both.

          source

          Callbacks

          Custom vertex storage

          Custom extreme point types

          For some feasible sets, the extreme points of the feasible set returned by the LMO possess a specific structure that can be represented in an efficient manner both for storage and for common operations like scaling and addition with an iterate. They are presented below:

          Utils

          FrankWolfe.DeletedVertexStorageType

          Vertex storage to store dropped vertices or find a suitable direction in lazy settings. The algorithm will look for at most return_kth suitable atoms before returning the best. See Extra-lazification with a vertex storage for usage.

          A vertex storage can be any type that implements two operations:

          1. Base.push!(storage, atom) to add an atom to the storage.

          Note that it is the storage type responsibility to ensure uniqueness of the atoms present.

          1. storage_find_argmin_vertex(storage, direction, lazy_threshold) -> (found, vertex)

          returning whether a vertex with sufficient progress was found and the vertex. It is up to the storage to remove vertices (or not) when they have been picked up.

          source
          FrankWolfe.ExpMomentumIteratorType
          ExpMomentumIterator{T}

          Iterator for the momentum used in the variant of Stochastic Frank-Wolfe. Momentum coefficients are the values of the iterator: ρ_t = 1 - num / (offset + t)^exp

          The state corresponds to the iteration count.

          Source: Stochastic Conditional Gradient Methods: From Convex Minimization to Submodular Maximization Aryan Mokhtari, Hamed Hassani, Amin Karbasi, JMLR 2020.

          source
          FrankWolfe.IncrementBatchIteratorType
          IncrementBatchIterator(starting_batch_size, max_batch_size, [increment = 1])

          Batch size starting at startingbatchsize and incrementing by increment at every iteration.

          source
          FrankWolfe.batchsize_iterateFunction
          batchsize_iterate(iter::BatchSizeIterator) -> b

          Method to implement for a batch size iterator of type BatchSizeIterator. Calling batchsize_iterate returns the next batch size and typically update the internal state of iter.

          source
          FrankWolfe.momentum_iterateFunction
          momentum_iterate(iter::MomentumIterator) -> ρ

          Method to implement for a type MomentumIterator. Returns the next momentum value ρ and updates the iterator internal state.

          source
          FrankWolfe.muladd_memory_modeMethod
          (memory_mode::MemoryEmphasis, storage, x, gamma::Real, d)

          Performs storage = x - gamma * d in-place or not depending on MemoryEmphasis

          source
          FrankWolfe.trajectory_callbackMethod
          trajectory_callback(storage)

          Callback pushing the state at each iteration to the passed storage. The state data is only the 5 first fields, usually: (t,primal,dual,dual_gap,time)

          source

          Oracle counting trackers

          The following structures are wrapping given oracles to behave similarly but additionally track the number of calls.

          Also see the example Tracking, counters and custom callbacks for Frank Wolfe.

          Update order for block-coordinate methods

          Block-coordinate methods can be run with different update orders. All update orders are subtypes of FrankWolfe.BlockCoordinateUpdateOrder. They have to implement the method FrankWolfe.select_update_indices which selects which blocks to update in what order.

          FrankWolfe.BlockCoordinateUpdateOrderType

          Update order for a block-coordinate method. A BlockCoordinateUpdateOrder must implement

          select_update_indices(::BlockCoordinateUpdateOrder, s::CallbackState, dual_gaps)
          source
          FrankWolfe.select_update_indicesFunction
          select_update_indices(::BlockCoordinateUpdateOrder, s::CallbackState, dual_gaps)

          Returns a list of lists of the block indices. Each sublist represents one round of updates in an iteration. The indices in a list show which blocks should be updated parallely in one round. For example, a full update is given by [1:l] and a blockwise update by [[i] for i=1:l], where l is the number of blocks.

          source
          FrankWolfe.CyclicUpdateType

          The cyclic update initiates a sequence of update rounds. In each round only one block is updated. The order of the blocks is determined by the given order of the LMOs.

          source
          FrankWolfe.StochasticUpdateType

          The stochastic update initiates a sequence of update rounds. In each round only one block is updated. The order of the blocks is a random.

          source
          FrankWolfe.LazyUpdateType

          The Lazy update order is discussed in "Flexible block-iterative analysis for the Frank-Wolfe algorithm," by Braun, Pokutta, & Woodstock (2024). 'lazyblock' is an index of a computationally expensive block to update; 'refreshrate' describes the frequency at which we perform a full activation; and 'blocksize' describes the number of "faster" blocks (i.e., those excluding 'lazyblock') activated (chosen uniformly at random) during each of the "faster" iterations; for more detail, see the article. If 'block_size' is unspecified, this defaults to

          Note: This methodology is currently only proven to work with 'FrankWolfe.Shortstep' linesearches and a (not-yet implemented) adaptive method; see the article for details.

          source

          Update step for block-coordinate Frank-Wolfe

          Block-coordinate Frank-Wolfe (BCFW) can run different FW algorithms on different blocks. All update steps are subtypes of FrankWolfe.UpdateStep and implement FrankWolfe.update_iterate which defines one iteration of the corresponding method.

          FrankWolfe.UpdateStepType

          Update step for block-coordinate Frank-Wolfe. These are implementations of different FW-algorithms to be used in a blockwise manner. Each update step must implement

          update_iterate(
          +Utilities and data structures · FrankWolfe.jl

          Utilities and data structures

          Active set

          FrankWolfe.AbstractActiveSetType
          AbstractActiveSet{AT, R, IT}

          Abstract type for an active set of atoms of type AT with weights of type R and iterate of type IT. An active set is typically expected to have a field weights, a field atoms, and a field x. Otherwise, all active set methods from src/active_set.jl can be overwritten.

          source
          FrankWolfe.ActiveSetType
          ActiveSet{AT, R, IT}

          Represents an active set of extreme vertices collected in a FW algorithm, along with their coefficients (λ_i, a_i). R is the type of the λ_i, AT is the type of the atoms a_i. The iterate x = ∑λ_i a_i is stored in x with type IT.

          source
          Base.copyMethod

          Copies an active set, the weight and atom vectors and the iterate. Individual atoms are not copied.

          source
          FrankWolfe.active_set_argminMethod
          active_set_argmin(active_set::AbstractActiveSet, direction)

          Computes the linear minimizer in the direction on the active set. Returns (λ_i, a_i, i)

          source
          FrankWolfe.active_set_argminmaxMethod
          active_set_argminmax(active_set::AbstractActiveSet, direction)

          Computes the linear minimizer in the direction on the active set. Returns (λ_min, a_min, i_min, val_min, λ_max, a_max, i_max, val_max, val_max-val_min ≥ Φ)

          source
          FrankWolfe.active_set_update!Method
          active_set_update!(active_set::AbstractActiveSet, lambda, atom)

          Adds the atom to the active set with weight lambda or adds lambda to existing atom.

          source
          FrankWolfe.compute_active_set_iterate!Method
          compute_active_set_iterate!(active_set::AbstractActiveSet) -> x

          Recomputes from scratch the iterate x from the current weights and vertices of the active set. Returns the iterate x.

          source

          Functions and gradients

          FrankWolfe.ObjectiveFunctionType
          ObjectiveFunction

          Represents an objective function optimized by algorithms. Subtypes of ObjectiveFunction must implement at least

          • compute_value(::ObjectiveFunction, x) for primal value evaluation
          • compute_gradient(::ObjectiveFunction, x) for gradient evaluation.

          and optionally compute_value_gradient(::ObjectiveFunction, x) returning the (primal, gradient) pair. compute_gradient may always use the same storage and return a reference to it.

          source
          FrankWolfe.SimpleFunctionObjectiveType
          SimpleFunctionObjective{F,G,S}

          An objective function built from separate primal objective f(x) and in-place gradient function grad!(storage, x). It keeps an internal storage of type s used to evaluate the gradient in-place.

          source
          FrankWolfe.StochasticObjectiveType
          StochasticObjective{F, G, XT, S}(f::F, grad!::G, xs::XT, storage::S)

          Represents a composite function evaluated with stochastic gradient. f(θ, x) evaluates the loss for a single data point x and parameter θ. grad!(storage, θ, x) adds to storage the partial gradient with respect to data point x at parameter θ. xs must be an indexable iterable (Vector{Vector{Float64}} for instance). Functions using a StochasticObjective have optional keyword arguments rng, batch_size and full_evaluation controlling whether the function should be evaluated over all data points.

          Note: grad! must not reset the storage to 0 before adding to it.

          source
          FrankWolfe.compute_gradientFunction
          compute_gradient(f::ObjectiveFunction, x; [kwargs...])

          Computes the gradient of f at x. May return a reference to an internal storage.

          source
          FrankWolfe.compute_value_gradientMethod
          compute_value_gradient(f::ObjectiveFunction, x; [kwargs...])

          Computes in one call the pair (value, gradient) evaluated at x. By default, calls compute_value and compute_gradient with keywords kwargs passed down to both.

          source

          Callbacks

          Custom vertex storage

          Custom extreme point types

          For some feasible sets, the extreme points of the feasible set returned by the LMO possess a specific structure that can be represented in an efficient manner both for storage and for common operations like scaling and addition with an iterate. They are presented below:

          Utils

          FrankWolfe.DeletedVertexStorageType

          Vertex storage to store dropped vertices or find a suitable direction in lazy settings. The algorithm will look for at most return_kth suitable atoms before returning the best. See Extra-lazification with a vertex storage for usage.

          A vertex storage can be any type that implements two operations:

          1. Base.push!(storage, atom) to add an atom to the storage.

          Note that it is the storage type responsibility to ensure uniqueness of the atoms present.

          1. storage_find_argmin_vertex(storage, direction, lazy_threshold) -> (found, vertex)

          returning whether a vertex with sufficient progress was found and the vertex. It is up to the storage to remove vertices (or not) when they have been picked up.

          source
          FrankWolfe.ExpMomentumIteratorType
          ExpMomentumIterator{T}

          Iterator for the momentum used in the variant of Stochastic Frank-Wolfe. Momentum coefficients are the values of the iterator: ρ_t = 1 - num / (offset + t)^exp

          The state corresponds to the iteration count.

          Source: Stochastic Conditional Gradient Methods: From Convex Minimization to Submodular Maximization Aryan Mokhtari, Hamed Hassani, Amin Karbasi, JMLR 2020.

          source
          FrankWolfe.IncrementBatchIteratorType
          IncrementBatchIterator(starting_batch_size, max_batch_size, [increment = 1])

          Batch size starting at startingbatchsize and incrementing by increment at every iteration.

          source
          FrankWolfe.batchsize_iterateFunction
          batchsize_iterate(iter::BatchSizeIterator) -> b

          Method to implement for a batch size iterator of type BatchSizeIterator. Calling batchsize_iterate returns the next batch size and typically update the internal state of iter.

          source
          FrankWolfe.momentum_iterateFunction
          momentum_iterate(iter::MomentumIterator) -> ρ

          Method to implement for a type MomentumIterator. Returns the next momentum value ρ and updates the iterator internal state.

          source
          FrankWolfe.muladd_memory_modeMethod
          (memory_mode::MemoryEmphasis, storage, x, gamma::Real, d)

          Performs storage = x - gamma * d in-place or not depending on MemoryEmphasis

          source
          FrankWolfe.trajectory_callbackMethod
          trajectory_callback(storage)

          Callback pushing the state at each iteration to the passed storage. The state data is only the 5 first fields, usually: (t,primal,dual,dual_gap,time)

          source

          Oracle counting trackers

          The following structures are wrapping given oracles to behave similarly but additionally track the number of calls.

          Also see the example Tracking, counters and custom callbacks for Frank Wolfe.

          Update order for block-coordinate methods

          Block-coordinate methods can be run with different update orders. All update orders are subtypes of FrankWolfe.BlockCoordinateUpdateOrder. They have to implement the method FrankWolfe.select_update_indices which selects which blocks to update in what order.

          FrankWolfe.BlockCoordinateUpdateOrderType

          Update order for a block-coordinate method. A BlockCoordinateUpdateOrder must implement

          select_update_indices(::BlockCoordinateUpdateOrder, s::CallbackState, dual_gaps)
          source
          FrankWolfe.select_update_indicesFunction
          select_update_indices(::BlockCoordinateUpdateOrder, s::CallbackState, dual_gaps)

          Returns a list of lists of the block indices. Each sublist represents one round of updates in an iteration. The indices in a list show which blocks should be updated parallely in one round. For example, a full update is given by [1:l] and a blockwise update by [[i] for i=1:l], where l is the number of blocks.

          source
          FrankWolfe.CyclicUpdateType

          The cyclic update initiates a sequence of update rounds. In each round only one block is updated. The order of the blocks is determined by the given order of the LMOs.

          source
          FrankWolfe.StochasticUpdateType

          The stochastic update initiates a sequence of update rounds. In each round only one block is updated. The order of the blocks is a random.

          source
          FrankWolfe.LazyUpdateType

          The Lazy update order is discussed in "Flexible block-iterative analysis for the Frank-Wolfe algorithm," by Braun, Pokutta, & Woodstock (2024). 'lazyblock' is an index of a computationally expensive block to update; 'refreshrate' describes the frequency at which we perform a full activation; and 'blocksize' describes the number of "faster" blocks (i.e., those excluding 'lazyblock') activated (chosen uniformly at random) during each of the "faster" iterations; for more detail, see the article. If 'block_size' is unspecified, this defaults to

          Note: This methodology is currently only proven to work with 'FrankWolfe.Shortstep' linesearches and a (not-yet implemented) adaptive method; see the article for details.

          source

          Update step for block-coordinate Frank-Wolfe

          Block-coordinate Frank-Wolfe (BCFW) can run different FW algorithms on different blocks. All update steps are subtypes of FrankWolfe.UpdateStep and implement FrankWolfe.update_iterate which defines one iteration of the corresponding method.

          FrankWolfe.UpdateStepType

          Update step for block-coordinate Frank-Wolfe. These are implementations of different FW-algorithms to be used in a blockwise manner. Each update step must implement

          update_iterate(
               step::UpdateStep,
               x,
               lmo,
          @@ -12,7 +12,7 @@
               linesearch_workspace,
               memory_mode,
               epsilon,
          -)
          source
          FrankWolfe.update_iterateFunction
          update_iterate(
               step::UpdateStep,
               x,
               lmo,
          @@ -25,4 +25,4 @@
               linesearch_workspace,
               memory_mode,
               epsilon,
          -)

          Executes one iteration of the defined FrankWolfe.UpdateStep and updates the iterate x implicitly. The function returns a tuple (dual_gap, v, d, gamma, step_type):

          • dual_gap is the updated FrankWolfe gap
          • v is the used vertex
          • d is the update direction
          • gamma is the applied step-size
          • step_type is the applied step-type
          source
          FrankWolfe.BPCGStepType

          Implementation of the blended pairwise conditional gradient (BPCG) method as an update step for block-coordinate Frank-Wolfe.

          source

          Block vector

          FrankWolfe.BlockVectorType
          BlockVector{T, MT <: AbstractArray{T}, ST <: Tuple} <: AbstractVector{T}

          Represents a vector consisting of blocks. T is the element type of the vector, MT is the type of the underlying data array, and ST is the type of the tuple representing the sizes of each block. Each block can be accessed with the blocks field, and the sizes of the blocks are stored in the block_sizes field.

          source

          Index

          +)

          Executes one iteration of the defined FrankWolfe.UpdateStep and updates the iterate x implicitly. The function returns a tuple (dual_gap, v, d, gamma, step_type):

          • dual_gap is the updated FrankWolfe gap
          • v is the used vertex
          • d is the update direction
          • gamma is the applied step-size
          • step_type is the applied step-type
          source
          FrankWolfe.BPCGStepType

          Implementation of the blended pairwise conditional gradient (BPCG) method as an update step for block-coordinate Frank-Wolfe.

          source

          Block vector

          FrankWolfe.BlockVectorType
          BlockVector{T, MT <: AbstractArray{T}, ST <: Tuple} <: AbstractVector{T}

          Represents a vector consisting of blocks. T is the element type of the vector, MT is the type of the underlying data array, and ST is the type of the tuple representing the sizes of each block. Each block can be accessed with the blocks field, and the sizes of the blocks are stored in the block_sizes field.

          source

          Index

          diff --git a/previews/PR513/reference/4_linesearch/index.html b/previews/PR513/reference/4_linesearch/index.html index 1d77de7c5..ad73f80fe 100644 --- a/previews/PR513/reference/4_linesearch/index.html +++ b/previews/PR513/reference/4_linesearch/index.html @@ -1,2 +1,2 @@ -Line search and step size settings · FrankWolfe.jl

          Line search and step size settings

          The step size dictates how far one traverses along a local descent direction. More specifically, the step size $\gamma_t$ is used at each iteration to determine how much the next iterate moves towards the new vertex:

          \[x_{t+1} = x_t - \gamma_t (x_t - v_t).\]

          $\gamma_t = 1$ implies that the next iterate is exactly the vertex, a zero $\gamma_t$ implies that the iterate is not moving.

          The following are step size selection rules for Frank Wolfe algorithms. Some methodologies (e.g. FixedStep and Agnostic) depend only on the iteration number and induce series $\gamma_t$ that are independent of the problem data, while others (e.g. GoldenSearch and Adaptive) change according to local information about the function; the adaptive methods often require extra function and/or gradient computations. The typical options for convex optimization are Agnostic or Adaptive.

          All step size computation strategies are subtypes of FrankWolfe.LineSearchMethod. The key method they have to implement is FrankWolfe.perform_line_search which is called at every iteration to compute the step size gamma.

          FrankWolfe.LineSearchMethodType

          Line search method to apply once the direction is computed. A LineSearchMethod must implement

          perform_line_search(ls::LineSearchMethod, t, f, grad!, gradient, x, d, gamma_max, workspace)

          with d = x - v. It may also implement build_linesearch_workspace(x, gradient) which creates a workspace structure that is passed as last argument to perform_line_search.

          source
          FrankWolfe.perform_line_searchFunction
          perform_line_search(ls::LineSearchMethod, t, f, grad!, gradient, x, d, gamma_max, workspace)

          Returns the step size gamma for step size strategy ls.

          source
          FrankWolfe.AdaptiveType

          Modified adaptive line search test from:

          S. Pokutta "The Frank-Wolfe algorith: a short introduction" (2023), preprint, https://arxiv.org/abs/2311.05313

          It replaces the original test implemented in the AdaptiveZerothOrder line search based on:

          Pedregosa, F., Negiar, G., Askari, A., and Jaggi, M. (2020). "Linearly convergent Frank–Wolfe with backtracking line-search", Proceedings of AISTATS.

          source
          FrankWolfe.AdaptiveZerothOrderType

          Slight modification of the Adaptive Step Size strategy from Pedregosa, Negiar, Askari, Jaggi (2018)

          \[ f(x_t + \gamma_t (x_t - v_t)) - f(x_t) \leq - \alpha \gamma_t \langle \nabla f(x_t), x_t - v_t \rangle + \alpha^2 \frac{\gamma_t^2 \|x_t - v_t\|^2}{2} M ~.\]

          The parameter alpha ∈ (0,1] relaxes the original smoothness condition to mitigate issues with nummerical errors. Its default value is 0.5. The Adaptive struct keeps track of the Lipschitz constant estimate L_est. The keyword argument relaxed_smoothness allows testing with an alternative smoothness condition,

          \[ \langle \nabla f(x_t + \gamma_t (x_t - v_t) ) - \nabla f(x_t), x_t - v_t \rangle \leq \gamma_t M \|x_t - v_t\|^2 ~.\]

          This condition yields potentially smaller and more stable estimations of the Lipschitz constant while being more computationally expensive due to the additional gradient computation.

          It is also the fallback when the Lipschitz constant estimation fails due to numerical errors. perform_line_search also has a should_upgrade keyword argument on whether there should be a temporary upgrade to BigFloat for extended precision.

          source
          FrankWolfe.AgnosticType

          Computes step size: l/(l + t) at iteration t, given l > 0.

          Using l > 2 leads to faster convergence rates than l = 2 over strongly and some uniformly convex set.

          Accelerated Affine-Invariant Convergence Rates of the Frank-Wolfe Algorithm with Open-Loop Step-Sizes, Wirth, Peña, Pokutta (2023), https://arxiv.org/abs/2310.04096

          See also the paper that introduced the study of open-loop step-sizes with l > 2:

          Acceleration of Frank-Wolfe Algorithms with Open-Loop Step-Sizes, Wirth, Kerdreux, Pokutta, (2023), https://arxiv.org/abs/2205.12838

          Fixing l = -1, results in the step size gamma_t = (2 + log(t+1)) / (t + 2 + log(t+1))

          S. Pokutta "The Frank-Wolfe algorith: a short introduction" (2023), https://arxiv.org/abs/2311.05313

          source
          FrankWolfe.GeneralizedAgnosticType

          Computes step size: g(t)/(t + g(t)) at iteration t, given g: R_{>= 0} -> R_{>= 0}.

          Defaults to the best open-loop step-size gamma_t = (2 + log(t+1)) / (t + 2 + log(t+1))

          S. Pokutta "The Frank-Wolfe algorith: a short introduction" (2023), https://arxiv.org/abs/2311.05313

          This step-size is as fast as the step-size gammat = 2 / (t + 2) up to polylogarithmic factors. Further, over strongly convex and some uniformly convex sets, it is faster than any traditional step-size gammat = l / (t + l) for any l in N.

          source
          FrankWolfe.MonotonicNonConvexStepSizeType
          MonotonicNonConvexStepSize{F}

          Represents a monotonic open-loop non-convex step size. Contains a halving factor N increased at each iteration until there is primal progress gamma = 1 / sqrt(t + 1) * 2^(-N).

          source
          FrankWolfe.MonotonicStepSizeType
          MonotonicStepSize{F}

          Represents a monotonic open-loop step size. Contains a halving factor N increased at each iteration until there is primal progress gamma = 2 / (t + 2) * 2^(-N).

          source
          FrankWolfe.SecantType
          Secant(limit_num_steps, tol, domain_oracle)

          Secant line search strategy, which iteratively refines the step size using the secant method. This method is geared towards problems with self-concordant functions (but might require extra structure) and potentially faster than the backtracking line search. The order of convergence is superlinear with exponent 1.618 (Golden Ratio) but not quite quadratic. Convergence is not guaranteed in general.

          Arguments

          • limit_num_steps::Int: Maximum number of iterations for the secant method. (default 40)
          • tol::Float64: Tolerance for convergence. (default 1e-8)
          • domain_oracle::Function, returns true if the argument x is in the domain of the objective function f.

          References

          source
          FrankWolfe.ShortstepType

          Computes the 'Short step' step size: dual_gap / (L * norm(x - v)^2), where L is the Lipschitz constant of the gradient, x is the current iterate, and v is the current Frank-Wolfe vertex.

          source

          See Pedregosa, Negiar, Askari, Jaggi (2020) for the adaptive step size, Carderera, Besançon, Pokutta (2021) for the monotonic step size.

          Index

          +Line search and step size settings · FrankWolfe.jl

          Line search and step size settings

          The step size dictates how far one traverses along a local descent direction. More specifically, the step size $\gamma_t$ is used at each iteration to determine how much the next iterate moves towards the new vertex:

          \[x_{t+1} = x_t - \gamma_t (x_t - v_t).\]

          $\gamma_t = 1$ implies that the next iterate is exactly the vertex, a zero $\gamma_t$ implies that the iterate is not moving.

          The following are step size selection rules for Frank Wolfe algorithms. Some methodologies (e.g. FixedStep and Agnostic) depend only on the iteration number and induce series $\gamma_t$ that are independent of the problem data, while others (e.g. GoldenSearch and Adaptive) change according to local information about the function; the adaptive methods often require extra function and/or gradient computations. The typical options for convex optimization are Agnostic or Adaptive.

          All step size computation strategies are subtypes of FrankWolfe.LineSearchMethod. The key method they have to implement is FrankWolfe.perform_line_search which is called at every iteration to compute the step size gamma.

          FrankWolfe.LineSearchMethodType

          Line search method to apply once the direction is computed. A LineSearchMethod must implement

          perform_line_search(ls::LineSearchMethod, t, f, grad!, gradient, x, d, gamma_max, workspace)

          with d = x - v. It may also implement build_linesearch_workspace(x, gradient) which creates a workspace structure that is passed as last argument to perform_line_search.

          source
          FrankWolfe.perform_line_searchFunction
          perform_line_search(ls::LineSearchMethod, t, f, grad!, gradient, x, d, gamma_max, workspace)

          Returns the step size gamma for step size strategy ls.

          source
          FrankWolfe.AdaptiveType

          Modified adaptive line search test from:

          S. Pokutta "The Frank-Wolfe algorith: a short introduction" (2023), preprint, https://arxiv.org/abs/2311.05313

          It replaces the original test implemented in the AdaptiveZerothOrder line search based on:

          Pedregosa, F., Negiar, G., Askari, A., and Jaggi, M. (2020). "Linearly convergent Frank–Wolfe with backtracking line-search", Proceedings of AISTATS.

          source
          FrankWolfe.AdaptiveZerothOrderType

          Slight modification of the Adaptive Step Size strategy from Pedregosa, Negiar, Askari, Jaggi (2018)

          \[ f(x_t + \gamma_t (x_t - v_t)) - f(x_t) \leq - \alpha \gamma_t \langle \nabla f(x_t), x_t - v_t \rangle + \alpha^2 \frac{\gamma_t^2 \|x_t - v_t\|^2}{2} M ~.\]

          The parameter alpha ∈ (0,1] relaxes the original smoothness condition to mitigate issues with nummerical errors. Its default value is 0.5. The Adaptive struct keeps track of the Lipschitz constant estimate L_est. The keyword argument relaxed_smoothness allows testing with an alternative smoothness condition,

          \[ \langle \nabla f(x_t + \gamma_t (x_t - v_t) ) - \nabla f(x_t), x_t - v_t \rangle \leq \gamma_t M \|x_t - v_t\|^2 ~.\]

          This condition yields potentially smaller and more stable estimations of the Lipschitz constant while being more computationally expensive due to the additional gradient computation.

          It is also the fallback when the Lipschitz constant estimation fails due to numerical errors. perform_line_search also has a should_upgrade keyword argument on whether there should be a temporary upgrade to BigFloat for extended precision.

          source
          FrankWolfe.AgnosticType

          Computes step size: l/(l + t) at iteration t, given l > 0.

          Using l > 2 leads to faster convergence rates than l = 2 over strongly and some uniformly convex set.

          Accelerated Affine-Invariant Convergence Rates of the Frank-Wolfe Algorithm with Open-Loop Step-Sizes, Wirth, Peña, Pokutta (2023), https://arxiv.org/abs/2310.04096

          See also the paper that introduced the study of open-loop step-sizes with l > 2:

          Acceleration of Frank-Wolfe Algorithms with Open-Loop Step-Sizes, Wirth, Kerdreux, Pokutta, (2023), https://arxiv.org/abs/2205.12838

          Fixing l = -1, results in the step size gamma_t = (2 + log(t+1)) / (t + 2 + log(t+1))

          S. Pokutta "The Frank-Wolfe algorith: a short introduction" (2023), https://arxiv.org/abs/2311.05313

          source
          FrankWolfe.GeneralizedAgnosticType

          Computes step size: g(t)/(t + g(t)) at iteration t, given g: R_{>= 0} -> R_{>= 0}.

          Defaults to the best open-loop step-size gamma_t = (2 + log(t+1)) / (t + 2 + log(t+1))

          S. Pokutta "The Frank-Wolfe algorith: a short introduction" (2023), https://arxiv.org/abs/2311.05313

          This step-size is as fast as the step-size gammat = 2 / (t + 2) up to polylogarithmic factors. Further, over strongly convex and some uniformly convex sets, it is faster than any traditional step-size gammat = l / (t + l) for any l in N.

          source
          FrankWolfe.MonotonicNonConvexStepSizeType
          MonotonicNonConvexStepSize{F}

          Represents a monotonic open-loop non-convex step size. Contains a halving factor N increased at each iteration until there is primal progress gamma = 1 / sqrt(t + 1) * 2^(-N).

          source
          FrankWolfe.MonotonicStepSizeType
          MonotonicStepSize{F}

          Represents a monotonic open-loop step size. Contains a halving factor N increased at each iteration until there is primal progress gamma = 2 / (t + 2) * 2^(-N).

          source
          FrankWolfe.SecantType
          Secant(limit_num_steps, tol, domain_oracle)

          Secant line search strategy, which iteratively refines the step size using the secant method. This method is geared towards problems with self-concordant functions (but might require extra structure) and potentially faster than the backtracking line search. The order of convergence is superlinear with exponent 1.618 (Golden Ratio) but not quite quadratic. Convergence is not guaranteed in general.

          Arguments

          • limit_num_steps::Int: Maximum number of iterations for the secant method. (default 40)
          • tol::Float64: Tolerance for convergence. (default 1e-8)
          • domain_oracle::Function, returns true if the argument x is in the domain of the objective function f.

          References

          source
          FrankWolfe.ShortstepType

          Computes the 'Short step' step size: dual_gap / (L * norm(x - v)^2), where L is the Lipschitz constant of the gradient, x is the current iterate, and v is the current Frank-Wolfe vertex.

          source

          See Pedregosa, Negiar, Askari, Jaggi (2020) for the adaptive step size, Carderera, Besançon, Pokutta (2021) for the monotonic step size.

          Index

          diff --git a/previews/PR513/search/index.html b/previews/PR513/search/index.html index a80f215fd..aa4a6adff 100644 --- a/previews/PR513/search/index.html +++ b/previews/PR513/search/index.html @@ -1,2 +1,2 @@ -Search · FrankWolfe.jl +Search · FrankWolfe.jl