Compass optimizer

A modification of original ADAMW optimizer by replacing momentum moment with smoothing filter.

weight decay and centralization helps training stability but not required and can be ommited.

Initialize:
θ₀ ∈ ℝᵈ (initial parameters)
m₀ ← 0 (initialize 1st moment vector)
v₀ ← 0 (initialize 2nd moment vector)
t ← 0 (initialize timestep)

Hyperparameters:
α (learning rate)
γ (smoothing factor)
λ (optional! weight decay)
τ (optional! centralization)
β₁, β₂ ∈ [0,1) (exponential decay rates for the moment estimates)
ε (small constant to prevent division by zero)

Repeat until convergence:
t ← t + 1
gₜ ← ∇θ fₜ(θₜ₋₁) (compute gradients of the stochastic objective at timestep t)
gₜ ← gₜ - μgₜt (optional! scale or remove gradients mean completely)
mₜ ← β₁ mₜ₋₁ + (1 - β₁) gₜ (update biased first moment estimate)
ĝₜ ← gₜ + mₜγ (smooth out gradients)
vₜ ← β₂ vₜ₋₁ + (1 - β₂) ĝₜ² (update biased second moment estimate)
m̂ₜ ← mₜ / (1 - β₁ᵗ) (compute bias-corrected first moment estimate)
v̂ₜ ← vₜ / (1 - β₂ᵗ) (compute bias-corrected second moment estimate)
Θₜ₋₁ ← θₜ₋₁(1 - αλ) (optional! compute decoupled weight decay)
θₜ ← Θₜ₋₁ - αĝₜ / (m̂ₜsqrt(v̂ₜ) + ε) (update parameters)

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
experimental		experimental
.gitignore		.gitignore
compass.py		compass.py
equation.tex		equation.tex
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Compass optimizer

About

Releases

Packages

Languages

lodestone-rock/compass_optimizer

Folders and files

Latest commit

History

Repository files navigation

Compass optimizer

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages