Upgrade-Manager

An autodiscovery tool to help you know what and when to update.

Grafana Dashboard displaying upgrade-manager's metrics, pointing out which apps need to be upgrading and their associated versions

Key Features:

Service discovery: finds softwares deployed (see the supported software sources)
Obsolescence Score SLI (Service Level Indicator) Calculation: compute each software’s obsolescence score and expose it as a prometheus metric ((see calculators)
New eligible release detection: automatically find the version corresponding to a tailor-made selection logic

Managing IT system obsolescence is like being in a boat full of holes

Let's pretend your IT systems are a boat:

Each software is a new hole in the boat.
Each hole gets wider and wider as new software versions are released.
In such a situation, you need to decide:
- When to patch a hole (meaning updating softwares) in order for the boat not to sink.
  - It means you need to define the minimum size (diameter) of a hole in order for the crew to consider patching it
- When to row (do projects delivering business value) in order for the boat to make progress.
- Which hole to patch first, and it probably makes sense to patch the widest.
  - It means you need a simple way to quickly get the list of holes that are the widest.

How upgrade-manager responds to this issue

In real life, a software's obsolescence (how big the hole is) is hard to measure, since we don't have a clear metric to quantify it.

This is why upgrade-manager computes an obsolescence score, exposing it as a Prometheus metric:

It represents how obsolete a system is, based on different calculation strategies (see calculators). It is essentially an indicator to measure the obsolescence at an software-level
It can be summed / averaged across all apps to have a company-wide freshness/obsolescence metric.

Installation

You can use the default Helm chart to deploy upgrade-manager as follows:

helm repo add qonto oci://public.ecr.aws/qonto
helm install upgrade-manager qonto/upgrade-manager-chart \
  -n upgrade-manager --create-namespace

Configuration

upgrade-manager uses yaml config file (the default location is /app/config/config.yaml).

For more information about specific sources' configuration block, see sources

global:
  interval: 10m # How often upgrade-manager should run the main loop (discovering all softwares, their new versions and compute scores)
  aws:
    region: us-east-1 # AWS region to look for resources in
sources:
  deployments:
    - <deploymentConfig>
  argocdHelm:
    - <argoCDHelmConfig>
 filesystemHelm:
    - <filesystemHelmConfig>
  aws:
    eks: <eksConfig>
    rds: <rdsConfig>
    msk: <mskConfig>
    elasticache: <elasticacheConfig>
    lambda: <lambdaConfig>
http:
  host: 0.0.0.0 # local server address
  port: 10000 # local port to listen on
  write-timeout: 10
  read-timeout: 10
  read-header-timeout: 10

Required AWS Privileges when running AWS sources

To automatically discover AWS resources and their newer versions, upgrade-manager needs the following AWS privileges (represented in Terraform HCL):

data "aws_iam_policy_document" "upgrade-manager" {
  statement {
    sid = "elasticache"

    actions = [
      "elasticache:DescribeCacheClusters",
      "elasticache:DescribeCacheEngineVersions",
    ]

    resources = [
      "*",
    ]
  }
  statement {
    sid = "ecr"

    actions = [
      "ecr:ListImages",
      "ecr:GetAuthorizationToken",
      "ecr:BatchGetImage",
      "ecr:GetDownloadUrlForLayer",
    ]

    resources = [
      "*",
    ]
  }
  statement {
    sid = "eks"

    actions = [
      "eks:DescribeAddon",
      "eks:DescribeAddonVersions",
      "eks:ListClusters",
      "eks:DescribeCluster",
      "eks:ListAddons",
    ]

    resources = [
      "*",
    ]
  }
  statement {
    sid = "rds"

    actions = [
      "rds:DescribeDBInstances",
      "rds:DescribeDBEngineVersions",
    ]

    resources = [
      "*",
    ]
  }
  statement {
    sid = "lambda"

    actions = [
      "lambda:ListFunctions",
    ]

    resources = [
      "*",
    ]
  }
  statement {
    sid = "kafka"

    actions = [
      "kafka:ListClustersV2",
      "kafka:GetCompatibleKafkaVersions",
    ]

    resources = [
      "*",
    ]
  }
}

Problem Statement: why use upgrade-manager?

As we scale, we deploy more and more softwares. We end up with hundreds of softwares to maintain across many different platforms.

Each of these softwares need to be updated regularly to benefit from security fixes, new features etc.

Dashboard

To visualize metrics, you can use the dashboard available in the dashboard/ directory here

Alerting Patterns: deciding when to update softwares

As SREs, we like to define SLOs/SLAs based on SLIs. Among other things, it helps us take actions based on specific thresholds. Using our obsolescence score metric, we can now define thresholds to help us decide when we need to act and update softwares.

Each company has its own way to define alerting rules, but a sane default could be:

The obsolescence score for each individual software should be < 100

- alert: ObsolecenceScoreTooHigh
  expr: upgrade_manager_software_obsolescence_score{isparent="1"} > 99
  for: 5m
  labels:
    team: foo
  annotations:
    summary: "The software {{ $labels.app }} is obsolete (>99) and needs to be updated"
    runbook_url: https://letmegooglethat.com/?q=how+to+update+softwares

The average obsolescence score for all apps should be < 80

- alert: AverageObsolecenceScoreTooHigh
  expr: avg(upgrade_manager_software_obsolescence_score{isparent="1"}) < 80
  for: 5m
  labels:
    team: foo
  annotations:
    summary: "The average obsolescence score across all softwares is too high, softwares need to be updated" 
    runbook_url: https://letmegooglethat.com/?q=how+to+update+softwares

When an alert is triggered, an engineer should acknowledge it and upgrade the applications with the highest obsolescence scores.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github/workflows		.github/workflows
chart		chart
cmd		cmd
config		config
dashboards		dashboards
doc		doc
internal		internal
scripts		scripts
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.goreleaser.yaml		.goreleaser.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Upgrade-Manager

Managing IT system obsolescence is like being in a boat full of holes

How upgrade-manager responds to this issue

Installation

Configuration

Required AWS Privileges when running AWS sources

Problem Statement: why use upgrade-manager?

Dashboard

Alerting Patterns: deciding when to update softwares

About

Releases 12

Packages

Contributors 2

Languages

License

qonto/upgrade-manager

Folders and files

Latest commit

History

Repository files navigation

Upgrade-Manager

Managing IT system obsolescence is like being in a boat full of holes

How upgrade-manager responds to this issue

Installation

Configuration

Required AWS Privileges when running AWS sources

Problem Statement: why use upgrade-manager?

Dashboard

Alerting Patterns: deciding when to update softwares

About

Resources

License

Stars

Watchers

Forks

Releases 12

Packages 0

Contributors 2

Languages

Packages