Skip to content

Latest commit

 

History

History
298 lines (219 loc) · 12.3 KB

INFRASTRUCTURE.md

File metadata and controls

298 lines (219 loc) · 12.3 KB

shields.io

This diagram shows the current changelog.com setup:

%% https://fontawesome.com/search
graph TD
    classDef link stroke:#59b287,stroke-width:3px;
    
    %% Code & assets
    subgraph GitHub
        repo{{ fab:fa-github thechangelog/changelog.com }}:::link
        click repo "https://github.com/thechangelog/changelog.com"

        cicd[/ fa:fa-circle-check GitHub Action - Ship It \]:::link
        click cicd "https://github.com/thechangelog/changelog.com/actions/workflows/ship_it.yml"
        
        automation[\ fab:fa-golang Dagger Go SDK /]:::link
        click automation "https://github.com/thechangelog/changelog.com/blob/master/magefiles/magefiles.go"

        registry(( fab:fa-github ghcr.io )):::link
        click registry "https://github.com/orgs/thechangelog/packages"

        chat(( fab:fa-slack Slack )):::link
        click chat "https://changelog.slack.com/archives/C03SA8VE2"

        repo -.-> |.github/workflows/ship_it.yml| cicd
        cicd --> |magefiles/magefiles.go| automation
        
        cicd --> |success #dev| chat
    end
    
    repo -.- |fly.io/changelog-2024-01-12| app
    
    registry --> |ghcr.io/changelog/changelog-prod| app
    runner --> |flyctl deploy| app
        
    repo -.- |fly.io/dagger-engine-2024-03-28| dagger010

    repo -.- |fly.io/pghero-2024-03-27| pghero

    %% PaaS - https://fly.io/dashboard/changelog
    subgraph Fly.io
        proxy{fa:fa-globe Proxy}
        proxy ==> |https| app

        dagger010([ fa:fa-project-diagram Dagger Engine v0.10 2024-03-28 ]):::link
        click dagger010 "https://fly.io/apps/dagger-engine-2024-03-28"
            
        app(( fab:fa-phoenix-framework App changelog-2024-01-12.fly.dev )):::link
        style app fill:#488969;
        click app "https://fly.io/apps/changelog-2024-01-12"

        pghero([ fa:fa-gem PgHero 2024-03-27 ]):::link
        click pghero "https://fly.io/apps/pghero-2024-03-27"

        grafana[ fa:fa-columns Grafana fly-metrics.net ]:::link
        click grafana "https://fly-metrics.net"
        grafana -.- |metrics| app
        grafana -.- |metrics| dagger010
        grafana -.- |metrics| pghero
    end

    app <===> |Postgres| dbrw
    pghero --> dbrw

    subgraph Neon.tech
        dbrw([ fa:fa-database main branch primary ]):::link
        click dbrw "https://console.neon.tech/app/projects/orange-sound-86604986/branches/br-wandering-smoke-78468159"

        dbro1([ fa:fa-database main branch replica ])
        dbrw -.-> |replicate| dbro1
    end

    subgraph Namespace.so
        runner([ fa:fa-person-running GitHub Runner ]):::link
        click runner "https://cloud.namespace.so/9s8hfvousnlae/ghrunners"

        automation --> |runs-on: namespace-profile-changelog| runner
        runner --> |ghcr.io/changelog/changelog-runtime| registry
        runner --> |ghcr.io/changelog/changelog-prod| registry

    end

    %% Secrets
    secrets(( fa:fa-key 1Password )):::link
    click secrets "https://changelog.1password.com/"
    secrets -.-> |secrets| app
    secrets -.-> |secrets| repo

    %% Search
    search(( fa:fa-magnifying-glass Typesense ))
    app -..-> |search| search

    %% Exceptions
    exceptions(( fa:fa-car-crash Sentry )):::link
    click exceptions "https://sentry.io/organizations/changelog-media/issues/?project=5668962"
    app -..-> |exceptions| exceptions

    %% CDN - https://manage.fastly.com/configure/services/7gKbcKSKGDyqU7IuDr43eG
    subgraph Fastly
        apex[ changelog.com ]:::link
        click apex "https://changelog.com"
        
        subgraph Ashburn
            cdn[ cdn.changelog.com ]
        end
    end

    subgraph AWS.S3
        logs[ fab:fa-aws changelog-logs ]
    end
    apex & cdn-.-> |logs| logs

    %% Observability
    observability(( fa:fa-bug Honeycomb )):::link
    click observability "https://ui.honeycomb.io/changelog/datasets/changelog_opentelemetry/home"
    app -.-> |traces| observability
    logs -.-> |logs| observability
    
    %% Object storage
    apex ==> |https| proxy
    subgraph Cloudflare.R2
        assets[ fab:fa-cloudflare changelog-assets changelog.place ]
        feeds[ fab:fa-cloudflare changelog-feeds feeds.changelog.place ]
    end
    cdn ==> |https| assets & feeds

    %% Monitoring
    subgraph BetterStack
        status[ fa:fa-layer-group status.changelog.com ]:::link
        click status "https://status.changelog.com"

        monitoring(( fa:fa-table-tennis Uptime )):::link
        click monitoring "https://uptime.betterstack.com/team/133302/monitors"
        monitoring -....-> |monitors| apex
        monitoring -.-> |monitors| cdn
        monitoring -.-> |monitors| proxy
        monitoring -.-> |monitors| status
    end
Loading

Let's dig into how all the above pieces fit together.

A three-tier monolith

TL;DR:

  • Front-end
    • Fastly
    • Fly.io Proxy
    • Cloudflare R2
  • Application
    • Elixir / Phoenix
    • Typesense search
  • Database
    • PostgreSQL (Neon.tech)

changelog.com is a monolithic Elixir application built with the Phoenix web framework. It uses PostgreSQL for persistence & Node.js to digest & compile static assets (CSS & JS).

Static assets, including all our mp3 episodes, are stored on Cloudflare R2. They are served via Fastly, specifically https://cdn.changelog.com.

Fastly (cdn.changelog.com)
↓
Cloudflare R2 (changelog.place)

The production instance of our application is running on Fly.io. All https://changelog.com requests are served via Fastly. Each Fastly request gets proxied to our application instance via the Fly.io Proxy.

Fastly (changelog.com)
↓
Fly.io Proxy
↓
Application (changelog-2024-01-12.fly.dev)

The production database - PostgreSQL - is running on Neon.tech. It is replicated setup, with one leader (RW) & one replica (RO). We are currently not using the replica, and since Neon.tech scales down to 0, this doesn't cost anything.

Application (changelog-2024-01-12.fly.dev)
↓
PostgreSQL Leader (RW)
↓
PostgreSQL Replica (RO)

Production deploys

Each commit made against our primary branch gets deployed straight into production. The "Ship It!" GitHub Actions workflow is responsible for this. From the workflow jobs perspective, it is fairly standard:

Secrets

All our secrets are stored in 1Password, in the changelog vault. We are declaring a single secret in Fly.io, OP_SERVICE_ACCOUNT_TOKEN, and then loading all other secrets into memory part of app boot via op & env.op.

In GitHub Actions secrets, we are still pasting them manually.

Note

We should use op here too.

Metrics & observability

Since our application & database are running on Fly.io, we benefit from free infrastructure metrics: https://fly-metrics.net

All logs from Fastly are streamed into Honeycomb.io. This allows us to ask unknown questions about how various HTTP clients interact with our content. It also helps us explore how Fastly interacts with Fly.io.

We also send app traces via OpenTelemetry to Honeycomb.io.

App errors - e.g. Plug.Conn.InvalidQueryError - show up in Sentry.io.

BetterStack.com monitors our public HTTPS endpoints & alerts us when they become unhealthy.

Search

We use Typesense for search. It's near-instant & it just works.

What is missing?

The above is what we have so far. While we like to keep things simple, our setup is a constant work in progress. We keep making small improvements all the time, and we talk about them every few months in the context of our Ship It! Kaizen episodes.

For example, this diagram and document were created in the context of 🎧 Kaizen 8: 24 improvements & a lot more. If you would prefer to stay in reading mode, check out GitHub discussion #433.

If anything on this page is missing, or could be clearer, please open an issue. Thank you very much!


How to create a new app instance?

  1. Start by creating a new app, e.g. flyctl apps create changelog-2024-01-12 --org changelog

  2. Copy the existing app instance config, e.g. cp -r fly.io/changelog-{2023-12-17,2024-01-12}

  3. Run all following commands in the app directory, e.g. cd fly.io/changelog-2024-01-12

  4. Update the app name in e.g. fly.toml to match the newly created app

  5. From within the app directory, set a few secrets required by the app to work correctly while testing

     flyctl secrets set --stage \
         OP_SERVICE_ACCOUNT_TOKEN="$(op read op://changelog/op/credential --account changelog.1password.com --cache)" \
         R2_FEEDS_BUCKET=changelog-feeds-dev \
         URL_HOST=changelog-2024-01-12.fly.dev
    
  6. Deploy the latest app image from https://github.com/thechangelog/changelog.com/pkgs/container/changelog-prod

     flyctl deploy --vm-size performance-4x --image <LATEST_IMAGE_SHA>
    

How to branch the production db instance?

See Enable changelog.com devs to create prod db forks with a single command.