Knowledge share: SecureDrop Continuous Integration

Knowledge Share, 2021-06-28

Goals

Speed up CI: CI has been a frequent pain point in terms of flakiness, reliability, and performance
Renew collaboration with the infra team

`staging-test-with-rebase` (59 mins, our longest running ci job)

Current state: we are running the staging VM environment for the app and mon servers, performing a clean install (pulls in ansible code for admins), building the debian packages, and installing them in the VMs. also fetchs app-test artifacts. we are basically saying that every PR (actually, every commit!) should run it.

do we need to run this if the ansible code didn't change? - yes, because app armor rules need to be checked
how does it work? - when circleci job starts, we run the vagrant-based VM setup on google cloud to make sure the VM setup used by developers still works. (there was another reason that conor mentioned that i missed) - ignores i18n- branches - develop/devops/gce-nested/gce-start.sh: script provisions the cloud vms. explicitly pins ci-nested-virt-buster-IMAGE_NUMBER and pre-fetches the VM images so we don't waste more wall time when we run vagrant up
should we run this on hardware?
when do we need to run this? [focus on this] - we don't need to run this on every commit
this is a pain during the release process

More on how this works

we rebaseontarget to make sure it's run against the latest develop branch, then run staging tests on GCE (see ci-go script, which is a wrapper for everything: gce-start.sh, gce-runner.sh, gce-stop.sh), and then destroy all our securedrop-ci tagged VMs so that we can use a cron job (over in infra) to pull for VMs with this label that have been running for longer than 6 hours and destroys them so we're not charged a bunch of $ for it
- gce-runner.sh does SSH bootstrapping, (lines 60-61) we make build-debs-notest and make staging (probably takes ~30 minutes to provision the system), then we verify the state of our provisioned VMs
- after the GCE/GCP run, there's a brief step to extract test results in a machine-readable format.
- after test results are stoled, the environment will be torn down completely, regardless of pass or fail.

Questions

should we move away from circleci now that github has nested virtualization support?
- we should look into shaving some time off from running google cloud platform (gcp)
should we use circleci orbs for filtering for when a job should be run?
- could we look at diffs and determine what should be run?
- file and branch filtering already helps us determine what we should run. how would orbs improve this? it might be more maintainable, but perhaps not a performance improvement
- we're still mostly interested in performance improvements so research is needed to see if it actually shaves off time with environmental setup
could someone give some background info on circleci/ vs cimg/ images?
- cimg/ images are newer circleci-maintained images, which we should be using

One area for infra to dig into?

we're losing a lot of time on container builds (in .circleci/config.yml)
we could also combine some of these test steps, e.g. is there a reason to run make build-debs-notest on circleci and gcp as well?
the debs do not have a commit hash appended - just the version.
- we could start building debs in nightlies and then ci could pull from apt-test, like we do in securedrop workstation land. orrr just build them once in a job and share them.

`admin-tests` (8 mins) and `app-tests` (40 mins, 3 parallel runs)

Current state: app-tests is parallelized via --split-by=timings in the .circleci/config.yml (line 107)

there might be a more sophisticated way to parallelize
this is the only place in ci where we use the parallelism tag (is this correct?)
lint is taking an unexpectedly long time, probably because of environmental setup
parallelism: 20 for translation-tests - we need to bump this up each time we add a new language - if a branch is prefixed with i18n- then this ci job is run - there is a devops/scripts script somewhere that determines if the translation-tests is run - instead, you can use circleci, which could save ~5 minutes to figure out if this should be run

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Knowledge share: SecureDrop Continuous Integration

Knowledge Share, 2021-06-28

Goals

`staging-test-with-rebase` (59 mins, our longest running ci job)

More on how this works

Questions

One area for infra to dig into?

`admin-tests` (8 mins) and `app-tests` (40 mins, 3 parallel runs)

Clone this wiki locally

Knowledge share: SecureDrop Continuous Integration

Knowledge Share, 2021-06-28

Goals

staging-test-with-rebase (59 mins, our longest running ci job)

More on how this works

Questions

One area for infra to dig into?

admin-tests (8 mins) and app-tests (40 mins, 3 parallel runs)

Clone this wiki locally

`staging-test-with-rebase` (59 mins, our longest running ci job)

`admin-tests` (8 mins) and `app-tests` (40 mins, 3 parallel runs)