-
Notifications
You must be signed in to change notification settings - Fork 685
Knowledge share: SecureDrop Continuous Integration
- Speed up CI: CI has been a frequent pain point in terms of flakiness, reliability, and performance
- Renew collaboration with the infra team
Current state: we are running the staging VM environment for the app and mon servers, performing a clean install (pulls in ansible code for admins), building the debian packages, and installing them in the VMs. also fetchs app-test artifacts. we are basically saying that every PR (actually, every commit!) should run it.
-
do we need to run this if the ansible code didn't change? - yes, because app armor rules need to be checked
-
how does it work? - when circleci job starts, we run the vagrant-based VM setup on google cloud to make sure the VM setup used by developers still works. (there was another reason that conor mentioned that i missed) - ignores
i18n-
branches -develop/devops/gce-nested/gce-start.sh
: script provisions the cloud vms. explicitly pinsci-nested-virt-buster-IMAGE_NUMBER
and pre-fetches the VM images so we don't waste more wall time when we runvagrant up
-
should we run this on hardware?
-
when do we need to run this? [focus on this] - we don't need to run this on every commit
-
this is a pain during the release process
-
we
rebaseontarget
to make sure it's run against the latestdevelop
branch, then run staging tests on GCE (seeci-go
script, which is a wrapper for everything:gce-start.sh
,gce-runner.sh
,gce-stop.sh
), and then destroy all oursecuredrop-ci
tagged VMs so that we can use a cron job (over in infra) to pull for VMs with this label that have been running for longer than 6 hours and destroys them so we're not charged a bunch of $ for it-
gce-runner.sh
does SSH bootstrapping, (lines 60-61) wemake build-debs-notest
andmake staging
(probably takes ~30 minutes to provision the system), then we verify the state of our provisioned VMs -
after the GCE/GCP run, there's a brief step to extract test results in a machine-readable format.
-
after test results are stoled, the environment will be torn down completely, regardless of pass or fail.
-
-
should we move away from circleci now that github has nested virtualization support?
- we should look into shaving some time off from running google cloud platform (gcp)
-
should we use circleci orbs for filtering for when a job should be run?
-
could we look at diffs and determine what should be run?
-
file and branch filtering already helps us determine what we should run. how would orbs improve this? it might be more maintainable, but perhaps not a performance improvement
-
we're still mostly interested in performance improvements so research is needed to see if it actually shaves off time with environmental setup
-
-
could someone give some background info on
circleci/
vscimg/
images?-
cimg/
images are newer circleci-maintained images, which we should be using
-
- we're losing a lot of time on container builds (in
.circleci/config.yml
) - we could also combine some of these test steps, e.g. is there a reason to run
make build-debs-notest
on circleci and gcp as well? - the debs do not have a commit hash appended - just the version.
- we could start building debs in nightlies and then ci could pull from apt-test, like we do in securedrop workstation land. orrr just build them once in a job and share them.
Current state: app-tests
is parallelized via --split-by=timings
in the .circleci/config.yml
(line 107)
- there might be a more sophisticated way to parallelize
- this is the only place in ci where we use the
parallelism
tag (is this correct?) - lint is taking an unexpectedly long time, probably because of environmental setup
-
parallelism: 20
for translation-tests - we need to bump this up each time we add a new language - if a branch is prefixed withi18n-
then this ci job is run - there is adevops/scripts
script somewhere that determines if the translation-tests is run - instead, you can use circleci, which could save ~5 minutes to figure out if this should be run