Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an alternative standalone, container-based dev environment (and fix a few things exposed) #5565

Merged
merged 4 commits into from
Jan 19, 2024

Conversation

AdamWill
Copy link
Contributor

@AdamWill AdamWill commented Dec 15, 2023

The current development environment requires four separate VMs (three from tinystage and one for Bodhi itself), plus some containers running inside the Bodhi VM. It's pretty heavy!

This provides an alternative development environment that is standalone and entirely based on containers: a postgres
container, a waiverdb container, a greenwave container, a rabbitmq container, an ipsilon container, and a bodhi container that functions similarly to the bodhi VM from the existing environment. There is no FreeIPA or FAS backing the ipsilon instance, we just use ipsilon's testauth mode, with some configuration to allow for testing different group memberships.

The containers are Podman containers orchestrated by an ansible playbook, around which the bcd command is a light wrapper. They share a network namespace via a pod, and everything is accessed from the host as localhost, without SSL or Apache (these could be added if we really want, but it seemed unnecessary).

I suspect there are actually some issues in the current dev environment which I fixed in passing for this one: I don't think the way the docker-compose definition tries to launch the waiverdb and greenwave containers is valid any more, so I suspect they don't work.

I have tested this environment from a clean Fedora 39 host, it comes up in four minutes and works. Usage details are in the doc file.

@AdamWill AdamWill requested a review from a team as a code owner December 15, 2023 01:45
@AdamWill
Copy link
Contributor Author

AdamWill commented Dec 15, 2023

This removes the docs for a virtualenv-based dev env because they are wildly outdated and I'm pretty sure you just can't really get a usable dev env just using a virtualenv any more, I did try for a bit and failed.

Outdated stuff about the initial implementation Some limitations and caveats about this environment are documented in the vagrant.rst file. To go into more detail:

In order for the containers to network with each other we have to expose their ports to the host, at least, that's the only way I could get it to work. Theoretically we could put them all in a pod, but vagrant's docker provider plugin does not support pods (not surprisingly as it's meant to be for docker, not podman). I tried kinda hacking it up using a pre-up trigger to create the pod and --pod extra args when running the containers, but it doesn't work because the docker provider seems to be hardwired to try and map port 22 on each container when running it, and when using a pod you can't map ports at the container level, it has to be done at the pod level, so it just fails. Possibly I could make this work using custom networking configuration, but I have no experience doing that with vagrant and just didn't want to spend another day on it.

I did try using an ipsilon container to provide a more sophisticated auth experience (based on the one the CI environment uses), but couldn't get that to fly either, problems with URLs redirecting in unexpected ways...again it's probably fixable, but I decided I'd probably spent too much time on this already.

Because of the simple auth, anything authenticated doesn't work through the CLI client (which is hardwired to expect OIDC auth, it seems). Because this is, itself, a container-based environment, you can't run more containers in it, so you can't run the CI environment in it (so I disabled bci), and bstartdeps etc. are irrelevant (so I disabled those too).

I tried to name the control variables for the ansible play 'functionally', but this does mean they imply possibilities I just haven't tested (like using the basic auth stuff in a VM-based environment) or I know don't currently work (like using FreeIPA auth in the container environment; we probably could make that work, but it doesn't seem super useful, as you still need the three tiny-stage VMs that way). The only configs that are really 'supported' are the ones expressed in the two Vagrantfiles.

I didn't bother setting up an httpd layer on top of the Bodhi instance and making it accessible as 'https://bodhi-dev.example.com' because it just seems kinda not very useful. It could only have a self-signed certificate since there's no FreeIPA issuer, so you'd just have to always trust that certificate, and...I just couldn't really see what value that would add. Again we could probably add that if someone wanted it, I just don't want to spend the time.

This PR includes a fix for a subtle bug that this dev env exposed, because it has to set base_address to http://localhost.localdomain:6543 in the dev env config, which is different from the value in the testing.ini that the test environment is supposed to use. Turns out the test process will actually use the values from the dev env config during early setup, then switch to using the values from testing.ini when the setup step for the first test runs. Full details are in the very long commit message for the fairly small change...

@AdamWill
Copy link
Contributor Author

As for the "localhost" mystery, it seems...quite deep. Now I look, there are several places in the tests that rely on this rather nonsensical behaviour that, when things in ffmarkdown.py use request = pyramid.threadlocal.get_current_request() and then request.route_url, they wind up producing a URL which starts http://localhost/ , even though base_address is not http://localhost in any config we use.

At a quick glance it seems like this is because, in the request we get from pyramid.threadlocal.get_current_request(), the host for some reason is 'localhost:80'. Still, that just begs the question, why is that the case? I've no idea.

I do note that https://docs.pylonsproject.org/projects/pyramid/en/latest/api/threadlocal.html says "This function should be used extremely sparingly, usually only in unit testing code. It's almost always usually a mistake to use get_current_request outside a testing context", but we're using it in app code. I guess the reason is we want to avoid having to get an app instance into this markdown code somehow, but still.

@AdamWill
Copy link
Contributor Author

Update: so...I think it might be because we're using WebTest to set up our test app, and that's just sort of how WebTest works? In real use the request will I guess have come from Apache or something and should have the 'real' public host and port for the instance. When you use webtest and just ask for a relative URL like this, it seems like the host and port will just always be localhost and 80, as best I can tell...

@AdamWill AdamWill force-pushed the container-dev-env branch 3 times, most recently from 0307f73 to cfe6e53 Compare December 15, 2023 17:03
@AdamWill AdamWill marked this pull request as draft December 15, 2023 19:29
@AdamWill

This comment was marked as outdated.

@AdamWill
Copy link
Contributor Author

Oooh, okay, now it's much better! I hope, anyway. Worked out the networking, I think.

@AdamWill AdamWill marked this pull request as ready for review December 15, 2023 20:57
@AdamWill
Copy link
Contributor Author

AdamWill commented Dec 16, 2023

time vagrant up with this environment, after podman system reset to get a completely clean slate:

real    5m9.091s
user    1m39.174s
sys     0m19.905s

free -h with this env running: 31Gi total, 2.6Gi used, 20Gi free, 272Mi shared, 8.4Gi buff/cache, 27Gi available

edit note: those stats were from the vagrant container version. With the current ansible container version time to start from scratch is around 4 minutes, memory usage is around 3.1Gi.

By comparison, time vagrant up for tiny-stage:

real    29m19.333s
user    1m20.371s
sys     0m28.149s

and time vagrant up for the VM-based bodhi env after tiny-stage is up:

real    9m45.181s
user    0m28.031s
sys    0m9.601s

So getting a full VM-based env from scratch takes about 39 minutes. free -h with the VM env: 31Gi total, 9.8Gi used, 8.0Gi free, 1.6Mi shared, 13Gi buff/cache, 21Gi available. So that's about a 6-7G difference.

edit: and note after just vagrant up in the VM env you don't have greenwave or waiverdb, you have to vagrant ssh in and run bstartdeps to get them. That takes a few more minutes (and, currently, fails - I don't think my PR broke it, either). And the postgres container doesn't work.

@AdamWill

This comment was marked as outdated.

@mattiaverga
Copy link
Contributor

Yeah, I was aware about vagrant became not FOSS, so it will have to be dropped from Fedora. Also, the fork doesn't seems to have gained much momentum.
My wishes are to get rid of everything vagrant based and just use podman and containers to set up a devel environment, but I haven't looked at it because I'm a newbie about podman and containers, so I'll need to document myself. Moreover, since this is going to be a problem for several Fedora apps, maybe it could be a task for CPE to migrate all apps from using vagrant to podman based dev envs... in a way that they could nicely interact themselves (tinystage, waiverdb, greenwave, bodhi, etc).

@AdamWill

This comment was marked as outdated.

@AdamWill

This comment was marked as outdated.

@AdamWill
Copy link
Contributor Author

AdamWill commented Dec 20, 2023

more update: okay, now I implemented it again in ansible-driven podman. kinda 50:50 on whether I like that or podman-compose more, but ansible is likely to be better supported for longer.

Outdated idea about container build approach I'm now fiddling with optimizing the actual dev container generation along the lines of [how greenwave does it](https://github.com/release-engineering/greenwave/blob/master/Dockerfile), using a 'builder' environment to produce a more stripped down final container. Tomorrow I'm gonna try a setup where I have the 'builder' environment produce a kind of 'core' image that just has the bodhi runtime deps in it (but no actual bodhi files), then produce two containers from that, a 'prod' container which has a specific version of the bodhi code added in, and a 'dev' container which has no bodhi code - you are expected to provide it as a volume, as that's what makes sense for a dev env, so you can live modify it - but has all the additional stuff we want in a dev environment (debugging tools and all the rest of it). potentially we could publish both of those to a registry and then the dev env ansible playbook would just have to run the dev container. right now it builds it on the fly.

I also want to take another shot at adding an ipsilon container, now I've had a bit more go-round with networking issues and CORS errors and stuff...

moar update: okay, quite happy with the ansible approach now, but it's not PR ready yet as my working tree is in a bit of a messy state, I still need to add some sugar (some kinda wrapper around the ansible commands), and I'm taking another shot at getting an ipsilon container working.

@AdamWill
Copy link
Contributor Author

AdamWill commented Dec 22, 2023

wew! I finally got to the point of pushing the ansible-based version, now called 'bcd'. It's possible the CI may fail on this, because the bcd environment shares its ipsilon container definition and messed with it a bit, but I wanted to push it so I can check it out on a clean machine and verify it works from scratch (and time it).

@AdamWill AdamWill force-pushed the container-dev-env branch 3 times, most recently from 1463fdf to 2ac01dd Compare December 22, 2023 23:37
@AdamWill
Copy link
Contributor Author

AdamWill commented Dec 23, 2023

some notes, while I can still think of them...

  • There is some duplication and overlap with the ansible plays for the Vagrant environment. I could clean this up somewhat, but I don't intend to because I'm hoping people will agree this is good enough to throw the Vagrant environment away entirely. If folks would rather keep it, I can do some spring cleaning, if not, I'll send a follow-up PR to just delete all the Vagrant stuff.
  • I ultimately gave up on the idea of combining the CI and dev env container definitions for now; they're just a bit too necessarily different for it to really work.
  • The Ipsilon stuff relies on these patches to Ipsilon, hence the side repo for it. They make the OpenIDC provider respect 'secure=no', provide a way to specify a non-standard listen port during server installation, and add a mechanism to customize reported group membership when logging in with Ipsilon's testauth authentication.
  • I reworked how the dev env itself is implemented somewhat, compared to the basis in the Vagrant environment. The way that works currently is a bit odd. We install all Bodhi's dependencies...then we set up a virtualenv with system packages and run poetry install in it. What that does is update all the dependencies to newer versions that are listed in the poetry lock file. So you waste a lot of time installing things twice, and you wind up testing in a somewhat odd environment. Bodhi's real-world deployments use containers based on distribution packages, so to me, it seems most appropriate for the dev env to also use distribution packages. So my version throws away the virtualenv (which wasn't doing us any good, really), and uses poetry install --only-root to only 'install' (in fact this just provides a pointer, really) the bodhi packages themselves, it tells poetry not to install/update all the dependencies. This makes setup (container build, here) faster, makes the container smaller, and means we're testing an environment that's closer to how Bodhi is ultimately deployed. I also decided that effectively duplicating Bodhi's dependency list in the dev env definition (here, the Containerfile) is silly, so instead, I install the official Bodhi packages to get all their dependencies, then remove them again (all in one step in the Containerfile so there's no intermediate layer with the packages present). That way the Containerfile only has to specify additional dependencies, like test deps and things we want in the container for debugging.

There's probably more, I'll think about it.

@AdamWill
Copy link
Contributor Author

AdamWill commented Dec 23, 2023

I've just noticed there's an interesting race with group memberships. The first time you log in as a user who is in the database dump, you get their "real" group memberships applied to any actions you perform. If you then log out and back in as the same user, you get the fake group memberships from ipsilon testauth applied to any actions you perform. I suspect this might be a real bug in Bodhi - it might be the case that, if your groups change in FAS, the first time you log into Bodhi afterwards, the changes might not be reflected. I think perhaps I'll look into it more later.

edit: oh, but I forgot I buried the lede - I fixed authed CLI usage! whee. Turns out things go sideways with cookies if the server URL doesn't use an FQDN, so I changed it to localhost.localdomain.

@AdamWill

This comment was marked as outdated.

@AdamWill
Copy link
Contributor Author

Oh, and I sent the Ipsilon patches: https://pagure.io/ipsilon/pull-request/400

@AdamWill AdamWill force-pushed the container-dev-env branch 3 times, most recently from 458c007 to ed13b39 Compare December 27, 2023 19:28
@AdamWill
Copy link
Contributor Author

Tweaked bcd to have the common vagrant commands (up, halt, destroy) as aliases.

@mattiaverga
Copy link
Contributor

I've finally had time to look at this. Maybe something has changed recently, because I cannot get bodhi to work properly in the container.
After running ./bcd run trying to open http://localhost.localdomain:6543/ fails. Logs in bodhi container show that an error prevents bodhi from starting.
Also, trying to run pytest within the bodhi container fails:

# pytest -x -v --no-cov --disable-warnings bodhi-messages/
================================================================================= test session starts =================================================================================
platform linux -- Python 3.10.0, pytest-6.2.4, py-1.11.0, pluggy-0.13.1 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /bodhi, configfile: pyproject.toml
plugins: cov-3.0.0, mock-3.6.1
collected 0 items / 1 error                                                                                                                                                           

======================================================================================= ERRORS ========================================================================================
_________________________________________________________________ ERROR collecting bodhi-messages/tests/test_base.py __________________________________________________________________
ImportError while importing test module '/bodhi/bodhi-messages/tests/test_base.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib64/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
bodhi-messages/tests/test_base.py:24: in <module>
    from bodhi.messages.schemas import base
bodhi-messages/bodhi/messages/__init__.py:20: in <module>
    METADATA = importlib.metadata.metadata('bodhi-messages')
/usr/lib64/python3.10/importlib/metadata/__init__.py:936: in metadata
    return Distribution.from_name(distribution_name).metadata
/usr/lib64/python3.10/importlib/metadata/__init__.py:518: in from_name
    raise PackageNotFoundError(name)
E   importlib.metadata.PackageNotFoundError: No package metadata was found for bodhi-messages
=============================================================================== short test summary info ===============================================================================
ERROR bodhi-messages/tests/test_base.py
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
================================================================================== 1 error in 0.04s ===================================================================================

@mattiaverga
Copy link
Contributor

Oh, wait... why the bodhi container has a f39 kernel but uses dnf repo from f35?.... 🙃

@mattiaverga
Copy link
Contributor

I tried to modify the Bodhi Containerfile FROM fedora:latest to FROM quay.io/fedora/fedora:latest, however I can't manage to recreate the container... I've run ./bcd remove and ./bcd destroy multiple times, but ./bcd run doesn't recreate the container.

@AdamWill
Copy link
Contributor Author

AdamWill commented Jan 17, 2024

huh. I did test this on a completely fresh system with no registry customizations, and it didn't do that. I'll test it again today. On my system I have a /etc/containers/registries.conf.d/000-shortnames.conf from containers-common package which specifies "fedora" = "registry.fedoraproject.org/fedora", so that seems like it should be 'normal' on Fedora, but maybe something overrides it if you have docker installed, or something? I guess it would be safer indeed to explicitly specify quay.io or registry.fp.o, I will change that. edit: changed

to force a container regen you have to edit the ansible play that builds the container image to have "force: true" (or just go ahead and do it directly with podman commands). I didn't really provide an interface to this as it seemed unlikely to be needed in 'normal' use, but I guess I could (maybe a command to remove images, forcing them to be rebuilt on the next run).

Testing the container-based Bodhi development environment I'm
working on exposed a subtle behavior issue in this test/code.
The tests have a global fixture - `mock_settings_env_vars` -
which sets an environment variable, and a setup method -
`BaseTestCaseMixin._setup_method` - which updates
`bodhi.server.config.config` (which is a mutable singleton)
from the file that env var points to (`tests/testing.ini`).
Before the first time the setup method is run, when running
the tests in a development environment,
`bodhi.server.config.config` will contain the values from
`/home/vagrant/development.ini`, the configuration intended
for the development deployment, and so any code that's run at
this point - including code at the global level in modules
that get imported before any test case is run, I believe - uses
those values, not the ones from testing.ini.

You can test this by ensuring the value is different in
the two config files, then adding a line
`raise ValueError(config['base_address'])` first at the top of
`ffmarkdown.py` - where `UPDATE_RE` was defined, before this
commit - and running the test, then moving the line into the
`extendMarkdown` method - where this commit moves it to - and
running the test. When we check the value at the top of the
file, it will be the one from `/home/vagrant/development.ini`;
when we check the value from inside
`extendMarkdown`, it is the value from `testing.ini`. I think
this is because code at the global level gets executed
immediately on the first import of the module, which happens
before any test case has started to run and so before the
setup method has changed the config settings; but
`extendMarkdown` will only get called *after* the test case
has started running and the setup method has been called.

To confuse matters, the test case isn't super explicit about what
it's really doing, because it just hardcodes the base_address
value from testing.ini into the URL. Changing the test case
to show that it's using `config['base_address']` makes it
clearer, and moving the update regex definition inside
`extendMarkdown` ensures the test and the app code agree on what
that value actually is. The regex is not used anywhere else.

Alternatively we could leave the regex definition at the global
level and have the test code read base_address from
`base.original_config`, which is a copy of the 'original' config
that `base.py` takes before doing anything to it (so probably
somebody ran into this before). But I think I like this way.

I noticed this issue because I had to change the base_address
value in `/home/vagrant/development.ini` in my container-based
environment. In the VM-based environment the base_address value
in both development.ini and testing.ini is the same, so the
issue is hidden.

Signed-off-by: Adam Williamson <[email protected]>
This makes things behave more intuitively in an interactive
development environment which uses dummy ACLs (it's weird if
you're logged in as bob and you can't edit an update owned by
bob).

Signed-off-by: Adam Williamson <[email protected]>
I think we can stop caring about python2 now, and this means we
don't have to install python-unversioned-command any time we want
to use this.

Signed-off-by: Adam Williamson <[email protected]>
The current development environment requires four separate VMs
(three from tinystage and one for Bodhi itself), plus some
containers running inside the Bodhi VM. It's pretty heavy!

This provides an alternative development environment that is
standalone and entirely based on containers: a postgres
container, a waiverdb container, a greenwave container, a
rabbitmq container, an ipsilon container, and a bodhi container
that functions similarly to the bodhi VM from the existing
environment. There is no FreeIPA or FAS backing the ipsilon
instance, we just use ipsilon's testauth mode, with some
configuration to allow for testing different group memberships.

The containers are Podman containers orchestrated by an ansible
playbook, around which the `bcd` command is a light wrapper.
They share a network namespace via a pod, and everything is
accessed from the host as `localhost`, without SSL or Apache
(these could be added if we really want, but it seemed
unnecessary).

Signed-off-by: Adam Williamson <[email protected]>
@mattiaverga
Copy link
Contributor

yep, podman image rm ... + FROM quay.io/fedora/fedora:latest did the job. Now it seems to work.

Copy link
Contributor

@mattiaverga mattiaverga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works great!

@mergify mergify bot merged commit 9b9cf54 into fedora-infra:develop Jan 19, 2024
31 checks passed
@AdamWill
Copy link
Contributor Author

Awesome, thanks!

Now this is merged I'd like to get rid of the Ipsilon side build, so I've poked the upstream PR and if I don't hear back on it soon I'm just going to go ahead and backport it to the Fedora packages.

What do you think about removing the Vagrant env as a follow-up to this? It would clean things up quite a bit, but if you still think the Vagrant env may be useful I won't send a PR for it.

@mattiaverga
Copy link
Contributor

I think we can remove the Vagrant dev env and just focus on maintaining the podman based, tweaking it if any issues arise.

@mattiaverga
Copy link
Contributor

BTW, it would be nice to have containers to use local URLs instead of localhost:port, like we used with vagrant/tinystage. For example bodhi.bodhi-dev.local, ipsilon.bodhi-dev.local, etc... not sure if that's possible with podman, though.

@AdamWill
Copy link
Contributor Author

so that only really worked with vagrant because it hacked up the host's /etc/hosts , which I honestly did not love, personally. It was a handy feature in a way, but it was also why you had to babysit vagrant up for the half hour it took to run so you could enter your root password occasionally so it could mess with your /etc/hosts. (this is what Vagrant calls the "hostmanager" feature).

we would have to reimplement...something...to do that, if we wanted the feature, which personally as I said I'm not a big fan of. I don't know if something like that exists in a way that's compatible with this ansible+podman approach or if we'd have to invent it. we could of course do it in the ansible plays, but then that play would have to run as root, making it interactive in the way I kinda hated with vagrant, I guess.

I think the networking would also have to work differently in that case. I don't think TCP/IP networking between the host and the containers actually works at all, the way it's currently implemented. I didn't go super deep into podman networking, just came up with something simple that worked and went with it. It was a goal for me for this to work rootless, which changes the options when it comes to networking, which is also something to keep in mind.

@AdamWill
Copy link
Contributor Author

Ipsilon changes got merged, so F38 update and F39 update are pending. Once those go stable we can drop the custom Ipsilon build from this.

AdamWill added a commit to AdamWill/bodhi that referenced this pull request Feb 10, 2024
This removes all traces (I hope) of the Vagrant development
environment, as discussed in
fedora-infra#5565 (comment) .
I preserved the VS Code docs in a very half-assed way as it
seemed nicer than just deleting them, but I hope somebody who
actually uses VS Code can test it out and improve those.

Signed-off-by: Adam Williamson <[email protected]>
AdamWill added a commit to AdamWill/bodhi that referenced this pull request Feb 10, 2024
This removes all traces (I hope) of the Vagrant development
environment, as discussed in
fedora-infra#5565 (comment) .
I preserved the VS Code docs in a very half-assed way as it
seemed nicer than just deleting them, but I hope somebody who
actually uses VS Code can test it out and improve those.

Signed-off-by: Adam Williamson <[email protected]>
mergify bot pushed a commit that referenced this pull request Feb 10, 2024
This removes all traces (I hope) of the Vagrant development
environment, as discussed in
#5565 (comment) .
I preserved the VS Code docs in a very half-assed way as it
seemed nicer than just deleting them, but I hope somebody who
actually uses VS Code can test it out and improve those.

Signed-off-by: Adam Williamson <[email protected]>
@AdamWill
Copy link
Contributor Author

Hrmm. I've just noticed what seems like some odd behaviour with this and, I think, Podman 5.

I can access the dev server via http://localhost:6543 in Firefox, but not Epiphany or curl from a console. http://127.0.0.1:6543 works from all three. http://localhost.localdomain:6543 doesn't work from any of the three, which is a problem, because that's the URL you have to use for CORS to be happy (anything else causes things like posting comments or creating updates to be rejected by CORS).

Don't know why this is, yet. Will try and confirm that its a podman 5 thing later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants