Should we instantiate a k8s cluster within CI for rhg_compute_tools.kubernetes unit tests? #88

bolliger32 · 2020-09-14T22:04:58Z

In theory, you could do this with https://github.com/rancher/k3s. This would allow for some better testing of the get_cluster related commands. For example, right now the _get_cluster_dask_gateway unit test creates a dask gateway server with a local backend. The local backend doesn't permit the same config options as a kubernetes backend, so not all of our config options are testable. But if we made a local k8s cluster and then spun up the gateway server on that, they would be. I'd imagine there are some other existing or planned functions that could make use of this for unit testing.

Might be overkill... but could be useful

The text was updated successfully, but these errors were encountered:

brews · 2020-09-14T22:35:04Z

Interesting! Have you per chance looked at coverage stats for what we have vs what we're missing because of this limitation?

bolliger32 · 2020-09-14T22:40:07Z

I haven't - I'm not sure it would show up? TBH I don't really know how coverage works but it's not that we're missing testing some functions, its just that we're not able to test all components of what some functions could do. So my guess (again without much knowledge on the subject) is that this would show up as fully covered. But just as an example, I haven't found a way to actually allow a memory specification to adjust the memory allocated to a worker on the local dask gateway cluster. I think this is something about how the local backend manages memory given whatever memory is available locally (so this wouldn't be an issue on kubernetes) but it's also possible that it's a bug in my code and we actually are not able to control memory allocated to a worker, even with the kubernetes background. I was able to test this manually on adrastea, but couldn't test it within CI.

bolliger32 · 2020-09-14T22:47:40Z

Oh! One other change that users will see is that it sometimes takes some time (on the order of seconds to minutes) for the get_*_cluster command to return client and cluster objects. This is b/c the scheduler is now always its own remote pod. So at the very least, an image that already exists on a node will need to be used to start a new container. At most, a new node will need to be triggered, the image will need to be pulled, and then it will need to be loaded. I think there is potentially a way to do this asynchronously, but for now its just implemented synchronously, so the response time of this function can be a lot slower.

Relatedly, schedulers are currently going into the core pool I believe. I think this is fine, as that is autoscaling and I think a more appropriate machine size. But we could start a scheduler-specific node pool and add some tolerations to the scheduler pods by default if we wanted to keep those separate

bolliger32 mentioned this issue Sep 14, 2020

Add dask-gateway compatibility #87

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should we instantiate a k8s cluster within CI for rhg_compute_tools.kubernetes unit tests? #88

Should we instantiate a k8s cluster within CI for rhg_compute_tools.kubernetes unit tests? #88

bolliger32 commented Sep 14, 2020

brews commented Sep 14, 2020

bolliger32 commented Sep 14, 2020

bolliger32 commented Sep 14, 2020

Should we instantiate a k8s cluster within CI for rhg_compute_tools.kubernetes unit tests? #88

Should we instantiate a k8s cluster within CI for rhg_compute_tools.kubernetes unit tests? #88

Comments

bolliger32 commented Sep 14, 2020

brews commented Sep 14, 2020

bolliger32 commented Sep 14, 2020

bolliger32 commented Sep 14, 2020