-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add KEDA HPA TriggerAuthentication and postgresql ScaledObject. #2384
Closed
pt247
wants to merge
162
commits into
nebari-dev:develop
from
pt247:2284-keda-conda-store-worker-hpa
+432
−12
Closed
Changes from 1 commit
Commits
Show all changes
162 commits
Select commit
Hold shift + click to select a range
90532c6
Add KEDA HPA TriggerAuthentication and postgresql ScaledObject.
pt247 57c188f
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] bf87d9d
Terraform fmt.
pt247 6b1e7ff
More reactive scale up and down.
pt247 e57aa74
Merge branch 'develop' into 2284-keda-conda-store-worker-hpa
pt247 d62ecea
Formatting changes.
pt247 6dbfcef
Formating changes.
pt247 d07928c
Tweak default parameters.
pt247 b1be5b5
Code refactor.
pt247 c6a38cb
Set max nodes of general node to 5.
pt247 5d0607c
Add node affinity for KEDA pods to general node.
pt247 4d61350
Set maxReplicaCount for conda worker scaling.
pt247 bbae748
Move keda resources to conda.
pt247 bcb8a82
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] 4106d7d
Fix variable discription.
pt247 22e1fbe
Keeping default as more aggressive polling of postgresql is resulting…
pt247 bd5b051
Merge branch 'develop' into 2284-keda-conda-store-worker-hpa
pt247 701eb6e
Add resource limits for conda pods.
pt247 bff0c4c
Set CondaStoreWorker.concurrency = 1
pt247 0c2af1d
Expose worker resources and replica count to Nebari config.
pt247 75a194c
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] 7db539d
Merge branch 'nebari-dev:develop' into 2284-keda-conda-store-worker-hpa
pt247 f01bda9
Add integration test for KEDA.
e0442bc
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] 3236832
Fix integration test for KEDA.
d78153b
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] bef082e
disable ssl verify.
5339861
Ignore insecure request warning.
7d19315
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] c10457c
Increase timer for scaledown.
92537ae
Keep replica count for conda-store-worker deployment as 0 to start with.
164f548
Merge branch 'nebari-dev:develop' into 2284-keda-conda-store-worker-hpa
pt247 2dfffce
Modify test.
01975f8
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] c538bc1
Add more memory to conda-store worker.
832613a
Add more CPU to conda-store worker.
7fcadf9
Reduce cpu back to 250 for conda-store-workers.
c0a1a3e
Merge branch 'develop' into 2284-keda-conda-store-worker-hpa
pt247 479f708
Increase replicas back to 1.
4851b05
Revert resource constraints for conda-store-worker.
0431fca
Fix memory and cpu for conda store workers.
d5e4703
Adjust CPU and Memory consumptions.
b0349a1
Increase CPU to 1 core.
9a9ed3e
Debug keda test.
5b84bd2
Reduce memory for conda worker and add more info logs.
f8e03eb
Fix test.
d10c05e
Try CONDA_STORE_TOKEN from env
609ee7f
Fix logging.
72bfb2d
Merge branch 'develop' into 2284-keda-conda-store-worker-hpa
pt247 3fa05ae
Re-enable configmap patch in test.
d5370e6
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] 2f23a23
Merge branch 'nebari-dev:develop' into 2284-keda-conda-store-worker-hpa
pt247 b3b7b82
Setup tmate.
ebdb6e4
Update test.
1073877
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] b7d6e45
Fix env url.
21af4d7
Test refactor. rebase master.
225b887
Pause CI on failour.
7458aa2
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] 2edb084
Merge branch 'develop' into 2284-keda-conda-store-worker-hpa
pt247 9702f2d
Fix test.
1cea90c
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] b1e51a0
Fix test.
94dd480
Skip failing cypress tests.
330a507
Skip failing cypress tests.
cfdafce
Fix test.
f04a4be
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] 4bc8c88
Fix test.
df7efd7
Add cyprus tests back.
c4bad9c
Merge branch 'develop' into 2284-keda-conda-store-worker-hpa
pt247 c6eccf7
Remove changes from ci.
57df7fe
Remove node affinity for testing.
be4e4fa
Run pytest first.
c2c894c
Reduce cooldown period for tests.
59d1412
Change test for CI.
02fa687
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] e955d6d
Still increase timeout for test to finish
b701900
ignore::pytest.PytestUnraisableExceptionWarning
2ef1414
Fix test decorators.
118f3e0
Limit to 2 envs.
9cec9cf
IncreaseCI memort.
df63165
Test refactor.
40d129d
Revert ci workflow changes.
b2e1567
Remove unrelated changes.
8780325
Skip Cyprus tests.
52b4b63
Minor test refactor.
71cdf88
Revert inctance change.
f1208d0
Revert inctance change.
ee385db
Revert test_local_integration.yaml chanes.
49161f4
Add nodeselector for Keda.
7f78618
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] c7791e8
Remove cyprus tests.
14a79f6
Reduce pollingInterval and cooldownPeriod for tests.
41ff5c5
Reduce number of deployments to 1 for testing.
a9860ec
Add tmate on failour.
3c6dae1
tqdm instead of pandas for test.
71e572c
Fix tmate location.
ff5fbc1
r5a.12xlarge
a4bb7f2
Remove tmate.
2cbf117
Fix terraform format.
c104554
r5ad.4xlarge
19f8b61
Skip test_scale_up_and_down.
2fd929a
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] 85856d2
Rebase
306b5c0
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] 24cb0c8
Remove commentes.
fbd4f73
Remove print statements.
e18835d
Refactor test.
f1de525
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] 0954b6e
Add logs.
1c73150
Add more logging.
e839b61
Update timer.
a0a66fe
Remove ignore::pytest.PytestUnraisableExceptionWarning fixture from t…
c315a24
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] b5ced72
Merge branch 'develop' into 2284-keda-conda-store-worker-hpa
pt247 daf3a5f
Test cleanup.
9703916
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] 68966b1
Revert cirun instance_type.
3ec7df7
Add variable needed for pytest and sync file with develop.
789b461
Refactor test.
4b35cef
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] d9f3502
Upgrade python client for kubernetes version to 29.0.0
2c94f6e
add comment in test.
10140bb
Merge branch 'develop' into 2284-keda-conda-store-worker-hpa
pt247 d9d2338
Include cypress tests.
52716f8
Minor change to trigger local-integration-tests.
4889c8b
Ingore DeprecationWarning in tests.
8321804
Update test_local_integration.yaml
pt247 fe70439
Update test_conda_store_scaling.py
pt247 3d84840
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] 13aa233
Revert to hardcoded namespace for testing.
04a21da
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] b46a3a3
Re-add cypress tests in CI.
388aca5
Remove hardocded namespace from test.
ecd50f3
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] 6c40448
Revert version upgrade for kubernetes client.
a593758
Merge branch 'develop' into 2284-keda-conda-store-worker-hpa
pt247 a54f291
Merge branch 'develop' into 2284-keda-conda-store-worker-hpa
pt247 01e6ac4
Merge branch 'develop' into 2284-keda-conda-store-worker-hpa
pt247 6c9ce07
Fix node_slector lookup.
a1bce8f
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] 73b933d
Merge branch 'develop' into 2284-keda-conda-store-worker-hpa
pt247 bcddb7b
Merge branch 'develop' into 2284-keda-conda-store-worker-hpa
pt247 40ee18f
Merge branch 'develop' into 2284-keda-conda-store-worker-hpa
pt247 c4a748b
Merge branch 'develop' into 2284-keda-conda-store-worker-hpa
pt247 9a782e6
Merge branch 'develop' into 2284-keda-conda-store-worker-hpa
pt247 87089a4
Merge branch 'develop' into 2284-keda-conda-store-worker-hpa
pt247 93b51e7
Deployment and pod logs.
dd948f6
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] 797c682
Merge branch 'develop' into 2284-keda-conda-store-worker-hpa
pt247 2eba1bf
Merge branch 'nebari-dev:develop' into 2284-keda-conda-store-worker-hpa
pt247 95c84b4
KEDA scaling based on conda-store API.
7fd2670
Merge branch 'develop' into 2284-keda-conda-store-worker-hpa
pt247 9b5b0e7
Cleanup tests.
44ac977
Fix conda-store-worker terrafrom file format and syntax.
cdad85f
Update Azure general node group max nodes to 5 to be consistent with …
96b70d6
Make verbose conda-store-worker logs ad debug.
d4378b3
Fix typo.
2e7b1b6
Cleanup KEDA scaleed object config.
2c39a05
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] 142633c
Terrafrom fmt.
c72f95a
Reduce default max workers to 4.
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading status checks…
Add integration test for KEDA.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,6 +2,9 @@ | |
.nox | ||
_build | ||
.env | ||
.venv | ||
nebari-aws | ||
nebari-local | ||
|
||
# setuptools scm | ||
src/_nebari/_version.py | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,229 @@ | ||
import base64 | ||
import json | ||
import logging | ||
import sys | ||
import time | ||
import uuid | ||
from unittest import TestCase | ||
|
||
import kubernetes.client | ||
import pytest | ||
import requests | ||
from kubernetes import client, config, dynamic | ||
from kubernetes.client import api_client | ||
from kubernetes.client.rest import ApiException | ||
from timeout_function_decorator import timeout | ||
|
||
from tests.tests_deployment import constants | ||
|
||
CONDA_STORE_API_ENDPOINT = "conda-store/api/v1" | ||
|
||
service_permissions = {"primary_namespace": "", "role_bindings": {"*/*": ["admin"]}} | ||
|
||
NEBARI_HOSTNAME = constants.NEBARI_HOSTNAME | ||
# NEBARI_HOSTNAME = "pt.quansight.dev" ## Override for local testing | ||
|
||
|
||
@pytest.mark.filterwarnings("error") | ||
class TestCondaStoreWorkerHPA(TestCase): | ||
""" | ||
Creates 5 conda environments. | ||
Check conda-store-worker Scale up to 5 nodes. | ||
Check conda-store-worker Scale down to 0 nodes. | ||
""" | ||
|
||
log = logging.getLogger() | ||
logging.basicConfig( | ||
format="%(asctime)s %(module)s %(levelname)s: %(message)s", | ||
datefmt="%m/%d/%Y %I:%M:%S %p", | ||
level=logging.INFO, | ||
) | ||
stream_handler = logging.StreamHandler(sys.stdout) | ||
log.addHandler(stream_handler) | ||
|
||
def fetch_token(self): | ||
v1 = client.CoreV1Api() | ||
secret = v1.read_namespaced_secret("conda-store-secret", "dev") | ||
|
||
token = [ | ||
k | ||
for k in json.loads(base64.b64decode(secret.data["config.json"]))[ | ||
"service-tokens" | ||
].keys() | ||
][0] | ||
return token | ||
|
||
def read_namespaced_config_map(self): | ||
with kubernetes.client.ApiClient(self.configuration) as api_client: | ||
api_instance = kubernetes.client.CoreV1Api(api_client) | ||
try: | ||
api_response = api_instance.read_namespaced_config_map( | ||
"conda-store-config", "dev" | ||
) | ||
return api_response | ||
except ApiException as e: | ||
self.log.exception( | ||
"Exception when calling CoreV1Api->read_namespaced_config_map: %s\n" % e | ||
) | ||
finally: | ||
api_client.close() | ||
|
||
def patch_namespaced_config_map(self, config_map): | ||
with kubernetes.client.ApiClient(self.configuration) as api_client: | ||
api_instance = kubernetes.client.CoreV1Api(api_client) | ||
try: | ||
api_response = api_instance.patch_namespaced_config_map( | ||
"conda-store-config", "dev", config_map | ||
) | ||
self.log.debug(api_response) | ||
except ApiException as e: | ||
self.log.exception( | ||
"Exception when calling CoreV1Api->patch_namespaced_config_map: %s\n" | ||
% e | ||
) | ||
finally: | ||
api_client.close() | ||
|
||
def setUp(self): | ||
""" | ||
Get token for conda API. | ||
Create an API client. | ||
""" | ||
self.log.info("Setting up the test case.") | ||
self.configuration = config.load_kube_config() | ||
# Get token from pre-defined tokens. | ||
token = self.fetch_token() | ||
self.headers = {"Authorization": f"Bearer {token}"} | ||
|
||
# Read conda-store-config | ||
self.config_map = self.read_namespaced_config_map() | ||
|
||
# Patch conda-store-config | ||
self.config_map.data["conda_store_config.py"] = self.config_map.data[ | ||
"conda_store_config.py" | ||
].replace( | ||
'{default_namespace}/*": {"viewer"}', '{default_namespace}/*": {"admin"}' | ||
) | ||
self.patch_namespaced_config_map(self.config_map) | ||
|
||
# Patch conda-store-config | ||
|
||
# Delete existing environments | ||
self.delete_conda_environments() | ||
self.log.info("Wait for existing conda-store-worker pods terminate.") | ||
self.timed_wait_for_deployments(0) | ||
self.log.info("Ready to start tests.") | ||
|
||
def test_scale_up_and_down(self): | ||
""" | ||
Crete 5 conda environments. | ||
Wait for 5 conda-store-worker pods to start. | ||
Fail if 5 conda-store-worker pods don't spin up in 2 minutes. | ||
Wait till all the conda environments are created. (max 5 minutes) | ||
Fail if they don't scale down in another 5 minutes. | ||
""" | ||
# Crete 5 conda environments. | ||
count = 5 | ||
self.build_n_environments(count) | ||
self.log.info("Wait for 5 conda-store-worker pods to start.") | ||
self.timed_wait_for_deployments(count) | ||
self.log.info( | ||
"Waiting (max 5 minutes) for all the conda environments to be created." | ||
) | ||
self.timed_wait_for_environment_creation(count) | ||
self.log.info("Wait till worker deployment scales down to 0") | ||
self.timed_wait_for_deployments(0) | ||
self.log.info("Test passed.") | ||
|
||
def tearDown(self): | ||
""" | ||
Delete all conda environments. | ||
""" | ||
self.delete_conda_environments() | ||
|
||
# Revert conda-store-config | ||
self.config_map.data["conda_store_config.py"] = self.config_map.data[ | ||
"conda_store_config.py" | ||
].replace( | ||
'{default_namespace}/*": {"admin"}', '{default_namespace}/*": {"viewer"}' | ||
) | ||
self.patch_namespaced_config_map(self.config_map) | ||
self.log.info("Teardown complete.") | ||
self.stream_handler.close() | ||
|
||
def delete_conda_environments(self): | ||
existing_envs_url = f"https://{NEBARI_HOSTNAME}/{CONDA_STORE_API_ENDPOINT}/environment/?namespace=global" | ||
response = requests.get(existing_envs_url, headers=self.headers) | ||
for env in response.json()["data"]: | ||
env_name = env["name"] | ||
delete_url = f"https://{NEBARI_HOSTNAME}/{CONDA_STORE_API_ENDPOINT}/environment/global/{env_name}" | ||
self.log.info(f"Deleting {delete_url}") | ||
response = requests.delete(delete_url, headers=self.headers) | ||
self.log.info(f"All conda environments deleted.") | ||
|
||
@timeout(6 * 60) | ||
def timed_wait_for_environment_creation(self, target_count): | ||
created_count = 0 | ||
while created_count != target_count: | ||
created_count = 0 | ||
response = requests.get( | ||
f"https://{NEBARI_HOSTNAME}/{CONDA_STORE_API_ENDPOINT}/environment/", | ||
headers=self.headers, | ||
) | ||
for env in response.json().get("data"): | ||
build_id = env["current_build_id"] | ||
_res = requests.get( | ||
f"https://{NEBARI_HOSTNAME}/{CONDA_STORE_API_ENDPOINT}/build/{build_id}", | ||
headers=self.headers, | ||
) | ||
status = _res.json().get("data")["status"] | ||
if status == "COMPLETED": | ||
created_count += 1 | ||
self.log.info(f"{created_count}/{target_count} Environments created") | ||
time.sleep(5) | ||
|
||
self.log.info(f"timed_wait_for_environment_creation finished successfully.") | ||
|
||
@timeout(10) | ||
def build_n_environments(self, n): | ||
self.log.info(f"Building {n} conda environments...") | ||
for _ in range(n): | ||
time.sleep(1) | ||
self.create_conda_store_env() | ||
|
||
@timeout(10 * 60) | ||
def timed_wait_for_deployments(self, target_deployment_count): | ||
self.log.info( | ||
f"Waiting for deployments to reach target value {target_deployment_count} ..." | ||
) | ||
client = dynamic.DynamicClient( | ||
api_client.ApiClient(configuration=self.configuration) | ||
) | ||
replica_count = -1 | ||
while replica_count != target_deployment_count: | ||
deployment_api = client.resources.get( | ||
api_version="apps/v1", kind="Deployment" | ||
) | ||
deployment = deployment_api.get( | ||
name="nebari-conda-store-worker", namespace="dev" | ||
) | ||
replica_count = deployment.spec.replicas | ||
direction = "up" if target_deployment_count > replica_count else "down" | ||
self.log.info( | ||
f"Scaling {direction} deployments: {replica_count}/{target_deployment_count}" | ||
) | ||
time.sleep(5) | ||
self.log.info(f"Deployment count: {replica_count}") | ||
|
||
def create_conda_store_env(self): | ||
_url = f"https://{NEBARI_HOSTNAME}/{CONDA_STORE_API_ENDPOINT}/specification/" | ||
name = str(uuid.uuid4()) | ||
request_json = { | ||
"namespace": "global", | ||
"specification": f"dependencies:\n - pandas\nvariables: {{}}\nchannels: []\n\ndescription: ''\nname: {name}\nprefix: null", | ||
} | ||
response = requests.post(_url, json=request_json, headers=self.headers) | ||
self.log.debug(request_json) | ||
self.log.debug(self.headers) | ||
self.log.debug(response.json()) | ||
return response.json()["data"]["build_id"] |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@viniciusdc This trick of patching the config is working when running tests against AWS but not in CI. You suggested deploy Nebari with admin token. Can override from Nebari CI config?