Skip to content

Commit

Permalink
chore: bump katib-config minimum supported versions to 0.14. Update v…
Browse files Browse the repository at this point in the history
…arious readmes. Add issue templates.
  • Loading branch information
a9p committed Oct 26, 2022
1 parent 5638c0c commit 3020df7
Show file tree
Hide file tree
Showing 6 changed files with 68 additions and 35 deletions.
35 changes: 35 additions & 0 deletions .github/ISSUE_TEMPLATE/bug.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
---
name: Bug Report
about: Report a bug encountered during use
labels: kind/bug

---

### Summary

<!-- Concisely summarize the defect -->


### Steps to reproduce

<!-- *Detailed* steps to reproduce the issue. Include OS and hardware information, and attach any necessary *minimal* data files where applicable. -->


### Current bug behavior

<!-- What actually occurs -->


### Expected correct behavior

<!-- What you expected to observe -->


### Relevant logs and/or screenshots

<!-- Paste any relevant logs - please use code blocks (```) to format console output, logs, and code -->


### Possible fixes

<!-- If you can, link to the line of code that might be responsible for the problem -->
17 changes: 17 additions & 0 deletions .github/ISSUE_TEMPLATE/task.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
---
name: Task
about: General purpose template for feature/chore/documentation

---

### Description

<!--- What development is encompassed in this feature? -->

### Proposal

<!--- What is the proposed solution? -->

### Metrics for Success

<!--- If no way to measure success, link to an issue that will implement a way to measure this. -->
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ Running AutoLFADS using Ray enables scaling your processing jobs to many worker

Running AutoLFADS using KubeFlow enables scaling your experiments across an entire cluster. This workflow allows for isolated multi-user utilization and is ideal for running on managed infrastructure (e.g. University, public or private cloud) or on service-oriented clusters (i.e. no direct access to compute instances). It leverages industry standard tooling and enables scalable compute workflows beyond AutoLFADS for groups looking to adopt a framework for scalable machine learning.

If you are using a cloud provider, KubeFlow provides a series of [tutorials](https://www.kubeflow.org/docs/started/installing-kubeflow/#install-a-packaged-kubeflow-distribution) to get you setup with a completely configured install. We currently require a [feature](https://github.com/kubeflow/katib/pull/1833) that will be released in KubeFlow 1.6 (Katib 0.14). The below installation provides a pathway for installing KubeFlow on a _vanilla_ Kubernetes cluster integrating the noted changes.
If you are using a cloud provider, KubeFlow provides a series of [tutorials](https://www.kubeflow.org/docs/started/installing-kubeflow/#install-a-packaged-kubeflow-distribution) to get you setup with a completely configured install. We currently require a [feature](https://github.com/kubeflow/katib/pull/1833) that was introduced in Katib 0.14. The below installation provides a pathway for installing KubeFlow on a _vanilla_ Kubernetes cluster integrating the noted changes.

**Prerequisites:** Kubernetes cluster access and Ansible (installed locally; only needed when deploying KubeFlow)

Expand Down
17 changes: 1 addition & 16 deletions kubeflow/roles/kubeflow/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

[![](https://img.shields.io/badge/Kubeflow-v1.5.0--rc.0-informational)](https://github.com/kubeflow/manifests/releases/tag/v1.5.0-rc.0)

This playbook provides a core kubeflow installation. Specific components are removed (Notebooks, KNative, KFServing) as they are unused in our lab, but can be added back in by modifying the `files/kubeflow/kustomization.yaml`.
This playbook provides a core _kubeflow_ installation. Specific components are removed (Notebooks, KNative, KFServing) as they are unused in this example, but can be added back in by modifying the `files/kubeflow/kustomization.yaml`.

## Requirements

Expand Down Expand Up @@ -31,18 +31,3 @@ ansible-playbook kubeflow.yml --extra-vars "run_option=uninstall"
- Individual configurations can be inspected with kubctl using `kubectl kustomize <path>`
- Individual targets can be directly applied using `kubectl apply -k <path>`
- Quick inspection can be done using port forwarding: `kubectl port-forward svc/istio-ingressgateway -n istio-system --address 0.0.0.0 5901:80`


## WIP: Deployment
- Dex IDP (HAS CHANGES)
- OIDC AuthService (HAS CHANGE FOR FUTURE DEPRECATION)
- Pipelines (HAS CHANGE; BUT THIS SEEMS TO BE A BUG OR INCOMPLETE 1.22 SUPPORT)
NOTE: upstream/base/installs/multi-user/istio-authorization-config.yaml: ml-pipeline security is disabled...should be fixed
- Central dashboard (Modified for deployment)
- Admission webhook (HAS CHANGES)
- Tensorboard controller (HAS CHANGES)
- MPI Operator (README section should be deleted upstream...)
- Default User Namespace (ns: `kubeflow-user-example-com`, un: `[email protected]`, pw: `12341234` )

TODO: expose endpoint
- staging files need to be manually deleted from nuc-03:/tmp
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,13 @@ data:
metrics-collector-sidecar: |-
{
"StdOut": {
"image": "docker.io/kubeflowkatib/file-metrics-collector:latest"
"image": "docker.io/kubeflowkatib/file-metrics-collector:v0.14.0"
},
"File": {
"image": "docker.io/kubeflowkatib/file-metrics-collector:latest"
"image": "docker.io/kubeflowkatib/file-metrics-collector:v0.14.0"
},
"TensorFlowEvent": {
"image": "docker.io/kubeflowkatib/tfevent-metrics-collector:v0.13.0",
"image": "docker.io/kubeflowkatib/tfevent-metrics-collector:v0.14.0",
"resources": {
"limits": {
"memory": "1Gi"
Expand All @@ -24,39 +24,39 @@ data:
suggestion: |-
{
"random": {
"image": "docker.io/kubeflowkatib/suggestion-hyperopt:v0.13.0"
"image": "docker.io/kubeflowkatib/suggestion-hyperopt:v0.14.0"
},
"tpe": {
"image": "docker.io/kubeflowkatib/suggestion-hyperopt:v0.13.0"
"image": "docker.io/kubeflowkatib/suggestion-hyperopt:v0.14.0"
},
"grid": {
"image": "docker.io/kubeflowkatib/suggestion-chocolate:v0.13.0"
"image": "docker.io/kubeflowkatib/suggestion-chocolate:v0.14.0"
},
"hyperband": {
"image": "docker.io/kubeflowkatib/suggestion-hyperband:v0.13.0"
"image": "docker.io/kubeflowkatib/suggestion-hyperband:v0.14.0"
},
"bayesianoptimization": {
"image": "docker.io/kubeflowkatib/suggestion-skopt:v0.13.0"
"image": "docker.io/kubeflowkatib/suggestion-skopt:v0.14.0"
},
"cmaes": {
"image": "docker.io/kubeflowkatib/suggestion-goptuna:v0.13.0"
"image": "docker.io/kubeflowkatib/suggestion-goptuna:v0.14.0"
},
"sobol": {
"image": "docker.io/kubeflowkatib/suggestion-goptuna:v0.13.0"
"image": "docker.io/kubeflowkatib/suggestion-goptuna:v0.14.0"
},
"multivariate-tpe": {
"image": "docker.io/kubeflowkatib/suggestion-optuna:v0.13.0"
"image": "docker.io/kubeflowkatib/suggestion-optuna:v0.14.0"
},
"enas": {
"image": "docker.io/kubeflowkatib/suggestion-enas:v0.13.0",
"image": "docker.io/kubeflowkatib/suggestion-enas:v0.14.0",
"resources": {
"limits": {
"memory": "200Mi"
}
}
},
"darts": {
"image": "docker.io/kubeflowkatib/suggestion-darts:v0.13.0"
"image": "docker.io/kubeflowkatib/suggestion-darts:v0.14.0"
},
"pbt": {
"image": "docker.io/kubeflowkatib/suggestion-pbt:v0.14.0",
Expand All @@ -78,6 +78,6 @@ data:
early-stopping: |-
{
"medianstop": {
"image": "docker.io/kubeflowkatib/earlystopping-medianstop:v0.13.0"
"image": "docker.io/kubeflowkatib/earlystopping-medianstop:v0.14.0"
}
}
4 changes: 0 additions & 4 deletions kubeflow/roles/nfs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,3 @@ ansible-playbook nfs_storage_class.yml --extra-vars "run_option=install"
# Cluster should first be initialized using: `ops init <cluster>`
ansible-playbook nfs_storage_class.yml --extra-vars "run_option=uninstall"
```

## WIP: Deployment
- Test should be run on installation
- Volumes should be reflected in variables specific to initialized deployment (e.g. staging, production)

0 comments on commit 3020df7

Please sign in to comment.