Skip to content

Commit

Permalink
Add WG Batch with charter
Browse files Browse the repository at this point in the history
  • Loading branch information
alculquicondor committed Feb 9, 2022
1 parent 8795466 commit 0cf4239
Show file tree
Hide file tree
Showing 10 changed files with 201 additions and 0 deletions.
6 changes: 6 additions & 0 deletions OWNERS_ALIASES
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,12 @@ aliases:
wg-api-expression-leads:
- apelisse
- kwiesmueller
wg-batch-leads:
- Huang-Wei
- ahg-g
- endocrimes
- soltysh
- swatisehgal
wg-data-protection-leads:
- xing-yang
- yuxiangqian
Expand Down
1 change: 1 addition & 0 deletions liaisons.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ members will assume one of the departing members groups.
| [SIG Usability](sig-usability/README.md) | Davanum Srinivas (**[@dims](https://github.com/dims)**) |
| [SIG Windows](sig-windows/README.md) | Jordan Liggitt (**[@liggitt](https://github.com/liggitt)**) |
| [WG API Expression](wg-api-expression/README.md) | Jordan Liggitt (**[@liggitt](https://github.com/liggitt)**) |
| [WG Batch](wg-batch/README.md) | Bob Killen (**[@mrbobbytables](https://github.com/mrbobbytables)**) |
| [WG Data Protection](wg-data-protection/README.md) | Christoph Blecker (**[@cblecker](https://github.com/cblecker)**) |
| [WG IoT Edge](wg-iot-edge/README.md) | Christoph Blecker (**[@cblecker](https://github.com/cblecker)**) |
| [WG Multitenancy](wg-multitenancy/README.md) | Jordan Liggitt (**[@liggitt](https://github.com/liggitt)**) |
Expand Down
1 change: 1 addition & 0 deletions sig-apps/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ The Chairs of the SIG run operations and processes governing the SIG.
## Working Groups

The following [working groups][working-group-definition] are sponsored by sig-apps:
* [WG Batch](/wg-batch)
* [WG Data Protection](/wg-data-protection)


Expand Down
6 changes: 6 additions & 0 deletions sig-autoscaling/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,12 @@ The Chairs of the SIG run operations and processes governing the SIG.
- [@kubernetes/sig-autoscaling-test-failures](https://github.com/orgs/kubernetes/teams/sig-autoscaling-test-failures) - Test Failures and Triage
- Steering Committee Liaison: Tim Pepper (**[@tpepper](https://github.com/tpepper)**)

## Working Groups

The following [working groups][working-group-definition] are sponsored by sig-autoscaling:
* [WG Batch](/wg-batch)


## Subprojects

The following [subprojects][subproject-definition] are owned by sig-autoscaling:
Expand Down
1 change: 1 addition & 0 deletions sig-list.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ When the need arises, a [new SIG can be created](sig-wg-lifecycle.md)
| Name | Label | Stakeholder SIGs |Organizers | Contact | Meetings |
|------|-------|------------------|-----------|---------|----------|
|[API Expression](wg-api-expression/README.md)|[api-expression](https://github.com/kubernetes/kubernetes/labels/wg%2Fapi-expression)|* API Machinery<br>* Architecture<br>|* [Antoine Pelisse](https://github.com/apelisse), Google<br>* [Kevin Wiesmueller](https://github.com/kwiesmueller), //SEIBERT/MEDIA GmbH<br>|* [Slack](https://kubernetes.slack.com/messages/wg-api-expression)<br>* [Mailing List](https://groups.google.com/forum/#!forum/kubernetes-wg-api-expression)|* Regular WG Meeting: [Tuesdays at 9:30 PT (Pacific Time) (biweekly)](https://zoom.us/j/94238112084)<br>
|[Batch](wg-batch/README.md)|[batch](https://github.com/kubernetes/kubernetes/labels/wg%2Fbatch)|* Apps<br>* Autoscaling<br>* Node<br>* Scheduling<br>|* [Wei Huang](https://github.com/Huang-Wei), Apple<br>* [Abdullah Gharaibeh](https://github.com/ahg-g), Google<br>* [Danielle Lancashire](https://github.com/endocrimes), VMware<br>* [Maciej Szulik](https://github.com/soltysh), Red Hat<br>* [Swati Sehgal](https://github.com/swatisehgal), Intel<br>|* [Slack](https://kubernetes.slack.com/messages/wg-batch)<br>* [Mailing List](TBD)|* Regular Meeting: [TBDs at TBD UTC (biweekly)](TBD)<br>
|[Data Protection](wg-data-protection/README.md)|[data-protection](https://github.com/kubernetes/kubernetes/labels/wg%2Fdata-protection)|* Apps<br>* Storage<br>|* [Xing Yang](https://github.com/xing-yang), VMware<br>* [Xiangqian Yu](https://github.com/yuxiangqian), Google<br>|* [Slack](https://kubernetes.slack.com/messages/wg-data-protection)<br>* [Mailing List](https://groups.google.com/forum/#!forum/kubernetes-data-protection)|* Regular WG Meeting: [Wednesdays at 9:00 PT (Pacific Time) (bi-weekly)](https://zoom.us/j/6933410772)<br>
|[IoT Edge](wg-iot-edge/README.md)|[iot-edge](https://github.com/kubernetes/kubernetes/labels/wg%2Fiot-edge)|* Multicluster<br>* Network<br>|* [Steve Wong](https://github.com/cantbewong), VMware<br>* [Cindy Xing](https://github.com/cindyxing), Microsoft<br>* [Dejan Bosanac](https://github.com/dejanb), Red Hat<br>|* [Slack](https://kubernetes.slack.com/messages/wg-iot-edge)<br>* [Mailing List](https://groups.google.com/forum/#!forum/kubernetes-wg-iot-edge)|* APAC WG Meeting: [Wednesdays at 5:00 UTC (every four weeks)](https://zoom.us/j/91251176046?pwd=cmdqclovM3R3eDB1VlpuL1ZGU1hnZz09)<br>* Regular WG Meeting (Pacific Time): [Wednesdays at 09:00 PT (every four weeks)](https://zoom.us/j/92778512626?pwd=MXhlemwvYnhkQmkxeXllQ0Z5VGs4Zz09)<br>
|[Multitenancy](wg-multitenancy/README.md)|[multitenancy](https://github.com/kubernetes/kubernetes/labels/wg%2Fmultitenancy)|* API Machinery<br>* Auth<br>* Network<br>* Node<br>* Scheduling<br>* Storage<br>|* [Sanjeev Rampal](https://github.com/srampal), Cisco<br>* [Tasha Drew](https://github.com/tashimi), VMware<br>|* [Slack](https://kubernetes.slack.com/messages/wg-multitenancy)<br>* [Mailing List](https://groups.google.com/forum/#!forum/kubernetes-wg-multitenancy)|* Regular WG Meeting: [Tuesdays at 11:00 PT (Pacific Time) (biweekly)](https://zoom.us/my/k8s.sig.auth)<br>
Expand Down
1 change: 1 addition & 0 deletions sig-node/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ The Chairs of the SIG run operations and processes governing the SIG.
## Working Groups

The following [working groups][working-group-definition] are sponsored by sig-node:
* [WG Batch](/wg-batch)
* [WG Multitenancy](/wg-multitenancy)
* [WG Policy](/wg-policy)
* [WG Structured Logging](/wg-structured-logging)
Expand Down
1 change: 1 addition & 0 deletions sig-scheduling/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ subprojects, and resolve cross-subproject technical issues and decisions.
## Working Groups

The following [working groups][working-group-definition] are sponsored by sig-scheduling:
* [WG Batch](/wg-batch)
* [WG Multitenancy](/wg-multitenancy)
* [WG Policy](/wg-policy)
* [WG Structured Logging](/wg-structured-logging)
Expand Down
46 changes: 46 additions & 0 deletions sigs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2873,6 +2873,52 @@ workinggroups:
liaison:
github: liggitt
name: Jordan Liggitt
- dir: wg-batch
name: Batch
mission_statement: >
Discuss and enhance the support for Batch (eg. HPC, AI/ML, data analytics, CI)
workloads in core Kubernetes. We want to unify the way users deploy batch workloads
to improve portability and to simplify supportability for Kubernetes providers.
charter_link: charter.md
stakeholder_sigs:
- Apps
- Autoscaling
- Node
- Scheduling
label: batch
leadership:
chairs:
- github: Huang-Wei
name: Wei Huang
company: Apple
- github: ahg-g
name: Abdullah Gharaibeh
company: Google
- github: endocrimes
name: Danielle Lancashire
company: VMware
- github: soltysh
name: Maciej Szulik
company: Red Hat
- github: swatisehgal
name: Swati Sehgal
company: Intel
meetings:
- description: Regular Meeting
day: TBD
time: TBD
tz: UTC
frequency: biweekly
url: TBD
archive_url: TBD
recordings_url: TBD
contact:
slack: wg-batch
mailing_list: TBD
liaison:
github: mrbobbytables
name: Bob Killen
- dir: wg-data-protection
name: Data Protection
mission_statement: >
Expand Down
42 changes: 42 additions & 0 deletions wg-batch/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
<!---
This is an autogenerated file!
Please do not edit this file directly, but instead make changes to the
sigs.yaml file in the project root.
To understand how this file is generated, see https://git.k8s.io/community/generator/README.md
--->
# Batch Working Group

Discuss and enhance the support for Batch (eg. HPC, AI/ML, data analytics, CI) workloads in core Kubernetes. We want to unify the way users deploy batch workloads to improve portability and to simplify supportability for Kubernetes providers.

The [charter](charter.md) defines the scope and governance of the Batch Working Group.

## Stakeholder SIGs
* [SIG Apps](/sig-apps)
* [SIG Autoscaling](/sig-autoscaling)
* [SIG Node](/sig-node)
* [SIG Scheduling](/sig-scheduling)

## Meetings
*Joining the [mailing list](TBD) for the group will typically add invites for the following meetings to your calendar.*
* Regular Meeting: [TBDs at TBD UTC](TBD) (biweekly). [Convert to your timezone](http://www.thetimezoneconverter.com/?t=TBD&tz=UTC).
* [Meeting notes and Agenda](TBD).
* [Meeting recordings](TBD).

## Organizers

* Wei Huang (**[@Huang-Wei](https://github.com/Huang-Wei)**), Apple
* Abdullah Gharaibeh (**[@ahg-g](https://github.com/ahg-g)**), Google
* Danielle Lancashire (**[@endocrimes](https://github.com/endocrimes)**), VMware
* Maciej Szulik (**[@soltysh](https://github.com/soltysh)**), Red Hat
* Swati Sehgal (**[@swatisehgal](https://github.com/swatisehgal)**), Intel

## Contact
- Slack: [#wg-batch](https://kubernetes.slack.com/messages/wg-batch)
- [Mailing list](TBD)
- [Open Community Issues/PRs](https://github.com/kubernetes/community/labels/wg%2Fbatch)
- Steering Committee Liaison: Bob Killen (**[@mrbobbytables](https://github.com/mrbobbytables)**)
<!-- BEGIN CUSTOM CONTENT -->

<!-- END CUSTOM CONTENT -->
96 changes: 96 additions & 0 deletions wg-batch/charter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
# WG Batch Charter

This charter adheres to the conventions described in the [Kubernetes Charter README] and uses
the Roles and Organization Management outlined in [wg-governance].

[Kubernetes Charter README]: /committee-steering/governance/README.md

## Scope

Discuss and enhance the support for Batch (eg. HPC, AI/ML, data analytics, CI)
workloads in core Kubernetes. We want to unify the way users deploy batch
workloads to improve portability and to simplify supportability for Kubernetes
providers.

### In scope

- To reduce fragmentation in the k8s batch ecosystem: congregate leads and users from
different external and internal projects and user groups (CNCF TAGs, k8s sub-projects
focused on batch-related features such as topology-aware scheduling) in the batch ecosystem to
gather requirements, validate designs and encourage reutilization of core kubernetes APIs.
- The following recommendations for enhancements:
- Additions to the batch API group, currently including Job and CronJob resources
that benefit batch use cases such as HPC, AI/ML, data analytics and CI.
- Primitives for job-level queueing, not limited to the k8s Job resource. Long-term,
this could include multi-cluster support.
- Primitives to control and maximize utilization of resources in fixed-size clusters
(on-prem) and elastic clusters (cloud).
- Runtime and scheduling support for specialized hardware (GPUs, NUMA, RDMA, etc.)

### Out of scope

- Addition of new API kinds that serve a specialized type of workload. The focus
should be on general APIs that specialized controllers can build on top of.
- Uses of the batch APIs as support for serving workloads (eg. backups,
upgrades, migrations). These can be served by existing SIGs.
- Proposals that duplicate the functionality of core kubernetes components
(job-controller, kube-scheduler, cluster-autoscaler).
- Job workflows or pipelines. Mature third party frameworks serve these
use cases with the current kubernetes primitives. But additional primitives
to support these frameworks could be in scope.

## Stakeholders

Stakeholders in this working group span multiple SIGs that own parts of the
code in core kubernetes components and addons.

- Apps
- Autoscaling
- Node
- Scheduling

## Deliverables

The list of deliverables include the following high level features:

- To SIG Apps:
- Updated Job API that fulfills the needs of a wider range of batch applications.
- A performant job controller that can scale to thousands of pods per minute.
- To SIG Scheduling and Autoscaling
- A set of APIs to support job queueing, a framework to support different
queueing policies and a ready-to-use implementation as a subproject.
- Scheduling plugin(s) to support different batch needs.
- To SIG Autoscaling:
- Capabilities for job-level provisioning.
- To SIG Node:
- Runtime support for specialized hardware.

## Roles and Organization Management

This wg adheres to the Roles and Organization Management outlined in [wg-governance]
and opts-in to updates and modifications to [wg-governance].

[wg-governance]: /committee-steering/governance/wg-governance.md

Additionally, the wg commits to:

- maintain a solid communication line between the Kubernetes groups and the wider CNCF community;
- submit a proposal to the KubeCon/CloudNativeCon maintainers track; if not selected, a video update will be recorded and listed below.

## Timelines and Disbanding

As a first mandate, the wg will define a roadmap in the first quarter
of operation. We envision three timelines for the exit criteria, the focus will
be on early exit, but a determination on whether or not to go beyond
that is left until we reach that milestone.

1. Early exit: define "recommendations" for the deliverables mentioned above, those
recommendations would be left to the respective sigs to implement. The WG could
start implementing those recommendations in the context of the owning sig to generate
some momentum.
2. Mileston 2, Late exit: The WG continues the implementation of the recommendations until they reach GA,
and then disband.
2. Convert to SIG: The WG observes a constant influx of requirements for the artifacts and there
is the risk that the SIGs don't have enough capacity to maintain them.
Then, the wg will propose the graduation into a SIG, taking ownership of the
APIs, controllers and scheduling plugins.

0 comments on commit 0cf4239

Please sign in to comment.