Releases: NVIDIA/ais-k8s
Releases · NVIDIA/ais-k8s
v1.7.0
AIS Operator v1.7.0
- Fixed bug with shutdown that could cause a cluster to be stuck in "Shutting Down" state. Operator no longer makes a separate API call to specifically shut down AIS cluster before scaling down.
- Optimize rebalance condition to patch only when changed
- Removed several unused environment variables from the statefulset spec. Refactored construction of the set of ENV vars to use.
- Minor updates to tests, linting, proxy statefulset update status
- Updated all minor dependencies including AIS
Deprecated
EnablePromExporter
option. On all recent AIS releases this is always enabled and the associated environment variable has been removed.
Helm
- Updated CA duration and renewal option in TLS charts
- Added cloud cert secrets generation chart
- Added config for internal test cluster and internal deployment
- Added pod
resource
values option
Full Changelog: v1.6.1...v1.7.0
v1.6.1
See https://github.com/NVIDIA/ais-k8s/releases/tag/v1.6.0
AIS Operator v1.6.1
- Added reconciliation of target and proxy container resources spec
Full Changelog: v1.6.0...v1.6.1
v1.6.0
IMPORTANT Please see compatibility docs for information on deploying clusters with this new version. It requires a new aisinit container >= v3.25 to generate configs for AIS pods.
AIS Operator v1.6.0
- Added support for init container managed configs. See compatibility docs. This will improve compatibility between versions and help with upgrade paths.
- Operator will now reconcile the entire pod spec for aisnode when image changes
- Operator will now reconcile the entire init pod spec when init image changes
- Added resource management options to AIS spec
- Added MY_NODE env var to aisnode container
- Added support for deployments with distributed tracing
Full Changelog: v1.5.0...v1.6.0
v1.5.0
AIS Operator v1.5.0
- Updated to go 1.23 and latest dependencies
- Added support for custom annotations passed from spec to aisnode containers via
Annotations
spec option - Added support for custom environment variables passed from spec to aisnode containers via
Env
spec option - Fixed a bug where rebalance would not properly disable and re-enable for upgrades if it had been modified manually
- Removed the option for the operator manager to run external to the k8s cluster
- Internal logic refactoring of AIS API and AuthN clients
- Added
Sync
option to version config - Changed
net.http.UseHttps
option to solely control whether aisnode expects to use HTTPS rather than relying on presence of TLS secrets or cert manager issuer - Improved logging and requeue logic to make it easier to follow deployment progress and debug issues
Helm
- Moved operator repository to github pages. The operator will now use a constant repo and update chart versions along with each new version. See https://github.com/NVIDIA/ais-k8s/tree/main/helm#install-charts for instructions.
Full Changelog: v1.4.1...v1.5.0
v1.4.1
AIS Operator v1.4.1
- Fixed an issue where the operator would modify the rebalance config in the provided spec and not restore previous config after upgrades
- Cleaned up logging and handling of DNS resolution on proxy startup
Major release v1.4.0: https://github.com/NVIDIA/ais-k8s/releases/tag/v1.4.0
Full Changelog: v1.4.0...v1.4.1
v1.4.0
AIS Operator v1.4.0
- Improved state management to reconcile based on state rather than using blocking waits
- Disabled rebalance at the AIS level before cluster modifications -- scaling, rolling upgrades, cluster re-creation
- Added a watch on AIS spec configToUpdate for changes and keep those in sync with the cluster
- Added ability to reconcile statefulset status
- Updated default AIS config generation and improved compatibility through version changes
- Added new AIS states for the following:
- Scaling
- HostCleanup
- Finalized
- Bug fixes
- Fixed deep equal comparison with spec
- Fixed cleanup jobs with proper status and termination
- Improved wait behavior when waiting for AIS cluster readiness or decommissioning
- QOL improvements -- Cleaned up logging, Added unit testing
API Changes
- New options
- cleanupMetadata -- Allows for cluster decommission while preserving cluster metadata for future deployments
- tlsCertManagerIssuerName -- Specifies a cert-manager CSI issuer
Full Changelog: v1.3.0...v1.4.0
v1.3.0
AIS Operator v1.3.0
- Added sidecar container for accessing stdout logs via k8s
- Test improvements including unit tests for controller
- Improved state management including new states for in-progress shutdown, in-progress decommission, and cleanup. See ClusterCondition list in aistore_types.go
- Improved state logging and event recording
- Remove unused "env-mount" volume mount
- Added AuthN support
API changes
- New cleanupMetadata option. Previous behavior matches cleanupMetadata=true. This option can now be disabled to allow preservation of cluster metadata (such as buckets) when decommissioning and transitioning to an entirely new cluster (new AIS custom resource).
- New authNSecretName option to add secret signing key for JWT tokens in AIStore.
Full Changelog: v1.2.0...v1.3.0
v1.2.0
AIS Operator v1.2.0
Operator:
-
Breaking Change
- Deployments with Operator versions >= 1.2.0 must specify an ais-init image >= 1.2.0
-
Changes
- Added
stateStorageClass
field to AIS spec for dynamic state storage - Handle destroying statefulsets in unready state
- Wait for cleanup job success before continuing decommission
- Added internal shutdown status
- Fixed duration type in AIS config
- Added ais-init docker build (moved from aistore repo)
- Move bash script logic into the init image
- Use proper HTTP probes for liveness/readiness
- Added
-
Deprecated
- Deprecated
hostPathPrefix
. See docs/state_storage.md
- Deprecated
Full Changelog: v1.1.1...v1.2.0
v1.1.1
AIS Operator v1.1.1
Highlights:
-
General Improvements:
- Updated AIStore version to v3.23 in Helm chart, operator tests, deployment roles, and config samples.
- Enhanced security and execution efficiency by refining the use of 'become: true' in Ansible playbooks, restricting elevated privileges to necessary tasks only.
- Transitioned the default branch name from 'master' to 'main'.
-
Monitoring Enhancements:
- Improved Grafana dashboard visuals and organization, enhancing panel visibility and highlighting unavailable numbers.
- Updated AlertManager timings and Slack titles to better distinguish between alert statuses.
- Fixed and optimized Grafana dashboard metrics, including throughput calculation and error graph adjustments.
- Added more alerts for various AIS node states, including restart and maintenance mode alerts.
-
Operator Enhancements:
- Fixed
Backend
field marshaling in the operator. - Made
.spec.size
optional, simplifying operator configuration. - Simplified the
waitForDNSEntry
method. - Explicitly disallowed multiple proxies on a single node for better stability.
- Bumped AIStore dependency and default version to v1.1.0.
- Fixed
-
Documentation and Miscellaneous:
- Added a compatibility matrix for AIStore and ais-operator.
- Updated generated files and lint configurations.
Full Changelog: v1.1.0...v1.1.1
v1.1.0
AIS-Operator v1.1.0 Release Notes
Operator Enhancements:
- New
logsDir
field to mount logs. - New cleanup jobs after decommissioning.
- Automatic cluster decommission upon deletion.
- Added
mountLabel
field to CRD; support for backward compatibility. - Enhanced DNS checks for proxies before resolving targets.
- Improved flows for startup, restart, shutdown, and decommission.
- Added
shutdownCluster
field to CRD spec. - Added
hostNetwork
parameter to target specifications in CRD. - General fixes and updates.
Documentation Updates:
- Guidelines for deploying multiple targets per Kubernetes node.
- General documentation fixes and updates.
Playbooks:
- Updated to accommodate new operator field enhancements.
Additional Updates:
- Experimental Helm chart for deploying AIS.
- New
ais-operator-helper
Docker image for post-decommission cleanup jobs. - Various test fixes and improvements.