Releases: NVIDIA/ais-k8s
Releases · NVIDIA/ais-k8s
v1.6.1
See https://github.com/NVIDIA/ais-k8s/releases/tag/v1.6.0
AIS Operator v1.6.1
- Added reconciliation of target and proxy container resources spec
Full Changelog: v1.6.0...v1.6.1
v1.6.0
IMPORTANT Please see compatibility docs for information on deploying clusters with this new version. It requires a new aisinit container >= v3.25 to generate configs for AIS pods.
AIS Operator v1.6.0
- Added support for init container managed configs. See compatibility docs. This will improve compatibility between versions and help with upgrade paths.
- Operator will now reconcile the entire pod spec for aisnode when image changes
- Operator will now reconcile the entire init pod spec when init image changes
- Added resource management options to AIS spec
- Added MY_NODE env var to aisnode container
- Added support for deployments with distributed tracing
Full Changelog: v1.5.0...v1.6.0
v1.5.0
AIS Operator v1.5.0
- Updated to go 1.23 and latest dependencies
- Added support for custom annotations passed from spec to aisnode containers via
Annotations
spec option - Added support for custom environment variables passed from spec to aisnode containers via
Env
spec option - Fixed a bug where rebalance would not properly disable and re-enable for upgrades if it had been modified manually
- Removed the option for the operator manager to run external to the k8s cluster
- Internal logic refactoring of AIS API and AuthN clients
- Added
Sync
option to version config - Changed
net.http.UseHttps
option to solely control whether aisnode expects to use HTTPS rather than relying on presence of TLS secrets or cert manager issuer - Improved logging and requeue logic to make it easier to follow deployment progress and debug issues
Helm
- Moved operator repository to github pages. The operator will now use a constant repo and update chart versions along with each new version. See https://github.com/NVIDIA/ais-k8s/tree/main/helm#install-charts for instructions.
Full Changelog: v1.4.1...v1.5.0
v1.4.1
AIS Operator v1.4.1
- Fixed an issue where the operator would modify the rebalance config in the provided spec and not restore previous config after upgrades
- Cleaned up logging and handling of DNS resolution on proxy startup
Major release v1.4.0: https://github.com/NVIDIA/ais-k8s/releases/tag/v1.4.0
Full Changelog: v1.4.0...v1.4.1
v1.4.0
AIS Operator v1.4.0
- Improved state management to reconcile based on state rather than using blocking waits
- Disabled rebalance at the AIS level before cluster modifications -- scaling, rolling upgrades, cluster re-creation
- Added a watch on AIS spec configToUpdate for changes and keep those in sync with the cluster
- Added ability to reconcile statefulset status
- Updated default AIS config generation and improved compatibility through version changes
- Added new AIS states for the following:
- Scaling
- HostCleanup
- Finalized
- Bug fixes
- Fixed deep equal comparison with spec
- Fixed cleanup jobs with proper status and termination
- Improved wait behavior when waiting for AIS cluster readiness or decommissioning
- QOL improvements -- Cleaned up logging, Added unit testing
API Changes
- New options
- cleanupMetadata -- Allows for cluster decommission while preserving cluster metadata for future deployments
- tlsCertManagerIssuerName -- Specifies a cert-manager CSI issuer
Full Changelog: v1.3.0...v1.4.0
v1.3.0
AIS Operator v1.3.0
- Added sidecar container for accessing stdout logs via k8s
- Test improvements including unit tests for controller
- Improved state management including new states for in-progress shutdown, in-progress decommission, and cleanup. See ClusterCondition list in aistore_types.go
- Improved state logging and event recording
- Remove unused "env-mount" volume mount
- Added AuthN support
API changes
- New cleanupMetadata option. Previous behavior matches cleanupMetadata=true. This option can now be disabled to allow preservation of cluster metadata (such as buckets) when decommissioning and transitioning to an entirely new cluster (new AIS custom resource).
- New authNSecretName option to add secret signing key for JWT tokens in AIStore.
Full Changelog: v1.2.0...v1.3.0
v1.2.0
AIS Operator v1.2.0
Operator:
-
Breaking Change
- Deployments with Operator versions >= 1.2.0 must specify an ais-init image >= 1.2.0
-
Changes
- Added
stateStorageClass
field to AIS spec for dynamic state storage - Handle destroying statefulsets in unready state
- Wait for cleanup job success before continuing decommission
- Added internal shutdown status
- Fixed duration type in AIS config
- Added ais-init docker build (moved from aistore repo)
- Move bash script logic into the init image
- Use proper HTTP probes for liveness/readiness
- Added
-
Deprecated
- Deprecated
hostPathPrefix
. See docs/state_storage.md
- Deprecated
Full Changelog: v1.1.1...v1.2.0
v1.1.1
AIS Operator v1.1.1
Highlights:
-
General Improvements:
- Updated AIStore version to v3.23 in Helm chart, operator tests, deployment roles, and config samples.
- Enhanced security and execution efficiency by refining the use of 'become: true' in Ansible playbooks, restricting elevated privileges to necessary tasks only.
- Transitioned the default branch name from 'master' to 'main'.
-
Monitoring Enhancements:
- Improved Grafana dashboard visuals and organization, enhancing panel visibility and highlighting unavailable numbers.
- Updated AlertManager timings and Slack titles to better distinguish between alert statuses.
- Fixed and optimized Grafana dashboard metrics, including throughput calculation and error graph adjustments.
- Added more alerts for various AIS node states, including restart and maintenance mode alerts.
-
Operator Enhancements:
- Fixed
Backend
field marshaling in the operator. - Made
.spec.size
optional, simplifying operator configuration. - Simplified the
waitForDNSEntry
method. - Explicitly disallowed multiple proxies on a single node for better stability.
- Bumped AIStore dependency and default version to v1.1.0.
- Fixed
-
Documentation and Miscellaneous:
- Added a compatibility matrix for AIStore and ais-operator.
- Updated generated files and lint configurations.
Full Changelog: v1.1.0...v1.1.1
v1.1.0
AIS-Operator v1.1.0 Release Notes
Operator Enhancements:
- New
logsDir
field to mount logs. - New cleanup jobs after decommissioning.
- Automatic cluster decommission upon deletion.
- Added
mountLabel
field to CRD; support for backward compatibility. - Enhanced DNS checks for proxies before resolving targets.
- Improved flows for startup, restart, shutdown, and decommission.
- Added
shutdownCluster
field to CRD spec. - Added
hostNetwork
parameter to target specifications in CRD. - General fixes and updates.
Documentation Updates:
- Guidelines for deploying multiple targets per Kubernetes node.
- General documentation fixes and updates.
Playbooks:
- Updated to accommodate new operator field enhancements.
Additional Updates:
- Experimental Helm chart for deploying AIS.
- New
ais-operator-helper
Docker image for post-decommission cleanup jobs. - Various test fixes and improvements.
v1.0.0
AIS Operator v1.0.0
New Features
- Support for different sizes of proxy/target stateful sets.
- Enabled TLS certificate verification.
- Operator client updated for TLS support.
- Added multi-home support.
- Helm chart creation added to Makefile bundle-manifests.
Fixes
- Webhook fix for proxy/target spec size adjustments.
Updates
- Upgraded to AISNode image v3.22.
- Updated operator dependencies for better performance and security.