Skip to content

Releases: kubernetes-sigs/jobset

v0.7.3

24 Jan 18:48
v0.7.3
152f624
Compare
Choose a tag to compare

What's Changed

  • [Release 0.7]: cherry-pick allow for one to install jobset in a different namespace by @kannon92 in #751
  • [release-0.7]: cherry-pick turn off internal cert management via config by @kannon92 in #757

Full Changelog: v0.7.2...v0.7.3

Release v0.7.2

10 Dec 21:36
v0.7.2
9cb030b
Compare
Choose a tag to compare

What's Changed

  • Update docs for v0.7.0 (release branch) by @danielvegamyhre in #691
  • Automated cherry pick of #705: Propagate schedulingGates set on PodTemplate when resuming by @mimowo in #706

Full Changelog: v0.7.0...v0.7.2

Release v0.7.1

18 Nov 23:35
v0.7.1
Compare
Choose a tag to compare

What's Changed

  • Update docs for v0.7.0 (release branch) by @danielvegamyhre in #691
  • Automated cherry pick of #705: Propagate schedulingGates set on PodTemplate when resuming by @mimowo in #706

Full Changelog: v0.7.0...v0.7.1

v0.7.0

26 Oct 18:20
f76f2a7
Compare
Choose a tag to compare

Highlights

What's Changed

  • fix: delete active jobs right away when job finishes even when TTLSecondsAfterFinished is set by @CecileRobertMichon in #667
  • Bump github.com/onsi/ginkgo/v2 from 2.20.0 to 2.20.1 by @dependabot in #663
  • Bump github.com/prometheus/client_golang from 1.20.0 to 1.20.2 by @dependabot in #664
  • Bump kubernetes dependencies to v0.31.x. by @mbobrovskyi in #670
  • Bump github.com/onsi/ginkgo/v2 from 2.20.1 to 2.20.2 by @dependabot in #668
  • Bump github.com/onsi/gomega from 1.34.1 to 1.34.2 by @dependabot in #669
  • chore: update README.md e2e test version for v1.31.0 by @googs1025 in #671
  • Add test-python-sdk on Makefile test. by @mbobrovskyi in #673
  • Bump github.com/prometheus/client_golang from 1.20.2 to 1.20.3 by @dependabot in #674
  • feat: add component config by @rainfd in #609
  • Bump the kubernetes group with 6 updates by @dependabot in #675
  • Add global-job-replicas label/annotation by @GiuseppeTT in #677
  • Add examples for three existing failure policy actions. by @jedwins1998 in #601
  • Bump github.com/prometheus/client_golang from 1.20.3 to 1.20.4 by @dependabot in #679
  • chore: use symbolic link instead of directory by @googs1025 in #630
  • Priority-based exclusive placement by @ahg-g in #687
  • Bump github.com/prometheus/client_golang from 1.20.4 to 1.20.5 by @dependabot in #688
  • Add restart strategy by @nstogner in #686

New Contributors

Full Changelog: v0.7.0-devel...v0.7.0

v0.6.0

20 Aug 16:20
d66f1d5
Compare
Choose a tag to compare

Highlights

  • New JobSet Failure Policy API - allows users to configure different behavior for different types of errors, enabling them to use compute resources more efficiently and improve ML training goodput.
  • Add Coordinator field to JobSet spec, enabling user to define a global coordinator pod for distributed ML/HPC workloads. The stable network endpoint for this pod will be added as a label and annotation to every Job and Pod in the JobSet for easy use in application code. A common use case for this is TPU Multislice training with multiple different Job templates. See linked issue for details.
  • Add global Job index label/annotation to every Job and Pod, which is needed to support TPU Multislice training with multiple different Job templates. See linked issue for details.
  • Added new metrics
  • Improved test coverage
  • Bug fixes
  • New examples and documentation

What's Changed

New Contributors

Full Changelog: v0.6.0-devel...v0.6.0

JobSet v0.5.2

04 Jun 17:42
8637f29
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.5.1...v0.5.2

v0.5.1

09 May 17:50
43f8137
Compare
Choose a tag to compare

Highlights

  • Fixed bug causing foreground cascading deletion policy to not work properly on JobSets #562
  • Fixed field path in error message in validation for ManagedBy field #527
  • Test coverage improvements, refactoring, additional documentation

What's Changed

Full Changelog: v0.6.0-devel...v0.5.1

v0.5.0

15 Apr 20:12
cb941fc
Compare
Choose a tag to compare

What's Changed

Highlights

  • JobSet TTL support added in #443
  • Docsite is live at https://jobset.sigs.k8s.io/ with updated documentation and examples.
  • Include first failed job name in event emitted when JobSet fails, to speed up the debugging process for large complex workloads #477
  • Lower default resource request for JobSet controller manager so it fits on default cloud CPU VMs, but keep high limit to support maximum performance #480
  • Perform only 1 JobSet status update per reconcile attempt to reduce pressure on k8s apiserver #494
  • Introduced MangedBy field to the JobSet spec to enable Multi-Kueue support

Detailed release notes

New Contributors

Full Changelog: v0.5.0-devel...v0.5.0

v0.4.0

28 Feb 21:12
9f2cb14
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.4.0-devel...v0.4.0

JobSet v0.3.2

13 Feb 19:51
5eb9a2a
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.3.1...v0.3.2