Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release v0.6.0 requirements #523

Closed
8 of 13 tasks
danielvegamyhre opened this issue Apr 16, 2024 · 9 comments
Closed
8 of 13 tasks

Release v0.6.0 requirements #523

danielvegamyhre opened this issue Apr 16, 2024 · 9 comments

Comments

@danielvegamyhre danielvegamyhre pinned this issue Apr 16, 2024
@ahg-g
Copy link
Contributor

ahg-g commented Apr 16, 2024

v1 API?

@danielvegamyhre
Copy link
Contributor Author

v1 API?

Thanks for the reminder, added to must-haves

@danielvegamyhre
Copy link
Contributor Author

Removed API v1 graduation pending finalized design for Kubeflow training v2 which may require JobSet API changes.

@danielvegamyhre
Copy link
Contributor Author

danielvegamyhre commented Jul 15, 2024

Before cutting the 0.6 release, I'm waiting for the cherry picks of kubernetes/kubernetes#126046 to be completed and patch release of k8s.io/api to be published so we can bump the dependency in JobSet. This upstream bug blocks our ability to use configurable failure policy rules with PodFailurePolicies, which is the main use case for the new configurable failure policy feature (#262). Configurable failure policy is the main feature included in 0.6, one which many customers/users are waiting on, so I don't want to cut the release with it in an incomplete state.

valayDave added a commit to Netflix/metaflow that referenced this issue Jul 18, 2024
> supported :
- foreach + parallel
- parallel with Argo
- dynamically set worker-counts.
- should work with @timeout / @project /@card etc.
- retries working with native Argo
- fully self contained jobset with argo support
    - Requires Jobset v0.6.0 [kubernetes-sigs/jobset#523]

> not-supported:
- support for catch

> Notes
- not using the `{{retries}}` like we do in container templates
- Instead passing down {{retries}} as a `inputs.parameters` which will
be accessible in the Jobset manifest.
- Temporary tweek to boto dep to ensure that boto install failures dont fail deployment.
- instead of relying on the kubernetes object, we freshly create a object in the ArgoContainer templates.
- Code in the same style as the kubernetes/argo integrations with explicit filling of variables and decoupled abstractions
- setting annotations explicitly as they wont be passed down from WorkflowTemplate level.
- support for jobset native success conditions (requires Jobset v0.6 on controller)
- REFACTORS THAT HAVE WENT INTO THIS COMMIT:
    - [argo][feedback] refactor dag template parameter /output setting
        - just move conditional block around
    - [argo][feedback] refactor references to `task_id_base` to `task_id_entropy`
        - these are set/used in the argo outputs and variable names
    - [argo][feedback] refactor references to `task-id-base` to `task-id-entropy`
        - these are uses a Argo Parameter Names.
    - [argo][feedback] refactor to match code style
    - [argo][feedback] refactor to match code style (refactor some conditionals)
    - [argo][feedback] remove k8s client and make `KubernetesArgoJobSet` directly use `kubernetes_sdk`
    - [argo][feedback] added `environment_variables_from_selectors` for code simplification
    - [argo][feedback] fix comment.
    - [argo][feedback] refactor condition for readabililty.
    - [argo][feedback] rollback temp boto3 installation change in metaflow env
    - [argo][feedback] remove rogue type hint
valayDave added a commit to Netflix/metaflow that referenced this issue Jul 19, 2024
> supported :
- foreach + parallel
- parallel with Argo
- dynamically set worker-counts.
- should work with @timeout / @project /@card etc.
- retries working with native Argo
- fully self contained jobset with argo support
    - Requires Jobset v0.6.0 [kubernetes-sigs/jobset#523]

> not-supported:
- support for catch

> Notes
- not using the `{{retries}}` like we do in container templates
- Instead passing down {{retries}} as a `inputs.parameters` which will
be accessible in the Jobset manifest.
- Temporary tweek to boto dep to ensure that boto install failures dont fail deployment.
- instead of relying on the kubernetes object, we freshly create a object in the ArgoContainer templates.
- Code in the same style as the kubernetes/argo integrations with explicit filling of variables and decoupled abstractions
- setting annotations explicitly as they wont be passed down from WorkflowTemplate level.
- support for jobset native success conditions (requires Jobset v0.6 on controller)
- REFACTORS THAT HAVE WENT INTO THIS COMMIT:
    - [argo][feedback] refactor dag template parameter /output setting
        - just move conditional block around
    - [argo][feedback] refactor references to `task_id_base` to `task_id_entropy`
        - these are set/used in the argo outputs and variable names
    - [argo][feedback] refactor references to `task-id-base` to `task-id-entropy`
        - these are uses a Argo Parameter Names.
    - [argo][feedback] refactor to match code style
    - [argo][feedback] refactor to match code style (refactor some conditionals)
    - [argo][feedback] remove k8s client and make `KubernetesArgoJobSet` directly use `kubernetes_sdk`
    - [argo][feedback] added `environment_variables_from_selectors` for code simplification
    - [argo][feedback] fix comment.
    - [argo][feedback] refactor condition for readabililty.
    - [argo][feedback] rollback temp boto3 installation change in metaflow env
    - [argo][feedback] remove rogue type hint
valayDave added a commit to valayDave/metaflow that referenced this issue Jul 19, 2024
> supported :
- foreach + parallel
- parallel with Argo
- dynamically set worker-counts.
- should work with @timeout / @project /@card etc.
- retries working with native Argo
- fully self contained jobset with argo support
    - Requires Jobset v0.6.0 [kubernetes-sigs/jobset#523]

> not-supported:
- support for catch

> Notes
- not using the `{{retries}}` like we do in container templates
- Instead passing down {{retries}} as a `inputs.parameters` which will
be accessible in the Jobset manifest.
- Temporary tweek to boto dep to ensure that boto install failures dont fail deployment.
- instead of relying on the kubernetes object, we freshly create a object in the ArgoContainer templates.
- Code in the same style as the kubernetes/argo integrations with explicit filling of variables and decoupled abstractions
- setting annotations explicitly as they wont be passed down from WorkflowTemplate level.
- support for jobset native success conditions (requires Jobset v0.6 on controller)
- REFACTORS THAT HAVE WENT INTO THIS COMMIT:
    - [argo][feedback] refactor dag template parameter /output setting
        - just move conditional block around
    - [argo][feedback] refactor references to `task_id_base` to `task_id_entropy`
        - these are set/used in the argo outputs and variable names
    - [argo][feedback] refactor references to `task-id-base` to `task-id-entropy`
        - these are uses a Argo Parameter Names.
    - [argo][feedback] refactor to match code style
    - [argo][feedback] refactor to match code style (refactor some conditionals)
    - [argo][feedback] remove k8s client and make `KubernetesArgoJobSet` directly use `kubernetes_sdk`
    - [argo][feedback] added `environment_variables_from_selectors` for code simplification
    - [argo][feedback] fix comment.
    - [argo][feedback] refactor condition for readabililty.
    - [argo][feedback] rollback temp boto3 installation change in metaflow env
    - [argo][feedback] remove rogue type hint
savingoyal pushed a commit to Netflix/metaflow that referenced this issue Jul 19, 2024
> supported :
- foreach + parallel
- parallel with Argo
- dynamically set worker-counts.
- should work with @timeout / @project /@card etc.
- retries working with native Argo
- fully self contained jobset with argo support
    - Requires Jobset v0.6.0 [kubernetes-sigs/jobset#523]

> not-supported:
- support for catch

> Notes
- not using the `{{retries}}` like we do in container templates
- Instead passing down {{retries}} as a `inputs.parameters` which will
be accessible in the Jobset manifest.
- Temporary tweek to boto dep to ensure that boto install failures dont fail deployment.
- instead of relying on the kubernetes object, we freshly create a object in the ArgoContainer templates.
- Code in the same style as the kubernetes/argo integrations with explicit filling of variables and decoupled abstractions
- setting annotations explicitly as they wont be passed down from WorkflowTemplate level.
- support for jobset native success conditions (requires Jobset v0.6 on controller)
- REFACTORS THAT HAVE WENT INTO THIS COMMIT:
    - [argo][feedback] refactor dag template parameter /output setting
        - just move conditional block around
    - [argo][feedback] refactor references to `task_id_base` to `task_id_entropy`
        - these are set/used in the argo outputs and variable names
    - [argo][feedback] refactor references to `task-id-base` to `task-id-entropy`
        - these are uses a Argo Parameter Names.
    - [argo][feedback] refactor to match code style
    - [argo][feedback] refactor to match code style (refactor some conditionals)
    - [argo][feedback] remove k8s client and make `KubernetesArgoJobSet` directly use `kubernetes_sdk`
    - [argo][feedback] added `environment_variables_from_selectors` for code simplification
    - [argo][feedback] fix comment.
    - [argo][feedback] refactor condition for readabililty.
    - [argo][feedback] rollback temp boto3 installation change in metaflow env
    - [argo][feedback] remove rogue type hint
@danielvegamyhre
Copy link
Contributor Author

Update: the upstream k8s issue has been fixed and cherry picks merged, and will be included in the patch release on 08/13, at which point we can bump our k8s api dependency packages

@danielvegamyhre
Copy link
Contributor Author

danielvegamyhre commented Aug 14, 2024

k8s api 1.31 packages with the fix mentioned above were released, but after attempting to bump our dependencies I ran into a compatibility issue with controller-runtime, which I found others have hit as well: kubernetes-sigs/controller-runtime#2925

Maintainers say controller-runtime v0.19.0 will be released soon which will support k8s api v0.31.0, so we'll have to wait a bit longer for the JobSet v0.6.0 release.

In the meantime I've added a couple more features to the "must have" feature list for this release, since due to these delays we ended up having time to implement them and include them in the release.

@danielvegamyhre
Copy link
Contributor Author

danielvegamyhre commented Aug 19, 2024

Update: the changes for #617 and #649 are needed urgently by customers and we cannot wait for dependency compatibility issues described in #523 to be resolved. So I will publish the v0.6.0 release today, then include the dependency version bumps in a patch release v0.6.1 once they are ready.

After upgrading to k8s.io packages to v0.31.0 and controller-runtime to v0.19.0, I ran into this issue which I haven't debugged yet.

@danielvegamyhre
Copy link
Contributor Author

Update: the changes for #617 and #649 are needed urgently by customers and we cannot wait for dependency compatibility issues described in #523 to be resolved. So I will publish the v0.6.0 release today, then include the dependency version bumps in a patch release v0.6.1 once they are ready.

After upgrading to k8s.io packages to v0.31.0 and controller-runtime to v0.19.0, I ran into this issue which I haven't debugged yet.

Actually, since @mimowo completed the cherry picks for the fix we should be able to use k8s v0.30.4 packages, which we just bumped to before the v0.6.0 release. Testing this now.

@danielvegamyhre
Copy link
Contributor Author

Release v0.6.0 published

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants