Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python SDK - Steps Template not serialized correctly with model_to_dict #7374

Closed
samuelBB opened this issue Dec 9, 2021 · 23 comments
Closed
Assignees
Labels
area/sdks problem/stale This has not had a response in some time type/bug

Comments

@samuelBB
Copy link

samuelBB commented Dec 9, 2021

Summary

Version. I am using the Python SDK on master:latest which at the moment corresponds to commit 22b0326 (though the last commit to actually modify the sdks folder appears to be aba6599).

Briefly, I am constructing a Workflow using the objects in the Python SDK which includes a steps template. When I try to serialize the object using the model_to_dict function, it appears that the steps template is not properly serialized to a dict but instead remains in its unserialized object form.

Reproducible Example

import yaml

from argo_workflows.model.container import Container
from argo_workflows.model.object_meta import ObjectMeta
from argo_workflows.model.io_argoproj_workflow_v1alpha1_template import \
    IoArgoprojWorkflowV1alpha1Template
from argo_workflows.model.io_argoproj_workflow_v1alpha1_workflow import \
    IoArgoprojWorkflowV1alpha1Workflow
from argo_workflows.model.io_argoproj_workflow_v1alpha1_workflow_spec import \
    IoArgoprojWorkflowV1alpha1WorkflowSpec
from argo_workflows.model.io_argoproj_workflow_v1alpha1_metadata import \
    IoArgoprojWorkflowV1alpha1Metadata
from argo_workflows.model.io_argoproj_workflow_v1alpha1_parallel_steps import \
    IoArgoprojWorkflowV1alpha1ParallelSteps
from argo_workflows.model.io_argoproj_workflow_v1alpha1_workflow_step import \
    IoArgoprojWorkflowV1alpha1WorkflowStep

from argo_workflows.model_utils import model_to_dict

manifest = IoArgoprojWorkflowV1alpha1Workflow(
    metadata=ObjectMeta(generate_name='myApp-'),
    spec=IoArgoprojWorkflowV1alpha1WorkflowSpec(
        entrypoint='myTemplate',
        templates=[
            IoArgoprojWorkflowV1alpha1Template(
                name="myTemplate",
                metadata=IoArgoprojWorkflowV1alpha1Metadata(
                    labels={"app": "myApp"}
                ),
                steps=[
                    IoArgoprojWorkflowV1alpha1ParallelSteps(
                        [
                            IoArgoprojWorkflowV1alpha1WorkflowStep(
                                name="mySteps",
                                template="whalesay",
                            )
                        ]
                    )
                ]
            ),
            IoArgoprojWorkflowV1alpha1Template(
                name='whalesay',
                container=Container(
                    image='docker/whalesay:latest',
                    command=['cowsay'],
                    args=['hello world']
                )
            )
        ]
    )
)

print(yaml.dump(model_to_dict(manifest)))

Output:

metadata:
  generateName: myApp-
spec:
  entrypoint: myTemplate
  templates:
  - metadata:
      labels:
        app: myApp
    name: myTemplate
    steps:
    - - !!python/object:argo_workflows.model.io_argoproj_workflow_v1alpha1_workflow_step.IoArgoprojWorkflowV1alpha1WorkflowStep
        _check_type: true
        _configuration: null
        _data_store:
          name: mySteps
          template: whalesay
        _path_to_item: !!python/tuple []
        _spec_property_naming: false
        _visited_composed_classes: !!python/tuple
        - !!python/name:argo_workflows.model.io_argoproj_workflow_v1alpha1_workflow_step.IoArgoprojWorkflowV1alpha1WorkflowStep ''
  - container:
      args:
      - hello world
      command:
      - cowsay
      image: docker/whalesay:latest
    name: whalesay

Hack to temporarily fix the issue

On the other hand, when I add a custom YAML representer to again apply model_to_dict whenever the YAML dumper sees a WorkflowStep type, I do get the correct result. To expand, this is just adding the following before the yaml.dump call:

def representer(dumper, data): 
     return dumper.represent_dict(model_to_dict(data))
yaml.add_representer(IoArgoprojWorkflowV1alpha1WorkflowStep, representer)

Now output is correct:

metadata:
  generateName: myApp-
spec:
  entrypoint: myTemplate
  templates:
  - metadata:
      labels:
        app: myApp
    name: myTemplate
    steps:
    - - name: mySteps
        template: whalesay
  - container:
      args:
      - hello world
      command:
      - cowsay
      image: docker/whalesay:latest
    name: whalesay

Hypotheses Regarding the Issue

While it's possible I am not specifying the Steps/WorkflowSteps correctly, I doubt this, as the type checking valdiation is not producing any error messages from this. I tried fiddling with this (removing the objects from a list, specifying as kwargs instead of args, etc) but to no avail.

As such, my hypothesis is that the recursive serialization in model_to_dict fails on the fact that Steps is specified with a nested list, so maybe it bottoms out before actually reaching the nested WorkflowStep object to serialize. Haven't traced exactly how, but thats my guess.

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

@terrytangyuan
Copy link
Member

Do you see any exceptions like in #7293?

@samuelBB
Copy link
Author

samuelBB commented Dec 9, 2021

@terrytangyuan I'm not submitting the constructed workflow with the API. At the moment I'm only interested in constructing a valid YAML, so I don't think that should be relevant?

@terrytangyuan
Copy link
Member

I see. That's an interesting use case. How are you planning to submit the constructed workflow? K8s Python SDK?

@samuelBB
Copy link
Author

samuelBB commented Dec 9, 2021

Or just plain kubectl on the output YAML. The main thing is that in my use case, I'm decoupling the workflow construction from the deployment. The reason to use the Python SDK for the construction is that we have a large amount of templating and similar k8s objects which is awkward and not-DRY in plain YAML but quite easy to work with in Python.

@terrytangyuan
Copy link
Member

terrytangyuan commented Dec 9, 2021

This might be an issue with openapi-gen's generated model_to_dict. Perhaps we should wrap it around with the representer() hack you mentioned.

@terrytangyuan
Copy link
Member

@flaviuvadan Any thoughts?

@samuelBB
Copy link
Author

samuelBB commented Dec 9, 2021

This might be an issue with openapi-gen's generatedmodel_to_dict. Perhaps we should wrap it around with the representer() hack you mentioned.

this makes me slightly nervous because what if there are other Argo objects which need to be hacked like this?

@terrytangyuan
Copy link
Member

terrytangyuan commented Dec 9, 2021

I would guess other objects are experiencing the same issue. A couple of options:

  1. We leave this to users to hack but document it (if model_to_dict is used often like this)
  2. We wrap this for all Argo objects. One problem is that some users may use other yaml modules import pyyaml as yaml and then this approach probably won't work.
  3. We find the root cause in openapi-gen. If this is a common use case, others might have found some fixes we can reference.

@samuelBB
Copy link
Author

samuelBB commented Dec 9, 2021

Re 1 and 2 - is there an easy way to generate a list of all Argo objects? So that you can just use the hack in a for loop.

Re 3 - I'm up for looking into the logic a bit, and as you said, maybe others have ideas here too.

@terrytangyuan
Copy link
Member

Re 1 and 2 - is there an easy way to generate a list of all Argo objects? So that you can just use the hack in a for loop.

You should be able find the list in the models module (with a filtering): https://github.com/argoproj/argo-workflows/blob/master/sdks/python/client/argo_workflows/models/__init__.py#L213-L343

@samuelBB
Copy link
Author

samuelBB commented Dec 9, 2021

This line appears to be the issue: https://github.com/argoproj/argo-workflows/blob/master/sdks/python/client/argo_workflows/model_utils.py#L1658

When the function recursively gets to manifest.spec.templates[0].steps, this is a list which means the case of L1649 from the file linked above is invoked. The for-loop on L1655 will process the only element in steps, i.e., steps[0], which is an instance of ModelSimple with repr:

[{'name': 'mySteps', 'template': 'whalesay'}]

As such, it will be captured by the case on L1658, and its value attribute is added to res. The problem is that this value is a list which contains the WorkflowStep object, which will now never be serialized.

I think the fix should be to an additional boolean check to L1658 that ensures that v.value is a non-iterable primitive type (as far as I can tell, this is any primitive type except list). This way, if v.value is an iterable or otherwise non-primitive, it will fall back to the last case in the loop of calling the function recursively on v. Alternatively, are we sure ParallelSteps should be a ModelSimple and not a ModelNormal? It's hard for me to tell why objects inherit from specific classes here.

TL;DR This issue should appear anytime we nest anything except a non-iterable primitive under a ModelSimple.

EDIT: I've revised this post as I believe what I said before was incorrect. It should now be accurate. I have tested my proposed "fix" of adding an additional check to L1658 which seems to work, though I am not 100% confident its full-proof as I am not totally familiar with the inner workings of the SDK.

@samuelBB
Copy link
Author

samuelBB commented Dec 10, 2021

Ok so here is an interesting thing I noticed, following up on my previous post: the class IoArgoprojWorkflowV1alpha1ParallelSteps appears to be the only class imported in argo_workflows.models which inherits from ModelSimple - perhaps this is the issue? Maybe this is supposed to be inheritance from ModelNormal as with the other objects?

EDIT: The docstring for ModelSimple explains that classes inherit this when "the parent class of models whose type != object in their swagger/openapi" which I checked is true for ParallelSteps (and see issue #2454 as I referenced in my comment below). So this is correct inheritance. In which case I think my analysis of model_to_dict above is the correct identification of the issue.

@samuelBB
Copy link
Author

this issue appears to be relevant, which explains a uniqueness of ParallelSteps regarding serialization: #2454

@flaviuvadan
Copy link
Contributor

flaviuvadan commented Dec 13, 2021

ParallelSteps is the only object that has an OpenAPI schema definition that defines ParallelSteps as an array. When we generate the SDK with OpenAPI 4.3.1, as suggested by #7363, the ParallelSteps object gets the following two OpenAPI fields:

    openapi_types = {
    }

    attribute_map = {
    }

The object has a method called to_dict, which is the equivalent of model_to_dict when we generate the SDK using OpenAPI 5.x. This method uses the openapi_types on objects to deserialize whatever, but ParallelSteps clearly doesn't have anything on it... When I print out the manifest, replicated based on this issue, I get an empty steps field in the template. I fiddled around with the OpenAPI schema side and wrote the following definition for ParallelSteps:

    "io.argoproj.workflow.v1alpha1.ParallelSteps": {
      "type": "array",
      "required": [
        "steps"
      ],
      "properties": {
        "steps": {
          "type": "array",
          "items": {
            "$ref": "#/definitions/io.argoproj.workflow.v1alpha1.WorkflowStep"
          }
        }
      }
    }

with the original being:

    "io.argoproj.workflow.v1alpha1.ParallelSteps": {
      "type": "array",
      "items": {
        "$ref": "#/definitions/io.argoproj.workflow.v1alpha1.WorkflowStep"
      }
    }

I generated the SDK and was able to obtain the steps from the template, but now we have a steps within a steps... which cannot be deserialized into a map[string]interface on the Argo Server side.

This issue with OpenAPI seems to be the closest to what we have here - an issue with generating an object that "extends"/"overwrites" a primitive. #7363 suggests we downgrade OpenAPI to 4.x so that we (1) merge the PR in a stable way to auto-publish the SDK and (2) fix #7293. Going forward I think there are 2 approaches to fix this, and I'd love to discuss them, but maybe Alex should be involved:

  • we modify the way we represent parallel steps so that it's not necessary to use array definitions in the OpenAPI schema. Rather, we represent it as an object with a steps field, as showcased above, and we get proper construction from OpenAPI. I like this one because it shields us from OpenAPI, which is not in our full control
  • we contribute, or maybe wait for, a fix in OpenAPI, so that the whole community benefits from a fix for building objects that extend arrays

I welcome any feedback on my analysis and proposed ways to move forward! I also played with the generated.proto file of Argo Workflows as an attempt to control the codegen behavior but to no avail. I think what @samuelBB showcased through his analysis is the equivalent of this but with code generated by OpenAPI 5.x, which faces the same problem, but with a different interface perhaps.

@terrytangyuan
Copy link
Member

we modify the way we represent parallel steps so that it's not necessary to use array definitions in the OpenAPI schema. Rather, we represent it as an object with a steps field, as showcased above, and we get proper construction from OpenAPI. I like this one because it shields us from OpenAPI, which is not in our full control

This would be a breaking change so we probably don't want to do it.

we contribute, or maybe wait for, a fix in OpenAPI, so that the whole community benefits from a fix for building objects that extend arrays

This will take time to get the fix in OpenAPI and require us to use the latest version of OpenAPI, which would introduce the more commonly seen issue #7293 back.

@samuelBB Since this is a small use case, I'd recommend that we defer this to OpenAPI-gen itself to fix and use the workaround you found for now. Once this is fixed on OpenAPI-gen, we'll explore upgrading to see if this has been fixed.

@samuelBB
Copy link
Author

@terrytangyuan

(1) I guess someone should open a new issue with OpenAPI-gen?

(2) Regarding using the workaround I found: Perhaps I'm misunderstanding, but I thought that according to what @flaviuvadan commented above, under OpenAPI-4.x (which will be used in #7363), ParallelSteps has an "empty" spec, so my workaround for generating YAML won't work, but moreover, you won't even be able to use steps at all, i.e., the server API can't even read it?

@terrytangyuan
Copy link
Member

@flaviuvadan Anything else we need to/can address here before the v3.3 release?

@flaviuvadan
Copy link
Contributor

@flaviuvadan Anything else we need to/can address here before the v3.3 release?

I think we need the custom object we discussed for passing to with_items.

@stale
Copy link

stale bot commented Feb 7, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the problem/stale This has not had a response in some time label Feb 7, 2022
@terrytangyuan
Copy link
Member

Bump. Assigning to @flaviuvadan to migrate the related code from Hera.

@stale stale bot removed the problem/stale This has not had a response in some time label Feb 7, 2022
@terrytangyuan terrytangyuan removed their assignment Feb 7, 2022
@stale
Copy link

stale bot commented Mar 2, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the problem/stale This has not had a response in some time label Mar 2, 2022
@stale stale bot closed this as completed Apr 16, 2022
@alexec alexec reopened this Apr 16, 2022
@stale stale bot removed the problem/stale This has not had a response in some time label Apr 16, 2022
@stale
Copy link

stale bot commented May 1, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.

@stale stale bot added the problem/stale This has not had a response in some time label May 1, 2022
@stale
Copy link

stale bot commented Jun 19, 2022

This issue has been closed due to inactivity. Feel free to re-open if you still encounter this issue.

@stale stale bot closed this as completed Jun 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/sdks problem/stale This has not had a response in some time type/bug
Projects
None yet
Development

No branches or pull requests

4 participants