Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve argument streaming for remote job execution (part 1) #1397

Draft
wants to merge 18 commits into
base: devel
Choose a base branch
from

Conversation

Shrews
Copy link
Contributor

@Shrews Shrews commented Sep 24, 2024

Changes to Remote Job Execution Argument Streaming

Reason For Change

Remote job execution involves a Transmitter node streaming the job keyword arguments as a JSON object to a Worker node. Adding new keyword arguments to the interface.run() or interface.run_async() methods can cause errors within the remote job streaming interface when an older Worker node receives the new, unrecognized keyword argument from a newer Transmitter node (worker/transmitter version mismatch).

For more background information on the streaming process, see this page of the documentation.

Fixes #1324

Solution

The total fix will require a multi-stage solution:

  • First stage of the solution (implemented in this PR) will have the Transmitter process stream only keyword arguments that are actually specified and have a value different from their default value. This does not fix the case when a new keyword argument is introduced and used with a non-default value, but does fix older Worker nodes from failing when the new keyword is not used since the new keyword will not be sent in the argument stream.
  • The second stage (not implemented here) will have the Worker process gracefully fail on unrecognized keywords.

Change Details

  • RunnerConfig object will become the main source for supplying job parameters instead of kwargs.
    • Turns the BaseConfig and RunnerConfig classes into dataclasses to allow us to properly define arguments, their data types, and their default values.
      • Since the __init__() method is now autogenerated, care is taken to keep the same name for a few class attributes that did not have the same name as their argument. This is done through the use of Python class property decorators. Internally, the attribute is referenced by the non-alias version.
    • By default, all RunnerConfig attributes are streamable to a Worker. Attributes that are NEVER streamable will have explicit dataclass metadata to mark it as such (example, field(metadata={MetaValues.TRANSMIT: False})).
  • interface.run() and interface.run_async() changed to accept a RunnerConfig object.
    • Can now be used in one of two ways:
      • Old-style: method takes only kwargs parameters; backwards compatible
      • New-style: method takes a RunnerConfig object and a few select parameters.
    • Non-public methods lower in the call hierarchy (init_runner(), dump_artifacts(), etc) are modified to allow for receiving a RunnerConfig object instead of keyword arguments.
  • Transmitter modified to query the RunnerConfig object to retrieve the keyword arguments to transmit.
  • Includes a new porting guide in the documentation describing the change.
    • Parameter docs for interface.run() method moved to BaseConfig and RunnerConfig docs.

TODO

  • Testing with Controller?

@Shrews Shrews force-pushed the kwarg-protocol-3 branch 3 times, most recently from 1931b5f to 7f88c89 Compare September 24, 2024 19:32
@Shrews Shrews force-pushed the kwarg-protocol-3 branch 2 times, most recently from 2eadf31 to 61cb018 Compare October 14, 2024 16:45
@github-actions github-actions bot added the docs Changes to documentation label Oct 14, 2024
@Shrews Shrews force-pushed the kwarg-protocol-3 branch 2 times, most recently from 5e28eea to 5fbae20 Compare October 14, 2024 18:36
@Shrews Shrews changed the title Keyword argument protocol - config dataclasses Improve argument streaming for remote job execution (part 1) Oct 14, 2024
@Shrews Shrews force-pushed the kwarg-protocol-3 branch 5 times, most recently from 15fe5e5 to 9be5cd1 Compare October 17, 2024 18:48
@Shrews Shrews force-pushed the kwarg-protocol-3 branch 2 times, most recently from 9964051 to 7e82b6a Compare November 4, 2024 20:39
@Shrews Shrews force-pushed the kwarg-protocol-3 branch 3 times, most recently from a5d2b2f to 630fc50 Compare November 14, 2024 20:15
Copy link
Member

@nitzmahone nitzmahone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't get all the way through the fine-toothed comb review before I ran out of days, but looking great- I think this solves the most important parts of the internal arg-splatting problem well.

TRANSMIT = 'transmit'


@dataclass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest conditionally adding kw_only=True to the decorator when >= 3.10 (until it becomes the minimum, then just hardcode), and document that keyword args are required. For an object with this many fields, posargs in init are kind of a maintenance nightmare for both us and users. Nothing we can do for 3.9, but at least we can dodge the ordering bullet for the future.

I see you're using posargs in some of the tests for private_data_dir, but even that one might potentially not be required in the future if some of the things people have been asking for come to pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss this when you return.

I did it this way to maintain full backward compatibility. For the cases where users instantiate their own RunnerConfig object, the first parameter (for private_data_dir) is positional. Making it kw_only=True will break that particular use case.

@@ -60,52 +64,179 @@ class RunnerConfig(BaseConfig):
>>> r = Runner(config=rc)
>>> r.run()

This class inherites all the initialization parameters of the `BaseConfig` parent class, plus:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This class inherites all the initialization parameters of the `BaseConfig` parent class, plus:
This class inherits all the initialization parameters of the `BaseConfig` parent class, plus:

@@ -44,13 +48,13 @@ class ExecutionMode():
RAW = 3


@dataclass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same suggestion around conditionally setting kw_only=True when >= 3.10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Changes to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Account for streaming payload splatting version incompatibilities
2 participants