-
Notifications
You must be signed in to change notification settings - Fork 359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve argument streaming for remote job execution (part 1) #1397
base: devel
Are you sure you want to change the base?
Conversation
1931b5f
to
7f88c89
Compare
2eadf31
to
61cb018
Compare
5e28eea
to
5fbae20
Compare
15fe5e5
to
9be5cd1
Compare
9964051
to
7e82b6a
Compare
7e82b6a
to
d599164
Compare
d599164
to
9ef249d
Compare
a5d2b2f
to
630fc50
Compare
630fc50
to
728b9ff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't get all the way through the fine-toothed comb review before I ran out of days, but looking great- I think this solves the most important parts of the internal arg-splatting problem well.
TRANSMIT = 'transmit' | ||
|
||
|
||
@dataclass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest conditionally adding kw_only=True
to the decorator when >= 3.10 (until it becomes the minimum, then just hardcode), and document that keyword args are required. For an object with this many fields, posargs in init are kind of a maintenance nightmare for both us and users. Nothing we can do for 3.9, but at least we can dodge the ordering bullet for the future.
I see you're using posargs in some of the tests for private_data_dir
, but even that one might potentially not be required in the future if some of the things people have been asking for come to pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss this when you return.
I did it this way to maintain full backward compatibility. For the cases where users instantiate their own RunnerConfig
object, the first parameter (for private_data_dir
) is positional. Making it kw_only=True
will break that particular use case.
@@ -60,52 +64,179 @@ class RunnerConfig(BaseConfig): | |||
>>> r = Runner(config=rc) | |||
>>> r.run() | |||
|
|||
This class inherites all the initialization parameters of the `BaseConfig` parent class, plus: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class inherites all the initialization parameters of the `BaseConfig` parent class, plus: | |
This class inherits all the initialization parameters of the `BaseConfig` parent class, plus: |
@@ -44,13 +48,13 @@ class ExecutionMode(): | |||
RAW = 3 | |||
|
|||
|
|||
@dataclass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same suggestion around conditionally setting kw_only=True
when >= 3.10
Changes to Remote Job Execution Argument Streaming
Reason For Change
Remote job execution involves a
Transmitter
node streaming the job keyword arguments as a JSON object to aWorker
node. Adding new keyword arguments to theinterface.run()
orinterface.run_async()
methods can cause errors within the remote job streaming interface when an olderWorker
node receives the new, unrecognized keyword argument from a newerTransmitter
node (worker/transmitter version mismatch).For more background information on the streaming process, see this page of the documentation.
Fixes #1324
Solution
The total fix will require a multi-stage solution:
Transmitter
process stream only keyword arguments that are actually specified and have a value different from their default value. This does not fix the case when a new keyword argument is introduced and used with a non-default value, but does fix olderWorker
nodes from failing when the new keyword is not used since the new keyword will not be sent in the argument stream.Worker
process gracefully fail on unrecognized keywords.Change Details
RunnerConfig
object will become the main source for supplying job parameters instead ofkwargs
.BaseConfig
andRunnerConfig
classes into dataclasses to allow us to properly define arguments, their data types, and their default values.__init__()
method is now autogenerated, care is taken to keep the same name for a few class attributes that did not have the same name as their argument. This is done through the use of Python classproperty
decorators. Internally, the attribute is referenced by the non-alias version.RunnerConfig
attributes are streamable to aWorker
. Attributes that are NEVER streamable will have explicit dataclass metadata to mark it as such (example,field(metadata={MetaValues.TRANSMIT: False})
).interface.run()
andinterface.run_async()
changed to accept aRunnerConfig
object.kwargs
parameters; backwards compatibleRunnerConfig
object and a few select parameters.init_runner()
,dump_artifacts()
, etc) are modified to allow for receiving aRunnerConfig
object instead of keyword arguments.Transmitter
modified to query theRunnerConfig
object to retrieve the keyword arguments to transmit.interface.run()
method moved toBaseConfig
andRunnerConfig
docs.TODO