-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move Dask Client configuration to Component class and use multi-GPU in embed_images
component
#852
Conversation
No explicit tests ? Do we need to update the docs ? |
Do you have a proposal for a useful test scenario?
I'll add something indeed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should set the LocalCluster
for as default for all components and let the user overwrite the client if needed.
The other components still not using the LocalCluster
, but it is the recommended way. Is there any reason why we shouldn't set the LocalCluster
as default for the other components?
@mrchtr with this change we are setting the |
Ah my bad, nvm |
ee00432
to
41adcbb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
This PR changes the `dask_client` method introduced in #852 into a general `setup` method. This method differs from `__init__` in that it allows users to return a state, which is passed into the `teardown` method by Fondant. This is necessary since the Dask client is not pickleable, and setting it as an instance attribute leads to issues when executing the `transform` method across processes (as is the case in the `PandasTransformComponent`).
It doesn't really make sense to let the user configure the Dask client used by a component, since this is specific to the component implementation. I therefore moved it to the Component class with a default implementation of the Dask
LocalCluster
. Note that due to a bug theLocalCluster
was not yet used as default. This therefore changes the default for all components, which were using the threaded scheduler before.One downside to the current implementation is that we require the component developer to call
super().__init__()
in their__init__
method. This is best practice though and highlighted by IDEs.