Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move Dask Client configuration to Component class and use multi-GPU in embed_images component #852

Merged
merged 6 commits into from
Feb 15, 2024

Conversation

RobbeSneyders
Copy link
Member

It doesn't really make sense to let the user configure the Dask client used by a component, since this is specific to the component implementation. I therefore moved it to the Component class with a default implementation of the Dask LocalCluster. Note that due to a bug the LocalCluster was not yet used as default. This therefore changes the default for all components, which were using the threaded scheduler before.

One downside to the current implementation is that we require the component developer to call super().__init__() in their __init__ method. This is best practice though and highlighted by IDEs.

@GeorgesLorre
Copy link
Collaborator

No explicit tests ?

Do we need to update the docs ?

@RobbeSneyders
Copy link
Member Author

No explicit tests ?

Do you have a proposal for a useful test scenario?

Do we need to update the docs ?

I'll add something indeed.

Copy link
Contributor

@mrchtr mrchtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should set the LocalCluster for as default for all components and let the user overwrite the client if needed.
The other components still not using the LocalCluster, but it is the recommended way. Is there any reason why we shouldn't set the LocalCluster as default for the other components?

@RobbeSneyders
Copy link
Member Author

@mrchtr with this change we are setting the LocalCluster as default for all components.

@mrchtr
Copy link
Contributor

mrchtr commented Feb 15, 2024

@mrchtr with this change we are setting the LocalCluster as default for all components.

Ah my bad, nvm

Copy link
Collaborator

@GeorgesLorre GeorgesLorre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@RobbeSneyders RobbeSneyders merged commit b73aa3e into main Feb 15, 2024
11 checks passed
@RobbeSneyders RobbeSneyders deleted the feature/dask-client branch February 15, 2024 10:02
RobbeSneyders added a commit that referenced this pull request Feb 20, 2024
This PR changes the `dask_client` method introduced in #852 into a
general `setup` method.

This method differs from `__init__` in that it allows users to return a
state, which is passed into the `teardown` method by Fondant. This is
necessary since the Dask client is not pickleable, and setting it as an
instance attribute leads to issues when executing the `transform` method
across processes (as is the case in the `PandasTransformComponent`).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants