Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix illegal memory access embedding #735

Closed
wants to merge 1 commit into from

Conversation

PhilippeMoussalli
Copy link
Contributor

PR that fixes issue that arises with error 139 when running embedding component with large number of rows (~100k). The issue seems to stem from multiple processes trying to access the GPU at the same time.

This should be a temporary fix until #566 is addressed

more info here

@PhilippeMoussalli PhilippeMoussalli force-pushed the fix-illegal-memory-access-embedding branch from f3872af to 026ec51 Compare December 19, 2023 14:09
@@ -16,6 +17,8 @@

logger = logging.getLogger(__name__)

dask.config.set(scheduler="processes")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the LocalCluster instead?

I thought it was used by default, but I think there is a bug. The docstrings say the default for cluster_type should be "local", but it seems to be "default" instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed seems to be that the wrong default was incorporated with the new pipeline interface changes.
I think we can set the cluster type to local again. But it will probably not fix the issue here, cause the issue was occurring with the LocalCluster already, no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this bug was introduced with the interface changes. I think it was always there. I assume the issue triggering this PR is due to the usage of threads (since processes fixes it), while the local cluster uses processes by default as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we then change the default cluster_type from default to local in the component spec or just explicitly change the cluster type for this component?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local_cluster.txt

So using the local cluster does seem to work since it uses processes. One small downside it's that it's currently very verbose and returns a lot of warnings compared to just explicitly setting the scheduler to use processes instead (see logs), but it did manage to complete successfully.

I'm fine with either solutions for the moment, any preferences?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the local cluster. since it's the recommended one by Dask.

@PhilippeMoussalli
Copy link
Contributor Author

Closing this PR in favor of defining cluster type in this PR

@RobbeSneyders RobbeSneyders deleted the fix-illegal-memory-access-embedding branch January 11, 2024 09:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants