-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix illegal memory access embedding #735
Conversation
f3872af
to
026ec51
Compare
@@ -16,6 +17,8 @@ | |||
|
|||
logger = logging.getLogger(__name__) | |||
|
|||
dask.config.set(scheduler="processes") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the LocalCluster
instead?
I thought it was used by default, but I think there is a bug. The docstrings say the default for cluster_type
should be "local", but it seems to be "default" instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed seems to be that the wrong default was incorporated with the new pipeline interface changes.
I think we can set the cluster type to local again. But it will probably not fix the issue here, cause the issue was occurring with the LocalCluster already, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this bug was introduced with the interface changes. I think it was always there. I assume the issue triggering this PR is due to the usage of threads (since processes fixes it), while the local cluster uses processes by default as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we then change the default cluster_type
from default
to local
in the component spec or just explicitly change the cluster type for this component?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So using the local cluster does seem to work since it uses processes. One small downside it's that it's currently very verbose and returns a lot of warnings compared to just explicitly setting the scheduler to use processes instead (see logs), but it did manage to complete successfully.
I'm fine with either solutions for the moment, any preferences?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the local cluster. since it's the recommended one by Dask.
Closing this PR in favor of defining cluster type in this PR |
PR that fixes issue that arises with error 139 when running embedding component with large number of rows (~100k). The issue seems to stem from multiple processes trying to access the GPU at the same time.
This should be a temporary fix until #566 is addressed
more info here