Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] CPU based UDF to run efficiently and transfer data back to GPU for supported operations #3855

Closed
firestarman opened this issue Oct 19, 2021 · 3 comments
Assignees
Labels
feature request New feature or request P0 Must have for release

Comments

@firestarman
Copy link
Collaborator

firestarman commented Oct 19, 2021

The general idea is that if we see a UDF on the CPU we can pull back to the CPU just the columns that it needs. Do the processing on the CPU and then send the resulting column back to the GPU for more processing.

There are a few things that make this hard.

  1. There is no good way to know if we should run this on the GPU or not. There may be a few situations when it will be easy to tell, but for now I think we just want a config than enables or disables this feature. Once we have more experience, we can start to come up with a plan for heuristics.
  2. Memory Management. We have no good way right now to avoid going over the target batch size when doing a project. Most of the code that is written for going from rows to columns assumes that it can batch things how it wants, so we might have to re-write some code to be able to support dynamically growing the data, or more likely have us do multiple smaller batches that we then end up concatenating before the expression is done.
  3. Metrics. It really would be nice to know how much time we are spending on the CPU to do this processing for a UDF. But we have not set up anything to do it right now, and the APIs are not configured for it either. (This might be follow-on work) but it would be great to have some way for project or others to have a metric that we can use to know how much time is being spent on these UDFs and have a relatively clean way to do it.
  4. Multiple UDFs. There is the possibility that the output of one UDF is fed into another UDF, or there is a little bit of processing in between them. When we hit situations like this it probably would be best to keep as much of that processing on the CPU at once, but when to do the cutoff and how to bind all of them together starts to be difficult. For now, I would say that we just do the basic thing, and we will have sub-optimal performance when we run into these types of odd situations. But we should have a follow-on issue for us to go back and look at how to do this properly once we have more experience with it.

We can probably begin from a simple case with one UDF, and confirm the complicated case of multiple UDFs later.

Stage 1: (Release 21.12)

Stage 2: (#3979)

  • Consider the case of exceeding column batch size after UDF
  • Investigate how multiple UDFs work together and how we can do.

How will it co-work with current implementations (UDF compiler and RapidsUDF) ? The below is the order of precedence for where a UDF runs.

  1. The UDF to Catalyst compiler is enabled and it was able to translate it.
  2. It is tagged with GPU support (a RapidsUDF), so it should run on the GPU using the provided APIs.
  3. This new feature is enabled, and we move the data to the CPU to run this and then move the result back.
@jlowe
Copy link
Contributor

jlowe commented Oct 29, 2021

@firestarman this issue targets 21.12 but the description covers more than what will be done in 21.12. It would be good to file a separate issue or set of issues to cover the desired functionality that will be addressed beyond 21.12 so this can track what needs to be done for the 21.12 release. For example, #3897 is merged and is the bulk of the work, but will we try to address #3942 or #3904 for 21.12?

@Salonijain27 Salonijain27 added this to the Nov 1 - Nov 12 milestone Oct 31, 2021
@firestarman
Copy link
Collaborator Author

@firestarman this issue targets 21.12 but the description covers more than what will be done in 21.12. It would be good to file a separate issue or set of issues to cover the desired functionality that will be addressed beyond 21.12 so this can track what needs to be done for the 21.12 release. For example, #3897 is merged and is the bulk of the work, but will we try to address #3942 or #3904 for 21.12?

Filed #3979 to tack the remaining tasks.
I plan to try to fix #3942 first to catch up the 12 release. While #3904 would be probaly addressed beyond 21.12.

@firestarman
Copy link
Collaborator Author

Closing this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request P0 Must have for release
Projects
None yet
Development

No branches or pull requests

3 participants