You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The general idea is that if we see a UDF on the CPU we can pull back to the CPU just the columns that it needs. Do the processing on the CPU and then send the resulting column back to the GPU for more processing.
There are a few things that make this hard.
There is no good way to know if we should run this on the GPU or not. There may be a few situations when it will be easy to tell, but for now I think we just want a config than enables or disables this feature. Once we have more experience, we can start to come up with a plan for heuristics.
Memory Management. We have no good way right now to avoid going over the target batch size when doing a project. Most of the code that is written for going from rows to columns assumes that it can batch things how it wants, so we might have to re-write some code to be able to support dynamically growing the data, or more likely have us do multiple smaller batches that we then end up concatenating before the expression is done.
Metrics. It really would be nice to know how much time we are spending on the CPU to do this processing for a UDF. But we have not set up anything to do it right now, and the APIs are not configured for it either. (This might be follow-on work) but it would be great to have some way for project or others to have a metric that we can use to know how much time is being spent on these UDFs and have a relatively clean way to do it.
Multiple UDFs. There is the possibility that the output of one UDF is fed into another UDF, or there is a little bit of processing in between them. When we hit situations like this it probably would be best to keep as much of that processing on the CPU at once, but when to do the cutoff and how to bind all of them together starts to be difficult. For now, I would say that we just do the basic thing, and we will have sub-optimal performance when we run into these types of odd situations. But we should have a follow-on issue for us to go back and look at how to do this properly once we have more experience with it.
We can probably begin from a simple case with one UDF, and confirm the complicated case of multiple UDFs later.
Stage 1: (Release 21.12)
Enable one UDF running on CPU like the pseudo code.
Add a rapids configuration to enable/disable this feature.
@firestarman this issue targets 21.12 but the description covers more than what will be done in 21.12. It would be good to file a separate issue or set of issues to cover the desired functionality that will be addressed beyond 21.12 so this can track what needs to be done for the 21.12 release. For example, #3897 is merged and is the bulk of the work, but will we try to address #3942 or #3904 for 21.12?
@firestarman this issue targets 21.12 but the description covers more than what will be done in 21.12. It would be good to file a separate issue or set of issues to cover the desired functionality that will be addressed beyond 21.12 so this can track what needs to be done for the 21.12 release. For example, #3897 is merged and is the bulk of the work, but will we try to address #3942 or #3904 for 21.12?
Filed #3979 to tack the remaining tasks.
I plan to try to fix #3942 first to catch up the 12 release. While #3904 would be probaly addressed beyond 21.12.
The general idea is that if we see a UDF on the CPU we can pull back to the CPU just the columns that it needs. Do the processing on the CPU and then send the resulting column back to the GPU for more processing.
There are a few things that make this hard.
We can probably begin from a simple case with one UDF, and confirm the complicated case of multiple UDFs later.
Stage 1: (Release 21.12)
Stage 2: (#3979)
How will it co-work with current implementations (UDF compiler and RapidsUDF) ? The below is the order of precedence for where a UDF runs.
The text was updated successfully, but these errors were encountered: