Replies: 2 comments
-
Very usefull feature. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Having version-controlled UDFs that do not slow the whole pipeline (hooks) and better management of grants would be a perfect addition! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Kicking off a discussion after conversation with @dataders on Slack, with an original mention back in Feb in #i-made-this
Straightforward UDF as a materialization implementation
I experimented with UDFs as a materialization in BigQuery a few months ago and it went pretty well - UDF as materialization seem to play quite nicely with the dependency graph and tests (writeup). I created a public repo as part of that work and I've seen a bit of traffic there and been contacted about it a couple of times.
My implementation is deliberately minimal, involving only a new materialization macro that could perhaps be implemented by adapters without any changes elsewhere in the dbt ecosystem. Improving the param list from a string felt like a good improvement but as it is, it actually solves the problem quite neatly in practice for my usecases. Those include testing, ref-ing within models and UDFs ref-ing one another. I've undertaken a significant refactor from hooks to this materialization, and some subsequent changes to the UDFs themselves without issues.
Usage examples
Examples come from https://github.com/brabster/pypi_vulnerabilities
Simple UDF
UDF Reference another UDF
Model reference UDF
UDFs deploying as part of DAG
Evidence that UDF support is a persistent demand for dbt
I wrote about testing and deploying UDFs via a hooks a couple of years ago and according to Google Search Console that post attracts thousands of impressions and over a hundred clicks a month, consistently, for the past couple of years - 2,500 clicks in the last 12 months - for queries like "dbt udf". I conclude that UDF handling in dbt is a long-running problem affecting lots of practitioners. I can see demand back to 2018. There are at least a couple of issues on the topic across adapters and core
UDFs vs. Macros
Why not just use dbt macros? Amazon Athena does not support SQL macros for some reason. I was forced to replicate what I'd achieved with UDFs on more capable platforms with macros, and I wrote up some advice for others in the same position... so I can speak to why I had a much less pleasant experience. In no particular order:
add_one
with contentx + 1
will work fine in most contexts, but when used like this:{{ add_one(100) }} * 2
, you get 102 instead of the expected 202In the examples I gave above, there are maybe eight or more small functions that assemble into a pure SQL semver matching function. I would not have fancied trying to do that as macros, and the SQL that would have been generated would, I suspect, be the stuff of nightmares. UDFs + dbt encouraged active decomposition, TDD and ended in a well-tested result I could confidently build on.
Non-SQL UDFs
I can only really speak to my experience on BigQuery - I think JS UDFs might be able participate directly with the implementation I've given. If it were me, I would focus on getting consistent support for SQL UDFs, as I think they should be supported across most platforms, rather than trying to come up with a solution for all possible variants of UDF functionality.
Current Situation
Example repo in https://github.com/brabster/dbt_materialized_udf
It looks like the materialization approach works quite well, solves a real problem, and doesn't require any changes or direct support in dbt itself. I don''t want to create a central repo that will attract the UDF materializations for multiple adapters and become a support problem like dbt-external-tables. It seems sensible to add this materialization to adapters where the communities, expertise and quality controls are already in place, which is why I'm asking here. What do you think is a good way forward? For example, should I raise a PR in the BigQuery adapter and PostgreSQL adapter repos to adopt the materializations that are in that repo (PostgreSQL support is proposed in an issue)
Beta Was this translation helpful? Give feedback.
All reactions