-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check use cases for dbt-athena
Python support
#388
Comments
I'm going to keep some running notes here as I investigate:
Note I stopped investigating individual transformation scripts in detail after gaining confidence that I had considered all of the possible edge cases. |
@dfsnow After some investigation, my expectation is that Python models will only have limited utility for our ingests and transformations. Python models via PySpark are still very experimental (there have been no new releases since the release that added preliminary support for it in Feburary) and Athena PySpark has two limitations that are currently deal killers for many of our most complex ingest/transformation scripts:
Still, I think there are some ingest scripts that could be refactored to use Python models, particularly ones that read data from public URLs and perform limited transformations on them (e.g.
On the transformation front, I think we should continue to focus on refactoring transformations to SQL where possible (#99), which is tractable and involves a well-supported dbt approach (SQL models). I expect that in the process of doing so, we'll also end up identifying transformations that would be good candidates for Python (i.e. transformations that don't work well in SQL but are simple enough to work in Python without needing external libraries), and our work on the ingest front will help us determine how viable those transformations would be as Python models. Edited to add: I think we can also give the refactor of I'll create issues for all of these tasks and then close this one out. |
We may be able to:
The text was updated successfully, but these errors were encountered: