-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-1229] [Feature] Cross-database cast
macro
#84
Comments
cast
macrocast
macro
Option XCalling it for a known, explicit type would look like this: {{ cast(expression, type_integer()) }} Calling it for a variable data type would look like this: {{ cast(expression, api.Column.translate_type(data_type)) }} The default implementation might look like this: -- core/dbt/include/global_project/macros/utils/cast.sql
{% macro cast(expression, data_type) %}
{{ adapter.dispatch('cast', 'dbt') (expression, data_type) }}
{% endmacro %}
{% macro default__cast(expression, data_type) -%}
cast({{ expression }} as {{ data_type }})
{%- endmacro %} Option YCalling it for an explicit type would look like this: {{ cast(expression, "integer") }} Calling it for a variable data type would look like this: {{ cast(expression, data_type) }} The default implementation would look like: -- core/dbt/include/global_project/macros/utils/cast.sql
{% macro cast(expression, data_type) %}
{{ adapter.dispatch('cast', 'dbt') (expression, data_type) }}
{% endmacro %}
{% macro default__cast(expression, data_type) -%}
cast({{ expression }} as {{ api.Column.translate_type(data_type) }})
{%- endmacro %} |
@dbeatty10 Love the way you're thinking about this. I like Option Y. It's elegant, and it should return the right result given all inputs. The only case where it will be incorrect is if the adapter's type translation code is wrong, by asserting that two types are aliases when they're not identical. The only downside I can think of is surprise for end users. Imagine the following scenario on BigQuery: # user writes this code
{{ cast(some_column, "integer") }}
# expects to see
cast(some_column as integer)
# actually sees (in logs / compiled code)
cast(some_column as int64) Which is ... actually correct! Just potentially surprising, that there's an implicit type translation happening. (To be fair, BigQuery now supports I don't think we need a "back-out" to avoid that. It would be a bug in the adapter's |
Documenting a little more research here.
|
@dbeatty10 I'm working trying to implement this functionality in my dbt project right now to cross Redshift and SQL Server for a few very specific examples. Could you please explain this more? "Calling it for a variable data type would look like this:" Why/when would I use a variable data type that isn't something like One thing I would highlight for you and @jtcohen6 is that strings and integers are easy and anything about timestamps and timezones are HARD. They're a better concept to use for anything cross-SQL dialect. I was in charge of scheduling software for two years of my life, please believe me when I say timezones may be the most bug-prone, infuriating area of programming. Knowing that, I did this analysis last year for Redshift because each column below does something different. 😭 😭 😭
select
current_setting('timezone')
, current_date --current_date() errors out
--No milliseconds and no timezone
, getdate() --getdate does not exist
--milliseconds no timezone (because UTC is the currently set timezone)
, sysdate --timestamp format
, current_timestamp at time zone 'UTC' as ct_utc
, current_timestamp::timestamptz at time zone 'UTC' as ct_tz_utc
, sysdate::timestamptz at time zone 'UTC' as tz_utc
, sysdate::timestamptz at time zone 'utc' as tz_utc_lowercase
--With milliseconds and timezone
, current_timestamp at time zone 'UTC'::timestamptz as ct_utc_tz
, now()
, sysdate at time zone 'UTC' as utc
, current_timestamp as ct
, current_timestamp::timestamptz as ct_tz
, sysdate::timestamptz --timestamptz format
-- UTC to CDT
-- In CDT with timezone in the value
, sysdate::timestamptz at time zone 'CDT'::timestamptz as tz_cdt_tz
-- In CDT no timezone in the value
, sysdate::timestamptz at time zone 'cdt' as tz_cdt_lowercase
-- CDT to UTC
-- In CDT time(the next day) with timezone in the value
, sysdate at time zone 'cdt' as cdt
, sysdate at time zone 'CDT'::timestamptz as cdt_tz
(Now to throw a wrench in the whole thing Redshift is deprecating current_timestamp() for getdate(). Because of course when you want a timestamp you'd call a date function... 🤦 ) For your two options above, my first impression was Option X was better because |
My very specific cases at the moment are: A) Cast a null value to a timestamp to make sure the table creates that field with the right data type. (I seem to remember BigQuery being the worst when having to deal with this.)
B) Cast the current_timestamp(or a field with a value that's already a timestamp) to a timestamp. Think coalesce() and concat() cases, or in the future timestamp to timestamptz.
C) Cast a timestamp to a timestamp that either maintains or adds its timezone. Note that:
|
[Preview](https://docs-getdbt-com-git-dbeatty10-patch-3-dbt-labs.vercel.app/reference/dbt-jinja-functions/cross-database-macros) > [!NOTE] > I didn't make much of an attempt at versioning this thoughtfully. So please update this as-needed to bring it in line with expectations. ## What are you changing in this pull request and why? dbt-labs/dbt-adapters#84 was implemented in commit dbt-labs/dbt-adapters@5a50be7 within PR dbt-labs/dbt-adapters#55 So this docs PR adds it to the listing of [cross-database macros](https://docs.getdbt.com/reference/dbt-jinja-functions/cross-database-macros). ## Checklist - [x] Review the [Content style guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md) so my content adheres to these guidelines.
Is this your first time submitting a feature request?
Describe the feature
Per conversation with @jtcohen6 we are interested in creating a
cast()
macro meant to behave like theCAST
operator described in the SQL standard.Usage likely to be something like:
{{ cast("some expression", type_string()) }}
Which some databases might render as:
Describe alternatives you've considered
Who will this benefit?
Like all cross-database macros, we anticipate this being most relevant to dbt package maintainers.
It would also be useful in implementing this:
Are you interested in contributing this feature?
Yep!
Anything else?
N/A
The text was updated successfully, but these errors were encountered: