Replies: 13 comments 17 replies
-
Beta Was this translation helpful? Give feedback.
-
There isn't a Base version of TestIncrementalForeignKeyConstraint, so pytest is trying to run it as is when I import it (which is very strange behavior indeed). |
Beta Was this translation helpful? Give feedback.
-
TestGetLastRelationModified doesn't seem to be a thing (this is the only search result in your repo). |
Beta Was this translation helpful? Give feedback.
-
Shouldn't |
Beta Was this translation helpful? Give feedback.
-
Hey @dataders the adapter-specific fixtures in #8616 are irritating to work with inside of the tests for other adapters (and as far as I can tell, there has never been adapter-specific fixture logic inside of the Could y'all fix that up so that it's easy for other adapter developers to specify our preferred date format and what not instead of hardcoding it to be |
Beta Was this translation helpful? Give feedback.
-
@dataders In dbt-athena we avoid using information_schema because it's really slow. I'm trying to implement the same for get_catalog_relations macro, and I took a similar approach, in the end is all down to overwrite _get_one_catalog_by_relations. Do we have a snippet somewhere that I can use as a reference to test that my macro |
Beta Was this translation helpful? Give feedback.
-
@dataders For dbt-oracle,
as this will be a breaking change for users, I am trying to understand if there is some change expected in |
Beta Was this translation helpful? Give feedback.
-
@dataders, dbt-teradata also sees the same error after upgrading to dbt-core 1.7.x
|
Beta Was this translation helpful? Give feedback.
-
@dataders , I made the changes you proposed in Teradata/dbt-teradata@416189d. But still shows the same error Below is the complete backtrace:01:14:22 Running with dbt=1.7.1
01:14:22 dbt version: 1.7.1
01:14:22 python version: 3.9.12
01:14:22 python path: C:\Users\mt255026\AppData\Local\Programs\Python\Python39\python.exe
01:14:22 os info: Windows-10-10.0.19045-SP0
01:14:22 Encountered an error:
non-default argument 'schema' follows default argument
01:14:22 Traceback (most recent call last):
File "C:\Users\mt255026\AppData\Local\Programs\Python\Python39\lib\site-packages\dbt\cli\requires.py", line 90, in wrapper
result, success = func(*args, **kwargs)
File "C:\Users\mt255026\AppData\Local\Programs\Python\Python39\lib\site-packages\dbt\cli\requires.py", line 75, in wrapper
return func(*args, **kwargs)
File "C:\Users\mt255026\AppData\Local\Programs\Python\Python39\lib\site-packages\dbt\cli\main.py", line 444, in debug
results = task.run()
File "C:\Users\mt255026\AppData\Local\Programs\Python\Python39\lib\site-packages\dbt\task\debug.py", line 123, in run
load_profile_status: SubtaskStatus = self._load_profile()
File "C:\Users\mt255026\AppData\Local\Programs\Python\Python39\lib\site-packages\dbt\task\debug.py", line 212, in _load_profile
profile: Profile = Profile.render(
File "C:\Users\mt255026\AppData\Local\Programs\Python\Python39\lib\site-packages\dbt\config\profile.py", line 436, in render
return cls.from_raw_profiles(
File "C:\Users\mt255026\AppData\Local\Programs\Python\Python39\lib\site-packages\dbt\config\profile.py", line 401, in from_raw_profiles
return cls.from_raw_profile_info(
File "C:\Users\mt255026\AppData\Local\Programs\Python\Python39\lib\site-packages\dbt\config\profile.py", line 355, in from_raw_profile_info
credentials: Credentials = cls._credentials_from_profile(
File "C:\Users\mt255026\AppData\Local\Programs\Python\Python39\lib\site-packages\dbt\config\profile.py", line 165, in _credentials_from_profile
cls = load_plugin(typename)
File "C:\Users\mt255026\AppData\Local\Programs\Python\Python39\lib\site-packages\dbt\adapters\factory.py", line 212, in load_plugin
return FACTORY.load_plugin(name)
File "C:\Users\mt255026\AppData\Local\Programs\Python\Python39\lib\site-packages\dbt\adapters\factory.py", line 58, in load_plugin
mod: Any = import_module("." + name, "dbt.adapters")
File "C:\Users\mt255026\AppData\Local\Programs\Python\Python39\lib\importlib\__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 850, in exec_module
File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
File "c:\dbt-teradata\dbt\adapters\teradata\__init__.py", line 1, in <module>
from dbt.adapters.teradata.connections import TeradataConnectionManager
File "c:\dbt-teradata\dbt\adapters\teradata\connections.py", line 17, in <module>
class TeradataCredentials(Credentials):
File "C:\Users\mt255026\AppData\Local\Programs\Python\Python39\lib\dataclasses.py", line 1021, in dataclass
return wrap(cls)
File "C:\Users\mt255026\AppData\Local\Programs\Python\Python39\lib\dataclasses.py", line 1013, in wrap
return _process_class(cls, init, repr, eq, order, unsafe_hash, frozen)
File "C:\Users\mt255026\AppData\Local\Programs\Python\Python39\lib\dataclasses.py", line 927, in _process_class
_init_fn(flds,
File "C:\Users\mt255026\AppData\Local\Programs\Python\Python39\lib\dataclasses.py", line 504, in _init_fn
raise TypeError(f'non-default argument {f.name!r} '
TypeError: non-default argument 'schema' follows default argument``` |
Beta Was this translation helpful? Give feedback.
-
Folks, I updated the materialized view section of this discussion. Please let me know if this is helpful, and if there's anything that I'm missing that would be useful to add. Thanks! |
Beta Was this translation helpful? Give feedback.
-
@mikealfare that would be for 1.8, right? Or is it coming earlier? If it's 1.8, how does it interact with #9171 ? |
Beta Was this translation helpful? Give feedback.
-
Starting with 1.7, the catalog.json generated is empty. I am debugging this problem. Any pointers? Generation of the catalog works fine with dbt-core 1.6.x We have a bug oracle/dbt-oracle#125 reported by a user |
Beta Was this translation helpful? Give feedback.
-
fal-ai/dbt-fal#919 |
Beta Was this translation helpful? Give feedback.
-
Overview
This discussion is for communicating to adapter maintainers the scope of work needed to make use of the changes in 1.7.0. If you have questions and concerns, please ask them here for posterity.
Please consider this a living document between now and the date of final release. If there's something missing, please comment below!
Loom video overview (12 min)
Adapter Maintainers: Upgrading to dbt-core v1.7.0 - Watch Video
release timeline
The below table gives the milestones between up to and including the final release. It will be updated with each subsequent release.
prior maintainer upgrade guides
Example Diffs
TL;DR
Excluding the work to consume our refactoring work for materialized views, there should be no breaking changes. Please tell me if I'm wrong, but I believe that things should work by bumping package and dependency versions. I recommend adding tests. Everything below is nice-to-have first steps toward long-term stable interfaces we will develop in future versions.
Breaking changes
#8846 is a pre-regression in
1.7.0rc1
that will be remediated for the final release of1.7.0
. The issue only impacts adapters that have overridden BaseAdapter.get_catalog() method. However, this does not affect adapters that have overridden theget_catalog()
, macro.If you wish to temporarily unblock yourself, you may change your method's function signature to match that of the base class, but ultimately, this call will be wrapped in the capability flag so you only need update it if you plan on supporting the more precise get_catalog functionality.
Changes
new capability support structure for adapters
dbt-core 1.7.0
improves performance of two existing features: source freshness and catalog metadata fetching (see below for more). However, these improvements are only available contingent upon underlying data platform support.Traditionally, this adapter interface to call this out would be a single jinja macro or python method that may be overridden accordingly, but the end result is one macro that varies in what it does across adapters.
Our solution here is to define adapter/database capability in a top-level structure, that then "dispatches" to the corresponding macro/function. This way there's a stronger separation of concerns.
A new static member variable on
BaseAdapter
called_capabilities
with typeCapabilityDict
can be overriden by adapter implementations. This new member variable is introduced to more easily flag to dbt-core whether or not an adapter supports a given dbt feature as well as whether the database itself supports the feature.dbt/adapters/capability.py
has great information on how it's structured.From
1.7
forward, for a given relevant feature we'll define a new Capability within the Capability class, so that it may be defined within an adapter. The possible support states are defined within theSupport
class, namely: "Unknown", "Unsupported", "NotImplemented", "Versioned", or "Full".If you as an adaper maintainer want to define that your data platform has does not support a new feature,
SomeNewFeature
, you'd define it as the below.I'm very excited at the potential of this feature. I can imagine that much of an adapter might be abstracted into a structure like this, so that where previously a macro-override was required, now it's just another Dict entry.
metadata freshness checks
#7012 #8704 #8795 dbt-labs/dbt-snowflake#796
Excerpt from #7102
How to implement
Note
This is the first of the two new features enabled via the new Capability dict.
Warning
if you're overriding
Adapter.calculate_freshness()
the type signature has changed so please update accordingly: dbt-core#8795#discussion_r1355356418If your data platform has
Full
support for this new functionality, you will need to:_capabilities
Dict (see above section) to define. in this case, the feature isTableLastModifiedMetadata
. See dbt-snowflake#796#impl.py for the change that's needed.get_relation_last_modified()
to define how to fetch this freshness metadata with a new macro. Note thatdefault__get_relation_last_modified()
is not implemented, so you must implement your own. Seesnowflake__get_relation_last_modified()
. The function signature is similar to that of the below describedget_catalog_tables_sql()
, but returns a table with four columns:schema
identifier
,last_modified
the actual freshness metadata of the tablesnapshotted_at
current timestamp so we know when dbt checks for thisFreshness Tests
TestGetLastRelationModified
TestListRelationsWithoutCaching
has been deprecated and broken into the following:TestListRelationsWithoutCachingSingle
TestListRelationsWithoutCachingFull
Catalog fetch performance improvements
#8521 #8648 dbt-snowflake#758
Note
This is the second of the two new features enabled via the new Capability dict.
The performance of parsing projects and generating documentation has always become a bottleneck when one of the two conditions are met:
A change for
1.7
addresses the second scenario. Theget_catalog()
macro has always accepted a list of schemas as an argument. All the relations that exist for the given list of schemas are output, regardless of whether or not dbt interacts with this object.The adapter implementation to solve this problem is three-fold:
get_catalog_relations()
which accepts a list of relations, rather than a list of schemas._capabilities
Dict (see above section) to define whether support forSchemaMetadataByRelations
isFull
,Unsupported
, orNotImplemented
. See dbt-snowflake#758#impl.py for the change that's needed.get_catalog()
macro in order to minimize redundancy with body ofget_catalog_relations()
. This introduces some additional macros:get_catalog_tables_sql()
copied straight from pre-existingget_catalog()
everything you would normally fetch fromINFORMATION_SCHEMA.tables
get_catalog_columns_sql()
copied straight from pre-existingget_catalog()
everything you would normally fetch fromINFORMATION_SCHEMA.columns
get_catalog_schemas_where_clause_sql(schemas)
copied straight from pre-existingget_catalog()
. This uses jinja to loop through the provided schema list and make a bigWHERE
clause of the form:WHERE info_schema.tables.table_schema IN "schema1" OR info_schema.tables.table_schema IN "schema2" [OR ...]
get_catalog_relations_where_clause_sql(relations)
this is likely the only new thingget_catalog_relations_where_clause_sql
This macro is effectively an evolution of the old
get_catalog
WHERE
clause, except now it has the following control flow.Keep in mind that
relation
in this instance can be a standalone schema, not necessarily an object with a three-part identifier.The result of the above is that dbt can, with "laser-precision" fetch metadata for only the relations it needs, rather than the superset of "all the relations in all the schemas in which dbt has relations".
Catalog Query Structure
Where the below mentioned
OBJECT_LIST
isrelations
orschemas
depending on the macro.Tests
TestDocsGenerateOverride
intests/functional/artifacts/test_override.py
was modified to cover this new behavior. If you are not already implementing this, I suggest that you should.Behavior of
dbt show
's--limit
flag#8496 #8641
dbt show
shipped in1.5.0
with a--limit
flag that, when provided, would limit the number of results that dbt would grab to display. It does not modify the original query, which means that even if you provide--limit 5
, the command will not complete until the entire underlying query is complete which can take a long time if the query's result set is large. This is especially evident becausedbt show
is now also used for dbt Cloud IDE's "preview" button.The fix was #8641, which extends the implementation of the
--limit
flag to wrap the underlying query into a CTE and append aLIMIT {limit}
clause.No change is needed if the adapter's data platform supports ANSI SQL
LIMIT
clauses. Any changes can be made in a dispatched version ofget_limit_subquery_sql()
, seedefault__get_limit_subquery()
. This was merged into1.7.0
, but also back-ported to1.5
and1.6
, so if a change is needed, it will likely also need to be back-ported and patched accordingly.dbt.tests.adapter.dbt_show.test_dbt_show
contains new tests to ensure the new behavior functions properly:BaseShowSqlHeader
BaseShowLimit
Migrate
date_spine()
Macro from dbt-utils to Core#8172 #8616
Following on from initiative #5520, dbt-utils.date_spine() (and the macros on which it directly depends) now lives within dbt-core in order to better support the semantic layer, which requires a date spine to get started.
Macros that are now testable and overridable within an adapter:
date_spine
generate_series
get_intervals_between
get_powers_of_two
The
default__
version of these macros is compatible with the adapters we currently support in dbt Cloud, with the exception of the following that should be migrated:dbt-trino-utils.date_spine
,tsql-utils.date_spine
Data Spine Tests
Important
Important! The following test cases must be added to the adapter to ensure compatibility now and moving forward. See #8616 changes to
dbt/tests/adapter/utils/
directory for more info.TestDateSpine
TestGenerateSeries
TestGetIntervalsBetween
TestGetPowersOfTwo
Storing Test Failures as View
#6914 #8653
There's a new config for tests,
store_failures_as
which can betable
,view
, orephemeral
.table
andview
will commonly be used to turn on the store failure feature.ephemeral
is mostly used to turn off the store failures feature at a lower grain (e.g. model) after it has been turned on at a higher level (e.g. project). The newstore_failures_as
config, when set by the user, will override the existing configstore_failures
, which will be deprecated within the next two minor releases, i.e. either1.8
or1.9
. If the user does not setstore_failures_as
,store_failures
will continue to behave as it has in the past until deprecation.Note
Provided that your adapter doesn't have a custom
test
materialization overriding Core's default, there shouldn't be work required here beyond adding the below tests. If thetest
materialization is overridden, then the change is to inspectconfig.get("store_failures", "table")
and use that as the relation_type instead of a hard-coded "table". There could be other changes required, but that's the gist of it. I'll leave this as standalone section rather than listing the available tests just in case.Store Failures Tests
TestStoreTestFailuresAsInteractions
TestStoreTestFailuresAsProjectLevelOff
TestStoreTestFailuresAsProjectLevelView
TestStoreTestFailuresAsGeneric
TestStoreTestFailuresAsProjectLevelEphemeral
TestStoreTestFailuresAsExceptions
Additional Tests
Important
These are new tests introduced into the adapter zone that you should have in your adapter.
TestIncrementalForeignKeyConstraint
TestCloneSameTargetAndState
dbt clone
SeedUniqueDelimiterTestBase
TestSeedWithWrongDelimiter
TestSeedWithEmptyDelimiter
Materialized Views Refactor
Note
Under construction
Materialized views introduced a few pieces of purely new functionality within
dbt-core
. This resulted in some refactoring work that extended beyond materialized views to fold them into the mix. The highlights worth calling out:on_configuration_change
), which requires inspecting the materialized view at run time!view == table
This limits the following functionality:
At a high level, the approach that was taken establishes a relation-type agnostic operation layer that dispatches down to the relation-type specific layer, where possible. This allows for more granular control, though requires some copy paste. We determined the benefits outweigh the costs related with violating DRY principles. Let's discuss the jinja templates first.
Jinja Updates
While the file structure ultimately does not matter for jinja, we adopted the structure below to more easily navigate the files. Additionally, we're striving for a one-to-one relation between files and macros (effectively the opposite of
adapters.sql
).These files all represent macros that need to be defined. Some of them likely already exist, like
create
fortable
andview
. In that case, I would move them here for clarity. Anddrop
may also exist, but likely in the formdrop {{ relation.type }} {{ relation }}
. Unfortunately that no longer works with materialized views. Instead of parsingrelation.type
, and to be consistent with breaking it out by type (e.g.create
'), we elected to go with a macro per type. If this really irks you, you can keep onedrop
macro and call it within each of thedrop
macros by type, but we need thedrop
macros by type since they are called withindbt-core
.The standout file
materializations/materialized_view.sql
contains a macro that gets used while building the materialization indbt-core
. It determines the differences between the current and future states. Some folks (e.g. the author) just put this inmaterialized_view/alter.sql
since there is a heavy correlation between the two; the output of one is the input of the other.In addition to creating these macros, you'll need to update things like
list_relations_without_caching
,get_catalog
, etc. to ensure that they include this new object type.dbt-redshift and dbt-bigquery are good examples of how these macros were implemented. There are some differences because they are different platforms, but there are many similarities as well. This should give you a good feel for what's required and how to extend that to match your needs.
Python Updates - Jinja Config
There are a few configuration updates on your
MyAdapterRelation
that are needed to support the jinja updates. In particular, these are needed:Python Updates - Relation Models
The approach we took for the jinja templates differs from that of tables and views. In the latter, the
config
object that's in the global jinja context gets parsed within the template. This embeds a lot of logic into the template. This is still a valid approach if you wish to go down this path. However, that will make change monitoring very difficult to implement. The alternative is to parse theconfig
object into a python object. This allows for much easier comparison of current state and future state. However, if we're already parsingconfig
for the future state, you might as well use that for the rest of the jinja macros as well. Ultimately, this is your choice, but we will talk about how we did this in the adapters dbt Labs maintains, which implements relations as a python object.I'll start by calling out the caveat that we wanted to implement base classes slowly, to avoid getting locked in functionality that we would like to change in the future. The base classes are sparse, which aligns with our goal of minimal (hopefully none) breaking changes. With that in mind, there is no reason that you need to inherit from the base relation object in
dbt-core
. If you'd like to build your own object, go for it. As mentioned above, it's possible to implement all (almost, I'll point that out later) of this within jinja, we just think that would be a terrible time. In general, you can think of the materialized view as a class that marshallsconfig
or database metadata data into a dataclass. Here's the suggested structure of a materialized view python object:Again, dbt-redshift and dbt-bigquery are examples of how we implemented this object. This structure allows you to easily compare the current object (
from_relation_results
) to the future object (from_model_node
). We also created this comparison as a python object by creating a method onMyAdapterRelation
: dbt-redshift and dbt-bigquery. Then ourget_materialized_view_configuration_changes
macro simply passes the two versions of the relation into this macro and hands back the results.Note
If you elect to go with a jinja implementation of
get_materialized_view_configuration_changes
, make sure you hand back an object with a property.has_changes == False
if there are no changes (versus an empty object, or empty list). This value gets checked in the materialization to see if there are changes to apply, or if a refresh should occur.Beta Was this translation helpful? Give feedback.
All reactions