[pull] main from dbt-labs:main #29

pull · 2023-08-16T16:52:42Z

See Commits and Changes for more details.

Can you help keep this open source service alive? 💖 Please sponsor : )

* Add more accurate RSS high water mark measurement for Linux * Add changelog entry. * Checks to avoid exception based flow control, per review. * Fix review nit.

* Add fixtures for setting and resettign flags for unit tests * Remove unnecessary `set_from_args` in non `unittest.TestCase` based unit tests In the previous commit we added a pytest fixture which sets and tears down the global flags arg via `set_from_args` for every pytest based unit test. Previously we had added a `set_from_args` in tests or test files to reset the global flags from if they were modified by a previous test. This is no longer necessary because of the work done in the previous commit. Note: We did not modify any tests that use the `unittest.TestCase` class because they don't use pytest fixtures. Thus those tests need to continue operating as they currently do until we shift them to pytest unit tests. * Utilize the new `args_for_flags` fixture for setting of flags in `test_contracts_graph_parsed.py` * Convert `test_compilation.py` from `TestCase` tests to pytest tests We did this so in the next commit we can drop the unnecessary `set_from_args` in the next commit. That will be it's own commit because converting these tests is a restructuring that doing separately makes things easier to follow. That is to say, all changes in this commit were just to convert the tests to pytest, no other changes were made. * Drop unnecessary `set_from_args` in `test_compilation.py` * Add return types to all methods in `test_compilation.py` * Reduce imports from `compilation` in `test_compilation.py` * Update `test_logging.py` now that we don't need to worry about global flags * Conditionally import `Generator` type for python 3.8 In python 3.9 `Generator` was moved to `collections.abc` and deprecated in `typing`. We still support 3.8 and thus need to be conditionally importing `Generator`. We should remove this in the future when we drop support for 3.8.

* add saved query resource type * nits

…al with multiple nulls (#10202)

… warning/error (#10221)

* add new mode * reflect dbt-common changes * changelog * rename var

…10239)

* Create `runtime_config` fixture and necessary upstream fixtures * Check for better scoped `ProjectContractError` in test_runtime tests Previously in `test_unsupported_version_extra_config` and `test_archive_not_allowed` we were checking for `DbtProjectError`. This worked because `ProjectContractError` is a subclass of `DbtProjectError`. However, if we check for `DbtProjectError` in these tests than, some tangential failure which raises a `DbtProejctError` type error would go undetected. As we plan on modifying these tests to be pytest in the coming commits, we want to ensure that the tests are succeeding for the right reason. * Convert `test_str` of `TestRuntimeConfig` to a pytest test using fixtures * Convert `test_from_parts` of `TestRuntimeConfig` to a pytest test using fixtures While converting `test_from_parts` I noticed the comment > TODO(jeb): Adapters must assert that quoting is populated? This led me to beleive that `quoting` shouldn't be "fully" realized in our project fixture unless we're saying that it's gone through adapter instantiation. Thus I update the `quoting` on our project fixture to be an empty dict. This change affected `test__str__` in `test_project.py` which we thus needed to update accordingly. * Convert runtime version specifier tests to pytest tests and move to test_project We've done two things in this commit, which arguably _should_ have been done in two commits. First we moved the version specifier tests from `test_runtime.py::TestRuntimeConfig` to `test_project.py::TestGetRequiredVersion` this is because what is really being tested is the `_get_required_version` method. Doing it via `RuntimeConfig.from_parts` method made actually testing it a lot harder as it requires setting up more of the world and running with a _full_ project config dict. The second thing we did was convert it from the old unittest implementation to a pytest implementation. This saves us from having to create most of the world as we were doing previously in these tests. Of note, I did not move the test `test_unsupported_version_range_bad_config`. This test is a bit different from the rest of the version specifier tests. It was introduced in [1eb5857](1eb5857) of [#2726](#2726) to resolve [#2638](#2638). The focus of #2726 was to ensure the version specifier checks were run _before_ the validation of the `dbt_project.yml`. Thus what this test is actually testing for is order of operations at parse time. As such, this is really more a _functional_ test than a unit test. In the next commit we'll get this test moved (and renamed) * Create a better test for checking that version checks come before project schema validation * Convert `test_get_metadata` to pytest test * Refactor `test_archive_not_allowed` to functional test We do already have tests that ensure "extra" keys aren't allowed in the dbt_project.yaml. This test is different because it's checking that a specific key, `archive`, isn't allowed. We do this because at one point in time `archive` _was_ an allowed key. Specifically, we stopped allowing `archive` in dbt-core 0.15.0 via commit [f26948d](f26948d). Given that it's been 5 years and a major version, we could probably remove this test, but let's keep it around unless we start supporting `archive` again. * Convert `warn_for_unused_resource_config_paths` tests to use pytest

* init push arbitrary configs for generic tests pr * iterative work * initial test design attempts * test reformatting * test rework, have basic structure for 3 of 4 passing, need to figure out how to best represent same key error, failing correctly though * swap up test formats for new config dict and mixed varitey to run of dbt parse and inspecting the manifest * modify tests to get passing, then modify the TestBuilder class work from earlier to be more dry * add changelog * modify code to match suggested changes around seperate methods and test id fix * add column_name reference to init for deeper nested _render_values can use the input * add type annotations * feedback based on mike review

* Fire skipped events at debug level Closes #8774 * add changelog entry * Update to work with 1.9.*. * Add tests for --fail-fast not showing skip messages unless --debug. * Update test that works by itself, but assumes to much to work in integration tests. --------- Co-authored-by: Scott Gigante <scottgigante@users.noreply.github.com>

* Update semantic interface requirement * Update setup.py

* Rename `maniest` fixture in `test_selector` to `mock_manifest` We have a globally available `manifest` fixture in our unit tests. In the coming commits we're going to add tests to the file which use the gloablly available `manifest` fixture. Prior to this commit, the locally defined `manifest` fixture was taking precidence. To get around this, the easiest solution was to rename the locally defined fixture. I had tried to isolate the locally defined fixture by moving it, and the relevant tests to a test class like `TestNodeSelector`. However because of _how_ the relevant tests were parameterized, this proved difficult. Basically for readability we define a variable which holds a list of all the parameterization variables. By moving to a test class, the definition of the variables would have had to be defined directly in the parameterization macro call. Although possible, it made the readability slighty worse. It might be worth doing anyway in the long run, but instead I used a less heavy handed alternative (already described) * Improve type hinting in `tests/unit/utils/manifest.py` * Ensure `args` get set from global flags for `runtime_config` fixture in unit tests The `Compiler.compile` method accesses `self.config.args.which`. The `config` is the `RuntimeConfig` the `Compiler` was instantiated with. Our `runtime_config` fixture was being instatiated with an empty dict for the `args` property. Thus a `which` property of the args wasn't being made avaiable, and if `compile` was run a runtime error would occur. To solve this, we've begun instantiating the args from the global flags via `get_flags()`. This works because we ensure the `set_test_flags` fixture is run first which calls `set_from_args`. * Create a `make_manifest` utility function for use in unit tests and fixture creation * Refactor `Compiler` and `NodeSelector` tests in `test_selector.py` to use pytesting methodology * Remove parsing tests that exist in `test_selector.py` We had some tests in `test_selector.py::GraphTest` that didn't add anything ontop of what was already being tested else where in the file except the parsing of models. However, the tests in `test_parser.py::ModelParserTest` cover everything being tested here (and then some). Thus these tests in `test_selector.py::GraphTest` are unnecessary and can be deleted. * Move `test__partial_parse` from `test_selector.py` to `test_manifest.py` There was a test `test__partial_parse` in `test_selector.py` which tested the functionality of `is_partial_parsable` of the `ManifestLoader`. This doesn't really make sense to exist in `test_selector.py` where we are testing selectors. We test the `ManifestLoader` class in `test_manifest.py` which seemed like a more appropriate place for the test. Additionally we renamed the test to `test_is_partial_parsable_by_version` to more accurately describe what is being tested. * Make `test_selector`'s manifest fixture name even more specific * Add type hint to `expected_nodes` in `test_selector.py` tests In the test `tests/unit/graph/test_selector.py::TestCompiler::test_two_models_simple_ref` we have a variable `expected_nodes` that we are setting via a list comprehension. At a glance it isn't immediately obvious what `expected_nodes` actually is. It's a list, but a list of what? One suggestion was to explicitly write out the list of strings. However, I worry about the brittleness of doing so. That might be the way we head long term, but as a compromise for now, I've added a type hint the the variable definition.

…10285)

#10253)

* update on_skip to adjust for node that do not have schema * changelog

* Add changelog entry. * Update schemas and test fixtures for new snapshot meta-column * Add back comment.

…1056) * Rename `batch_info` to `previous_batch_results` * Exclude `previous_batch_results` from serialization of model node to avoid jinja context bloat * Drop `previous_batch_results` key from `test_manifest.py` unit tests In 4050e37 we began excluding `previous_batch_results` from the serialized representation of the ModelNode. As such, we no longer need to check for it in `test_manifest.py`.

* Add `batch_id` to jinja context of microbatch batches * Add changie doc * Update `format_batch_start` to assume `batch_start` is always provided * Add "runtime only" property `batch_context` to `ModelNode` By it being "runtime only" we mean that it doesn't exist on the artifact and thus won't be written out to the manifest artifact. * Begin populating `batch_context` during materialization execution for microbatch batches * Fix circular import * Fixup MicrobatchBuilder.batch_id property method * Ensure MicrobatchModelRunner doesn't double compile batches We were compiling the node for each batch _twice_. Besides making microbatch models more expensive than they needed to be, double compiling wasn't causing any issue. However the first compilation was happening _before_ we had added the batch context information to the model node for the batch. This was leading to models which try to access the `batch_context` information on the model to blow up, which was undesirable. As such, we've now gone and skipped the first compilation. We've done this similar to how SavedQuery nodes skip compilation. * Add `__post_serialize__` method to `BatchContext` to ensure correct dict shape This is weird, but necessary, I apologize. Mashumaro handles the dictification of this class via a compile time generated `to_dict` method based off of the _typing_ of th class. By default `datetime` types are converted to strings. We don't want that, we want them to stay datetimes. * Update tests to check for `batch_context` * Update `resolve_event_time_filter` to use new `batch_context` * Stop testing for batchless compiled code for microbatch models In 45daec7 we stopped an extra compilation that was happening per batch prior to the batch_context being loaded. Stopping this extra compilation means that compiled sql for the microbatch model without the event time filter / batch context is no longer produced. We have discussed this and _believe_ it is okay given that this is a new node type that has not hit GA yet. * Rename `ModelNode.batch_context` to `ModelNode.batch` * Rename `build_batch_context` to `build_jinja_context_for_batch` The name `build_batch_context` was confusing as 1) We have a `BatchContext` object, which the method was not building 2) The method builds the jinja context for the batch As such it felt appropriate to rename the method to more accurately communicate what it does. * Rename test macro `invalid_batch_context_macro_sql` to `invalid_batch_jinja_context_macro_sql` This rename was to make it more clear that the jinja context for a batch was being checked, as a batch_context has a slightly different connotation. * Update changie doc

Co-authored-by: Courtney Holcomb <courtneyeholcomb@gmail.com>

* Improve performance of select_children() and select_parents() * Add changelog entry.

…es (#11100)

* New function to add graph edges. * Clean up, leave out flag temporarily for testing. * Put new test edge behavior behind flag. * Final draft of documentaiton.

* microbatch: split out first and last batch to run in serial * only run pre_hook on first batch, post_hook on last batch * refactor: internalize parallel to RunTask._submit_batch * Add optional `force_sequential` to `_submit_batch` to allow for skipping parallelism check * Force last batch to run sequentially * Force first batch to run sequentially * Remove batch_idx check in `should_run_in_parallel` `should_run_in_parallel` shouldn't, and no longer needs to, take into consideration where in batch exists in a larger context. The first and last batch for a microbatch model are now forced to run sequentially by `handle_microbatch_model` * Begin skipping batches if first batch fails * Write custom `on_skip` for `MicrobatchModelRunner` to better handle when batches are skipped This was necessary specifically because the default on skip set the `X of Y` part of the skipped log using the `node_index` and the `num_nodes`. If there was 2 nodes and we are on the 4th batch of the second node, we'd get a message like `SKIPPED 4 of 2...` which didn't make much sense. We're likely in a future commit going to add a custom event for logging the start, result, and skipping of batches for better readability of the logs. * Add microbatch pre-hook, post-hook, and sequential first/last batch tests * Fix/Add tests around first batch failure vs latter batch failure * Correct MicrobatchModelRunner.on_skip to handle skipping the entire node Previously `MicrobatchModelRunner.on_skip` only handled when a _batch_ of the model was being skipped. However, that method is also used when the entire microbatch model is being skipped due to an upstream node error. Because we previously _weren't_ handling this second case, it'd cause an unhandled runtime exception. Thus, we now need to check whether we're running a batch or not, and there is no batch, then use the super's on_skip method. * Correct conditional logic for setting pre- and post-hooks for batches Previously we were doing an if+elif for setting pre- and post-hooks for batches, where in the `if` matched if the batch wasn't the first batch, and the `elif` matched if the batch wasn't the last batch. The issue with this is that if the `if` was hit, the `elif` _wouldn't_ be hit. This caused the first batch to appropriately not run the `post-hook` but then every hook after would run the `post-hook`. * Add two new event types `LogStartBatch` and `LogBatchResult` * Update MicrobatchModelRunner to use new batch specific log events * Fix event testing * Update microbatch integration tests to catch batch specific event types --------- Co-authored-by: Quigley Malcolm <quigley.malcolm@dbtlabs.com>

* Update single batch test case to check for generic exceptions * Explicitly skip last final batch execution when there is only one batch Previously if there was only one batch, we would try to execute _two_ batches. The first batch, and a "last" non existent batch. This would result in an unhandled exception. * Changie doc

…it code (#11115) * Update partial success test to assert partial successes mean that the run failed * Update results interpretation to include `PartialSuccess` as failure status

* fix MicrobatchExecutionDebug message * Fix typing in `describe_batch` to convince mypy `batch_start` exists when needed --------- Co-authored-by: Quigley Malcolm <quigley.malcolm@dbtlabs.com>

…le adapter doesn't support it (#11145) * Begin producing warning when attempting to force concurrent batches without adapter support Batches of microbatch models can be executed sequentially or concurrently. We try to figure out which to do intelligently. As part of that, we implemented an override, the model config `concurrent_batches`, to allow the user to bypass _some_ of our automatic detection. However, a user _cannot_ for batches to run concurrently if the adapter doesn't support concurrent batches (declaring support is opt in). Thus, if an adapter _doesn't_ support running batches concurrently, and a user tries to force concurrent execution via `concurrent_batches`, then we need to warn the user that that isn't happening. * Add custom event type for warning about invalid `concurrent_batches` config * Fire `InvalidConcurrentBatchesConfig` warning via `warn_or_error` so it can be silenced

…odified (#11166)

Co-authored-by: Doug Beatty <44704949+dbeatty10@users.noreply.github.com>

* roadmap post december 2024 * fix yml spacing * fix code snippet format

This reverts commit 95c090b.

pull bot added the ⤵️ pull label Aug 16, 2023

peterallenwebb and others added 29 commits May 21, 2024 11:01

Add more accurate RSS high water mark measurement for Linux (#10177)

0d297c2

* Add more accurate RSS high water mark measurement for Linux * Add changelog entry. * Checks to avoid exception based flow control, per review. * Fix review nit.

add saved query resource type (#10168)

6813007

* add saved query resource type * nits

Reorganize tests to match their code location (#10191)

ad61200

Remove Print and use event system (#10131)

eb71ad9

change port bind and add a unittest (#10208)

0c08d7a

Add most recent dbt-docs changes (#10217)

366d4ad

Maximally parallelize dbt clone (#10129)

fb10bb4

Fix: Order-insensitive unit test equality assertion for expected/actu…

84456f5

…al with multiple nulls (#10202)

add --host flag to docs serve (#10228)

c2d4643

Fix: prefer disabled project nodes to external node (#10223)

e0783c2

[Fix] Renaming or removing a contracted model raises a BreakingChange…

3e37d77

… warning/error (#10221)

add recording types (#10175)

d8e38c1

* add new mode * reflect dbt-common changes * changelog * rename var

Move StatsItem, StatsDict, TableMetadata to dbt-common (#10238)

81386a7

Button to open an issue for the docs.getdbt.com documentation repo (#…

1554828

…10239)

update dbt_utils version to 1.2.0 (#10260)

8c850b5

Move to test concrete class over abstract class (#10243)

a677abd

Fix issues with inline nodes and selectors (#10264)

a980599

Update semantic interface requirement (#10276)

e53420c

* Update semantic interface requirement * Update setup.py

Fix snapshot config to work in yaml files (#10275)

4df120e

Improve handling of error when loading schema file which is a list (#…

c5be8e2

…10285)

Move CatalogKey, ColumnMetadata, ColumnMap, CatalogTable to dbt-common (

e977b3e

#10253)

create a happy path project fixture (#10291)

e699f5d

update on_skip to adjust for node that do not have schema (#10302)

8639290

* update on_skip to adjust for node that do not have schema * changelog

deflake source freshness hooks test (#10308)

d4a6482

gshank and others added 30 commits November 21, 2024 17:24

Bump libpq-dev to 13.18-0+deb11u1 in docker/Dockerfile (#11029)

f5f0735

Fix #11012: Catch DbtRuntimeError for hooks (#11023)

f582ac2

Add New Config Properties and Schema for Snapshot Hard Deletes (#10972)

ad575ec

* Add changelog entry. * Update schemas and test fixtures for new snapshot meta-column * Add back comment.

Pin mashumaro to <3.15 (#11046)

407f6ca

Fixes dbt retry does not respect --threads (#10591)

3464be7

[Tidy first] move microbatch compilation to .compile method (#11063)

1b7d9b5

Access DEBUG flag through get_flags() (#11069)

fdfe03d

Update core to support DSI 0.8.3 (#10990)

ff6745c

Co-authored-by: Courtney Holcomb <courtneyeholcomb@gmail.com>

Improve performance of select_children() and select_parents() (#11099)

1472b86

* Improve performance of select_children() and select_parents() * Add changelog entry.

Implement partial parsing for singular data tests configs in yaml fil…

e32b8a9

…es (#11100)

Improve the Performance Characteristics of add_test_edges() (#11092)

afe25a9

* New function to add graph edges. * Clean up, leave out flag temporarily for testing. * Put new test edge behavior behind flag. * Final draft of documentaiton.

Fix microbatch model PartialSuccess status to result in non-zero ex…

983cbb4

…it code (#11115) * Update partial success test to assert partial successes mean that the run failed * Update results interpretation to include `PartialSuccess` as failure status

fix MicrobatchExecutionDebug message (#11071)

fc6167a

* fix MicrobatchExecutionDebug message * Fix typing in `describe_batch` to convince mypy `batch_start` exists when needed --------- Co-authored-by: Quigley Malcolm <quigley.malcolm@dbtlabs.com>

update adapter version messages (#10919)

6621015

Create a no-op exposure runner (#11082)

7df04b0

add allow additional property for Model and SourceDefinition (#11138)

4b1f1c4

Auto-response for bug reports during holiday break (#11152)

a175793

Fix yaml snapshot specification with data tests (#11156)

6076cf7

Check modified contracts when doing state:modified (#11161)

88e953e

Fix for dbt_project.yml "tests" config resulting in incorrect state:m…

bf18b59

…odified (#11166)

Add tags to SavedQueries (#10987)

97ffc37

Custom SQL for get source maxLoadedAt (#11163)

f2222d2

Fix circular import (#11137)

95c090b

Co-authored-by: Doug Beatty <44704949+dbeatty10@users.noreply.github.com>

Roadmap update (Dec 2024) (#11173)

459d156

* roadmap post december 2024 * fix yml spacing * fix code snippet format

Revert "Fix circular import (#11137)" (#11175)

0bf38ce

This reverts commit 95c090b.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] main from dbt-labs:main #29

[pull] main from dbt-labs:main #29

pull bot commented Aug 16, 2023 •

edited

Loading

[pull] main from dbt-labs:main #29

Are you sure you want to change the base?

[pull] main from dbt-labs:main #29

Conversation

pull bot commented Aug 16, 2023 • edited Loading

pull bot commented Aug 16, 2023 •

edited

Loading