Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] main from dbt-labs:main #29

Open
wants to merge 681 commits into
base: main
Choose a base branch
from
Open

[pull] main from dbt-labs:main #29

wants to merge 681 commits into from

Conversation

pull[bot]
Copy link

@pull pull bot commented Aug 16, 2023

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

@pull pull bot added the ⤵️ pull label Aug 16, 2023
peterallenwebb and others added 29 commits May 21, 2024 11:01
* Add more accurate RSS high water mark measurement for Linux

* Add changelog entry.

* Checks to avoid exception based flow control, per review.

* Fix review nit.
* Add fixtures for setting and resettign flags for unit tests

* Remove unnecessary `set_from_args` in non `unittest.TestCase` based unit tests

In the previous commit we added a pytest fixture which sets and tears down
the global flags arg via `set_from_args` for every pytest based unit test.
Previously we had added a `set_from_args` in tests or test files to reset
the global flags from if they were modified by a previous test. This is no
longer necessary because of the work done in the previous commit.

Note: We did not modify any tests that use the `unittest.TestCase` class
because they don't use pytest fixtures. Thus those tests need to continue
operating as they currently do until we shift them to pytest unit tests.

* Utilize the new `args_for_flags` fixture for setting of flags in `test_contracts_graph_parsed.py`

* Convert `test_compilation.py` from `TestCase` tests to pytest tests

We did this so in the next commit we can drop the unnecessary `set_from_args`
in the next commit. That will be it's own commit because converting these
tests is a restructuring that doing separately makes things easier to follow.
That is to say, all changes in this commit were just to convert the tests to
pytest, no other changes were made.

* Drop unnecessary `set_from_args` in `test_compilation.py`

* Add return types to all methods in `test_compilation.py`

* Reduce imports from `compilation` in `test_compilation.py`

* Update `test_logging.py` now that we don't need to worry about global flags

* Conditionally import `Generator` type for python 3.8

In python 3.9 `Generator` was moved to `collections.abc` and deprecated
in `typing`. We still support 3.8 and thus need to be conditionally
importing `Generator`. We should remove this in the future when we drop
support for 3.8.
* add saved query resource type

* nits
* add new mode

* reflect dbt-common changes

* changelog

* rename var
* Create `runtime_config` fixture and necessary upstream fixtures

* Check for better scoped `ProjectContractError` in test_runtime tests

Previously in `test_unsupported_version_extra_config` and
`test_archive_not_allowed` we were checking for `DbtProjectError`. This
worked because `ProjectContractError` is a subclass of `DbtProjectError`.
However, if we check for `DbtProjectError` in these tests than, some tangential
failure which raises a `DbtProejctError` type error would go undetected. As
we plan on modifying these tests to be pytest in the coming commits, we want to
ensure that the tests are succeeding for the right reason.

* Convert `test_str` of `TestRuntimeConfig` to a pytest test using fixtures

* Convert `test_from_parts` of `TestRuntimeConfig` to a pytest test using fixtures

While converting `test_from_parts` I noticed the comment
>  TODO(jeb): Adapters must assert that quoting is populated?

This led me to beleive that `quoting` shouldn't be "fully" realized
in our project fixture unless we're saying that it's gone through
adapter instantiation. Thus I update the `quoting` on our project
fixture to be an empty dict. This change affected `test__str__` in
`test_project.py` which we thus needed to update accordingly.

* Convert runtime version specifier tests to pytest tests and move to test_project

We've done two things in this commit, which arguably _should_ have been done in
two commits. First we moved the version specifier tests from `test_runtime.py::TestRuntimeConfig`
to `test_project.py::TestGetRequiredVersion` this is because what is really being
tested is the `_get_required_version` method. Doing it via `RuntimeConfig.from_parts` method
made actually testing it a lot harder as it requires setting up more of the world and
running with a _full_ project config dict.

The second thing we did was convert it from the old unittest implementation to a pytest
implementation. This saves us from having to create most of the world as we were doing
previously in these tests.

Of note, I did not move the test `test_unsupported_version_range_bad_config`. This test
is a bit different from the rest of the version specifier tests. It was introduced in
[1eb5857](1eb5857)
of [#2726](#2726) to resolve [#2638](#2638).
The focus of #2726 was to ensure the version specifier checks were run _before_ the validation
of the `dbt_project.yml`. Thus what this test is actually testing for is order of
operations at parse time. As such, this is really more a _functional_ test than a
unit test. In the next commit we'll get this test moved (and renamed)

* Create a better test for checking that version checks come before project schema validation

* Convert `test_get_metadata` to pytest test

* Refactor `test_archive_not_allowed` to functional test

We do already have tests that ensure "extra" keys aren't allowed in
the dbt_project.yaml. This test is different because it's checking that
a specific key, `archive`, isn't allowed. We do this because at one point
in time `archive` _was_ an allowed key. Specifically, we stopped allowing
`archive` in dbt-core 0.15.0 via commit [f26948d](f26948d).
Given that it's been 5 years and a major version, we could probably remove
this test, but let's keep it around unless we start supporting `archive` again.

* Convert `warn_for_unused_resource_config_paths` tests to use pytest
* init push arbitrary configs for generic tests pr

* iterative work

* initial test design attempts

* test reformatting

* test rework, have basic structure for 3 of 4 passing, need to figure out how to best represent same key error, failing correctly though

* swap up test formats for new config dict and mixed varitey to run of dbt parse and inspecting the manifest

* modify tests to get passing, then modify the TestBuilder class work from earlier to be more dry

* add changelog

* modify code to match suggested changes around seperate methods and test id fix

* add column_name reference to init for deeper nested _render_values can use the input

* add type annotations

* feedback based on mike review
* Fire skipped events at debug level

Closes #8774

* add changelog entry

* Update to work with 1.9.*.

* Add tests for --fail-fast not showing skip messages unless --debug.

* Update test that works by itself, but assumes to much to work in integration tests.

---------

Co-authored-by: Scott Gigante <scottgigante@users.noreply.github.com>
* Update semantic interface requirement

* Update setup.py
* Rename `maniest` fixture in `test_selector` to `mock_manifest`

We have a globally available `manifest` fixture in our unit tests. In the
coming commits we're going to add tests to the file which use the gloablly
available `manifest` fixture. Prior to this commit, the locally defined
`manifest` fixture was taking precidence. To get around this, the easiest
solution was to rename the locally defined fixture.

I had tried to isolate the locally defined fixture by moving it, and the relevant
tests to a test class like `TestNodeSelector`. However because of _how_ the relevant
tests were parameterized, this proved difficult. Basically for readability we define
a variable which holds a list of all the parameterization variables. By moving to a
test class, the definition of the variables would have had to be defined directly in
the parameterization macro call. Although possible, it made the readability slighty
worse. It might be worth doing anyway in the long run, but instead I used a less heavy
handed alternative (already described)

* Improve type hinting in `tests/unit/utils/manifest.py`

* Ensure `args` get set from global flags for `runtime_config` fixture in unit tests

The `Compiler.compile` method accesses `self.config.args.which`. The `config`
is the `RuntimeConfig` the `Compiler` was instantiated with. Our `runtime_config`
fixture was being instatiated with an empty dict for the `args` property. Thus
a `which` property of the args wasn't being made avaiable, and if `compile` was run
a runtime error would occur. To solve this, we've begun instantiating the args from
the global flags via `get_flags()`. This works because we ensure the `set_test_flags`
fixture is run first which calls `set_from_args`.

* Create a `make_manifest` utility function for use in unit tests and fixture creation

* Refactor `Compiler` and `NodeSelector` tests in `test_selector.py` to use pytesting methodology

* Remove parsing tests that exist in `test_selector.py`

We had some tests in `test_selector.py::GraphTest` that didn't add
anything ontop of what was already being tested else where in the file
except the parsing of models. However, the tests in `test_parser.py::ModelParserTest`
cover everything being tested here (and then some). Thus these tests
in `test_selector.py::GraphTest` are unnecessary and can be deleted.

* Move `test__partial_parse` from `test_selector.py` to `test_manifest.py`

There was a test `test__partial_parse` in `test_selector.py` which tested
the functionality of `is_partial_parsable` of the `ManifestLoader`. This
doesn't really make sense to exist in `test_selector.py` where we are
testing selectors. We test the `ManifestLoader` class in `test_manifest.py`
which seemed like a more appropriate place for the test. Additionally we
renamed the test to `test_is_partial_parsable_by_version` to more accurately
describe what is being tested.

* Make `test_selector`'s manifest fixture name even more specific

* Add type hint to `expected_nodes` in `test_selector.py` tests

In the test `tests/unit/graph/test_selector.py::TestCompiler::test_two_models_simple_ref`
we have a variable `expected_nodes` that we are setting via a list comprehension.
At a glance it isn't immediately obvious what `expected_nodes` actually is. It's a
list, but a list of what? One suggestion was to explicitly write out the list of strings.
However, I worry about the brittleness of doing so. That might be the way we head long
term, but as a compromise for now, I've added a type hint the the variable definition.
* update on_skip to adjust for node that do not have schema

* changelog
gshank and others added 30 commits November 21, 2024 17:24
* Add changelog entry.

* Update schemas and test fixtures for new snapshot meta-column

* Add back comment.
…1056)

* Rename `batch_info` to `previous_batch_results`

* Exclude `previous_batch_results` from serialization of model node to avoid jinja context bloat

* Drop `previous_batch_results` key from `test_manifest.py` unit tests

In 4050e37 we began excluding
`previous_batch_results` from the serialized representation of the
ModelNode. As such, we no longer need to check for it in `test_manifest.py`.
* Add `batch_id` to jinja context of microbatch batches

* Add changie doc

* Update `format_batch_start` to assume `batch_start` is always provided

* Add "runtime only" property `batch_context` to `ModelNode`

By it being "runtime only" we mean that it doesn't exist on the artifact
and thus won't be written out to the manifest artifact.

* Begin populating `batch_context` during materialization execution for microbatch batches

* Fix circular import

* Fixup MicrobatchBuilder.batch_id property method

* Ensure MicrobatchModelRunner doesn't double compile batches

We were compiling the node for each batch _twice_. Besides making microbatch
models more expensive than they needed to be, double compiling wasn't
causing any issue. However the first compilation was happening _before_ we
had added the batch context information to the model node for the batch. This
was leading to models which try to access the `batch_context` information on the
model to blow up, which was undesirable. As such, we've now gone and skipped
the first compilation. We've done this similar to how SavedQuery nodes skip
compilation.

* Add `__post_serialize__` method to `BatchContext` to ensure correct dict shape

This is weird, but necessary, I apologize. Mashumaro handles the
dictification of this class via a compile time generated `to_dict`
method based off of the _typing_ of th class. By default `datetime`
types are converted to strings. We don't want that, we want them to
stay datetimes.

* Update tests to check for `batch_context`

* Update `resolve_event_time_filter` to use new `batch_context`

* Stop testing for batchless compiled code for microbatch models

In 45daec7 we stopped an extra compilation
that was happening per batch prior to the batch_context being loaded. Stopping
this extra compilation means that compiled sql for the microbatch model without
the event time filter / batch context is no longer produced. We have discussed
this and _believe_ it is okay given that this is a new node type that has not
hit GA yet.

* Rename `ModelNode.batch_context` to `ModelNode.batch`

* Rename `build_batch_context` to `build_jinja_context_for_batch`

The name `build_batch_context` was confusing as
1) We have a `BatchContext` object, which the method was not building
2) The method builds the jinja context for the batch
As such it felt appropriate to rename the method to more accurately
communicate what it does.

* Rename test macro `invalid_batch_context_macro_sql` to `invalid_batch_jinja_context_macro_sql`

This rename was to make it more clear that the jinja context for a
batch was being checked, as a batch_context has a slightly different
connotation.

* Update changie doc
Co-authored-by: Courtney Holcomb <courtneyeholcomb@gmail.com>
* Improve performance of select_children() and select_parents()

* Add changelog entry.
* New function to add graph edges.

* Clean up, leave out flag temporarily for testing.

* Put new test edge behavior behind flag.

* Final draft of documentaiton.
* microbatch: split out first and last batch to run in serial

* only run pre_hook on first batch, post_hook on last batch

* refactor: internalize parallel to RunTask._submit_batch

* Add optional `force_sequential` to `_submit_batch` to allow for skipping parallelism check

* Force last batch to run sequentially

* Force first batch to run sequentially

* Remove batch_idx check in `should_run_in_parallel`

`should_run_in_parallel` shouldn't, and no longer needs to, take into
consideration where in batch exists in a larger context. The first and
last batch for a microbatch model are now forced to run sequentially
by `handle_microbatch_model`

* Begin skipping batches if first batch fails

* Write custom `on_skip` for `MicrobatchModelRunner` to better handle when batches are skipped

This was necessary specifically because the default on skip set the `X of Y` part
of the skipped log using the `node_index` and the `num_nodes`. If there was 2
nodes and we are on the 4th batch of the second node, we'd get a message like
`SKIPPED 4 of 2...` which didn't make much sense. We're likely in a future commit
going to add a custom event for logging the start, result, and skipping of batches
for better readability of the logs.

* Add microbatch pre-hook, post-hook, and sequential first/last batch tests

* Fix/Add tests around first batch failure vs latter batch failure

* Correct MicrobatchModelRunner.on_skip to handle skipping the entire node

Previously `MicrobatchModelRunner.on_skip` only handled when a _batch_ of
the model was being skipped. However, that method is also used when the
entire microbatch model is being skipped due to an upstream node error. Because
we previously _weren't_ handling this second case, it'd cause an unhandled
runtime exception. Thus, we now need to check whether we're running a batch or not,
and there is no batch, then use the super's on_skip method.

* Correct conditional logic for setting pre- and post-hooks for batches

Previously we were doing an if+elif for setting pre- and post-hooks
for batches, where in the `if` matched if the batch wasn't the first
batch, and the `elif` matched if the batch wasn't the last batch. The
issue with this is that if the `if` was hit, the `elif` _wouldn't_ be hit.
This caused the first batch to appropriately not run the `post-hook` but
then every hook after would run the `post-hook`.

* Add two new event types `LogStartBatch` and `LogBatchResult`

* Update MicrobatchModelRunner to use new batch specific log events

* Fix event testing

* Update microbatch integration tests to catch batch specific event types

---------

Co-authored-by: Quigley Malcolm <quigley.malcolm@dbtlabs.com>
* Update single batch test case to check for generic exceptions

* Explicitly skip last final batch execution when there is only one batch

Previously if there was only one batch, we would try to execute _two_
batches. The first batch, and a "last" non existent batch. This would
result in an unhandled exception.

* Changie doc
…it code (#11115)

* Update partial success test to assert partial successes mean that the run failed

* Update results interpretation to include `PartialSuccess` as failure status
* fix MicrobatchExecutionDebug message

* Fix typing in `describe_batch` to convince mypy `batch_start` exists when needed

---------

Co-authored-by: Quigley Malcolm <quigley.malcolm@dbtlabs.com>
…le adapter doesn't support it (#11145)

* Begin producing warning when attempting to force concurrent batches without adapter support

Batches of microbatch models can be executed sequentially or concurrently. We try to figure out which to do intelligently. As part of that, we implemented an override, the model config `concurrent_batches`, to allow the user to bypass _some_ of our automatic detection. However, a user _cannot_ for batches to run concurrently if the adapter doesn't support concurrent batches (declaring support is opt in). Thus, if an adapter _doesn't_ support running batches concurrently, and a user tries to force concurrent execution via `concurrent_batches`, then we need to warn the user that that isn't happening.

* Add custom event type for warning about invalid `concurrent_batches` config

* Fire `InvalidConcurrentBatchesConfig` warning via `warn_or_error` so it can be silenced
Co-authored-by: Doug Beatty <44704949+dbeatty10@users.noreply.github.com>
* roadmap post december 2024

* fix yml spacing

* fix code snippet format
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.