Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Defaulting lookback to 0 results in consistently incomplete batches #10867

Closed
Tracked by #10624
QMalcolm opened this issue Oct 16, 2024 · 2 comments · Fixed by #10876
Closed
Tracked by #10624

[Bug] Defaulting lookback to 0 results in consistently incomplete batches #10867

QMalcolm opened this issue Oct 16, 2024 · 2 comments · Fixed by #10876
Assignees
Labels
bug Something isn't working microbatch Issues related to the microbatch incremental strategy user docs [docs.getdbt.com] Needs better documentation

Comments

@QMalcolm
Copy link
Contributor

QMalcolm commented Oct 16, 2024

We should switch the default for lookback to 1.

Currently the lookback value defaults to 0. The problem with this is that a lookback of 0 means that the dataset will never have a “complete” batch. This is because when microbatch is run, the current time is used for the latest batch. We do that because we’re favoring freshness over ensuring “only complete batches”. Unfortunately, this combined with lookback=0 makes it such that no batch is ever "complete".

For an example, consider a microbatch model with a batch_size of day, and it's run at noon everyday (12:00:00). If our lookback is 0 then when the microbatch model is run today it’ll get data from today 00:00:00 to 12:00:00. Then tomorrow when my microbatch model it’ll get data for tomorrow 00:00:00 to 12:00:00, but it won’t go back and get the rest of “today’s” data (because the lookback is 0).

Thus the “default” valid behavior should be a lookback of 1. This ensures that batches are complete. The only caveats being when there is regularly late arriving data for which one can set a larger lookback value, or when there is one off late arriving data using —event-time-start + —event-time-end to backfill the specific range.

@QMalcolm QMalcolm added bug Something isn't working microbatch Issues related to the microbatch incremental strategy user docs [docs.getdbt.com] Needs better documentation labels Oct 16, 2024
@QMalcolm QMalcolm self-assigned this Oct 17, 2024
@mirnawong1
Copy link
Contributor

docs pr here to address this: https://github.com/dbt-labs/docs.getdbt.com/pull/6351/files

@FishtownBuildBot
Copy link
Collaborator

Opened a new issue in dbt-labs/docs.getdbt.com: dbt-labs/docs.getdbt.com#6360

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working microbatch Issues related to the microbatch incremental strategy user docs [docs.getdbt.com] Needs better documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants