Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

point-in-time endpoint for column-level lineage #2265

Merged
merged 2 commits into from
Dec 1, 2022

Conversation

pawel-big-lebowski
Copy link
Collaborator

@pawel-big-lebowski pawel-big-lebowski commented Nov 25, 2022

Signed-off-by: Pawel Leszczynski leszczynski.pawel@gmail.com

Problem

Point in time column lineage endpoint - implementation.

Closes: #2262

Solution

  • Dataset or job version from a request is used to determine date time when a version was created.
  • column_lineage table is filtered for rows earlier than a determined date time.
  • Column-lineage rows, dataset rows and dataset_version rows make use of the same date time.
  • The combination of point-in-time and withDownstream features does not suit well and deserves a separate PR if needed.

Note: All database schema changes require discussion. Please link the issue for context.

Checklist

  • You've signed-off your work
  • Your changes are accompanied by tests (if relevant)
  • Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • You've updated the CHANGELOG.md with details about your change under the "Unreleased" section (if relevant, depending on the change, this may not be necessary)
  • You've versioned your .sql database schema migration according to Flyway's naming convention (if relevant)
  • You've included a header in any source code files (if relevant)

@boring-cyborg boring-cyborg bot added the api API layer changes label Nov 25, 2022
@codecov
Copy link

codecov bot commented Nov 25, 2022

Codecov Report

Merging #2265 (888a258) into main (f3b1cbd) will increase coverage by 0.27%.
The diff coverage is 94.73%.

@@             Coverage Diff              @@
##               main    #2265      +/-   ##
============================================
+ Coverage     76.47%   76.74%   +0.27%     
- Complexity     1113     1150      +37     
============================================
  Files           216      219       +3     
  Lines          5203     5260      +57     
  Branches        421      423       +2     
============================================
+ Hits           3979     4037      +58     
+ Misses          752      748       -4     
- Partials        472      475       +3     
Impacted Files Coverage Δ
api/src/main/java/marquez/db/ColumnLineageDao.java 100.00% <ø> (ø)
api/src/main/java/marquez/db/DatasetFieldDao.java 100.00% <ø> (ø)
...c/main/java/marquez/api/ColumnLineageResource.java 75.00% <75.00%> (-8.34%) ⬇️
...java/marquez/db/mappers/PairUuidInstantMapper.java 75.00% <75.00%> (ø)
...i/src/main/java/marquez/service/models/NodeId.java 72.13% <90.00%> (+7.64%) ⬆️
...ain/java/marquez/service/ColumnLineageService.java 97.20% <98.36%> (-0.12%) ⬇️
...a/marquez/common/models/DatasetFieldVersionId.java 100.00% <100.00%> (ø)
.../main/java/marquez/common/models/JobVersionId.java 100.00% <100.00%> (ø)
...arquez/db/mappers/ColumnLineageNodeDataMapper.java 90.47% <100.00%> (+0.47%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@pawel-big-lebowski pawel-big-lebowski force-pushed the column-lineage-point-in-time branch from 7b816d4 to 8d6e907 Compare November 28, 2022 09:45
@boring-cyborg boring-cyborg bot added the docs label Nov 28, 2022
Signed-off-by: Pawel Leszczynski <leszczynski.pawel@gmail.com>
@pawel-big-lebowski pawel-big-lebowski force-pushed the column-lineage-point-in-time branch from 8d6e907 to 7ba6cc5 Compare November 28, 2022 09:46
@pawel-big-lebowski pawel-big-lebowski changed the title point-in-timea for column-level lineage point-in-time for column-level lineage Nov 28, 2022
@pawel-big-lebowski pawel-big-lebowski changed the title point-in-time for column-level lineage point-in-time endpoint for column-level lineage Nov 28, 2022
@pawel-big-lebowski pawel-big-lebowski marked this pull request as ready for review November 28, 2022 09:51
@pawel-big-lebowski pawel-big-lebowski self-assigned this Nov 30, 2022
@pawel-big-lebowski pawel-big-lebowski merged commit dedfe48 into main Dec 1, 2022
@pawel-big-lebowski pawel-big-lebowski deleted the column-lineage-point-in-time branch December 1, 2022 07:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api API layer changes docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Point in time column lineage endpoint - implementation
3 participants