Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding indexed created_at column to lineage events table #2299

Conversation

prachim-collab
Copy link
Contributor

@prachim-collab prachim-collab commented Dec 8, 2022

Problem

For analytics use case we need to incrementally copy and export lineage_events to DWH. For this use case there is no good way to identify incrementally created events in database. The current event_time in lineage_events table is client generated and can be back dated.

Closes: #2300

Solution

  1. Proposing to alter lineage_events table to add a new created_at timestamp column.
  2. Also index this new column.
  3. This is a backwards-compatible change
  4. Existing rows in the lineage events table will be assigned null values.
  5. New rows will have default value as DB server's UTC now() which determines the time when event was inserted in lineage_events table.

Note: All database schema changes require discussion. Please link the issue for context.

Checklist

I have tested by deploying the changes in my local database and migration script ran successfully on top of existing schema.

  • You've signed-off your work
  • Your changes are accompanied by tests (if relevant)
  • Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • You've updated the CHANGELOG.md with details about your change under the "Unreleased" section (if relevant, depending on the change, this may not be necessary)
  • You've versioned your .sql database schema migration according to Flyway's naming convention (if relevant)
  • You've included a header in any source code files (if relevant)

@boring-cyborg boring-cyborg bot added the api API layer changes label Dec 8, 2022
@codecov
Copy link

codecov bot commented Dec 8, 2022

Codecov Report

Merging #2299 (dc70917) into main (e501e36) will not change coverage.
The diff coverage is n/a.

@@            Coverage Diff            @@
##               main    #2299   +/-   ##
=========================================
  Coverage     77.01%   77.01%           
  Complexity     1166     1166           
=========================================
  Files           222      222           
  Lines          5307     5307           
  Branches        424      424           
=========================================
  Hits           4087     4087           
  Misses          747      747           
  Partials        473      473           

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@prachim-collab prachim-collab force-pushed the feature/lineage_event_created_at_indexed branch from c495a52 to 5d3185c Compare December 8, 2022 06:08
@wslulciuc
Copy link
Member

wslulciuc commented Dec 8, 2022

@prachim-collab, having a timestamp for when an OL event was received on the server is going to be very helpful. Mind also opening an issues to link to your PR about the timestamp usage and any changes to relevant APIs?

@prachim-collab
Copy link
Contributor Author

@prachim-collab, having a timestamp for when an OL event was received on the server is going to be very helpful. Mind also opening an issues to link to your PR about the timestamp usage and any changes to relevant APIs?

@wslulciuc I have created this new issue #2304

@prachim-collab prachim-collab force-pushed the feature/lineage_event_created_at_indexed branch from 6c15266 to 4897ff1 Compare December 9, 2022 01:08
@prachim-collab prachim-collab marked this pull request as ready for review December 9, 2022 01:35
@@ -0,0 +1,9 @@
ALTER TABLE lineage_events ADD created_at TIMESTAMP;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIMESTAMP WITH TIME ZONE. I think it's good practice to just always include the time zone in the timestamp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, i kept it as TIMESTAMP following other created_at columns in the schema. But i will update this one to TIMESTAMP WITH TIME ZONE.

Additionally , i was before backfilling old entries with event_time. I have removed that step now because it might be a costly operation during migration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the type to TIMESTAMP WITH TIME ZONE

Signed-off-by: Prachi Mishra <prachi.mishra@astronomer.io>
Signed-off-by: Prachi Mishra <prachi.mishra@astronomer.io>
Signed-off-by: Prachi Mishra <prachi.mishra@astronomer.io>
Signed-off-by: Prachi Mishra <prachi.mishra@astronomer.io>
Signed-off-by: Prachi Mishra <prachi.mishra@astronomer.io>
@prachim-collab prachim-collab force-pushed the feature/lineage_event_created_at_indexed branch from 53b5cf7 to dc70917 Compare December 12, 2022 21:09
@collado-mike collado-mike merged commit b1ff80e into MarquezProject:main Dec 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api API layer changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add created_at indexed column to Lineage events table
3 participants