Removed jobs_fqn table and moved FQN into jobs directly in order to enforce unique constraints #2448
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
The introduction of parent jobs and the
jobs_fqn
table intended to allow Marquez to support jobs that had the same name, but were triggered by different parents (e.g., a Spark job fired by different Airflow DAGs). Thejobs
table tracked the simple name of the job, while thejobs_fqn
table tracked the fully qualified name (FQN). In addition, thejobs_fqn
table became responsible for tracking the FQN of symlinked jobs, as it was too expensive to determine the new FQN of a job by following symlinks at query time. Instead, the FQN of a symlinked job is updated when the symlink is created so we return only the FQN of the symlink target rather than the FQN of the original job.Unfortunately, this means that neither the
jobs
table nor thejobs_fqn
table can enforce the uniqueness constraint we had on the fully qualified name of a job. Thus, in production, we see errors like the following when trying to load a job by its name:In particular, this happens on two occasions when receiving Airflow OpenLineage events:
FAIL
event with no start event - the parent facet of the run is omitted, so Marquez creates a job with no parent, but the same FQNFAIL
event prior to theSTART
event - usually, this happens when requests are queued by the load balancer or sometimes when theSTART
event itself is particularly large and deserializing takes longer than deserializing theFAIL
event.Solution
This change eliminates the
job_fqn
table and reestablishes the uniqueness constraint on thejobs
table'sname
column. It also adds asimple_name
column to the table, which is used by the view to return the column of the same name. Tests for the two cases mentioned above are added to ensure we can handle Airflow events that omit the parent facet.The
jobs_view
is also updated to omit symlinked jobs so that the read queries no longer have to omit them.aliases
are moved from thejobs_fqn
table to thejobs
table so old job names can still be found.Checklist
CHANGELOG.md
with details about your change under the "Unreleased" section (if relevant, depending on the change, this may not be necessary).sql
database schema migration according to Flyway's naming convention (if relevant)