state: Introduce `DependsOn` for N-to-1 job dependencies #1021

radeksimko · 2022-08-01T16:36:17Z

Background

Prior to this PR, the only way we could chain jobs was via Defer. This had one unfortunate limit, which is that a job can only depend on exactly one other job, i.e. there was just 1-to-1 dependency.

As part of introducing a new job type for parsing provider versions in #1014 it became clear that the new job will need to rely on 2+ other jobs:

parsed files
decoded metadata (via earlydecoder) - so that we have required_providers
parsed module manifest (+ any/all modules decoded) - so that we have required_providers of any submodules

I originally attempted to solve it in a hacky way by introducing WaitForJobs() inside Defer, but that created a few other problems - need to run at least two go routines, so that one can be blocked (waiting) and the other one can be dispatching + I ran into some race conditions, so I binned that hacky approach.

A nice side-effect demonstrated in the 2nd commit is that using DependsOn instead of Defer makes the code (IMHO) clearer and more readable, as there's less nesting and with the right variable names for job IDs it makes it much more obvious what the relationships are.

There is unfortunately still a few remaining (valid) use cases for Defer though - specifically because we cannot schedule any jobs for module paths parsed from a module manifest, until that manifest is parsed - so those few lines of code to do the scheduling still need to run as part of Defer.

The only downside of DependsOn compared to Defer is that the job which we depend on may be long gone/finished by the time the dependent one is dispatched, which means that we don't have direct access to its error/outcome. However I plan to address that problem more holistically as part of #1006 - TL;DR we can still run the jobs and return early if we find out that the data which we were expecting the previous job to provide aren't there.

This is to prevent a race condition

dbanck

I really like the reduced nesting when enqueuing jobs!

I think, I've found two things which need further inspection

internal/state/jobs.go

internal/indexer/walker.go

dbanck

Thank you for fixing/addressing my concerns!

I'm still a bit unsure about the order/dependencies of some jobs.

internal/indexer/module_calls.go

internal/indexer/walker.go

dbanck · 2022-08-03T16:53:45Z

internal/indexer/module_calls.go


-			id, err = idx.jobStore.EnqueueJob(job.Job{
+		if parseId != "" {
+			ids, err := idx.collectReferences(mcHandle, refCollectionDeps)


I didn't notice this before: why are we calling collectReferences here, before parsing variables? While in the walker, we're calling it much later?

The ordering of scheduling is not significant - the *.tfvars parsing and *.tf references should be independent of each other so it doesn't matter in what order do they run - we just happen to schedule them in different order in the two places, DependsOn and Defer (if any) decide in what order they actually get executed.

Okay! Thanks for the explanation!

We could align the ordering of scheduling, but there are also other differences - e.g. that when we parse submodules here, we assume that the modules are not initialised by themselves (given the context) and don't attempt to parse module manifest or obtain schema - both of which would otherwise be two more jobs which reference collection should depend on.

Co-authored-by: Daniel Banck <dbanck@users.noreply.github.com>

github-actions · 2022-09-04T03:48:38Z

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

radeksimko added the enhancement New feature or request label Aug 1, 2022

radeksimko self-assigned this Aug 1, 2022

radeksimko force-pushed the f-jobs-depends-on branch 2 times, most recently from acf7f67 to cb47858 Compare August 1, 2022 19:44

radeksimko added this to the v0.29.0 milestone Aug 1, 2022

radeksimko marked this pull request as ready for review August 1, 2022 19:58

radeksimko requested a review from a team as a code owner August 1, 2022 19:58

radeksimko mentioned this pull request Aug 1, 2022

Parse provider versions from lock file before obtaining schema #1014

Merged

radeksimko added 3 commits August 2, 2022 10:34

state: Introduce DependsOn for N-to-N job dependencies

7bb6819

indexer: Use DependsOn for (most) job dependencies

4de477d

fix: check module existence & add in the same txn

458dd54

This is to prevent a race condition

radeksimko force-pushed the f-jobs-depends-on branch from cb47858 to 458dd54 Compare August 2, 2022 09:34

dbanck requested changes Aug 3, 2022

View reviewed changes

internal/state/jobs.go Outdated Show resolved Hide resolved

internal/indexer/walker.go Show resolved Hide resolved

fix: account for empty job IDs & wrong ID in cleanup

c4a8f77

radeksimko requested a review from dbanck August 3, 2022 14:53

dbanck reviewed Aug 3, 2022

View reviewed changes

Update internal/indexer/module_calls.go

3768eb7

Co-authored-by: Daniel Banck <dbanck@users.noreply.github.com>

dbanck approved these changes Aug 3, 2022

View reviewed changes

radeksimko merged commit 48ab782 into main Aug 4, 2022

radeksimko deleted the f-jobs-depends-on branch August 4, 2022 06:40

github-actions bot locked as resolved and limited conversation to collaborators Sep 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

state: Introduce `DependsOn` for N-to-1 job dependencies #1021

state: Introduce `DependsOn` for N-to-1 job dependencies #1021

radeksimko commented Aug 1, 2022 •

edited

Loading

dbanck left a comment

dbanck left a comment

dbanck Aug 3, 2022

radeksimko Aug 3, 2022

dbanck Aug 3, 2022

radeksimko Aug 3, 2022

github-actions bot commented Sep 4, 2022

state: Introduce DependsOn for N-to-1 job dependencies #1021

state: Introduce DependsOn for N-to-1 job dependencies #1021

Conversation

radeksimko commented Aug 1, 2022 • edited Loading

Background

dbanck left a comment

Choose a reason for hiding this comment

dbanck left a comment

Choose a reason for hiding this comment

dbanck Aug 3, 2022

Choose a reason for hiding this comment

radeksimko Aug 3, 2022

Choose a reason for hiding this comment

dbanck Aug 3, 2022

Choose a reason for hiding this comment

radeksimko Aug 3, 2022

Choose a reason for hiding this comment

github-actions bot commented Sep 4, 2022

state: Introduce `DependsOn` for N-to-1 job dependencies #1021

state: Introduce `DependsOn` for N-to-1 job dependencies #1021

radeksimko commented Aug 1, 2022 •

edited

Loading