Extend Execution Stages with a new Level - Execution Batches #72

mrpaulandrew · 2020-11-10T14:33:21Z

Building on community feedback in the following issues:

Allow the framework to support multiple executions of the parent pipeline using a new higher level concept called Batches.

A batch sits above execution stages.
Batches can be linked to 1/many execution stages.
Be enabled/disabled via properties.
Batches can run concurrently. Batch 1 = Running, Batch 2 = Running.
A single batch ID cannot run concurrently.

Examples of Batch names:

Hourly
Daily
Weekly
Monthly

Finally, it is expected that a Trigger hitting the parent pipeline will have the batch name included.

Mallik-G · 2020-11-11T11:08:24Z

Hi @mrpaulandrew,

I too had a similar requirement to support multiple parallel executions. I approached it similarly except that I called the higher-level concept as "Job".

A Job is comprised of one or more stages.
Multiple jobs can run concurrently.
JobId is passed a parameter from the Parent pipeline's trigger and is made available to all lower levels.
A combination of pipelines linked to all stages in a job is inserted into LocalExecution.
Once the job is successful, logs related to a job are moved to ExecutionLog

Metadatadb changes include
- changes to Stages table to support Job and Stage Linking.
- added JobId as a parameter to stored procs where needed.
- few properties like OverrideRestart, FailureHandling are more job-specific, so moved to a new object named JobProperties table.
- framework level properties that will be common across jobs are kept in Properties Table.
- related procs and functions are also developed

Examples of Jobs:

Daily Sales Ingestion Job (JobId 1)
- Ingestion stage (StageId 1)
  - CopyToDataLake (Pipeline 1)
- Transformation Stage (StageId 2)
  - LaunchDatabricksJob (Pipeline 2)
- Load Stage (StageId 3)
  - CopyToDW (Pipeline 3)
Daily Marketing Egression Job (JobId 2)
- Prepare Stage (StageId 4)
  - LaunchDatabricksJob (Pipeline 4)
- Extract Stage (StageId 5)
  - CopyToSFTP ((Pipeline 5)
Weekly Campaign Ingestion Job (JobId 3)
- Ingestion stage (StageId 6)
  - CopyToDataLake (Pipeline 6)
- Transformation Stage (StageId 7)
  - LaunchDatabricksJob (Pipeline 7)
- Load Stage (StageId 8)
  - CopyToDW (Pipeline 8)

I didn't have much use for Grand Parent at my work setup. So, my idea was to have the ParentPipeline to have different triggers (schedule or tumbling) and to pass the JobId as a parameter to the pipeline

When a worker pipeline is reused in a stage within the same job or across jobs, for example, LaunchDatabricksJob is a pipeline that gets reused a lot at my work where I need to pass different invocation params, class name and jar name as well to submit different apps/jars on to Databricks cluster.
To do that, I just add a new entry into Pipelines table mapping the Stage and Pipeline (like a new pipeline instance) and that causes a new PipelineId to be created making this particular pipeline instance unique. This PipelineId will be used in the Pipeline Parameters table to associate parameters needed for that particular pipeline id (or pipeline instance).

Examples of Jobs:

DailyTrigger-SalesIngestion => Scheduled At 3 AM Daily in AUS Time Zone => Parameter: JobId = 1

DailyTrigger-MarketingEgression => TumblingWindow trigger Scheduled at 4 AM Daily UTC => Parameter: JobId = 2

WeeklyTrigger-CampaignIngestion => Scheduled At 3 AM On Mondays in AUS Time Zone => Parameter: JobId = 3

I had this as a work-in-progress project in my local for some time, but after looking at recent conversations around this issue, I have pulled recent changes from the master branch, applied my changes (only metadata database changes) and pushed them to my fork. Planning to work on ADF Pipeline changes tonight and give a test end to end

Please have a look at the changes and let me know if it helps -
https://github.com/Mallik-G/ADF.procfwk/tree/feature/mallik

Thanks,
Mallik

NJLangley · 2020-11-11T17:25:35Z

@mrpaulandrew I have pushed my implementation of this onto my fork here:
https://github.com/NJLangley/ADF.procfwk/tree/feature/batches

There are a few outstanding bits to fix:

Removing the stage from the pipelines table, it's now on the BatchPipelineLink table
Sorting the check to ensure that two batches with the same pipeline/params cannot run at the same time. Maybe we could do something clever here to see if another batch is running the pipeline with exactly the same params and wait for that to finish, instead of firing it off again?
Updating the sample data procs, I'll try and get that sorted in the next few days

Let me know if you have questions / any issues testing it

mrpaulandrew changed the title ~~Add a Property to Enable/Disable Multiple Concurrent Parent Triggers~~ Extend Execution Stages with a new Level - Execution Batches Nov 11, 2020

mrpaulandrew mentioned this issue Nov 11, 2020

Allow multiple CurrentExecution to be managed by framework? #61

Closed

mrpaulandrew removed the Solution Area: Functions App Azure Functions App - Middle ware callers. label Nov 13, 2020

mrpaulandrew linked a pull request Nov 19, 2020 that will close this issue

v1.9.2 #82

Merged

mrpaulandrew closed this as completed in #82 Nov 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend Execution Stages with a new Level - Execution Batches #72

Extend Execution Stages with a new Level - Execution Batches #72

mrpaulandrew commented Nov 10, 2020 •

edited

Loading

Mallik-G commented Nov 11, 2020

NJLangley commented Nov 11, 2020

Extend Execution Stages with a new Level - Execution Batches #72

Extend Execution Stages with a new Level - Execution Batches #72

Comments

mrpaulandrew commented Nov 10, 2020 • edited Loading

Mallik-G commented Nov 11, 2020

NJLangley commented Nov 11, 2020

mrpaulandrew commented Nov 10, 2020 •

edited

Loading