You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current approach of DC executes tasks sequentially. Taking London Cycle Traffic Air Quality recipe as an example throughout the issue description, will explain the current approach and possible approach to make DC multithreaded.
The recipe executes 8 Tasks in total when running, explained below:
Task 1 -> Download LocalAuthority Data from OaImporter
Task 2 -> Download TrafficCounts Data from TrafficCountImporter
Task 3 -> Download airQuality Data from LAQNImporter
Task 4 -> Geographic Aggregation of NO2 40 ug/m3 and BicycleFraction in Fields
Task 5 -> Taking mean of NO2 40 ug/m3 using LatestValueField
Task 6 -> Calculation BicycleFraction by dividing sum of CountPedalCycles and sum of CountCarsTaxis
Task 7 -> Adding CountPedalCycles using LatestValueField
Task 8 -> Adding CountCarsTaxis using LatestValueField
Current Approach
In current approach DC executes one task at a time, so the order to execution would be:
Task 1, Task 2, Task 3, Task 5, Task 7, Task 8, Task 6, Task 4 (one at a time)
Proposed Approach
We could execute certain Tasks in Parallel as executing certain tasks doesn't depend upon other Tasks.
We could create Dependency Graph e.g
Tasks ------> Dependencies
Task 1------> 0
Task 2------> 0
Task 3------> 0
Task 4------> Task 5, 6
Task 5------> Task 1, 2, 3
Task 6------> Task 7, 8
Task 7------> Task 1, 2, 3
Task 8------> Task 1, 2, 3
Now we only execute those tasks which have 0 dependencies in parallel and keeps updating Dependency Graph e.g
We could execute Task 1,2,3 in parallel, once they are done we update the Dependency Graph and remove dependency count for Task 5, 7 ,8
Then we execute Task 5,7,8 in parallel and once done,
update the Dependency Graph again and remove dependency count for Task 4, 6 but notice Task 4 can't still be executed in parallel with Task 6 as it has Task 6 as dependency, which means now we execute Task 6 and 4 sequentially.
Making DC multi-threaded could significantly improve run times.
Error log
None
The text was updated successfully, but these errors were encountered:
May be related - but would this enable the export of completed results if a build were to fail half way? E.g. if a build were to fail due to some resources not being available (e.g. Server returned HTTP response code: 429 for URL). Would be nice to get some of what had been analysed even if it failed half-way.
Description
The current approach of DC executes tasks sequentially. Taking London Cycle Traffic Air Quality recipe as an example throughout the issue description, will explain the current approach and possible approach to make DC multithreaded.
The recipe executes 8 Tasks in total when running, explained below:
Task 1 -> Download LocalAuthority Data from OaImporter
Task 2 -> Download TrafficCounts Data from TrafficCountImporter
Task 3 -> Download airQuality Data from LAQNImporter
Task 4 -> Geographic Aggregation of
NO2 40 ug/m3
andBicycleFraction
in FieldsTask 5 -> Taking mean of
NO2 40 ug/m3
using LatestValueFieldTask 6 -> Calculation BicycleFraction by dividing
sum of CountPedalCycles
andsum of CountCarsTaxis
Task 7 -> Adding
CountPedalCycles
using LatestValueFieldTask 8 -> Adding
CountCarsTaxis
using LatestValueFieldCurrent Approach
In current approach DC executes one task at a time, so the order to execution would be:
Task 1, Task 2, Task 3, Task 5, Task 7, Task 8, Task 6, Task 4 (one at a time)
Proposed Approach
We could execute certain Tasks in Parallel as executing certain tasks doesn't depend upon other Tasks.
We could create
Dependency Graph
e.gTasks ------> Dependencies
Task 1------> 0
Task 2------> 0
Task 3------> 0
Task 4------> Task 5, 6
Task 5------> Task 1, 2, 3
Task 6------> Task 7, 8
Task 7------> Task 1, 2, 3
Task 8------> Task 1, 2, 3
Now we only execute those tasks which have
0
dependencies in parallel and keeps updatingDependency Graph
e.gWe could execute
Task 1,2,3
in parallel, once they are done we update theDependency Graph
and remove dependency count forTask 5, 7 ,8
Then we execute
Task 5,7,8
in parallel and once done,update the
Dependency Graph
again and remove dependency count forTask 4, 6
but noticeTask 4
can't still be executed in parallel withTask 6
as it hasTask 6
as dependency, which means now we executeTask 6 and 4
sequentially.Making DC multi-threaded could significantly improve run times.
Error log
None
The text was updated successfully, but these errors were encountered: