-
Notifications
You must be signed in to change notification settings - Fork 0
/
pilotreview-2013.tex
4352 lines (3675 loc) · 235 KB
/
pilotreview-2013.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass{sig-alternate}
\input{head}
\input{include}
%\usepackage{listings}
\usepackage{paralist}
\usepackage{multirow}
% \usepackage{draftwatermark}
% \SetWatermarkLightness{0.8}
% \SetWatermarkText{Draft}
% \SetWatermarkVerCenter{13cm}
% \SetWatermarkHorCenter{10cm}
% \SetWatermarkScale{1}
\begin{document}
\CopyrightYear{2015}
\title{A Comprehensive Perspective on the \pilotjob Abstraction}
%\jhanote{either abstraction is singular and therefore ``the'', or we use the
%plural form}}
\numberofauthors{3}
\author{
\alignauthor Matteo Turilli \\
\affaddr{RADICAL Laboratory, ECE}\\
\affaddr{Rutgers University}\\
\affaddr{New Brunswick, NJ, USA}\\
\email{matteo.turilli@rutgers.edu}
% \email{}
% 2nd. author
\alignauthor Mark Santcroos\\
\affaddr{RADICAL Laboratory, ECE}\\
\affaddr{Rutgers University}\\
\affaddr{New Brunswick, NJ, USA}\\
\email{mark.santcroos@rutgers.edu}
% \email{}
\and
% 3rd. author
\alignauthor Shantenu Jha\titlenote{Corresponding author}\\
\affaddr{RADICAL Laboratory, ECE}\\
\affaddr{Rutgers University}\\
\affaddr{New Brunswick, NJ, USA}\\
\email{shantenu.jha@rutgers.edu}
% \email{}
% use '\and' if you need 'another row' of author names
% \and
% % 4th. author
% \alignauthor Lawrence P. Leipuner\\
% \affaddr{Brookhaven Laboratories}\\
% \affaddr{Brookhaven National Lab}\\
% \affaddr{P.O. Box 5000}\\
% \email{lleipuner@researchlabs.org}
% % 5th. author
% \alignauthor Sean Fogarty\\
% \affaddr{NASA Ames Research Center}\\
% \affaddr{Moffett Field}\\
% \affaddr{California 94035}\\
% \email{fogartys@amesres.org}
% % 6th. author
% \alignauthor Charles Palmer\\
% \affaddr{Palmer Research Laboratories}\\
% \affaddr{8600 Datapoint Drive}\\
% \affaddr{San Antonio, Texas 78229}\\
% \email{cpalmer@prl.com}
}
\maketitle
\begin{abstract}
\note{move first sentence down} This paper offers a comprehensive analysis of the \pilotjob abstraction
assessing its evolution, properties, and implementation as multiple \pilotjob
software systems. \pilotjob systems play an important role in supporting
distributed scientific computing. They are used to consume more than 700
million CPU hours a year by the Open Science Grid communities, and by
processing up to 5 million jobs a week for the ATLAS experiment on the
Worldwide LHC Computing Grid. With the increasing importance of task-level
parallelism in high-performance computing, \pilotjob systems are also
witnessing an adoption beyond traditional domains. Notwithstanding the growing
impact on scientific research, there is no agreement upon a definition of
\pilotjob system and no clear understanding of the underlying \pilot
abstraction and paradigm. \note{not even clear there is a \pilotjob abstraction} This lack of foundational understanding has lead to
a proliferation of unsustainable \pilotjob implementations with no shared best
practices or interoperability, ultimately hindering a realization of the full
impact of \pilotjobs. \note{prev sentence is not justified} This paper offers the conceptual tools to promote this
fundamental understanding while critically reviewing the state of the art of
\pilotjob implementations. The five main contributions of this paper are: (i)
an analysis of the motivations and evolution of the \pilotjob abstraction;
(ii) an outline of the minimal set of distinguishing functionalities; (iii)
the definition of a core vocabulary to reason consistently about \pilotjobs;
(iv) the description of core and auxiliary properties of \pilotjob systems;
and (v) a critical review of the current state of the art of their
implementations. These contributions are brought together to illustrate the
generality of the \pilotjob paradigm, to discuss some challenges in
distributed computing that it addresses and future opportunities. \note{odd grammar in last sentence}
% These contributions are brought together to illustrate the defining
% characteristics of the \pilotjob paradigm, its generality, and the main
% opportunities and challenges posed by its support of distributed computing.
% dispersing the available resources across a fragmented development landscape
% There is no agreed upon definition of \pilotjobs; however a functional
% attribute of \pilotjobs that is generally agreed upon is they are
% tools/services that support multi-level and/or application-level scheduling by
% providing a scheduling overlay on top of the system-provided schedulers.
% Nearly everything else is either specific to an implementation, open to
% interpretation or not agreed upon. For example, are \pilotjobs part of the
% application space, or part of the services provided by an infrastructure? We
% will see that close-formed answers to questions such as whether \pilotjobs are
% system-level or application-level capabilities are likely to be elusive.
% Hence, this paper does not make an attempt to provide close-formed answers,
% but aims to provide appropriate context, insight and analysis of a large
% number of \pilotjobs, and thereby bring about a hitherto missing consistence
% in the community's appreciation of \pilotjobs. Specifically this paper aims
% to provide a comprehensive perspective of \pilotjobs. A primary motivation for
% this work stems from our experience when looking for an interoperable,
% extensible and general-purpose \pilotjob; in the process, we realized that
% such a capability did not exist. The situation was however even more
% unsatisfactory: in fact there was no agreed upon definition or conceptual
% framework of \pilotjobs. To substantiate these points of view, we begin by
% discussing some existing \pilotjobs and the different aspects of these
% \pilotjobs, such as the applications scenarios that they have been used and
% how they have been used. The limited but sufficient sampling highlights the
% variation, and also provides both a motivation and the basis for developing an
% implementation agnostic terminology and vocabulary to understand
% \pilotjobs; Section \S3 attempts to survey the landscape/eco-system of
% \pilotjobs. With an agreed common framework/vocabulary to discuss and
% describe \pilotjobs, we proceed to analyze the most commonly utilized
% \pilotjobs and in the process provide a comprehensive survey of \pilotjobs,
% insight into their implementations, the infrastructure that they work on, the
% applications and application execution modes they support, and a frank
% assessment of their strengths and limitations. An inconvenient but important
% question -- both technically and from a sustainability perspective that must
% be asked: why are there so many similar seeming, but partial and slightly
% differing implementations of \pilotjobs, yet with very limited
% interoperability amongst them? Examining the reasons for this
% state-of-affairs provides a simple yet illustrative case-study to understand
% the state of the art and science of tools, services and middleware
% development. Beyond the motivation to understand the current landscape of
% \pilotjobs from both a technical and a historical perspective, we believe a
% survey of \pilotjobs is a useful and timely undertaking as it provides
% interesting insight into understanding issues of software sustainability.
%
% believe that a survey of \pilotjobs provides and appreciation for the richness
% of the \pilotjobs landscape. is not to discuss the \pstar conceptual
% framework, but That led to the \pstar model.
\end{abstract}
\keywords{\pilotjob, \pilot abstraction, distributed applications, distributed
systems, distributed resource management}
% -----------------------------------------------------------------------------
% INTRODUCTION
%
\section{Introduction}
\label{sec:intro}
% \jhanote{Building tools and components that have well-defined and
% well-characterized behavior, including performance. This leads to
% descriptive models of pilot-jobs, which while pervasive in distributed
% computing, are conspicuous by their absence in high-performance and
% data-intensive computing. By providing a firm theoretical underpinning
% pilot-jobs [3], one can provide a more “programmable” and flexible yet
% common pilot-job for different types of distributed infrastructure, and also
% extend the concept of pilot-jobs to high-performance and data-intensive
% computing [4]}
The seamless uptake of distributed computing infrastructures by scientific
applications has been limited by the lack of pervasive and simple-to-use
abstractions at the development, deployment, and execution level. \note{big claim, no justification or citation.}
As suggested by the proliferation of \pilotjob systems used on production
distributed computing, \pilotjobs are arguably one of the few widely-used
abstraction. \note{not sure there is an \pilotjob abstraction yet - you need to show this before you claim it} A variety of \pilotjob systems have emerged:
Glidein/GlideinWMS~\cite{frey2002condorG}, the Coaster
System~\cite{wilde2011swift}, DIANE~\cite{moscicki2003diane},
DIRAC~\cite{casajus2010dirac}, \panda~\cite{chiu2010pilot},
GWPilot~\cite{rubio2015gwpilot}, Nimrod/G~\cite{buyya2000nimrod},
Falkon~\cite{raicu2007falkon}, MyCluster~\cite{walker2006creating} to name a
few. These systems are for the most part functionally equivalent and motivated
by similar objectives; nonetheless, their implementations often serve specific
use cases, target specific resources, and lack in interoperability. \note{why is this a problem?}
% as they support the decoupling of workload submission from resource
% assignment. N
% The situation is reminiscent of the proliferation of functionally similar yet
% incompatible workflow systems, where in spite of significant {\it a
% posteriori} effort on workflow system extensibility and interoperability,
% these objectives remain difficult if not unfeasible.
% \pilotjobs excel in terms of the number and types of applications that use
% them, as well as the number of production distributed cyberinfrastructures
% that support them. \msnote{ref?}
% \mtnote{Should we use just `pilot'}\jhanote{I think you have proposed a
% graceful transition: from pilotjobs to pilotsystems? If so, should we stick
% with pilotjobs here?}
The fundamental reason for the proliferation of \pilotjob systems is that they
provide a simple solution to the rigid and static resource management
historically found in high-performance and distributed computing. There are two
ways in which \pilotjobs break free of the rigid resource utilization model: (i)
through a process often referred to as
late-binding~\cite{moscicki2011,glatard2010,delgado2014}, \pilotjobs make the
selection of heterogeneous and dynamic resources easier and effective; and (ii)
\pilotjobs decouple the workload specification from the task execution
management. The former results in the ability to utilize resources
``dynamically'', the latter simplifies the scheduling of workloads on those
resources.
% management of improving the efficiency of task assignment while shielding
% applications from having to manage tasks across such resources.
% \onote{I think the most important reasons why Pilot Jobs being so popular
% (and re-invented over and over again) is that they allow the execution of
% small (i.e., singe / few-core) tasks efficiently on HPC infrastrucutre by
% massively reducing queueing time. HPC sites (from schedulers to policies)
% have always been (and still are) discrimatory against this type of
% workload in favor of the large, tightly-coupled ones. Pilot-Jobs try to
% counteract. While this is certainly not the main story that we want to
% tell, this should IMHO still be mentioned. } \jhanote{This is definitely
% one of the main reasons, but as Melissa pointed out it during RADICAL
% call, it is by no means the only reason. Need to get the different reasons
% down here.. then find a nice balance and description}
% (thus providing {\it post-facto} justification of its needs)
% \mtnote{Should we have a paragraph explaining the core contribution offered by
% this paper?} \jhanote{yes} \mtnote{Should I write a first draft of it?}
% Though mostly as a pragmatic solution to the need of improving throughput
% performance of distributed applications.
\pilotjobs have been almost exclusively developed within pre-existing systems
and middleware satisfying specific scientific requirements. As a consequence,
the development of \pilotjobs have not been grounded on a robust understanding
of underpinning abstractions, or on a well-understood set of dedicated design
principles. Furthermore, the terminology used to describe functionally
equivalent systems is inconsistent; the proliferation and specificity are a
manifestation of a lack of common understanding and vocabulary, as well as a
compounding factor. \note{compounding factor to what?} Not surprisingly, the functionalities and properties of
\pilotjobs have been understood mostly, if not exclusively, in relation to the
needs of the containing software systems or of the use cases justifying their
immediate development.
This approach is not problematic in itself and has led to effective
implementations that serve many million jobs a year on diverse computing
platforms~\cite{maeno2014evolution,katz2012}. However, the lack of conceptual
clarity and an explicit enunciation of the \pilotjob computing paradigm has
undermined the development of specific implementations as well as resulting in
an unsustainable software ecosystem. \note{prove this?} This limitation is illustrated not only by
the duplication of effort, but also by an overall immaturity of the available
systems in terms of functionalities, flexibility, portability, interoperability,
and, most often, robustness. Ultimately, these also contribute to a high-cost of
development and low software sustainability. \note{it seems like the symptom is that
multiple such systems are developed, but does this show that the cause is lack of conceptual understanding? An alternate explanation is 'not invented here' or getting funding to write
something new is easier than getting funding to use something old}
This survey \note{what survey? this is a perspective article by title} is motivated by the fact that, in spite of the demonstrated
potential and proliferation of \pilotjob systems, there remains significant lack
of clarity and understanding about the \pilotjob abstraction. As alluded to,
this has resulted in significant overhead and repetition of effort. \note{again, not clear that cause and effect are what you think they are} Looking
forward, with the growing importance and need for scalable task-level
parallelism and dynamic resource management in high-performance computing, the
lack of conceptual clarity might have similar and potentially profound
consequences for the next generation of supercomputing. \note{or it might not}
This paper offers a critical analysis of the current state of the art providing
the conceptual tooling required to appreciate the properties of the \pilot
paradigm, i.e. the abstraction and the methodology underlying \pilotjobs
systems. \note{prev is confusing and awkward} The remainder of this paper is divided into four sections.
\S\ref{sec:history} offers a critical review of the functional underpinnings of
the \pilot abstraction and how it has been evolving into \pilotjob systems and
systems with pilot-like characteristics.
In~\S\ref{sec:understanding}, the minimal set of capabilities and properties
characterizing the design of a \pilotjob system are derived. A vocabulary is
then defined to be consistently used across
\pilotjob system designs and implementations.
In~\S\ref{sec:analysis}, the focus shifts from analyzing the design of a
\pilotjob system to critically reviewing the characteristics of a representative
set of its implementations. Core and auxiliary implementation properties are
introduced and then used alongside the functionalities and terminology defined
in~\S\ref{sec:understanding} to describe and compare \pilotjob system
implementations.
Finally,~\S\ref{sec:discussion} closes the paper by outlining the \pilot
paradigm, arguing for its generality, and elaborating on how it impacts and
relates to both other middleware and the application layer. The outcome of the
critical review of the current implementation state of the art is used to give
insights about the future directions and challenges faced by the \pilot
paradigm.
% -----------------------------------------------------------------------------
% SECTION 2
%
%\section{Functional Underpinnings and Evolution of Pilot Abstraction}
\section{Evolution of Pilot Abstraction and Systems}
\label{sec:history}
% The origin and motivations for devising the \pilot abstraction, developing its
% many implementations and realize a full-fledge \pilot paradigm can be traced
% back to five main notions:
At least five features need elucidation to understand the technical origins and
motivations of the \pilot abstraction: task-level distribution and parallelism,
\MW pattern, multi-tenancy, multi-level scheduling, and resource placeholding.
Even if \note{they aren't - why say even if} these features taken individually are not unique to the \pilot
abstraction, the \pilot abstraction brings them together towards an integrated
and collective capability. This section offers an overview of these five
features and an analysis of their relationship with the \pilot abstraction. A
chronological perspective is taken so as to contextualize the evolution of the
\pilot abstraction into its diverse implementations.
% the variation in their scope, semantics, and implementation is one of the
% defining reasons for the multiplicity of this abstraction implementations.
\jhanote{I don't think the \MW pattern is a functional area..A candidate for
removal from detailed discussion in 2.1?}\mtnote{I would prefer to remove
`functional' than \MW discussion from 2.1 as that serves as a base for the
somewhat less extended discussion in 3. Would that work?} \mtnote{I am not sure
what it means for an abstraction to ``initiate'' a feature so I removed that
predicate.} \mtnote{I shortened the paragraph to make clearer the relationship
between the five features and the \pilot abstraction.}
\mtnote{Possibly add a paragraph summarizing the salient evolutionary steps of
\pilot systems as described in subsection 2.2.}
% ------------------------------------------------------------------------------
% 2.1
\subsection{Functional Underpinnings of the Pilot Abstraction} \note{why doesn't this subsection title match the description/text of the five features in the section intro?}
\label{sec:histabstr}
To the best of the authors' knowledge, the term ``pilot'' was first coined in
2004 in the context of the Large Hadron Collider (LHC) Computing Grid (WLCG)
Data Challenge\footnote{Based on private communication.}
\cite{lhc_url,lhc1995large,wlcg_url,bonacorsi2007wlcg}, and then introduced in
writing as ``pilot-agent'' in a 2005 LHCb
report~\cite{nobrega2005lhcb,lhcb_url}. Despite its relatively recent explicit
naming, the \pilot abstraction addresses a problem already well-known at the
beginning of the twentieth century: {\bf task-level} distribution and
parallelism on multiple resources.
\note{this story adds no real value, but a fair amount of space, including the figure} In 1922 Lewis Fry Richardson devised a Forecast
Factory~\cite{lynch1999richardson} (Figure~\ref{fig:forecast_factory}) to solve
systems of differential equations for weather
forecasting~\cite{richardson1922weather}. This factory required 64,000 ``human
computers'' supervised by a senior clerk. The clerk would distribute portions of
the differential equations to the computers so that they could forecast the
weather of specific regions of the globe. The computers would perform their
calculations and then send the results back to the clerk. The Forecast Factory
was not only an early conceptualization of what is today called
``high-performance'' task-level parallelism, but also of the coordination
pattern for distributed and parallel computation called ``\MW''.
\begin{figure}[t]
\centering
\includegraphics[width=.45\textwidth]{figures/forecast-factory.jpg}
\caption{\textit{Forecast Factory} as envisioned by Lewis Fry Richardson.
Drawing by Fran{\c c}ois Schuiten.}
\label{fig:forecast_factory}
\end{figure}
The clerk of the Forecast Factory is the ``master'' while the human computers
are her ``workers''. Requests and responses go back and forth between the master
and all her workers. Each worker has no information about the overall
computation nor about the states of any other worker. The master has an
exclusive global view both of the overall problem and of its progress towards a
solution. As such, the {\bf \MW} is a coordination pattern allowing for the
structured distribution of tasks so as to orchestrate their parallel and
concurrent execution. This invariably translates into a reduced time to
completion of the overall computation when compared to a coordination pattern in
which each equation is sequentially solved by a single worker.
Modern silicon-based, high-performance machines introduced at least three key
differences compared to the carbon-based Forecast Factory devised by
Richardson. Most modern high-performance machines are meant to be used by
multiple users, i.e. they support multi-tenancy. Furthermore, diverse
high-performance machines are made available to the scientific community, each
with both distinctive and homogeneous properties in terms of architecture,
capacity, capabilities, and interfaces. Furthermore, high-performance machines
support different types of applications, depending on the applications'
communication and coordination models.
{\bf Multi-tenancy} has defined the way in which high-performance computing
resources are exposed to their users. Job schedulers, often called ``batch
queuing systems''~\cite{czajkowski1998} and first used in the time of punch
cards~\cite{katz1966,silberschatz1998}, adopt the batch processing concept to
promote efficient and fair resource sharing. Job schedulers implement a
usability model where users submit computational tasks called ``jobs'' to a
queue. The execution of these job is delayed waiting for the required amount of
resources to be available. The extent of delay depends mostly on the size and
duration of the submitted job, resource availability, and policies (e.g., fair
usage).
High-performance machines are often characterized by several types of
heterogeneity and diversity. Users are faced with job description languages,
submission commands, and configuration options. Furthermore, the number of
queues exposed to the users and their properties like walltime, duration, and
compute-node sharing policies vary from machine to machine. Finally, each
machine may be designed and configured to support only specific types of
application.
The resource provisioning of high-performance machines is limited, irregular,
and largely unpredictable~\cite{downey1997,wolski2003,li2004,tsafrir2007}. By
definition, the resources accessible and available at any given time can be less
than those demanded by all the active users. Furthermore, the resource usage
patterns are not stable over time and alternating phases of resource
availability and starvation are common~\cite{Furlani2013,Lu2013}. This landscape
has led not only to a continuous optimization of the management of each resource
but also to the development of alternative strategies to expose and serve
resources to the users.
% \jhanote{I do not think the two are equivalent. At least not in common usage.
% i.e., you can definitely do meta scheduling without multilevel scheduling.
% Please argue otherwise, else I will remove meta scheduling.}
{\bf Multi-level scheduling} is one of the strategies devised to improve
resource access across multiple high-performance and distributed machines. The
idea is to hide the scheduling point of each high-performance machine \note{not sure what a machine is here} behind a
single scheduler. The users or the applications submit their tasks to a
scheduler that negotiates and orchestrates the distribution of the tasks via the
scheduler of each available high-performance machine. While this approach
promises an increase in both scale and usability of applications, it also
introduces complexities across resources, middleware, and applications.
% \jhanote{need to be more specific than grid computing} \mtnote{Any insight in
% what kind of specificity you are thinking about? Grid and cloud computing
% are at the same level of generality so I will have to specify also the
% latter.}
Several approaches have been devised to manage the complexities associated with
multi-level scheduling. For example, some approaches target the resource
layer~\cite{raicu2007falkon,singh2005,ramakrishnan2006toward,foster2008,juve2008,villegas2012,song2009};
others the application layer as, for example, with workflow
systems~\cite{taylor2014,curcin2008scientific,juve2008,balderrama2012scalable}.
All these approaches offered and still offer some degree of success for specific
applications and use cases but a general solution based on well-defined and
robust abstractions has still to be devised and implemented. \note{why does this have to be done?}
% the approaches developed under the umbrellas of grid computing
% ~\cite{raicu2007,singh2005,ramakrishnan2006toward} or cloud
% computing~\cite{foster2008,juve2008,villegas2012,song2009},
One of the persistent issues besetting resource management across multiple
high-performance machines is the increase of the implementation complexity
imposed on the application layer. Even with solutions like grid
computing~\cite{berman2003grid,foster2003grid} aiming at effectively and, to
some extent, transparently integrating diverse resources, most of the
requirements involving the coordination of task execution still reside with the
application layer~\cite{legrand2003,krauter2002,darema2005}. This translates
into single-point solutions, extensive redesign and redevelopment of existing
applications when they need to be adapted to new use cases or new
high-performance machines, and lack of portability and interoperability.
Consider for example a simple distributed application implementing the \MW
pattern. With a single high-performance machine, the application requires the
capability of concurrently submitting tasks to the queue of the scheduler of the
high-performance machine, retrieve their outputs, and aggregate them. \note{awk} When
multiple high-performance machines are available, the application requires \note{the application requires?}
directly managing submissions to several queues or using a third-party scheduler
and its specific execution model. In both scenarios, the application requires a
large amount of development and capabilities that are not specific to the given
scientific problem but pertain instead to the coordination and management of its
computation.
The notion of resource placeholder was devised as a pragmatic solution to better
manage the complexity of executing distributed applications. A resource
placeholder decouples the acquisition of remote compute resource from their use
to execute the tasks of a distributed application. Resources are acquired by
scheduling a job onto the remote high-performance machine which, when executed,
is capable of retrieving and executing application tasks.
% Resources are acquired by scheduling a job onto the remote high-performance
% machine. Once executed, the job runs an agent capable of retrieving and
% executing application tasks.
{\bf Resource placeholders} bring together mul\-ti-\-le\-vel sche\-du\-ling to
enable parallel execution of the tasks of distributed applications. Multi-level
scheduling is achieved by scheduling the placeholder and then by enabling direct
scheduling of application tasks to that placeholder. Mul\-ti-\-le\-vel
sche\-du\-ling can be extended to multiple resources by instantiating resource
placeholders on diverse high-performance machines and then using a dedicated
scheduler to schedule tasks across all the placeholders.
% Mul\-ti-\-le\-vel sche\-du\-ling is achieved by scheduling the agent and then
% by enabling direct scheduling of application tasks to that agent. The \MW
% pattern is often an effective choice to manage the coordination of tasks
% execution on the available agent(s).
It should be noted that resource placeholders also mitigate the side-effects
introduced by a multi-tenant scheduling of resource placeholders. A placeholder
still spends a variable amount of time waiting to be executed by the batch
system of the remote high-performance machine, but, once executed, the user --
or the master process of the distributed application -- may hold total control
over its resources. In this way, tasks are directly scheduled on the placeholder
without competing with other users for the high-performance machine scheduler.
\note{it was fairly common in the 80s and 90s for a user of a batch supercomputer who
wanted interactive access, such as for debugging, to submit a batch job containing an
xterm. When the job started, the xterm window would appear and the user could then
use the system interactively. This is a an example of resource placeholding.}
% \msnote{I would like to either replace 'supercomputer' with i.e. 'cluster', or
% make it explicit in the beginning of 2.1 that we talk about multiple types
% of systems}
% \mtnote{Here I used supercomputer in its general meaning as computing device
% with a lot of computational capacity. I added a footnote, any better?}
% \jhanote{I think we should use neither. the generally acceptable albeit
% equally fuzzy term is high-performance machine or high-performance
% computing.}
% ------------------------------------------------------------------------------
% 2.2
\subsection{Brief History of \pilotjob Systems}
\label{sec:histimpl}
The \pilot abstraction has a rich set of properties~\cite{luckow2012towards}
that have been progressively implemented into multiple \pilotjob systems.
Figure~\ref{fig:timeline} shows the introduction of \pilotjob systems over time
while Figure~\ref{fig:pilotjob_clustering} shows their clustering along the axes
of workload management and pilot functionalities. Initially, \pilotjob systems
implemented core functionalities to utilize resources independently from the
resource management of the remote high-performance machines. Subsequently,
these systems progressively evolved to include advanced capabilities like
workload and data management.
% Starting from a set of core functionalities focused on acquiring remote
% resources and utilizing them independently from the resource management of the
% remote high-performance machine, \pilotjob systems progressively evolved to
% include advanced capabilities like workload and data management.
% As seen in Ref.~\cite{luckow2012towards}, the \pilot abstraction has a rich
% set of properties and its implementations offer a vast array of capabilities
% including multiple scheduling algorithms, data and compute placeholders, and
% late or early binding. Nonetheless, the capability of acquiring remote
% resources and directly utilizing them, independently from the supercomputer
% resource management, is a necessary property of the \pilot abstraction. As
% such, resource placeholders and their
% scheduling~\cite{Pinchak02practicalheterogeneous} should be seen as early
% \pilot system implementations.
% The progressive definition and implementation of the \pilot abstraction can be
% seen as the process of evolving both the understanding and implementation
% complexity of the notion of resource placeholder.
\begin{figure}[t]
% Put real dates in the comment here.
% Boinc: X
% BigJob: 200X
% etc.
\centering
\includegraphics[width=0.45\textwidth]{figures/timeline}
\caption{Introduction of systems over time. When available, the date of
first mention in a publication or otherwise the release date of software
implementation is used. \mtnote{Missing from from both Section 3 and 4:
WISDOM} \jhanote{I think PANDA is too far left..I would say post-2005?}}
\label{fig:timeline}
\end{figure}
%\footnote{http://wiki.nikhef.nl/biggrid/Using_the_Grid/ToPoS},
\begin{figure}[t]
\centering
\includegraphics[width=.45\textwidth]{figures/pilotjob-clustering.pdf}
\caption{A partial clustering of pilots along functionality. \mtnote{The
clustering is incomplete. Should we list all the \pilot systems we
mention?} \jhanote{We should for now just mention that its partial. More
importantly, should we revisit the axis labels?} \note{this is confusing, since Pegasus uses Glide-In - Pegasus is not the pilot system, Gilde-In is, and it should only show up in one oval, ideally}}
\label{fig:pilotjob_clustering}
\end{figure}
AppLeS~\cite{berman1996application} is a framework for application-level
scheduling and offers an example of an early implementation of resource
placeholders. AppLeS provides an agent that can be embedded into an application
thus enabling the application to acquire resources and to schedule tasks onto
these. Besides \note{in addition to} \MW, AppLeS also provides application templates, e.g., for
parameter sweep and moldable parallel applications~\cite{berman2003adaptive}.
AppLeS offered user-level control of scheduling but did not isolate the
application layer from the management and coordination of task execution. Any
change in the coordination mechanisms directly translated into a change of the
application code. The next evolutionary step was to create a dedicated
abstraction layer between those of the application and of the various batch
queuing systems available at remote systems.
Around the same time as AppLeS was introduced, volunteer computing projects
started using the \MW coordination pattern to achieve high-throughput
calculations for a wide range of scientific problems. The workers of these
systems could be downloaded and installed on the users workstation.
With an installation base distributed across the globe, workers pulled and
executed computation tasks when CPU cycles were available. \note{may also want to mention the Condor MW work here?}
The volunteer workers were essentially heterogeneous and dynamic as opposed to
the homogeneous and static AppLeS workers. Farming out tasks in a dynamic
distributed environment including personal computers promised to lower the
complexity of designing and implementing distributed applications. Each
volunteer worker behaves as an opportunistic resource placeholder and, as such,
implements the core functionality of the \pilot abstraction.
The first public volunteer computing projects were The Great Internet Mersenne
Prime Search effort\cite{woltman2004great}, shortly followed by
distributed.net~\cite{lawton2000distributed} in 1997 to compete in the
RC5-56 secret-key challenge, and the SETI@Home project, which set out to
analyze radio telescope data. The generic BOINC distributed master-worker
framework grew out of SETI@Home, becoming the {\it de facto} standard framework
for voluntary computing~\cite{anderson2004boinc}. \note{aren't there a number of other systems that are in common use too?}
It should be noted that the process of resource acquisition is different in
AppLes and volunteer computing. The former has prior knowledge of the available
resources while the latter has none. As a consequence, AppLes can request and
orchestrate a set of resources, allocate tasks in advance to specific workers
(i.e., resources placeholders), and implement load balancing among resources. In
voluntary computing tasks are pulled by the clients when they become active so
specific resource availability is unknown in advance. This is a potential
drawback but it is mitigated by the redundancy offered by the large scale that
voluntary computing can reach thanks to its simpler model of worker distribution
and installation. \note{is this just push vs pull? if so, why not say so?}
% The opportunistic use of geographically distributed resources championed by
% voluntary computing offers several advantages. The resource landscape
% available for scientific research is fragmented across multiple institutions,
% managed with different policies and protocols, and heterogeneous both in
% quantity and quality. Once aggregated, the sum of otherwise limited resources
% can support very large distributed computations and a great amount of
% multi-tenancy. Note that given the required capabilities, this model of
% resource provisioning can still support the execution of parallel applications
% on the few resources that offer low-latency network interconnect.\msnote{This
% paragraph is a good candidate for removal?}
% \jhanote{is ``batch'' redundant?} \mtnote{Probably
% (based on: http://research.cs.wisc.edu/htcondor/doc/condor-practice.pdf). I
% changed system to framework as system is used differently in the following
% sentence.}
HTCondor (formerly known as Condor) is a high-throughput distributed computing
framework that uses diverse and possibly geographically distributed
resources~\cite{thain2005}. Originally, HTCondor was created for systems within
one administrative domain but Flocking~\cite{epema1996worldwide} made it
possible to group multiple machines into aggregated resource pools. However,
resource management required system level software configurations that had to be
made \note{not sure configurations are 'made'} by the administrator of each individual machine of each resource pool.
% \jhanote{resource management could not be done on application level'' does not
% make sense to me. Are we referring to aggregation?}\mtnote{Better?}
This limitation was overcome by integrating a resource placeholder mechanism
within the HTCondor system. Gli\-de\-in~\cite{frey2002condorG} allowed users to
add grid resources to resource pools. In this way, users could uniformly execute
jobs on heterogeneous resource pools. Thanks to its use of resource
placeholders, Glidein was one of the systems pioneering the implementation of
the \pilot abstraction, enabling some \pilot capabilities also for third-party
systems like Bosco~\cite{weitzel2012campus}.
% \jhanote{Also there is a bit of care needed: we're implying glide-in is a
% resource placeholder -- which is part of a pilot, and not necessarily a full
% pilot.} \mtnote{We do not use \pilotjob system so I do not think we are
% implying that Glidein is a ``full pilot'' (even if I am not so sure what
% exactly a full pilot is). I slightly edited the sentence, any better?}
% \jhanote{as this is historical evolution, some parts need to be in the past
% tense. Care will be needed to get the tense right.} \mtnote{Better?}
The success of Glidein shows the relevance of the pilot abstraction to enable
scientific computation at scale and on heterogeneous resources. The
implementation of Glidein also highlighted at least two limitations: user/system
layer isolation, and application development model. While Glidein allows for the
user to manage resource placeholders directly, daemons must still be running on
the remote machines. This means that Glidein cannot be deployed without
involving the machine owners and system administrators. Implemented as a
service, Glidein supports integration with distributed application frameworks
but does not programmatically support the development of distributed
applications by means of dedicated APIs and libraries.
Concomitant and correlated with developments at LHC there was a ``Cambrian
Explosion'' of \pilotjob systems in the first decade of the millennium, e.\,g.\
DIANE~\cite{moscicki2003diane}, GlideinWMS, DIRAC~\cite{casajus2010dirac},
\panda~\cite{zhao2011panda}, AliEn~\cite{saiz2003alien}, and
Co-Pilot~\cite{buncicco2011co}. Each of these \pilotjobs serves a specific user
community and experiment at the LHC: DIRAC~\cite{casajus2010dirac} was developed
by the LHCb experiment~\cite{lhcb_url}; AliEn~\cite{saiz2003alien} by the ALICE
experiment; and \panda (Production and Distributed
Analysis)~\cite{zhao2011panda} by the ATLAS experiment~\cite{aad2008atlas}. Due
to socio-technical reasons, the CMS experiment at LHC mostly converged around
the HTCondor-Glidein-GlideinWMS~\cite{sfiligoi2008glideinwms} ecosystem.
Interestingly, these systems are functionally very similar, work on almost the
same underlying infrastructure, and serve applications with very similar (if not
identical) characteristics. Unsurprisingly,
Co-Pilot~\cite{buncicco2011co,harutyunyan2012cernvm}, another \pilotjob system
developed in the LHC context, promotes interoperability by integrating
grid-based \pilotjob systems (such as AliEn and \panda) with cloud and volunteer
computing resources.
% The BigJob \pilotjob system~\cite{luckow2010} was designed to address these
% limitations, to broaden the type of applications supported by the pilot-based
% execution model, and to extend the \pilot abstraction beyond the boundaries of
% compute tasks.
\pilotjob systems were developed alongside those tailored to the LHC
experiments to serve other research purposes, to target diverse types of
resources and middleware, or as special-purpose subsystems and frameworks. \note{awk}
The BigJob \pilotjob system~\cite{luckow2010} was designed to support task-level
parallelism on distributed HPC resources, to broaden the type of applications
supported by the pilot-based execution model, and, ultimately, to extend the
\pilot abstraction beyond the boundaries of compute tasks. BigJob offers
application-level programmability to provide the end-user with more flexibility
and control over the design of distributed application and the isolation of the
management of their execution. BigJob uses an interoperability library called
``SAGA'' (Simple API for Grid Applications) to work on a variety of
infrastructures~\cite{merzky2015saga,goodale2006,luckow2010}.
% Additionally, BigJob has also been extended to work with data
% and, analogous to compute pilots, to abstract away direct user communication
% between different storage systems.
% was recently re-implemented as a production-level tool named
% ``RADICAL-Pilot''~\cite{radical_pilot_paper}. rep- resents one of the latest
% evolutionary stages of the Pilot ab- straction. from an initial phase in which
% \msnote{The latter brings it back to the stage of apples, thats probably not
% what we want to say ...} \mtnote{Apologies, I am not sure I understand this
% comment.}
BigJob has recently been re-implemented as a production-level tool named
`RADICAL-Pilot'~\cite{merzky2015radical}. BigJob and now RADICAL-Pilot represent
an evolution of the \pilot abstraction: initially pilots were implemented as
\textit{ad hoc} place holder machinery for a specific application but evolved to
be integrated with the middleware of remote resources. Both BigJob and
RADICAL-Pilot implement the \pilot abstraction as an interoperable compute and
data management system that can be programmatically integrated into end-user
applications and thus provides both features.
% Another ongoing evolutionary trend has been to implement the \pilot
% abstractions into pilot-based workload managers, thus moving away from
% providing simple pilot capabilities in application space. These higher-level
% systems which are often centrally hosted, move critical functionality from the
% client to the server (i.e., a service model). These systems usually deploy
% pilot factories that automatically start new pilots on demand and integrate
% security mechanisms to support multiple users simultaneously.
% Several \pilotjob systems have been developed in the context of the LHC
% experiment at CERN, which is associated with a major increase in the uptake
% and availability of pilots, e.\,g.\ DIANE~\cite{moscicki2003diane},
% GlideinWMS, DIRAC~\cite{casajus2010dirac}, \panda~\cite{zhao2011panda},
% AliEn~\cite{saiz2003alien}, and Co-Pilot~\cite{buncicco2011co}. Each of these
% \pilotjob systems serves a particular user community and experiment.
% Interestingly, these systems are functionally very similar, work on almost the
% same underlying infrastructure, and serve applications with very similar (if
% not identical) characteristics.
% Co-Pilot provides components for building a framework for seamless and
% transparent integration of these resources into existing grid and batch
% computing infrastructures exploited by the High Energy Physics (HEP)
% community.
% The \pilot abstraction has also been integrated into scientific workflow
% systems.
GWPilot is a \pilot system defined to push the boundaries of implementation
efficiency~\cite{rubio2015gwpilot}. Aimed specifically to DCR exposing diverse
Grid middleware, GWPilot builds upon the GridWay
meta-scheduler~\cite{huedo2007modular} to allow the implementation of efficient
and reliable scheduling algorithms. Scheduling can be customized at user level
and the application level is well isolated from the \pilot system level.
\pilotjob systems have also proven an effective tool for managing the workloads
executed in the various stages of a scientific workflow. For example, the Corral
system~\cite{rynge2011experiences} has been developed to serve as a frontend to
HTCondor Glidein and to optimize glides (i.e., pilots) placement for the Pegasus
workflow system~\cite{deelman2015}. In contrast to GlideinWMS, Corral provides
more explicit control over the placement and start of pilots to the end-user.
Corral was later extended to serve also as a possible front end to GlideinWMS.
Swift~\cite{wilde2011swift} is a scripting language designed for expressing
abstract workflows and computations. The language also provides capabilities for
executing external application as well as the implicit management of data flows
between application tasks. Swift uses a \pilot implementation called ``Coaster
System''~\cite{coasters_url} \note{should cite hategan UCC 2011 paper elsewhere cited, rather than the URL I think} that supports various types of infrastructure,
including clouds and grids.
Swift has also been used in conjunction with Falkon~\cite{raicu2007falkon}.
Falkon was engineered for executing many small tasks on High Performance
Computing (HPC) systems and shows high performance compared to the native
queuing systems. Falkon is a paradigmatic example of how the \pilot abstraction
has been implemented to support specific workloads alongside investigating their
performance. Even if Falkon is now unmaintained, the insight gained by its
development has been used to improve the Coaster System.
% The proliferation of \pilotjob systems and their integration within other type
% of application and middleware systems,
% to support the execution of distributed and, increasingly, of parallel
% applications.
The brief description of the many \pilotjob system implementations introduced in
this section underlines a progressive appreciation for the \pilot abstraction
and the emergence of a \pilot paradigm. Nonetheless, the proliferation of
\pilotjob systems has been uncoordinated, developing across multiple dimensions
(see Figure~\ref{fig:pilotjob_clustering}), and making it difficult to
coherently understand the \pilot components, their functionalities,
implementations, and usages. % This hinders attempts at distinguishing \pilotjob
% system functionalities from those of other middleware and at appreciating the
% distinguishing characteristics of the \pilot paradigm.
The evolution of \pilots attests to their usefulness across a wide range of
deployment environments and application scenarios, but the divergence in
specific functionality and inconsistent terminology calls for a standard
vocabulary to assist in understanding the varied approaches and their
commonalities and differences. This is the primary motivation of the next
section.
% Some distinctions in terms of design, usage, and operation modes can be
% identified. Figure~\ref{fig:pilotjob_clustering} is a graphical representation
% of this clustering.
% The evolution of the \pilot paradigm and proliferation of systems has been
% uncoordinated, leading to an inconsistent terminology related to the \pilot
% abstraction, its implementations and usage. A coherent understanding of the
% \pilot components and functionalities is still missing, thus hindering attempts
% at distinguishing it from other functionality and middleware systems.
%leading to a blurred definition of \pilot abstraction and how it should be
%------------------------------------------------------------------------------
% SECTION 3
%------------------------------------------------------------------------------
\newcommand{\vocab}[1]{\textbf{#1}\xspace}
\newcommand{\prop}[1]{\textit{#1}\xspace}
\newcommand{\impterm}[1]{\texttt{#1}\xspace}
\section{Understanding the Landscape: Developing a Vocabulary}
\label{sec:understanding}
\note{don't need connecting text at the end of the previous section and the start of this one - just pick one, please}
The overview presented in \S\ref{sec:history} shows a degree of heterogeneity
both in the functionalities and the vocabulary adopted by different \pilotjob
systems. Implementation details sometimes hide the functional commonalities and
differences among \pilotjob systems. Features and capabilities tend to be named
inconsistently, often with the same terms referring to multiple concepts or the
same concept named in different ways.
This section offers a description of the logical components and functionalities
shared by every \pilotjob system and the definition of a consistent terminology.
The goal is to offer a paradigmatic description of a \pilotjob system and a
well-defined vocabulary to reason about such a description and, eventually,
about its multiple implementations.
\note{didn't some of this get presented in an eScience or HPDC paper in Delft? If so, is that paper cited here?}
% \jhanote{would ``description'' be better than ``analysis'' in the first
% sentence?} \mtnote{Done.}
%------------------------------------------------------------------------------
% 3.1
\subsection{Logical Components and Functionalities}
\label{sec:compsandfuncs}
All \pilotjob systems introduced in~\S\ref{sec:history} are engineered to allow
for the execution of multiple types of workloads on machines with diverse
middleware, e.g., grid, cloud, or HPC. \note{maybe make the point that different \pilotjob systems are optimized for different things - you seem to hint at it, but don't quite say it} This is achieved in many ways, depending
on use cases, design and implementation choices, and on the constraints imposed
by the middleware and policies of the targeted machines. The common denominators
among \pilotjob systems are defined along three dimensions: purpose, logical
components, and functionalities.
The purpose shared by every \pilotjob system is to improve workload execution
when compared to executing the same workload directly on one or more machines. \note{still unhappy with `machines'}
Performance of workload execution is usually measured by throughput and time to
completion, but other metrics could also be considered: data transfer time,
scale of the workload executed, power consumption, or a mix of them. Metrics
that are not related to performance include reliability, ease of application
deployment, and generality of workload. In order to achieve the required metrics
under given constraints, each \pilotjob system exhibits characteristics that are
either common or specific to one or more implementations. Discerning these
characteristics requires isolating the minimal set of logical components that
characterize every \pilotjob system.
At some level, all \pilotjob systems employ three separate but coordinated
logical components: \note{maybe say that they have these functions, rather than these components?} a \vocab{Pilot Manager}, a \vocab{Workload Manager}, and a
\vocab{Task Manager}. The Pilot Manager handles the description, instantiation,
and use of one or more resource placeholders (i.e., pilots) on single or
multiple machines. The Workload Manager handles the scheduling of one or more
workloads on the available resource placeholders. The Task Manager takes care of
executing the tasks of each workload by means of the resources held by the
placeholders.
The implementation details of these three logical components vary significantly
across \pilotjob systems (see~\S\ref{sec:analysis}). For example, two or more
logical components \note{again, functions would be better here} may be implemented by a single software module or additional
functionalities may be integrated into the three management components.
Nevertheless, the Pilot, Workload, and Task Managers can always be
distinguished across different \pilotjob systems.
% One or more logical components may be responsible for specific
% functionalities, both on application as well as machine level;
% \jhanote{Should we use ``execution of tasks'' in opening sentence. We talk
% about executing tasks before and after, and not necessarily workloads.
% Issue of consistency and granularity and not of correctness.} \mtnote{I
% reread from the beginning and I think we use workload and task consistently:
% ``All \pilotjob systems introduced in~\S\ref{sec:history} are engineered to
% allow for the execution of multiple types of workloads'', ``The purpose
% shared by every
% \pilotjob system is to improve workload execution'', ``The Workload Manager
% handles the scheduling of one or more workloads'', ``The Task Manager takes
% care of executing the tasks of each workload''}
Each \pilotjob system supports a minimal set of functionalities \note{probably functionality - be careful with functionalities, as it's usually wrong} that allow for
the execution of workloads: \vocab{Pilot Provisioning}, \vocab{Task
Dispatching}, and \vocab{Task Execution}. \pilotjob systems need to schedule
resource placeholders on the target machines, schedule tasks on the available
placeholders, and then \note{then might not be correct - in fact, many times this happens before the tasks are scheduled to the placeholders} \note{and why do you use placeholders rather than saying pilot jobs?} use these placeholders to execute the tasks of the given
workload. More functionalities might be needed to implement a production-grade
\pilotjob system. For example, authentication, authorization, accounting, data
management, fault-tolerance, or load-balancing. While these functionalities may
be critical implementation details, they depend on the specific characteristics
of the given use cases, workloads, or targeted resources. As such, these
functionalities should not be considered necessary characteristics of a
\pilotjob system.
Among the core functionalities that characterize every \pilotjob system, Pilot
Provisioning is essential because it allows for the creation of resource
placeholders. \note{for coasters, this is very ad hoc compared with other functions} As seen in~\S\ref{sec:history}, this type of placeholder enables
tasks to utilize resources without directly depending on the capabilities
exposed by the target machines. Resource placeholders are scheduled onto target
machines by means of dedicated capabilities, but once scheduled and then
executed, these placeholders make their resources directly available for the
execution of the tasks of a workload.
% \jhanote{possibly use ``resources'' in lieu of remote machines?} \mtnote{Would
% that overload the term with two separate meanings: resource as what is held
% and resource as DCR?}\jhanote{I removed ``remote'', retained machine. I
% want to avoid implying placeholders have to be distributed from the point of
% submission.} \mtnote{Great.}
% \jhanote{We may want to introduce the vocabulary of resource/DCR/DCI that we
% developed for the proposals here, and make it consistent throughout the
% paper. In this para for example we use the term DCI resource, which is
% inconsistent with developed vocabulary} \mtnote{If we want the definitions
% here, then we should probably move all of them before this subsection. Do
% you want me to do it? Meanwhile, I rephrased the whole subsection avoiding
% DCI altogether and added the definition of DCR to the next subsection.}
% MS: I would move this comment to section 5 I think, as multi-tenant pilot
% systems do have to make these trade-offs, and it would be good to point that
% out. (not doing it now because of MT lock in 5) Furthermore, resource
% placeholders are logical partitions of resources that do not need to leverage
% trade-offs among competing user requirements as needed instead with large
% pools of resources adopting multi-tenancy.
The provisioning of resource placeholders depends on the capabilities exposed by
the middleware of the targeted machine and on the implementation of each \pilot
system. Typically, on middleware for resources adopting queues, \note{adopting is a funny word here - either the resources has queues or it doesn't} batch systems,
and schedulers, provisioning a placeholder involves it being submitted as a
job. For such middleware, a job is a type of logical container that includes
configuration and execution parameters alongside information on the application
to be executed on the machine's compute resources. Conversely, for machines
without a job-based middleware, a resource placeholder would be executed by
means of other types of logical container as, for example, a virtual machine or
a Docker Engine~\cite{bernstein2014,felter2014}.
% \mtnote{Too many execut*. Should we use ``code'' instead of `executable'? Any
% better option than ``code'' to replace `executable'?} \jhanote{used
% application, but task might be better? code is acceptable too.}
Once resource placeholders are bound to the resources of a machine, tasks need
to be dispatched to those placeholders for execution. Task dispatching does not
depend on the functionalities of the targeted machine's middleware so it can be
implemented as part of the \pilotjob system. In this way, the control over the
execution of a workload is shifted from the machine's middleware to the \pilot
system. This shift is a defining characteristic of the \pilot paradigm, as it
decouples the execution of a workload from the need to submit its tasks via the
machine's scheduler. For example, the execution of individual tasks of a
workload will not depend upon the specifics of the targeted machine's state or
availability, but rather on those of the placeholder. More elaborate execution
patterns involving task and data dependences can thus be implemented independent
of the capabilities and constraints of the target machine's middleware.
Ultimately, this is how \pilotjob systems allow for the direct control of
workload execution and the optimization, for example, of execution throughput.
% For example, the tasks of a workload will not individually have to wait on the
% targeted machine's queues, but rather on the availability of the placeholder
% before being executed.