pilotreview-2013.tex

\documentclass{sig-alternate}

\input{head}
\input{include}

%\usepackage{listings}
\usepackage{paralist}
\usepackage{multirow}

% \usepackage{draftwatermark}
% \SetWatermarkLightness{0.8}
% \SetWatermarkText{Draft}
% \SetWatermarkVerCenter{13cm}
% \SetWatermarkHorCenter{10cm}
% \SetWatermarkScale{1}


\begin{document}
\CopyrightYear{2015}

\title{A Comprehensive Perspective on the \pilotjob Abstraction}

%\jhanote{either abstraction is singular and therefore ``the'', or we use the
%plural form}}

\numberofauthors{3}

\author{
  \alignauthor Matteo Turilli \\
    \affaddr{RADICAL Laboratory, ECE}\\
    \affaddr{Rutgers University}\\
    \affaddr{New Brunswick, NJ, USA}\\
    \email{matteo.turilli@rutgers.edu}
    % \email{}
  % 2nd. author
  \alignauthor Mark Santcroos\\
    \affaddr{RADICAL Laboratory, ECE}\\
    \affaddr{Rutgers University}\\
    \affaddr{New Brunswick, NJ, USA}\\
    \email{mark.santcroos@rutgers.edu}
    % \email{}
  \and
  % 3rd. author
  \alignauthor Shantenu Jha\titlenote{Corresponding author}\\
    \affaddr{RADICAL Laboratory, ECE}\\
    \affaddr{Rutgers University}\\
    \affaddr{New Brunswick, NJ, USA}\\
    \email{shantenu.jha@rutgers.edu}
    % \email{}
  % use '\and' if you need 'another row' of author names
  % \and
  % % 4th. author
  % \alignauthor Lawrence P. Leipuner\\
  %   \affaddr{Brookhaven Laboratories}\\
  %   \affaddr{Brookhaven National Lab}\\
  %   \affaddr{P.O. Box 5000}\\
  %   \email{lleipuner@researchlabs.org}
  % % 5th. author
  % \alignauthor Sean Fogarty\\
  %   \affaddr{NASA Ames Research Center}\\
  %   \affaddr{Moffett Field}\\
  %   \affaddr{California 94035}\\
  %   \email{fogartys@amesres.org}
  % % 6th. author
  % \alignauthor Charles Palmer\\
  %   \affaddr{Palmer Research Laboratories}\\
  %   \affaddr{8600 Datapoint Drive}\\
  %   \affaddr{San Antonio, Texas 78229}\\
  %   \email{cpalmer@prl.com}
}

\maketitle

\begin{abstract}

\note{move first sentence down} This paper offers a comprehensive analysis of the \pilotjob abstraction
assessing its evolution, properties, and implementation as multiple \pilotjob
software systems.  \pilotjob systems play an important role in supporting
distributed scientific computing. They are used to consume more than 700
million CPU hours a year by the Open Science Grid communities, and by
processing up to 5 million jobs a week for the ATLAS experiment on the
Worldwide LHC Computing Grid. With the increasing importance of task-level
parallelism in high-performance computing, \pilotjob systems are also
witnessing an adoption beyond traditional domains. Notwithstanding the growing
impact on scientific research, there is no agreement upon a definition of
\pilotjob system and no clear understanding of the underlying \pilot
abstraction and paradigm. \note{not even clear there is a \pilotjob abstraction} This lack of foundational understanding has lead to
a proliferation of unsustainable \pilotjob implementations with no shared best
practices or interoperability, ultimately hindering a realization of the full
impact of \pilotjobs.  \note{prev sentence is not justified} This paper offers the conceptual tools to promote this
fundamental understanding while critically reviewing the state of the art of
\pilotjob implementations. The five main contributions of this paper are: (i)
an analysis of the motivations and evolution of the \pilotjob abstraction;
(ii) an outline of the minimal set of distinguishing functionalities; (iii)
the definition of a core vocabulary to reason consistently about \pilotjobs;
(iv) the description of core and auxiliary properties of \pilotjob systems;
and (v) a critical review of the current state of the art of their
implementations. These contributions are brought together to illustrate the
generality of the \pilotjob paradigm, to discuss some challenges in
distributed computing that it addresses and future opportunities. \note{odd grammar in last sentence}

  % These contributions are brought together to illustrate the defining
  % characteristics of the \pilotjob paradigm, its generality, and the main
  % opportunities and challenges posed by its support of distributed computing.

%  dispersing the available resources across a fragmented development landscape

% There is no agreed upon definition of \pilotjobs; however a functional
% attribute of \pilotjobs that is generally agreed upon is they are
% tools/services that support multi-level and/or application-level scheduling by
% providing a scheduling overlay on top of the system-provided schedulers.
% Nearly everything else is either specific to an implementation, open to
% interpretation or not agreed upon. For example, are \pilotjobs part of the
% application space, or part of the services provided by an infrastructure? We
% will see that close-formed answers to questions such as whether \pilotjobs are
% system-level or application-level capabilities are likely to be elusive.
% Hence, this paper does not make an attempt to provide close-formed answers,
% but aims to provide appropriate context, insight and analysis of a large
% number of \pilotjobs, and thereby bring about a hitherto missing consistence
% in the community's appreciation of \pilotjobs.  Specifically this paper aims
% to provide a comprehensive perspective of \pilotjobs. A primary motivation for
% this work stems from our experience when looking for an interoperable,
% extensible and general-purpose \pilotjob; in the process, we realized that
% such a capability did not exist. The situation was however even more
% unsatisfactory: in fact there was no agreed upon definition or conceptual
% framework of \pilotjobs.  To substantiate these points of view, we begin by
% discussing some existing \pilotjobs and the different aspects of these
% \pilotjobs, such as the applications scenarios that they have been used and
% how they have been used. The limited but sufficient sampling highlights the
% variation, and also provides both a motivation and the basis for developing an
% implementation agnostic terminology and vocabulary to understand
% \pilotjobs; Section \S3 attempts to survey the landscape/eco-system of
% \pilotjobs.  With an agreed common framework/vocabulary to discuss and
% describe \pilotjobs, we proceed to analyze the most commonly utilized
% \pilotjobs and in the process provide a comprehensive survey of \pilotjobs,
% insight into their implementations, the infrastructure that they work on, the
% applications and application execution modes they support, and a frank
% assessment of their strengths and limitations.  An inconvenient but important
% question -- both technically and from a sustainability perspective that must
% be asked: why are there so many similar seeming, but partial and slightly
% differing implementations of \pilotjobs, yet with very limited
% interoperability amongst them?  Examining the reasons for this
% state-of-affairs provides a simple yet illustrative case-study to understand
% the state of the art and science of tools, services and middleware
% development.  Beyond the motivation to understand the current landscape of
% \pilotjobs from both a technical and a historical perspective, we believe a
% survey of \pilotjobs is a useful and timely undertaking as it provides
% interesting insight into understanding issues of software sustainability.
%
% believe that a survey of \pilotjobs provides and appreciation for the richness
% of the \pilotjobs landscape.  is not to discuss the \pstar conceptual
% framework, but That led to the \pstar model.
\end{abstract}

\keywords{\pilotjob, \pilot abstraction, distributed applications, distributed
  systems, distributed resource management}


% -----------------------------------------------------------------------------
% INTRODUCTION
%
\section{Introduction}
\label{sec:intro}

% \jhanote{Building tools and components that have well-defined and
%   well-characterized behavior, including performance. This leads to
%   descriptive models of pilot-jobs, which while pervasive in distributed
%   computing, are conspicuous by their absence in high-performance and
%   data-intensive computing. By providing a firm theoretical underpinning
%   pilot-jobs [3], one can provide a more “programmable” and flexible yet
%   common pilot-job for different types of distributed infrastructure, and also
%   extend the concept of pilot-jobs to high-performance and data-intensive
%   computing [4]}

The seamless uptake of distributed computing infrastructures by scientific
applications has been limited by the lack of pervasive and simple-to-use
abstractions at the development, deployment, and execution level. \note{big claim, no justification or citation.}

As suggested by the proliferation of \pilotjob systems used on production
distributed computing, \pilotjobs are arguably one of the few widely-used
abstraction.  \note{not sure there is an \pilotjob abstraction yet - you need to show this before you claim it} A variety of \pilotjob systems have emerged:
Glidein/GlideinWMS~\cite{frey2002condorG}, the Coaster
System~\cite{wilde2011swift}, DIANE~\cite{moscicki2003diane},
DIRAC~\cite{casajus2010dirac}, \panda~\cite{chiu2010pilot},
GWPilot~\cite{rubio2015gwpilot}, Nimrod/G~\cite{buyya2000nimrod},
Falkon~\cite{raicu2007falkon}, MyCluster~\cite{walker2006creating} to name a
few. These systems are for the most part functionally equivalent and motivated
by similar objectives; nonetheless, their implementations often serve specific
use cases, target specific resources, and lack in interoperability. \note{why is this a problem?}

% as they support the decoupling of workload submission from resource
% assignment. N

% The situation is reminiscent of the proliferation of functionally similar yet
% incompatible workflow systems, where in spite of significant {\it a
% posteriori} effort on workflow system extensibility and interoperability,
% these objectives remain difficult if not unfeasible.

% \pilotjobs excel in terms of the number and types of applications that use
% them, as well as the number of production distributed cyberinfrastructures
% that support them. \msnote{ref?}

% \mtnote{Should we use just `pilot'}\jhanote{I think you have proposed a
% graceful transition: from pilotjobs to pilotsystems? If so, should we stick
% with pilotjobs here?}

The fundamental reason for the proliferation of \pilotjob systems is that they
provide a simple solution to the rigid and static resource management
historically found in high-performance and distributed computing.  There are two
ways in which \pilotjobs break free of the rigid resource utilization model: (i)
through a process often referred to as
late-binding~\cite{moscicki2011,glatard2010,delgado2014}, \pilotjobs make the
selection of heterogeneous and dynamic resources easier and effective; and (ii)
\pilotjobs decouple the workload specification from the task execution
management.  The former results in the ability to utilize resources
``dynamically'', the latter simplifies the scheduling of workloads on those
resources.

% management of improving the efficiency of task assignment while shielding
% applications from having to manage tasks across such resources.

%   \onote{I think the most important reasons why Pilot Jobs being so popular
%     (and re-invented over and over again) is that they allow the execution of
%     small (i.e., singe / few-core) tasks efficiently on HPC infrastrucutre by
%     massively reducing queueing time. HPC sites (from schedulers to policies)
%     have always been (and still are) discrimatory against this type of
%     workload in favor of the large, tightly-coupled ones. Pilot-Jobs try to
%     counteract. While this is certainly not the main story that we want to
%     tell, this should IMHO still be mentioned. } \jhanote{This is definitely
%     one of the main reasons, but as Melissa pointed out it during RADICAL
%     call, it is by no means the only reason. Need to get the different reasons
%     down here.. then find a nice balance and description}

% (thus providing {\it post-facto} justification of its needs)

% \mtnote{Should we have a paragraph explaining the core contribution offered by
% this paper?} \jhanote{yes} \mtnote{Should I write a first draft of it?}

% Though mostly as a pragmatic solution to the need of improving throughput
% performance of distributed applications.

\pilotjobs have been almost exclusively developed within pre-existing systems
and middleware satisfying specific scientific requirements. As a consequence,
the development of \pilotjobs have not been grounded on a robust understanding
of underpinning abstractions, or on a well-understood set of dedicated design
principles. Furthermore, the terminology used to describe functionally
equivalent systems is inconsistent; the proliferation and specificity are a
manifestation of a lack of common understanding and vocabulary, as well as a
compounding factor. \note{compounding factor to what?} Not surprisingly, the functionalities and properties of
\pilotjobs have been understood mostly, if not exclusively, in relation to the
needs of the containing software systems or of the use cases justifying their
immediate development.

This approach is not problematic in itself and has led to effective
implementations that serve many million jobs a year on diverse computing
platforms~\cite{maeno2014evolution,katz2012}. However, the lack of conceptual
clarity and an explicit enunciation of the \pilotjob computing paradigm has
undermined the development of specific implementations as well as resulting in
an unsustainable software ecosystem. \note{prove this?} This limitation is illustrated not only by
the duplication of effort, but also by an overall immaturity of the available
systems in terms of functionalities, flexibility, portability, interoperability,
and, most often, robustness. Ultimately, these also contribute to a high-cost of
development and low software sustainability. \note{it seems like the symptom is that
multiple such systems are developed, but does this show that the cause is lack of conceptual understanding?  An alternate explanation is 'not invented here' or getting funding to write
something new is easier than getting funding to use something old}

This survey \note{what survey?  this is a perspective article by title} is motivated by the fact that, in spite of the demonstrated
potential and proliferation of \pilotjob systems, there remains significant lack
of clarity and understanding about the \pilotjob abstraction. As alluded to,
this has resulted in significant overhead and repetition of effort. \note{again, not clear that cause and effect are what you think they are} Looking
forward, with the growing importance and need for scalable task-level
parallelism and dynamic resource management in high-performance computing, the
lack of conceptual clarity might have similar and potentially profound
consequences for the next generation of supercomputing. \note{or it might not} 

This paper offers a critical analysis of the current state of the art providing
the conceptual tooling required to appreciate the properties of the \pilot
paradigm, i.e. the abstraction and the methodology underlying \pilotjobs
systems. \note{prev is confusing and awkward} The remainder of this paper is divided into four sections.
\S\ref{sec:history} offers a critical review of the functional underpinnings of
the \pilot abstraction and how it has been evolving into \pilotjob systems and
systems with pilot-like characteristics.

In~\S\ref{sec:understanding}, the minimal set of capabilities and properties
characterizing the design of a \pilotjob system are derived. A vocabulary is
then defined to be consistently used across
\pilotjob system designs and implementations.

In~\S\ref{sec:analysis}, the focus shifts from analyzing the design of a
\pilotjob system to critically reviewing the characteristics of a representative
set of its implementations. Core and auxiliary implementation properties are
introduced and then used alongside the functionalities and terminology defined
in~\S\ref{sec:understanding} to describe and compare \pilotjob system
implementations.

Finally,~\S\ref{sec:discussion} closes the paper by outlining the \pilot
paradigm, arguing for its generality, and elaborating on how it impacts and
relates to both other middleware and the application layer. The outcome of the
critical review of the current implementation state of the art is used to give
insights about the future directions and challenges faced by the \pilot
paradigm.


% -----------------------------------------------------------------------------
% SECTION 2
%
%\section{Functional Underpinnings and Evolution of Pilot Abstraction}
\section{Evolution of Pilot Abstraction and Systems}
\label{sec:history}

% The origin and motivations for devising the \pilot abstraction, developing its
% many implementations and realize a full-fledge \pilot paradigm can be traced
% back to five main notions:

At least five features need elucidation to understand the technical origins and
motivations of the \pilot abstraction: task-level distribution and parallelism,
\MW pattern, multi-tenancy, multi-level scheduling, and resource placeholding.
Even if \note{they aren't - why say even if} these features taken individually are not unique to the \pilot
abstraction, the \pilot abstraction brings them together towards an integrated
and collective capability. This section offers an overview of these five
features and an analysis of their relationship with the \pilot abstraction. A
chronological perspective is taken so as to contextualize the evolution of the
\pilot abstraction into its diverse implementations.

% the variation in their scope, semantics, and implementation is one of the
% defining reasons for the multiplicity of this abstraction implementations.

\jhanote{I don't think the \MW pattern is a functional area..A candidate for
removal from detailed discussion in 2.1?}\mtnote{I would prefer to remove
`functional' than \MW discussion from 2.1 as that serves as a base for the
somewhat less extended discussion in 3. Would that work?} \mtnote{I am not sure
what it means for an abstraction to ``initiate'' a feature so I removed that
predicate.} \mtnote{I shortened the paragraph to make clearer the relationship
between the five features and the \pilot abstraction.}

\mtnote{Possibly add a paragraph summarizing the salient evolutionary steps of
  \pilot systems as described in subsection 2.2.}

% ------------------------------------------------------------------------------
% 2.1
\subsection{Functional Underpinnings of the Pilot Abstraction} \note{why doesn't this subsection title match the description/text of the five features in the section intro?}
\label{sec:histabstr}

To the best of the authors' knowledge, the term ``pilot'' was first coined in
2004 in the context of the Large Hadron Collider (LHC) Computing Grid (WLCG)
Data Challenge\footnote{Based on private communication.}
\cite{lhc_url,lhc1995large,wlcg_url,bonacorsi2007wlcg}, and then introduced in
writing as ``pilot-agent'' in a 2005 LHCb
report~\cite{nobrega2005lhcb,lhcb_url}. Despite its relatively recent explicit
naming, the \pilot abstraction addresses a problem already well-known at the
beginning of the twentieth century: {\bf task-level} distribution and
parallelism on multiple resources.

\note{this story adds no real value, but a fair amount of space, including the figure} In 1922 Lewis Fry Richardson devised a Forecast
Factory~\cite{lynch1999richardson} (Figure~\ref{fig:forecast_factory}) to solve
systems of differential equations for weather
forecasting~\cite{richardson1922weather}. This factory required 64,000 ``human
computers'' supervised by a senior clerk. The clerk would distribute portions of
the differential equations to the computers so that they could forecast the
weather of specific regions of the globe. The computers would perform their
calculations and then send the results back to the clerk. The Forecast Factory
was not only an early conceptualization of what is today called
``high-performance'' task-level parallelism, but also of the coordination
pattern for distributed and parallel computation called ``\MW''.

\begin{figure}[t]
  \centering
    \includegraphics[width=.45\textwidth]{figures/forecast-factory.jpg}
  \caption{\textit{Forecast Factory} as envisioned by Lewis Fry Richardson.
    Drawing by Fran{\c c}ois Schuiten.}
  \label{fig:forecast_factory}
\end{figure}

The clerk of the Forecast Factory is the ``master'' while the human computers
are her ``workers''. Requests and responses go back and forth between the master
and all her workers. Each worker has no information about the overall
computation nor about the states of any other worker. The master has an
exclusive global view both of the overall problem and of its progress towards a
solution. As such, the {\bf \MW} is a coordination pattern allowing for the
structured distribution of tasks so as to orchestrate their parallel and
concurrent execution. This invariably translates into a reduced time to
completion of the overall computation when compared to a coordination pattern in
which each equation is sequentially solved by a single worker.

Modern silicon-based, high-performance machines introduced at least three key
differences compared to the carbon-based Forecast Factory devised by
Richardson. Most modern high-performance machines are meant to be used by
multiple users, i.e. they support multi-tenancy. Furthermore, diverse
high-performance machines are made available to the scientific community, each
with both distinctive and homogeneous properties in terms of architecture,
capacity, capabilities, and interfaces. Furthermore, high-performance machines
support different types of applications, depending on the applications'
communication and coordination models.

{\bf Multi-tenancy} has defined the way in which high-performance computing
resources are exposed to their users. Job schedulers, often called ``batch
queuing systems''~\cite{czajkowski1998} and first used in the time of punch
cards~\cite{katz1966,silberschatz1998}, adopt the batch processing concept to
promote efficient and fair resource sharing. Job schedulers implement a
usability model where users submit computational tasks called ``jobs'' to a
queue. The execution of these job is delayed waiting for the required amount of
resources to be available. The extent of delay depends mostly on the size and
duration of the submitted job, resource availability, and policies (e.g., fair
usage).

High-performance machines are often characterized by several types of
heterogeneity and diversity. Users are faced with job description languages,
submission commands, and configuration options. Furthermore, the number of
queues exposed to the users and their properties like walltime, duration, and
compute-node sharing policies vary from machine to machine. Finally, each
machine may be designed and configured to support only specific types of
application.

The resource provisioning of high-performance machines is limited, irregular,
and largely unpredictable~\cite{downey1997,wolski2003,li2004,tsafrir2007}. By
definition, the resources accessible and available at any given time can be less
than those demanded by all the active users. Furthermore, the resource usage
patterns are not stable over time and alternating phases of resource
availability and starvation are common~\cite{Furlani2013,Lu2013}. This landscape
has led not only to a continuous optimization of the management of each resource
but also to the development of alternative strategies to expose and serve
resources to the users.

% \jhanote{I do not think the two are equivalent. At least not in common usage.
%   i.e., you can definitely do meta scheduling without multilevel scheduling.
%   Please argue otherwise, else I will remove meta scheduling.}

{\bf Multi-level scheduling} is one of the strategies devised to improve
resource access across multiple high-performance and distributed machines. The
idea is to hide the scheduling point of each high-performance machine \note{not sure what a machine is here} behind a
single scheduler. The users or the applications submit their tasks to a
scheduler that negotiates and orchestrates the distribution of the tasks via the
scheduler of each available high-performance machine. While this approach
promises an increase in both scale and usability of applications, it also
introduces complexities across resources, middleware, and applications.

% \jhanote{need to be more specific than grid computing} \mtnote{Any insight in
%   what kind of specificity you are thinking about? Grid and cloud computing
%   are at the same level of generality so I will have to specify also the
%   latter.}

Several approaches have been devised to manage the complexities associated with
multi-level scheduling. For example, some approaches target the resource
layer~\cite{raicu2007falkon,singh2005,ramakrishnan2006toward,foster2008,juve2008,villegas2012,song2009};
others the application layer as, for example, with workflow
systems~\cite{taylor2014,curcin2008scientific,juve2008,balderrama2012scalable}.
All these approaches offered and still offer some degree of success for specific
applications and use cases but a general solution based on well-defined and
robust abstractions has still to be devised and implemented. \note{why does this have to be done?}

% the approaches developed under the umbrellas of grid computing
% ~\cite{raicu2007,singh2005,ramakrishnan2006toward} or cloud
% computing~\cite{foster2008,juve2008,villegas2012,song2009},

One of the persistent issues besetting resource management across multiple
high-performance machines is the increase of the implementation complexity
imposed on the application layer. Even with solutions like grid
computing~\cite{berman2003grid,foster2003grid} aiming at effectively and, to
some extent, transparently integrating diverse resources, most of the
requirements involving the coordination of task execution still reside with the
application layer~\cite{legrand2003,krauter2002,darema2005}. This translates
into single-point solutions, extensive redesign and redevelopment of existing
applications when they need to be adapted to new use cases or new
high-performance machines, and lack of portability and interoperability.

Consider for example a simple distributed application implementing the \MW
pattern. With a single high-performance machine, the application requires the
capability of concurrently submitting tasks to the queue of the scheduler of the
high-performance machine, retrieve their outputs, and aggregate them. \note{awk} When
multiple high-performance machines are available, the application requires \note{the application requires?}
directly managing submissions to several queues or using a third-party scheduler
and its specific execution model. In both scenarios, the application requires a
large amount of development and capabilities that are not specific to the given
scientific problem but pertain instead to the coordination and management of its
computation.

The notion of resource placeholder was devised as a pragmatic solution to better
manage the complexity of executing distributed applications. A resource
placeholder decouples the acquisition of remote compute resource from their use
to execute the tasks of a distributed application. Resources are acquired by
scheduling a job onto the remote high-performance machine which, when executed,
is capable of retrieving and executing application tasks.

% Resources are acquired by scheduling a job onto the remote high-performance
% machine. Once executed, the job runs an agent capable of retrieving and
% executing application tasks.

{\bf Resource placeholders} bring together mul\-ti-\-le\-vel sche\-du\-ling to
enable parallel execution of the tasks of distributed applications. Multi-level
scheduling is achieved by scheduling the placeholder and then by enabling direct
scheduling of application tasks to that placeholder. Mul\-ti-\-le\-vel
sche\-du\-ling can be extended to multiple resources by instantiating resource
placeholders on diverse high-performance machines and then using a dedicated
scheduler to schedule tasks across all the placeholders.

% Mul\-ti-\-le\-vel sche\-du\-ling is achieved by scheduling the agent and then
% by enabling direct scheduling of application tasks to that agent. The \MW
% pattern is often an effective choice to manage the coordination of tasks
% execution on the available agent(s).

It should be noted that resource placeholders also mitigate the side-effects
introduced by a multi-tenant scheduling of resource placeholders. A placeholder
still spends a variable amount of time waiting to be executed by the batch
system of the remote high-performance machine, but, once executed, the user --
or the master process of the distributed application -- may hold total control
over its resources. In this way, tasks are directly scheduled on the placeholder
without competing with other users for the high-performance machine scheduler.

\note{it was fairly common in the 80s and 90s for a user of a batch supercomputer who
wanted interactive access, such as for debugging, to submit a batch job containing an
xterm. When the job started, the xterm window would appear and the user could then
use the system interactively.  This is a an example of resource placeholding.}

% \msnote{I would like to either replace 'supercomputer' with i.e. 'cluster', or
%   make it explicit in the beginning of 2.1 that we talk about multiple types
%   of systems}
% \mtnote{Here I used supercomputer in its general meaning as computing device
%   with a lot of computational capacity. I added a footnote, any better?}
% \jhanote{I think we should use neither. the generally acceptable albeit
%   equally fuzzy term is high-performance machine or high-performance
%   computing.}

% ------------------------------------------------------------------------------
% 2.2
\subsection{Brief History of \pilotjob Systems}
\label{sec:histimpl}

The \pilot abstraction has a rich set of properties~\cite{luckow2012towards}
that have been progressively implemented into multiple \pilotjob systems.
Figure~\ref{fig:timeline} shows the introduction of \pilotjob systems over time
while Figure~\ref{fig:pilotjob_clustering} shows their clustering along the axes
of workload management and pilot functionalities. Initially, \pilotjob systems
implemented core functionalities to utilize resources independently from the
resource management of the remote high-performance machines. Subsequently,
these systems progressively evolved to include advanced capabilities like
workload and data management.

% Starting from a set of core functionalities focused on acquiring remote
% resources and utilizing them independently from the resource management of the
% remote high-performance machine, \pilotjob systems progressively evolved to
% include advanced capabilities like workload and data management.

% As seen in Ref.~\cite{luckow2012towards}, the \pilot abstraction has a rich
% set of properties and its implementations offer a vast array of capabilities
% including multiple scheduling algorithms, data and compute placeholders, and
% late or early binding. Nonetheless, the capability of acquiring remote
% resources and directly utilizing them, independently from the supercomputer
% resource management, is a necessary property of the \pilot abstraction. As
% such, resource placeholders and their
% scheduling~\cite{Pinchak02practicalheterogeneous} should be seen as early
% \pilot system implementations.

% The progressive definition and implementation of the \pilot abstraction can be
% seen as the process of evolving both the understanding and implementation
% complexity of the notion of resource placeholder.

\begin{figure}[t]
% Put real dates in the comment here.
% Boinc: X
% BigJob: 200X
% etc.
  \centering
    \includegraphics[width=0.45\textwidth]{figures/timeline}
    \caption{Introduction of systems over time. When available, the date of
      first mention in a publication or otherwise the release date of software
      implementation is used. \mtnote{Missing from from both Section 3 and 4:
        WISDOM} \jhanote{I think PANDA is too far left..I would say post-2005?}}
    \label{fig:timeline}
\end{figure}

%\footnote{http://wiki.nikhef.nl/biggrid/Using_the_Grid/ToPoS},

\begin{figure}[t]
  \centering
    \includegraphics[width=.45\textwidth]{figures/pilotjob-clustering.pdf}
    \caption{A partial clustering of pilots along functionality. \mtnote{The
        clustering is incomplete. Should we list all the \pilot systems we
        mention?} \jhanote{We should for now just mention that its partial. More
        importantly,  should we revisit the axis labels?} \note{this is confusing, since Pegasus uses Glide-In - Pegasus is not the pilot system, Gilde-In is, and it should only show up in one oval, ideally}}
  \label{fig:pilotjob_clustering}
\end{figure}

AppLeS~\cite{berman1996application} is a framework for application-level
scheduling and offers an example of an early implementation of resource
placeholders. AppLeS provides an agent that can be embedded into an application
thus enabling the application to acquire resources and to schedule tasks onto
these. Besides \note{in addition to} \MW, AppLeS also provides application templates, e.g., for
parameter sweep and moldable parallel applications~\cite{berman2003adaptive}.

AppLeS offered user-level control of scheduling but did not isolate the
application layer from the management and coordination of task execution. Any
change in the coordination mechanisms directly translated into a change of the
application code. The next evolutionary step was to create a dedicated
abstraction layer between those of the application and of the various batch
queuing systems available at remote systems.

Around the same time as AppLeS was introduced, volunteer computing projects
started using the \MW coordination pattern to achieve high-throughput
calculations for a wide range of scientific problems. The workers of these
systems could be downloaded and installed on the users workstation.
With an installation base distributed across the globe, workers pulled and
executed computation tasks when CPU cycles were available. \note{may also want to mention the Condor MW work here?}

The volunteer workers were essentially heterogeneous and dynamic as opposed to
the homogeneous and static AppLeS workers. Farming out tasks in a dynamic
distributed environment including personal computers promised to lower the
complexity of designing and implementing distributed applications. Each
volunteer worker behaves as an opportunistic resource placeholder and, as such,
implements the core functionality of the \pilot abstraction.

The first public volunteer computing projects were The Great Internet Mersenne
Prime Search effort\cite{woltman2004great}, shortly followed by
distributed.net~\cite{lawton2000distributed} in 1997 to compete in the
RC5-56 secret-key challenge, and the SETI@Home project, which set out to
analyze radio telescope data. The generic BOINC distributed master-worker
framework grew out of SETI@Home, becoming the {\it de facto} standard framework
for voluntary computing~\cite{anderson2004boinc}. \note{aren't there a number of other systems that are in common use too?}

It should be noted that the process of resource acquisition is different in
AppLes and volunteer computing. The former has prior knowledge of the available
resources while the latter has none. As a consequence, AppLes can request and
orchestrate a set of resources, allocate tasks in advance to specific workers
(i.e., resources placeholders), and implement load balancing among resources. In
voluntary computing tasks are pulled by the clients when they become active so
specific resource availability is unknown in advance. This is a potential
drawback but it is mitigated by the redundancy offered by the large scale that
voluntary computing can reach thanks to its simpler model of worker distribution
and installation. \note{is this just push vs pull?  if so, why not say so?}

% The opportunistic use of geographically distributed resources championed by
% voluntary computing offers several advantages. The resource landscape
% available for scientific research is fragmented across multiple institutions,
% managed with different policies and protocols, and heterogeneous both in
% quantity and quality. Once aggregated, the sum of otherwise limited resources
% can support very large distributed computations and a great amount of
% multi-tenancy. Note that given the required capabilities, this model of
% resource provisioning can still support the execution of parallel applications
% on the few resources that offer low-latency network interconnect.\msnote{This
% paragraph is a good candidate for removal?}

% \jhanote{is ``batch'' redundant?} \mtnote{Probably
% (based on: http://research.cs.wisc.edu/htcondor/doc/condor-practice.pdf). I
% changed system to framework as system is used differently in the following
% sentence.}

HTCondor (formerly known as Condor) is a high-throughput distributed computing
framework that uses diverse and possibly geographically distributed
resources~\cite{thain2005}. Originally, HTCondor was created for systems within
one administrative domain but Flocking~\cite{epema1996worldwide} made it
possible to group multiple machines into aggregated resource pools. However,
resource management required system level software configurations that had to be
made \note{not sure configurations are 'made'} by the administrator of each individual machine of each resource pool.

% \jhanote{resource management could not be done on application level'' does not
%   make sense to me.  Are we referring to aggregation?}\mtnote{Better?}

This limitation was overcome by integrating a resource placeholder mechanism
within the HTCondor system. Gli\-de\-in~\cite{frey2002condorG} allowed users to
add grid resources to resource pools. In this way, users could uniformly execute
jobs on heterogeneous resource pools. Thanks to its use of resource
placeholders, Glidein was one of the systems pioneering the implementation of
the \pilot abstraction, enabling some \pilot capabilities also for third-party
systems like Bosco~\cite{weitzel2012campus}.

% \jhanote{Also there is a bit of care needed: we're implying glide-in is a
% resource placeholder -- which is part of a pilot, and not necessarily a full
% pilot.} \mtnote{We do not use \pilotjob system so I do not think we are
% implying that Glidein is a ``full pilot'' (even if I am not so sure what
% exactly a full pilot is). I slightly edited the sentence, any better?}

% \jhanote{as this is historical evolution, some parts need to be in the past
% tense. Care will be needed to get the tense right.} \mtnote{Better?}

The success of Glidein shows the relevance of the pilot abstraction to enable
scientific computation at scale and on heterogeneous resources. The
implementation of Glidein also highlighted at least two limitations: user/system
layer isolation, and application development model. While Glidein allows for the
user to manage resource placeholders directly, daemons must still be running on
the remote machines. This means that Glidein cannot be deployed without
involving the machine owners and system administrators. Implemented as a
service, Glidein supports integration with distributed application frameworks
but does not programmatically support the development of distributed
applications by means of dedicated APIs and libraries.

Concomitant and correlated with developments at LHC there was a ``Cambrian
Explosion'' of \pilotjob systems in the first decade of the millennium, e.\,g.\
DIANE~\cite{moscicki2003diane}, GlideinWMS, DIRAC~\cite{casajus2010dirac},
\panda~\cite{zhao2011panda}, AliEn~\cite{saiz2003alien}, and
Co-Pilot~\cite{buncicco2011co}.  Each of these \pilotjobs serves a specific user
community and experiment at the LHC: DIRAC~\cite{casajus2010dirac} was developed
by the LHCb experiment~\cite{lhcb_url}; AliEn~\cite{saiz2003alien}  by the ALICE
experiment; and \panda (Production and Distributed
Analysis)~\cite{zhao2011panda} by the ATLAS experiment~\cite{aad2008atlas}. Due
to socio-technical reasons, the CMS experiment at LHC mostly converged around
the HTCondor-Glidein-GlideinWMS~\cite{sfiligoi2008glideinwms} ecosystem.

Interestingly, these systems are functionally very similar, work on almost the
same underlying infrastructure, and serve applications with very similar (if not
identical) characteristics. Unsurprisingly,
Co-Pilot~\cite{buncicco2011co,harutyunyan2012cernvm}, another \pilotjob system
developed in the LHC context, promotes interoperability by integrating
grid-based \pilotjob systems (such as AliEn and \panda) with cloud and volunteer
computing resources.

% The BigJob \pilotjob system~\cite{luckow2010} was designed to address these
% limitations, to broaden the type of applications supported by the pilot-based
% execution model, and to extend the \pilot abstraction beyond the boundaries of
% compute tasks.

\pilotjob systems were developed alongside those tailored to the LHC
experiments to serve other research purposes, to target diverse types of
resources and middleware, or as special-purpose subsystems and frameworks. \note{awk}

The BigJob \pilotjob system~\cite{luckow2010} was designed to support task-level
parallelism on distributed HPC resources, to broaden the type of applications
supported by the pilot-based execution model, and, ultimately, to extend the
\pilot abstraction beyond the boundaries of compute tasks. BigJob offers
application-level programmability to provide the end-user with more flexibility
and control over the design of distributed application and the isolation of the
management of their execution. BigJob uses an interoperability library called
``SAGA'' (Simple API for Grid Applications) to work on a variety of
infrastructures~\cite{merzky2015saga,goodale2006,luckow2010}.

% Additionally, BigJob has also been extended to work with data
% and, analogous to compute pilots, to abstract away direct user communication
% between different storage systems.

% was recently re-implemented as a production-level tool named
% ``RADICAL-Pilot''~\cite{radical_pilot_paper}. rep- resents one of the latest
% evolutionary stages of the Pilot ab- straction. from an initial phase in which

% \msnote{The latter brings it back to the stage of apples, thats probably not
%   what we want to say ...}  \mtnote{Apologies, I am not sure I understand this
%   comment.}

BigJob has recently been re-implemented as a production-level tool named
`RADICAL-Pilot'~\cite{merzky2015radical}. BigJob and now RADICAL-Pilot represent
an evolution of the \pilot abstraction: initially pilots were implemented as
\textit{ad hoc} place holder machinery for a specific application but evolved to
be integrated with the middleware of remote resources. Both BigJob and
RADICAL-Pilot implement the \pilot abstraction as an interoperable compute and
data management system that can be programmatically integrated into end-user
applications and thus provides both features.

% Another ongoing evolutionary trend has been to implement the \pilot
% abstractions into pilot-based workload managers, thus moving away from
% providing simple pilot capabilities in application space. These higher-level
% systems which are often centrally hosted, move critical functionality from the
% client to the server (i.e., a service model). These systems usually deploy
% pilot factories that automatically start new pilots on demand and integrate
% security mechanisms to support multiple users simultaneously.

% Several \pilotjob systems have been developed in the context of the LHC
% experiment at CERN, which is associated with a major increase in the uptake
% and availability of pilots, e.\,g.\ DIANE~\cite{moscicki2003diane},
% GlideinWMS, DIRAC~\cite{casajus2010dirac}, \panda~\cite{zhao2011panda},
% AliEn~\cite{saiz2003alien}, and Co-Pilot~\cite{buncicco2011co}. Each of these
% \pilotjob systems serves a particular user community and experiment.
% Interestingly, these systems are functionally very similar, work on almost the
% same underlying infrastructure, and serve applications with very similar (if
% not identical) characteristics.

% Co-Pilot provides components for building a framework for seamless and
% transparent integration of these resources into existing grid and batch
% computing infrastructures exploited by the High Energy Physics (HEP)
% community.

% The \pilot abstraction has also been integrated into scientific workflow
% systems.

GWPilot is a \pilot system defined to push the boundaries of implementation
efficiency~\cite{rubio2015gwpilot}. Aimed specifically to DCR exposing diverse
Grid middleware, GWPilot builds upon the GridWay
meta-scheduler~\cite{huedo2007modular} to allow the implementation of efficient
and reliable scheduling algorithms. Scheduling can be customized at user level
and the application level is well isolated from the \pilot system level.

\pilotjob systems have also proven an effective tool for managing the workloads
executed in the various stages of a scientific workflow. For example, the Corral
system~\cite{rynge2011experiences} has been developed to serve as a frontend to
HTCondor Glidein and to optimize glides (i.e., pilots) placement for the Pegasus
workflow system~\cite{deelman2015}. In contrast to GlideinWMS, Corral provides
more explicit control over the placement and start of pilots to the end-user.
Corral was later extended to serve also as a possible front end to GlideinWMS.

Swift~\cite{wilde2011swift} is a scripting language designed for expressing
abstract workflows and computations. The language also provides capabilities for
executing external application as well as the implicit management of data flows
between application tasks. Swift uses a \pilot implementation called ``Coaster
System''~\cite{coasters_url} \note{should cite hategan UCC 2011 paper elsewhere cited, rather than the URL I think} that supports various types of infrastructure,
including clouds and grids.

Swift has also been used in conjunction with Falkon~\cite{raicu2007falkon}.
Falkon was engineered for executing many small tasks on High Performance
Computing (HPC) systems and shows high performance compared to the native
queuing systems. Falkon is a paradigmatic example of how the \pilot abstraction
has been implemented to support specific workloads alongside investigating their
performance. Even if Falkon is now unmaintained, the insight gained by its
development has been used to improve the Coaster System.

% The proliferation of \pilotjob systems and their integration within other type
% of application and middleware systems,

% to support the execution of distributed and, increasingly, of parallel
% applications.

The brief description of the many \pilotjob system implementations introduced in
this section underlines a progressive appreciation for the \pilot abstraction
and the emergence of a \pilot paradigm. Nonetheless, the proliferation of
\pilotjob systems has been uncoordinated, developing across multiple dimensions
(see Figure~\ref{fig:pilotjob_clustering}), and making it difficult to
coherently understand the \pilot components, their functionalities,
implementations, and usages. % This hinders attempts at distinguishing \pilotjob
% system functionalities from those of other middleware and at appreciating the
% distinguishing characteristics of the \pilot paradigm.
The evolution of \pilots attests to their usefulness across a wide range of
deployment environments and application scenarios, but the divergence in
specific functionality and inconsistent terminology calls for a standard
vocabulary to assist in understanding the varied approaches and their
commonalities and differences. This is the primary motivation of the next
section.


% Some distinctions in terms of design, usage, and operation modes can be
% identified. Figure~\ref{fig:pilotjob_clustering} is a graphical representation
% of this clustering.

% The evolution of the \pilot paradigm and proliferation of systems has been
% uncoordinated, leading to an inconsistent terminology related to the \pilot
% abstraction, its implementations and usage. A coherent understanding of the
% \pilot components and functionalities is still missing, thus hindering attempts
% at distinguishing it from other functionality and middleware systems.

%leading to a blurred definition of \pilot abstraction and how it should be


%------------------------------------------------------------------------------
% SECTION 3
%------------------------------------------------------------------------------

\newcommand{\vocab}[1]{\textbf{#1}\xspace}
\newcommand{\prop}[1]{\textit{#1}\xspace}
\newcommand{\impterm}[1]{\texttt{#1}\xspace}

\section{Understanding the Landscape: Developing a Vocabulary}
\label{sec:understanding}

\note{don't need connecting text at the end of the previous section and the start of this one - just pick one, please}

The overview presented in \S\ref{sec:history} shows a degree of heterogeneity
both in the functionalities and the vocabulary adopted by different \pilotjob
systems. Implementation details sometimes hide the functional commonalities and
differences among \pilotjob systems. Features and capabilities tend to be named
inconsistently, often with the same terms referring to multiple concepts or the
same concept named in different ways.

This section offers a description of the logical components and functionalities
shared by every \pilotjob system and the definition of a consistent terminology.
The goal is to offer a paradigmatic description of a \pilotjob system and a
well-defined vocabulary to reason about such a description and, eventually,
about its multiple implementations.

\note{didn't some of this get presented in an eScience or HPDC paper in Delft?  If so, is that paper cited here?}

% \jhanote{would ``description'' be better than ``analysis'' in the first
%   sentence?} \mtnote{Done.}

%------------------------------------------------------------------------------
% 3.1
\subsection{Logical Components and Functionalities}
\label{sec:compsandfuncs}

All \pilotjob systems introduced in~\S\ref{sec:history} are engineered to allow
for the execution of multiple types of workloads on machines with diverse
middleware, e.g., grid, cloud, or HPC. \note{maybe make the point that different \pilotjob systems are optimized for different things - you seem to hint at it, but don't quite say it} This is achieved in many ways, depending
on use cases, design and implementation choices, and on the constraints imposed
by the middleware and policies of the targeted machines. The common denominators
among \pilotjob systems are defined along three dimensions: purpose, logical
components, and functionalities.

The purpose shared by every \pilotjob system is to improve workload execution
when compared to executing the same workload directly on one or more machines. \note{still unhappy with `machines'}
Performance of workload execution is usually measured by throughput and time to
completion, but other metrics could also be considered: data transfer time,
scale of the workload executed, power consumption, or a mix of them. Metrics
that are not related to performance include reliability, ease of application
deployment, and generality of workload. In order to achieve the required metrics
under given constraints, each \pilotjob system exhibits characteristics that are
either common or specific to one or more implementations. Discerning these
characteristics requires isolating the minimal set of logical components that
characterize every \pilotjob system.

At some level, all \pilotjob systems employ three separate but coordinated
logical components: \note{maybe say that they have these functions, rather than these components?} a \vocab{Pilot Manager}, a \vocab{Workload Manager}, and a
\vocab{Task Manager}. The Pilot Manager handles the description, instantiation,
and use of one or more resource placeholders (i.e., pilots) on single or
multiple machines. The Workload Manager handles the scheduling of one or more
workloads on the available resource placeholders. The Task Manager takes care of
executing the tasks of each workload by means of the resources held by the
placeholders.

The implementation details of these three logical components vary significantly
across \pilotjob systems (see~\S\ref{sec:analysis}). For example, two or more
logical components \note{again, functions would be better here} may be implemented by a single software module or additional
functionalities may be integrated into the three management components.
Nevertheless, the Pilot, Workload, and Task Managers can always be
distinguished across different \pilotjob systems.

% One or more logical components may be responsible for specific
% functionalities, both on application as well as machine level;

% \jhanote{Should we use ``execution of tasks'' in opening sentence. We talk
%   about executing tasks before and after, and not necessarily workloads.
%   Issue of consistency and granularity and not of correctness.} \mtnote{I
%   reread from the beginning and I think we use workload and task consistently:
%   ``All \pilotjob systems introduced in~\S\ref{sec:history} are engineered to
%   allow for the execution of multiple types of workloads'', ``The purpose
%   shared by every
%   \pilotjob system is to improve workload execution'', ``The Workload Manager
%   handles the scheduling of one or more workloads'', ``The Task Manager takes
%   care of executing the tasks of each workload''}

Each \pilotjob system supports a minimal set of functionalities \note{probably functionality - be careful with functionalities, as it's usually wrong} that allow for
the execution of workloads: \vocab{Pilot Provisioning}, \vocab{Task
Dispatching}, and \vocab{Task Execution}. \pilotjob systems need to schedule
resource placeholders on the target machines, schedule tasks on the available
placeholders, and then \note{then might not be correct - in fact, many times this happens before the tasks are scheduled to the placeholders} \note{and why do you use placeholders rather than saying pilot jobs?}  use these placeholders to execute the tasks of the given
workload. More functionalities might be needed to implement a production-grade
\pilotjob system. For example, authentication, authorization, accounting, data
management, fault-tolerance, or load-balancing. While these functionalities may
be critical implementation details, they depend on the specific characteristics
of the given use cases, workloads, or targeted resources. As such, these
functionalities should not be considered necessary characteristics of a
\pilotjob system.

Among the core functionalities that characterize every \pilotjob system, Pilot
Provisioning is essential because it allows for the creation of resource
placeholders. \note{for coasters, this is very ad hoc compared with other functions} As seen in~\S\ref{sec:history}, this type of placeholder enables
tasks to utilize resources without directly depending on the capabilities
exposed by the target machines. Resource placeholders are scheduled onto target
machines by means of dedicated capabilities, but once scheduled and then
executed, these placeholders make their resources directly available for the
execution of the tasks of a workload.

% \jhanote{possibly use ``resources'' in lieu of remote machines?} \mtnote{Would
%   that overload the term with two separate meanings: resource as what is held
%   and resource as DCR?}\jhanote{I removed ``remote'', retained machine.  I
%   want to avoid implying placeholders have to be distributed from the point of
%   submission.} \mtnote{Great.}

% \jhanote{We may want to introduce the vocabulary of resource/DCR/DCI that we
%   developed for the proposals here, and make it consistent throughout the
%   paper. In this para for example we use the term DCI resource, which is
%   inconsistent with developed vocabulary} \mtnote{If we want the definitions
%   here, then we should probably move all of them before this subsection. Do
%   you want me to do it? Meanwhile, I rephrased the whole subsection avoiding
%   DCI altogether and added the definition of DCR to the next subsection.}

% MS: I would move this comment to section 5 I think, as multi-tenant pilot
% systems do have to make these trade-offs, and it would be good to point that
% out. (not doing it now because of MT lock in 5) Furthermore, resource
% placeholders are logical partitions of resources that do not need to leverage
% trade-offs among competing user requirements as needed instead with large
% pools of resources adopting multi-tenancy.

The provisioning of resource placeholders depends on the capabilities exposed by
the middleware of the targeted machine and on the implementation of each \pilot
system. Typically, on middleware for resources adopting queues, \note{adopting is a funny word here - either the resources has queues or it doesn't} batch systems,
and schedulers, provisioning a placeholder involves it being submitted as a
job. For such middleware, a job is a type of logical container that includes
configuration and execution parameters alongside information on the application
to be executed on the machine's compute resources. Conversely, for machines
without a job-based middleware, a resource placeholder would be executed by
means of other types of logical container as, for example, a virtual machine or
a Docker Engine~\cite{bernstein2014,felter2014}.

% \mtnote{Too many execut*. Should we use ``code'' instead of `executable'? Any
%   better option than ``code'' to replace `executable'?} \jhanote{used
%   application, but task might be better? code is acceptable too.}

Once resource placeholders are bound to the resources of a machine, tasks need
to be dispatched to those placeholders for execution. Task dispatching does not
depend on the functionalities of the targeted machine's middleware so it can be
implemented as part of the \pilotjob system. In this way, the control over the
execution of a workload is shifted from the machine's middleware to the \pilot
system. This shift is a defining characteristic of the \pilot paradigm, as it
decouples the execution of a workload from the need to submit its tasks via the
machine's scheduler. For example, the execution of individual tasks of a
workload will not depend upon the specifics of the targeted machine's state or
availability, but rather on those of the placeholder. More elaborate execution
patterns involving task and data dependences can thus be implemented independent
of the capabilities and constraints of the target machine's middleware.
Ultimately, this is how \pilotjob systems allow for the direct control of
workload execution and the optimization, for example, of execution throughput.

% For example, the tasks of a workload will not individually have to wait on the
% targeted machine's queues, but rather on the availability of the placeholder
% before being executed.

% \jhanote{Matteo: please confirm you are OK with the suggested change.}
%   \mtnote{I like the generality of the new version. Would this improve
%   readability: ``For example, the execution of tasks of a workload will not
%   individually depend upon the specifics of the targeted machine's state or
%   availability, but rather on those of the placeholder.''?}

% \jhanote{I'd like to use a more general term than ``machine's scheduler'' when
%   talking about the defining characteristic. It weakens the punch of an
%   impactful and important sentence.}  \mtnote{Any idea about what term would
%   you like to use?}  \jhanote{Also in the subsequent sentence, ``The tasks of
%   a workload..''. I've introduced a for example to help give a specific
%   instance of the general paradigm.} \mtnote{OK.}

Communication and coordination are two distinguishing characteristics of
distributed systems, and \pilotjob systems are no exception. \note{but pilot job systems don't have to be used on distributed systems} The three logical
components -- Workload Manager, Pilot Manager, and Task Manager -- need to
communicate in order to coordinate the execution of the given workload.
Nonetheless, \pilotjob systems are not defined by any specific communication and
coordination pattern as the logical components of a \pilotjob system may
communicate or coordinate using any suitable
pattern~\ref{comm_patterns,omicini2013coordination}. The same applies to network
architectures and protocols: different network architectures and protocols may
be used to achieve effective communication and coordination.

% \jhanote{is this a reference to coordination or communication? unclear}
% \mtnote{`communication and coordination' so I would say to both.}

As seen in~\S\ref{sec:history}, \MW is a very common coordination pattern among
\pilotjob systems. When the master is identified with the Workload Manager, and
the worker with the Task Manager, the functionalities related to task
description, scheduling, and monitoring will generally be implemented within the
Workload Manager, while the functionalities needed to execute each task will be
implemented within the Task Manager. Alternative coordination patterns, for
example, where a Task Manager directly coordinates the task scheduling, might
require a functionally simpler Workload Manager but a comparatively more
feature-rich Task Manager. The former would require capabilities for submitting
tasks, while the latter capabilities to coordinate with its neighbor executors
leveraging, for example, a dedicated overlay network. While these systems would
adopt different coordination patterns, they could both be considered \pilotjob
systems.

\mtnote{Across the paper, \MW is called pattern, mechanism, and strategy. We
need to make up our mind. I am happy to consider it a coordination pattern.}

Data management can play an important role within a \pilotjob system. For
example, functionalities can be provided to support the local or remote staging
of data required to execute the tasks of a workload, or data might be managed
according to the specific capabilities offered by the targeted machine's
middleware. How these requirements are implemented does not define a core
functionality of the \pilot system. Being able to read and write files to a
local filesystem should then be considered the minimal capability related to
data required by a \pilotjob system. More advanced and specific data
capabilities like, for example, data replication, (concurrent) data transfers,
data abstractions other than files and directories, or data placeholders should
be considered special-purpose capabilities, not characteristic of every
\pilotjob system.

% For example, \pilotjob systems can be devised in which tasks do not require
% any data management because they: (i) do not necessitate input files; (ii) do
% not produce output files; (iii) data is already locally available; or (iv)
% data management is outsourced to third-party systems.

% \jhanote{Please check my edit/deletion. I did not think the statement and
%   listing of ``tasks not requiring data management helped in anyway''}
% \mtnote{That (over)specification was due to the endless debate about the idea
%   that data management is not a necessary functionality of \pilot systems. If
%   we think that we drive the point clearly enough then I would eliminate the
%   statement and listing. On the other hand, if we think this could be a point
%   of ``conceptual friction'' for the readers, I would keep the point
%   over-specified.}\jhanote{OK, lets keep this open for the moment and decide
%   in the next iteration}

In the following subsection, a minimal set of terms related to the logical
components and capabilities described so far is defined. \note{again, I don't understand why you talk about the next section in this section - why not just say this in the next section?}

%------------------------------------------------------------------------------
% 3.2
\subsection{Terms and Definitions}
\label{sec:termsdefs}

% \jhanote{Is the second sentence OK?} \mtnote{Superfluous? I commented it out
% but happy to bring it back modified if you think it is needed.} It is the case
% that \pilotjob systems are commonly referred to as `\pilotjob systems', a
% clear indication of the primary role played by the concepts of ``pilot'' and
% ``job'' in this type of system.

The terms ``pilot'' and ``job'' are arguably among the most relevant when
referring to \pilotjob systems. The definition of both concepts is
context-dependent and several other terms need to be clarified in order to offer
a coherent terminology. Both ``pilot'' and ``job'' need to be understood in the
context of machines and middleware used by \pilotjob systems. These machines
offer compute, storage, and network resources and \pilots allow for the
utilization of those resources to execute the tasks of one or more workloads.

\begin{description}

\item[Task.] A container \note{using container here is sure to confuse} for operations to be performed on a computing platform,
alongside a description of the properties and dependences of those operations,
and indications on how they should be executed and satisfied. Implementations of
a task may include wrappers, scripts, or applications.

\item[Workload.] A set of tasks, possibly related by a set of arbitrarily
complex relations. For example, relations may involve tasks, data, or runtime
communication requirements.

\item[Resource.] Finite, typed, and physical quantity \note{i don't really think a resource is a quantity - it may be a description of a type of system plus a quantity} utilized when executing
the tasks of a workload. Compute cores, data storage space, or network bandwidth
between a source and a destination are all examples of resources commonly
utilized when executing workloads.

\item[Distributed Computing Resource (DCR).] A machine characterized by a tuple:
\{a set of possibly heterogeneous resources, a middleware, and an administrative
domain\}.

% \mtnote{I used ``set of machines'' because from now on we use DCR instead of
% machine and I think we need to link explicitly the two.  Would you agree?}
% \jhanote{would it then not be Distributed Computing Resources?}

\end{description}

\mtnote{Here we can add the description of workloads across the three axes. This
  would support the workload semantic core property in \S4.}

Workloads are characterized by multiple tasks. These can be homogeneous,
heterogeneous, or one-of-a-kind but an established and encompassing taxonomy for
workload description is not available. We propose a taxonomy based upon the
orthogonal properties of coupling, dependency, and similarity of tasks.

Workloads comprised of tasks that are independent and effectively
indistinguishable from other tasks are commonly referred to as a Bag-of-Tasks
(BoT)~\cite{da2003trading,cirne2003running}. Ensembles (ENS) \note{here come the acronyms that aren't needed...  I suggest removing ENS, C-ENS, and WF - maybe DCR too} represent workloads
when the collective outcome of the tasks is relevant (e.g., computing the
average property)~\cite{raicu2008many}. The tasks that comprise the workload in
turn can have varying degrees and types of coupling; coupled tasks might have
global (synchronous) or local exchanges (asynchronous), regular or irregular
communication. We categorize such workloads as coupled ensembles (C-ENS)
independent of the specific details of the coupling between the tasks. A
workflow (WF) represents a workload with arbitrarily complex relationships among
the tasks, ranging from dependencies (e.g., sequential or data) to coupling
between the tasks (e.g., frequency or volume of exchange)~\cite{taylor2014}.

% The tasks that comprise the ensembles (i.e., ensemble-members) in
% turn can have varying degrees and types of coupling; coupled ensembles might
% have global (synchronous) or local exchanges (asynchronous), regular or
% irregular communication.

\note{maybe move the resource and DCR definitions here?} A cluster is a typical example of a DCR: it offers sets of compute and data
resources; it deploys a middleware as, for example, the Torque batch system, the
Globus grid middleware, or the OpenStack cloud platform; and enforces policies
of an administrative domain like XSEDE, OSG, CERN, NERSC, or a University.\note{why is a cluster a DCR and not just a resource?}

As seen in~\S\ref{sec:history}, most of the DCRs used by \pilotjob systems
utilize ``queues'', ``batch systems'', and ``schedulers''. In such DCRs, jobs
are scheduled and then executed by a batch system.

\begin{description}

\item[Job.] Functionally defined as a ``task'' from the perspective of the DCR,
but in the case of a \pilotjob system indicative of the type of container
required to acquire resources on a specific infrastructure. \note{very unclear}

\end{description}

When considering \pilotjob systems, jobs and tasks are functionally analogous
but qualitatively different. Functionally, both jobs and tasks are containers --
i.e. metadata wrappers around one or more executables often called ``kernel'', \note{kernel probably isn't a good choice here - app and script are fine, maybe function or component too?}
``application'', or ``script''. Qualitatively, the term ``task'' is used when
reasoning about workloads, while ``job'' is used in relation to a specific type
of DCR middleware where such a container is submitted. Accordingly, tasks are
considered as the functional units of a workload, while jobs as a way to
schedule one or more tasks on a DCR with a specific middleware. It should be
noted that, given their functional equivalence, the two terms can be adopted
interchangeably when considered outside the context of \pilotjob systems.
Indeed, workloads are encoded into jobs when they have to be directly executed
on DCRs that support or require that type of container. \note{not sure this is any more clear than in the definition by itself}

% \jhanote{minor nitpic: are we assuming 1 task per job? I don't think we
%   should..} \mtnote{I do not think we are assuming a 1:1 relation as we use
%   `tasks'. Should we make it explicit?}\jhanote{Yes, if only to say there is
%   no relationship} \mtnote{Better?}

% \jhanote{maybe workload in lieu of container in the previous
%   sentence?}\mtnote{The idea is that job is dependent from a middleware,
%   specifically a HPC-based one. For example, we do not submit a ``job'' to a
%   cloud middleware. I replaced ``executed'' by `submitted'.  Any better?}

% \jhanote{We start with jobs and tasks, but it gets a bit tricky to navigate
%   this past paragraph, because workloads, tasks and jobs are discussed. Would
%   it be better if the mapping in this paragraph was between tasks and jobs
%   only? or just workloads and jobs?} \mtnote{I do not think I would be able to
%   remove workload from the previous paragraph as it is used to distinguish
%   qualitatively jobs from tasks. Maybe the distinction is not
%   convincing?}\jhanote{it reads ok now. simplifying made it smoother}

As described in~\S\ref{sec:compsandfuncs}, a resource placeholder needs to be
submitted to a DCR wrapped in the type of container supported by the middleware
of that specific DCR. \note{unclear} For this reason, the capabilities exposed by the job
submission system of the target DCR determine the submission process of resource
placeholders and its specifics. For example, when wrapped within a ``job'',
placeholders are provisioned by submitting a job to the DCR queuing system, and
become available only once the job has been scheduled, and only for the duration
of the job lifetime.

% \jhanote{changed DCI to DCR. OK?}\jhanote{not sure what ``out of the queue''
%   means? maybe say, ``.. become available when the job is active, and is
%   available...''} \mtnote{Better? I used `once... has become' to indicate that
%   the job sits within the queue for a certain amount of time.}

% For example, for a DCR exposing a HPC or grid
% middleware~\cite{HPC_grid_middleware}, a resource placeholder needs to be
% wrapped within a `job'. For other type of middleware, the same resource
% placeholder will need to be wrapped within a different type of container as,
% for example, a VM or a Docker Engine.

\note{why is this before the definition that comes next?} A \pilot is a resource placeholder. As such, a pilot holds portion of a DCR's
resources for a user or a group of users, depending on implementation details. A
\pilotjob system is a software capable of creating pilots so as to gain
exclusive control over a set of resources on one or more DCRs and then to
execute the tasks of one or more workloads on those pilots.

% \jhanote{also I would not use the word exclusive} \mtnote{Why?}\jhanote{In our
%   definition of Resource earlier in this section, we define it as a finite
%   physical quantity utilized. As we do not claim that the resource is logical
%   we should not claim that the resource is exclusively available}\mtnote{Thank
%   you. I think exclusive applies because once the quantity of resource is held
%   by the pilot, it can be used only by the pilot. For example, it is not
%   shared with the DCR scheduler as other jobs of the DCR cannot use those
%   resources.}\jhanote{I will let it go, but the point about ``resource'' not
%   being a logical entity remains important. If it is not logical, it is not
%   possible to say that another job is not using that resource or that resource
%   is not doubly provisioned.}

\begin{description}

\item[Pilot.] A container (e.g., a ``job'') that functions as a resource
placeholder on a given infrastructure and is capable of executing tasks of a
workload on that resource.

\end{description}

It should be noted that the term ``pilot'' as defined here is named differently
across \pilotjob systems. Depending upon context, in addition to the term
``placeholder'', a pilot may also be named ``agent'' or
``\pilotjob''~\cite{moscicki2011,pinchak2002}. All these terms are, in practice,
used as synonyms without properly distinguishing between the type of container
and the type of executable that compose a pilot.

% This is a clear indication of the necessity of a consistent vocabulary when
% reasoning analytically about multiple \pilotjob system implementations.

% indication of how necessary the minimal and consistent vocabulary offered here
% is when reasoning analytically about multiple \pilotjob system
% implementations.

Until now, the term \pilotjob system has been used to indicate those systems
capable of executing workloads on pilots. From now on, the term ``\pilot
system'' will be used instead, as the term ``job'' in ``\pilotjob'' identifies
just the way in which a pilot is provisioned on a DCR exposing specific
middleware. The use of the term ``\pilotjob system'' should therefore be
regarded as a historical artifact, viz., \note{viz is not standard in American scientific writing} the targeting of a specific class of
DCRs in which the term ``job'' was, and still is, meaningful. With the
development of DCR middleware based on new abstractions as, for example, that of
virtual machine, the term ``job'' has become too restrictive, a situation that
can lead to terminological and conceptual confusion.

% ; it is not a general property of all the \pilotjob systems.

% \mtnote{nitpick: ``those based on virtual machines'' does not exclude jobs and
%   schedulers. For example, Nimbus or the CNAF ``cloud'' use queues and jobs to
%   instantiate VMs.} \jhanote{Agreed. Please suggest alternative}
%   \mtnote{Better?}

% \jhanote{I removed VMs to be a bit more specific}

% \jhanote{even by my standards of repeating the main points, i think we are
% painfully belabouring the point about ``jobs'' in pilot-jobs.} \mtnote{Maybe
% so but I do not see where we explicitly say this before. I have eliminated an
% example in which we were repeating the concept of job as a type of container.
% Here we make the case of a terminological confusion. Any better?}

We have now defined resources, DCRs, and pilots. We have established that a
pilot is a placeholder for a set of resources. When combined, the resources of
multiple pilots form a resource overlay. The pilots of a resource overlay can
potentially be distributed over multiple and diverse DCRs.

% \mtnote{We may want to use resource overlay in Section 4.3 when we describe
%   multi-DCR, multi-pilot scenarios.}
% \jhanote{Matteo to add a hook to resource overlay the first time the
%   multi-pilot, multi-DCR are used downstream.}
%   \mtnote{Done.}

% The details of this aggregation or federation is outside the scope of this
% paper.

\begin{description}
\item[Resource Overlay.] The aggregated set of resources of multiple pilots possibly instantiated on diverse DCRs.
\end{description}

As seen in~\S\ref{sec:histabstr}, three more terms associated with \pilot
systems need to be explicitly defined: ``Multi-level scheduling'', ``early
binding'', and ``late binding''.

% \jhanote{should we use lower case for second instance of ``pilots'' ? This
%   would make it consistent with job, task, workload as these are now simply
%   similar elements of our vocabulary?} \mtnote{This has opened the proverbial
%   can of warms. Ongoing but will need revision.}

% \jhanote{In case it suggests a change in the wording of opening sentence (and
%   as discussed after the definition of multi-level scheduling downstream in
%   this subsection): the same entity can be scheduled at multiple levels,
%   different entities can be scheduled at the same or different levels. These
%   have all historically be clubbed as multi-level scheduling, i.e., there is
%   multi-level and dimensional scheduling. Not sure if want to make distinction
%   explicit here.} \mtnote{I am not sure I follow. We may want to discuss this
%   on a call/meeting.} \jhanote{As agreed, we will introduce 3-4 sentences for
%   multi-stage and multi-entity, we will eschew multi-level scheduling. MT to
%   refine/rewrite accordingly} \mtnote{Better?}

Pilot systems are said to implement multi-level scheduling because they require
the scheduling of two types of entities: pilots and
tasks~\cite{rubio2015gwpilot,de2014panda,balderrama2012scalable}. This
definition of ``multi-level scheduling'' is problematic because the term
``level'' is left unspecified.\note{but so is stage or entity}  It is not clear what constitutes a level or how
its boundaries should be assessed. ``Multi-entity'' and ``multi-stage'' are
better terms to describe the scheduling properties of \pilot systems, as these
terms specifically indicate that (at least) two entities are scheduled and that
such scheduling happens at separate moments in time.

In the \pilot systems, a portion of the resources of a DCR is allocated to one
or more pilots, and the tasks of a workload are dispatched for execution to
those pilots. As alluded to in the previous subsection, this is a fundamental
feature of \pilot systems as it leads to the following: (i) more flexible and
potentially reduced times to completion of workloads as a consequence of
avoiding a centralized job management system multiple times; and (ii) the tasks
of a workload can be bound to a set of pilots before or after it becomes
available on a remote resource. Depending on the implementation of a \pilot
system, finer-grained scheduling of tasks within the pilot's resource allocation
might be possible.

% can be further scheduled by deciding about their placement within
% the pilot's resource allocation.

\mtnote{OED reports subsection and not sub-section as the term currently in use.
Accordingly, I have done a global search and replace.}

% \jhanote{about point (i): I know we caveat the statement with ``potentially''
%   but maybe the focus should be on (a) greater control to the user without any
%   increase in burden/complexity or compromise in performance?}

% \jhanote{I don't think it is about simplification (that is doubtful). It is
%   about greater control as a consequence}

The greater control obtained as a consequence of removing the dependence of
every task on the job submission system of the DCR is one of the main reasons
for the success and early adoption of \pilot systems. In addition to possibly
increasing the throughput of the workload execution, enabling each task to be
executed on a pilot without waiting in the DCR's queue, it allows also the reuse
of active pilots to execute multiple workloads. How tasks are actually scheduled
to pilots is a matter of implementation. For example, a dedicated scheduler
could be adopted, or tasks might be directly scheduled to a pilot by the user.

% As mentioned in~\S\ref{sec:compsandfuncs}, The tasks of a workload can be
% executed on a pilot without each task individually waiting in the queuing
% system of the DCR's middleware.

% several advantages,

% The tasks of a workload can be executed on a pilot without each task
% individually waiting in the queuing system of the DCR's middleware. This
% greater control results in several advantages, including possibly increasing
% the throughput of the workload execution, and reusing active pilots to execute
% multiple workloads. How tasks are actually scheduled to pilots is a matter of
% implementation. For example, a dedicated scheduler could be adopted, or tasks
% might be directly scheduled to a pilot by the user.

The type of binding of tasks to pilots depends on the state of the pilot. A
pilot is inactive until it is executed on a DCR, is active thereafter, until it
completes or fails. Early binding indicates the binding of a task to an inactive
pilot; late binding the binding of a task to an active one. Early binding is
potentially useful to increase the information about which pilots can be
deployed: by knowing in advance the properties of the tasks that are bound to a
pilot, specific deployment decisions can be made for that pilot. Additionally,
in case of early binding, other type of decisions related to the workload could
be made, e.g., the transfer of data to a certain resource while the pilot is
still inactive. Late binding is instead critical to assure the aforementioned
high throughput of the distributed application by allowing sustained task
execution without additional queuing time or container instantiation time.

It should be noted that some aspects of early binding can also be achieved
without a \pilot system, but, importantly, \pilot systems permit both types of
binding, even within a single workload.

% \jhanote{Is both a reference to early and late binding?} \mtnote{I would think
%   so. Any better?  Should we just delete the whole three lines?}

% \msnote{I would think parts of this paragraph also have their place in S5 (if
% its not already there)}
% \mtnote{The notion of binding needs to be fully understood before getting into
% 4. I would leave this as it is here but I am open to discuss about it.}
% \msnote{Agreed, with 'also' I meant repeating, not moving}

\begin{description}

\item[Multi-entity and Multi-stage scheduling.] Scheduling pilots onto
resources, and scheduling tasks onto (active or inactive) pilots.

\item[Early binding.] Binding one or more tasks to an inactive pilot.

\item[Late binding.] Binding one or more tasks to an active pilot.

\end{description}

%\jhanote{everything after here till the end of section 3 needs work.}

% \definecolor{term}{RGB}{153,39,38}
% \definecolor{funct}{RGB}{0,128,64}
% \definecolor{lcomp}{RGB}{0,128,255}

% \begin{table*}
%  \centering
%  \begin{tabular}{|p{4cm}|p{3.2cm}|p{3.2cm}|}
%   \hline
%     \textbf{Term} &
%     \textbf{Functionality} &
%     \textbf{Logical Component} \\
%   \hline
%   \hline
%     \textcolor{term}{Workload} &
%     \textcolor{funct}{Task Dispatching} &
%     \textcolor{lcomp}{Workload Manager} \\
%   \hline
%     \textcolor{term}{Task} &
%     \textcolor{funct}{Task Dispatching} \newline
%       \textcolor{funct}{Task Execution} &
%     \textcolor{lcomp}{Workload Manager} \newline
%       \textcolor{lcomp}{Task Manager} \\
%   \hline
%     \textcolor{term}{Resource} &
%     \textcolor{funct}{Pilot Provisioning} &
%     \textcolor{lcomp}{Pilot Manager} \\
%   \hline
%     \textcolor{term}{Infrastructure} or \textcolor{term}{DCI} &
%     \textcolor{funct}{Pilot Provisioning} &
%     \textcolor{lcomp}{Pilot Manager} \\
%   \hline
%     \textcolor{term}{Job} &
%     \textcolor{funct}{Pilot Provisioning} &
%     \textcolor{lcomp}{Pilot Manager} \\
%   \hline
%     \textcolor{term}{Pilot} &
%     \textcolor{funct}{Pilot Provisioning} \newline
%       \textcolor{funct}{Task Execution} &
%     \textcolor{lcomp}{Pilot Manager} \newline
%       \textcolor{lcomp}{Task Manager} \\
%   \hline
%     \textcolor{term}{Multi-level scheduling} &
%     \textcolor{funct}{Pilot Provisioning} \newline
%       \textcolor{funct}{Task Dispatching} &
%     \textcolor{lcomp}{Pilot Manager} \newline
%       \textcolor{lcomp}{Workload Manager} \\
%   \hline
%     \textcolor{term}{Early binding} &
%     \textcolor{funct}{Task Dispatching} \newline
%       \textcolor{funct}{Pilot Provisioning} &
%     \textcolor{lcomp}{Workload Manager} \newline
%       \textcolor{lcomp}{Pilot Manager} \\
%   \hline
%     \textcolor{term}{Late binding} &
%     \textcolor{funct}{Task Dispatching} \newline
%       \textcolor{funct}{Pilot Provisioning} &
%     \textcolor{lcomp}{Workload Manager} \newline
%       \textcolor{lcomp}{Pilot Manager} \\
%   \hline
%  \end{tabular}
%  \caption{\textbf{Mapping of the core terminology of \pilot systems into the
%      functionalities and logical components described in
%      \S\ref{sec:compsandfuncs}.}
%    \mtnote{Done in the text. Is it enough?}\msnote{Even with the fancy colors
%      ;-) I still fail to see the value of this table. Given that I've become
%      more appreciative of Fig4, I'm tempted to say that the figure captures
%      the multi-dimensionality much better than the table. Will even volunteer
%      to make it happen.}\jhanote{this table should either get a major
%      attention or it should simply be removed. It is unhelpful at best}}
%  \label{table:terminology}
% \end{table*}

% Table~\ref{table:terminology} offers an overview of the defined minimal and
% consistent vocabulary. The terms are mapped into the logical components of a
% \pilot system based upon .... \jhanote{Matteo: please finish the sentence}.
% The terms are also mapped onto minimal set of functionalities as defined
% in~\S\ref{sec:compsandfuncs} based upon .... \jhanote{Matteo: please finish
% the sentence}.

Figure~\ref{fig:core_vocabulary} offers a diagrammatic overview of the logical
components of \pilot systems (green) alongside their functionalities (blue) and
the defined vocabulary (red). The figure is composed by three main blocks: the
one on the top-left corner represents the workload originator. The one starting
at the top-right and shaded in gray represents the \pilot system, while the four
boxes one inside the other on the middle-left portion of the figure represent a
DCR. Of the four boxes, the outmost denotes the DCR boundaries, e.g., a cluster.
The second box the container used to schedule a pilot on the DCR, e.g., a job or
a virtual machine. The third box represents the pilot once it has been
instantiated on the DCR, and the fourth box represents the resources held by the
pilot.

\begin{figure}[t]
    \centering
        \includegraphics[width=.48\textwidth]{figures/core_vocabulary.pdf}
    \caption{Diagrammatic representation of the logical components,
    functionalities, and core vocabulary of a \pilot system. The terms of the
    core vocabulary are highlighted in red, those of the logical components of a
    \pilot system in green, and those of their functionalities in blue.}
    \label{fig:core_vocabulary}
\end{figure}

Together, Figure~\ref{fig:core_vocabulary} can be interpreted as follows: an
application submits a workload composed of tasks to the \pilot system via an
interface (\ref{fig:core_vocabulary}, tag~a). The Pilot Manager is responsible
for pilot provisioning (\ref{fig:core_vocabulary}, tag~b), the Workload Manager
to dispatch tasks to the Task Manager (\ref{fig:core_vocabulary}, tag~c), the
Task Manager to execute those tasks once the pilot has become available
(\ref{fig:core_vocabulary}, tag~d). Figure~\ref{fig:core_vocabulary} shows not
only the separation between the DCR and the \pilot system, but also how the
resources on which tasks are executed are contained within different logical
and physical components. Appreciating the characteristics and functionalities of
a \pilot system depends upon understanding the levels at which each of its
component exposes capabilities.

% \jhanote{adequate paragraph. please don't say Figure 4(a) or Figure 4(b) as
%   that is canonically used for different sub-figures. maybe just (a) (or A, or
%   1)?}
%   \mtnote{Better?}

% \jhanote{``The same mapping'' is unclear.}
% \jhanote{Please describe figure 4 using at least 1 paragraph. Its a complex
%   figure and without a detailed description it is not going to help the
%   reader.}

Note how in Figure~\ref{fig:core_vocabulary} scheduling happens at the DCR
(tag~b), for example by means of a cluster scheduler, and then at the pilot
(tag~c). This illustrates what has been called here a multi-entity and
multi-stage scheduling, a couple of terms replacing the more common but less
precise ``multi-level scheduling''.  Figure~\ref{fig:core_vocabulary} depicts
the separation between scheduling at the pilot and scheduling at the workload
manager, highlighting the four entities involved: jobs on DCR middleware, and
tasks on pilots.

% \jhanote{are we also trying to say that multi-level scheduling is for
%   different ``entities'' at the different levels? either way, we may consider
%   making that explicit.}\mtnote{Better?}

Figure~\ref{fig:core_vocabulary} also helps to appreciate the critical
distinction between the container of a pilot and the pilot itself. A container,
for example a job, is used by the pilot manager to provision the pilot. Once the
pilot has been provisioned, it is the pilot and not the container that is
responsible of both holding a set of resources and offering the functionalities
of the task manager.

% \jhanote{How Table 1 helps is not at all clear/obvious. Hence I have removed a
%   reference to Table 1 in the opening of this sentence}.

Figure~\ref{fig:core_vocabulary} should not be confused with an architectural
diagram. No indications are given about the interfaces that should be used, how
the logical component should be mapped into software modules, or what type of
communication and coordination protocols should be adopted among such
components. This is why no distinction is made diagrammatically between, for
example, early and late binding.

Figure~\ref{fig:core_vocabulary} represents the architectural pattern for \pilot
systems~\cite{buschmann2007pattern}. The defining features of this type of
system are delineated in terms of components and functionalities that can be
implemented by multiple architectures leading to diverse \pilot systems
implementations. The wide spectrum of available implementations of the logical
components and functionalities of a \pilot system is explored in the next
section. \note{same problem - move intro of section into section, not before it}

% \jhanote{this is not needed here. we have discussed the difference between
%   early versus later before. by having it here we are losing the opportunity
%   to close out section 3 with the main lessons learnt} Their difference is
%   temporal and, as such, it can be highlighted only when describing the
%   succession of the states and operations of a \pilot system implementation.
%   In the early case, the binding of the tasks to a pilot will happen before
%   the submission of such a \pilot to the remote infrastructure. In the late
%   case, the binding will happen once the pilot has been instantiated and holds
%   already its resources.

% \mtnote{Lessons learned to be discussed before writing them down here.}
%   \jhanote{do you mean ``here''?} \mtnote{As in: at this point of the
%   subsection. Reading the whole Section in a go I am wondering whether we
%   need to add more. The paragraph describing Figure~\ref{fig:core_vocabulary}
%   seems to offer a summary of the main contributions and we have discussed
%   contentious points along the way. }

% \mtnote{For S3, the TODO list is: 1. Argue explicitly that the offered `model'
% is sufficient and necessary in order to discriminate between \pilot and
% not-\pilot systems;} \mtnote{This is still somewhat an open issue.}


%------------------------------------------------------------------------------
% SECTION 4
%------------------------------------------------------------------------------
\section{Pilot Systems: Design, Implementation, and Analysis}
\label{sec:analysis}

% Section~\S\ref{sec:understanding} offered two main contributions: (i) the
% minimal sets of logical components and functionalities of \pilot systems; and
% (ii) a well-defined core terminology to support reasoning about such systems.
% The former defines the necessary and sufficient requirements for a software
% system to be a \pilot system, while the latter enables consistency when
% referring to different \pilot systems. Both these contributions are used in
% this section to analyze a selection of \pilot systems.

%that have seen significant implemented to execute real-life scientific
%workloads.

The goal of this section is threefold. Initially, core and auxiliary design
properties of \pilot systems are inferred and defined. Subsequently, a selection
of \pilot system implementations are described showing how the architecture of
each system maps to the architectural pattern presented
in~\S\ref{sec:compsandfuncs}. Finally, insight is offered about the
commonalities and differences distinguishing these implementations on the basis
of the given descriptions and their most relevant auxiliary design properties.

% how to choose a \pilot system based on functional requirements,

\mtnote{The promises made at the end of these paragraph need to the fulfilled by
  offering remarks about: (i) choosing a pilot system; (ii) critical notes on
  the design and engineering of pilot systems; (iii) shared design principles.
  Should we do at the end of 4.2 or in 5?} \jhanote{I propose 5. We'll also need
  to weaken the claims made at the end of the previous section.}

% -----------------------------------------------------------------------------
% 4.1
%
% \subsection{Core and Auxiliary Design Properties}
% \label{sec:properties}

% This subsection analyzes the design properties of \pilot system
% implementations. Two sets of properties are introduced: core and auxiliary
% (see Tables~\ref{table:core_properties} and~\ref{table:aux_properties}). Core
% properties are necessary for every \pilot system to provide the minimal set of
% functionalities as described in~\S\ref{sec:compsandfuncs}. Auxiliary
% properties are instead required only in support of the core properties. For
% example, while the design of every \pilot system has to specify pilot
% deployment, authentication and authorization may be required for pilot
% deployment only on some DCRs, the specifics protocols and mechanisms of which
% depend upon the DCRs middleware. As such, the set of auxiliary properties are
% not necessarily shared among all \pilot systems.

% % Both sets of properties are chosen by considering the implementation %
% requirements of the \pilot capabilities as defined in~\S\ref{sec:history}.

% The following is a detailed description of both core and auxiliary properties
% by which any \pilot system can be characterized. The list of core properties
% is minimal and complete while that of auxiliary properties is a representative
% subset. While the given set of core properties has to characterize every
% \pilot system design, an arbitrary number of auxiliary properties have to be
% considered, depending on use case and target DCRs requirements. Note that both
% set of properties refer to \pilot systems and not to individual pilot
% instances deployed on DCRs.

% \jhanote{removed table reference..as a consequence of changes in 3, broken
%   reference here}

% \jhanote{there is a potential problem here: we define core property based upon
%   its requirement by a \pilot functionality. we define auxiliary based upon
%   its requirement by a \pilot system. can we not define core simply as, ``...
%   necessary for any \pilot system''?} \mtnote{Better? I see the link between
%   properties and functionalities as representing the connection between
%   Section 3 and 4, namely the connection between an abstract description of
%   every \pilot system functionalities and the properties of specific \pilot
%   system implementations. Does it make sense?}\jhanote{Yes. I also agree. the
%   linkage is between property and functionality.}

% \jhanote{how about traditional meaning of auxiliary: ``in support of'' and add
%   to the definition of auxiliary properties: those that help implement/support
%   core properties?} \mtnote{Done. Any better?}

% \jhanote{In general, I think some more discussion about the classification of
%   ``core'' and ``auxiliary'' will help readability and prevent complications
%   of the type ``why is X auxiliary and not core?''.} \mtnote{I will wait to
%   address this once the sets of core and auxiliary properties will be
%   definitive.}

% Table~\ref{table:property_component_mapping} offers a brief description for
% each core and auxiliary property alongside their mapping to the required
% components and functionalities of a \pilot system as described
% in~\S\ref{sec:compsandfuncs}.

% \jhanote{the table structure is different to the order presented here. See
%   comments at the end of next paragraph.} \mtnote{Done.}

% \jhanote{the contents of this section are fine. my observation is that of
%   style: both internal to the text, and text vis-a-vis the table. Could we not
%   (i) identify and describe the properties, (ii) then mention the components
%   and functionality associated with the property. If we did so, then the
%   ``description'' column would move to be after the property, followed by
%   component and functionality. This is section on properties.  But the
%   properties are presented as subsidiary/secondary in importance to the
%   components. Also, i think current style makes is a tad bit harder to read as
%   one looks at the table.} \mtnote{Done in the table, more loosely done in the
%   text. In my opinion, rigidly enforcing your suggestion in the text would
%   weaken the connection between Section 3 and Section 4. I think we may want
%   to strike a balance between consistency with the table and letting the
%   reader that comes straight from 3 to move smoothly on into 4.}

% -----------------------------------------------------------------------------
% 4.1
%
\subsection{Core properties}
\label{sec:coreprops}

The core properties of every \pilot system are derived by analyzing the features
each component described in~\S\ref{sec:compsandfuncs} must have in order to
implement its defining functionality. As such, core properties are necessary for
every \pilot system and the minimal and complete set of these properties needs
to be identified (see Table~\ref{table:core_properties}).

The design of a Pilot Manager requires at least the following in order to
provide the Pilot Provisioning functionality: (i) pilots holding at least one
type of resource in order to be usable for task execution; (ii) wrapping of
pilots into jobs suitable for scheduling on chosen DCR(s); (iii) bootstrapping
environment and execution routines so that pilots can become available for task
scheduling and execution. These features are covered by two core properties:
Pilot Resources and Pilot Deployment.

The design of a Workload Manager needs needs at least the following to provide
Task Dispatching functionality: (i) workload descriptions expressing the
computing requirements of each task alongside data and inter-task dependences
(if any) so that workloads can be scheduled and then executed; (ii) pilots with
a list of tasks, and each task listed in at least one pilot so as to enable
complete binding between task and pilot(s); (iii) task lists associated with a
pilot description or to a pilot instance in order to provide early and late
binding. Workload Semantics and Workload Binding are the two core properties
used to represent these features.

The design of a Task Manager requires that information is made available about
configuration and environmental features that determine the execution process of
a task on a DCR. Without such information, the Task Execution functionality
cannot be implemented as the execution of a task would fail, for example due
to misconfiguration or lack of required supporting software.  The core property
of Workload Execution represents this feature and other features associated with
task execution.

The following offers a detailed description of each core property. Note that
these properties refer to \pilot systems and not to individual pilot instances
deployed on DCRs.

% The core properties `Pilot Resources' and `Pilot Deployment' characterize the
% implementation of a Pilot Manager and its Pilot Provisioning functionality.
% Pilot Resources identifies the type of resources that the \pilot system
% exposes, e.g. compute, data, or networking while Pilot Deployment describes
% the modalities of scheduling, utilization, and aggregation of pilots. For
% example, to schedule a pilot onto a specific DCR, \pilot systems need to
% acquire and process information about what type of container to use (e.g., job,
% virtual machine), what type of scheduler the resource exposes, the amount of
% cores and duration they can be requisitioned for, what filesystems can be
% accessed, or how to communicate with the DCR.

% \jhanote{the way we discuss resource interaction, it appears that the
%   interaction is really information about the resources, that could be
%   provided by say an information service. calling this a property of the pilot
%   system doesn't seem very useful.} \mtnote{Interaction has been removed as it
%   is not part of core properties anymore.}

% \jhanote{In general, greater the information available, the better the use of
%   the DCR the \pilot system can make, e.g., information about the type of
%   cores, the amount of available memory and storage and whether a low-latency
%   interconnect is available.} \mtnote{Agreed. Does the revised version
%   incorporate this comment or should we make it more explicit?}

% The core properties `Workload Semantics' and `Workload Binding' characterize
% the implementation of a Workload Manager and its Task Dispatching
% functionality. Semantically, a workload description contains all the
% information necessary for it to be dispatched to an appropriate DCR and to be
% executed. For example, type, number, size, and duration of tasks alongside
% their grouping into stages or their data dependences need to be known when
% deciding how many resources of a specific type should be used to execute the
% given workload but also for how long such resources should be available.

% Executing a workload requires for its tasks to be bound to the resources. Both
% the temporal and spatial dimensions of the binding operations are relevant for
% the implementation of Task Dispatching. Depending on the concurrency of a
% given workload, tasks could be dispatched to one or more pilots for an
% efficient execution. Furthermore, tasks could be bound to pilots before or
% after its instantiation, depending on resource availability and scheduling
% decisions.

% Finally, the core property `Workload Execution' characterizes the
% implementation of a Task Manager and its Task Execution functionality.
% Workload Execution identifies the process of task dispatching and the
% modalities with which the execution environment is set up for each task. The
% dispatching process may also depend on the capabilities exposed by the target
% DCR and its policies. As such, the execution of the workload may depend on how
% the \pilot systems implement their interactions with the remote DCR.

% \jhanote{DCR Interaction has been discussed 3 paragraphs earlier.}
% \mtnote{Interaction has been removed as a core property. The word interaction
% is still used in the last sentence. Should we try to eliminate it?}

% \jhanote{I propose to remove multi-tenancy from the list. Only to avoid
%   possible confusion arising from the fact that it was referred to as a
%   functional underpinning in section 2.} \mtnote{Better? I am worried about
%   acknowledging for the reviewers that multitenancy is one of the
%   main/distinctive features of OSG pilots.}

% Following is an in depth description of core properties characterizing each
% implementation of a \pilot system. This list of properties is minimal and
% complete. \textit{Note that these are the properties of \pilot
% implementations, and not of individual systems of pilots}.

% \jhanote{the last sentence is a very important sentence. It could do with
%   italicizing or even boldfacing. Should we change the ``implementations'' to
%   just systems, as currently the reader might be confused between
%   implementations and individual instantiations?} \mtnote{Done.}\jhanote{I
%   propose the following as replacement for the paragraph. Apologies if I'm
%   causing confusion. Would be glad to discuss} \mtnote{Great, commented out
%   the previous version}

% Core properties
\begin{table*}
\centering
\begin{tabular}{p{3cm}p{7.7cm}p{2.7cm}p{2.6cm}}

\toprule

\textbf{Property} &
\textbf{Description} &
\textbf{Component} &
\textbf{Functionality} \\

\midrule

Pilot Resources &
Types and characteristics of pilot resources &
Pilot Manager &
Pilot Provisioning \\

Pilot Deployment &
Modalities for pilot scheduling and bootstrapping &
Pilot Manager &
Pilot Provisioning \\

Workload Semantics &
Tasks description, dependences, and relations &
Workload Manager &
Task Dispatching \\

Workload Binding&
Modalities and policies for binding tasks to pilots &
Workload Manager&
Task Dispatching \\

Workload Execution &
Type and features of the task execution environment &
Task Manager &
Task Execution \\

\bottomrule

\end{tabular}
\caption{\textbf{Mapping of the core properties  of \pilot system
  implementations onto the components and functionalities described in
  \S\ref{sec:compsandfuncs}. Core properties are necessary for every \pilot
  system to implement these components so as to provide those functionalities.}
  }
\label{table:core_properties}
\end{table*}

\begin{itemize}

% core because: pilots are resource placeholders.
\item \textbf{Pilot Resources}. The type and characteristics of resources that
  the \pilot system exposes. Resource types are, for example, compute, data, or
  networking while some of the their typical characteristics are: size (e.g.,
  number of cores), lifespan, intercommunication (e.g., low-latency or
  inter-domain), computing platforms (e.g., x86, or GPU), file systems (e.g.,
  local, shared, or distributed). The coupling between pilot and the resources
  that it holds may vary depending on the architecture of the DCR in which the
  pilot is instantiated. For example, a pilot may hold multiple compute nodes,
  single nodes, or portion of the cores of each node. The same applies to file
  systems and their partitions, or to software-defined or physical networks.

  % Usually, pilots expose compute resources but pilots might also expose data
  % and network resources.

  % \jhanote{how about the following for the first sentence: ``depending on the
  % capabilities offered by the DCR and the middleware used, pilots might
  % also..''} \mtnote{Done.}

% core because: its fundamental to our model
\item \textbf{Pilot Deployment}. Modalities for pilot scheduling and
  bootstrapping. The characteristic of these operations vary depending on the
  design choices of \pilot systems. For example, pilot scheduling may be fully
  automated (i.e., implicit) or directly controlled by applications and
  end-users (i.e., explicit). Pilots can be bootstrapped from code downloaded at
  every instantiation or from code that is bundled by the DCR. Scheduling and
  bootstrapping are performed on a DCR and their design depends on whether
  single or multiple types of DCRs are targeted. For example, a design based on
  connectors will be desirable when targeting multiple DCRs in order to gather
  and use information about type of container (e.g., job, virtual machine), type
  of scheduler (e.g., PBS, HTCondor, Globus), amount of cores, walltime, or
  available filesystems.

  % Once bootstrapped, pilots can be independently used or aggregated into a
  % resource overlay presented as an homogeneous set of resources to the
  % application layer.

  \jhanote{in general, a dependency on DCR properties suggests a non-core pilot
  property to me. pilot deployment shares this feature with DCR interaction. I'm
  not advocating to remove deployment from a core property, but suggest we
  address this possible weakness. In the case of the pilot deployment we can
  constraint the discussion to the modes of deployment that a pilot supports?}
  \mtnote{We may want to discuss this off comments. I am not sure I see your
  point.}
  \mtnote{Better?}

% core because: both tasks and workload are core components
\item \textbf{Workload Semantics}. Properties and relations among tasks captured
  in a workload description. This description encodes all the information
  necessary for the workload to be dispatched to appropriate DCR(s) for
  execution. Dispatching decisions depend on the temporal and spatial
  relationships among tasks and the type of capabilities needed for their
  execution. For example, type, number, size, and duration of tasks alongside
  their grouping into stages or their data dependences need to be known when
  deciding how many DCRs should be used to execute the given workload, for how
  long, and with what capabilities. \pilot systems may support workloads with
  varying semantic richness as labeled in~\S\ref{sec:termsdefs}: bag of tasks
  (BoT), ensemble (ENS), milti-phase (MP), workflow (WF).

  % \mtnote{add examples? See note in \S3 about having a general description of
  % workloads across the three axes, possibly with an example of taxonomy to use
  % here and in the table of core properties.}

  % The format or language in which the workloads are described may also be
  % relevant.

  % \jhanote{The list of dependencies was not complete, hence the conversion
  %   from specifically to for example. also i removed ``affinity'' between data
  %   and compute resources, as it would be unclear to the reader.} \mtnote{I
  %   removed `(standard)' as it seemed unnecessary but please add it back if
  %   you think differently.}

  % Specifically, dispatching decisions depends on the temporal and spatial
  % relationships among tasks, the affinity between data and compute resources
  % required by the tasks, and the type of capabilities needed for their
  % execution.

  % \jhanote{moved comment sequence to end of file, after end{document}}

  % \msnote{I get the feeling you are talking about different things. My
  % perception is that the miscommunication is caused by the ambiguous core/aux
  % distinction. More pragmatically I changed the "depending on task semantics"
  % into "depending on workload semantics".}

% core because: this is of the heart of pilot systems
\item \textbf{Task Binding}. Modalities and policies for binding tasks to
  pilots. Executing a workload requires for its tasks to be bound to one or more
  pilots instantiated on one or more DCRs. As seen in \S\ref{sec:understanding},
  \pilot systems may allow for two modalities of binding between tasks and
  pilots: early binding and late binding. \pilot system implementations differ
  in whether and how they support these two types of binding. Furthermore,
  \pilot systems can support application-level or multi-stage scheduling
  decisions. For example, coupled tasks may have to be bound to a single pilot,
  loosely coupled or uncoupled tasks to multiple pilots; tasks may be scheduled
  to a pilot and then to a specific pool of resources on a single compute node;
  or task binding may be prioritized depending on task size and duration.

% core because: its the only property of the task executor
% component/functionality
\item \textbf{Task Execution}. Type and characteristics of the environment in
  which tasks are executed. Once bound to a pilot, a task needs an environment
  that satisfies its execution requirements. The execution environment depends
  on the type of task (e.g., single or multi-threaded, MPI), task code
  dependences (e.g., compilers, libraries, interpreters, or modules ), and task
  communication, coordination and data requirements (e.g., interprocess,
  internode communication, data staging, sharing, and replication).

  % and diverse dispatch policies

  % \msnote{This should not overlap with Task Execution Modes} \mtnote{Is this
  %   still valid?}  \msnote{I'm still not comfortable with the overlap of this
  %   with Task Execution and Pilot Deployment} \mtnote{I am OK with it but if
  %   you offer an alternative I will see how to integrate it into the existing
  %   text.}
  % \jhanote{In my opinion, at least as currently written, infrastructure
  %   interaction is NOT a core property but artifact of other decisions and
  %   factors. Thus, possibly a candidate for movement into auxiliary
  %   properties?}

  % \mtnote{IMHO, a \pilot system \textbf{implementation} that cannot interact
  %   with a DCR should not be considered a ``viable'' implementation of the
  %   \pilot paradigm. It might be possible to find corner cases for which such
  %   a system is a ``valid'' implementation of the ``model'' offered in Section
  %   3.1, maybe even of P*. Nonetheless, I feel uncomfortable to call `\pilot
  %   system implementation' a system that runs only on non-DCR localhost.
  %   Especially in a section where the goal is to analyze the properties of
  %   \pilot systems implementations used to serve real-life scientific use
  %   cases.}  \jhanote{I am not saying the DCR interaction is an avoidable or
  %   unnecessary feature. I view DCR interaction arising as a consequence when
  %   a pilot system is compatible with a DCR, not as a fundamental property of
  %   the pilot itself. Just like how two people interact is a function of the
  %   who/what they are, as opposed to the interaction itself being a
  %   fundamental property of one of the people. Make sense?} \mtnote{Moved to
  %   auxiliary as per discussion.}

\end{itemize}

% -----------------------------------------------------------------------------
% 4.2
%
\subsection{Auxiliary properties}
\label{sec:auxprops}

Several auxiliary properties play an important role in supporting the design of
the core properties (see Table~\ref{table:aux_properties}). While these
properties might be necessary for \pilot systems deployment and usability, in of
themselves they do not distinguish a \pilot as a unique system. For example,
while the design of every \pilot system has to specify pilot deployment,
authentication and authorization may be required for pilot deployment only on
some DCRs, the specifics protocols and mechanisms of which depend upon the DCRs
middleware. As such, the set of auxiliary properties are not necessarily shared
among all \pilot systems.

% Both sets of properties are chosen by considering the implementation
% requirements of the \pilot capabilities as defined in~\S\ref{sec:history}.

The following list of auxiliary properties is a representative subset. While the
given set of core properties has to characterize every \pilot system design, an
arbitrary number of auxiliary properties have to be considered, depending on use
case and target DCRs requirements. Examples of this type of property are:
programming and user interfaces; interoperability across differing middleware
and other \pilot systems; multitenancy of pilots as opposed to that of DCRs;
strategies and abstractions for data management; security including
authentication, authorization, and accounting; or robustness in terms of
fault-tolerance and high-availability.

The following list of auxiliary properties refers to \pilot systems and not to
individual pilot instances deployed on DCRs.

% Auxiliary properties
\begin{table*}
\centering
\begin{tabular}{p{5cm}p{11cm}}

\toprule

\textbf{Property}      &
\textbf{Description}\\

\midrule

Architecture &
Structures and components of the \pilot system \\

Coordination and Communication &
Interaction protocols and patterns among the components of the system \\

Interface &
Interaction mechanisms both among components and exposed to the user \\

Interoperability &
Qualitative and functional features shared among \pilots systems \\

Multitenancy &
Simultaneous use of the \pilot system components by multiple users \\

Resource Overlay &
The aggregation of resources from multiple pilots into overlays \\

Robustness &
Resilience and reliability of pilot and workload executions \\

Security &
Authentication, authorization, and accounting framework \\

Files and Data &
Mechanisms for data staging and management \\

Performance &
Measure of the scalability, throughput, latency, or memory usage \\

Development Model &
Practices and policies for code production and management \\

DCR Interaction &
Modalities and protocols for pilot system/DCR interaction coordination \\

\bottomrule

\end{tabular}
\caption{\textbf{Summary of Auxiliary Properties and their descriptions.
  Auxiliary properties are required as a support to the implementation of core
  properties.}}
\label{table:aux_properties}
\end{table*}


% \jhanote{Given that we have reduced our discussion of auxiliary properties, I
%   think the binding them to ``components'' and ``functionality'' is a bit
%   constraining if not misleading. We could remove the last two columns, and
%   where needed in the core properties introduce a mention of relevant
%   components/functionality into the description. This has a major advantage of
%   making this table more space efficient, i.e., by reduce to 3 formal columns:
%   core/aux, property and description, and using the description more flexibly.
%   thoughts?}
% \mtnote{The mapping between core properties, components, and functionalities
%   is used heavily in the text so I would leave it. Agreed for the auxiliary in
%   the current version of the text. I would split the table in two tables, one
%   for core properties, the other for auxiliary without the last two columns,
%   and reference/place them accordingly.}
% \jhanote{Yes, this makes sense.}
% \mtnote{Done.}


% \jhanote{alternate style suggestion: moved the earlier last sub-sentence to
%   the beginning. i've commented out the original text in case you'd like to
%   revert}
% \mtnote{I reviewed the second sentence because it was incomplete.}

% Programming and user interfaces; interoperability across differing middleware
% and other \pilot systems; multitenancy of pilots as opposed to that of DCRs;
% strategies and abstractions for data management; security including
% authentication, authorization, and accounting; support for multiple usage
% modes like HPC or HTC; or robustness in terms of fault-tolerance and
% high-availability; are all examples of properties that might be necessary for
% a \pilot implementation but that, in of themselves, would not distinguish a
% \pilot as a unique system.

% \jhanote{the transition to 4.1.2 is a bit unsmooth, e.g., no text before we
%   start bullet points. Two possibilities: (i) could the last paragraph could
%   just as well as be in 4.1.2 as it is in 4.1.1 ? (ii) Break up last paragraph
%   into a 4.1.1 component and a 4.1.2 component?} \mtnote{Went for option (i).
%   Better?}


\begin{itemize}

% aux because: its an "implementation" detail
\item \textbf{Architecture}. \pilot systems may be implemented by means of
  different types of architecture (e.g., service-oriented, client-server, or
  peer-to-peer). Architectural choices may depend on multiple factors, including
  application use cases, deployment strategies, or interoperability
  requirements.  The analysis and comparison of architectural choices is limited
  to the trade-offs implied by each choice, especially when considering how they
  affect the core properties.

  %  \jhanote{sentence needs attention: plurality of ``choice'' seems
  %  inconsistent?}  \mtnote{Better?}

  \jhanote{lower case for core properties? either way, we should be consistent,
  e.g., previous paragraph we use l.c.} \mtnote{I proposed to review this
  towards the end of the paper review so to have a global lock.} \mtnote{Done.}

% aux because: its an implementation detail not directly coupled to one of the
% components or functions
\item \textbf{Communication and Coordination}. Communication and coordination
  are features of every distributed system, but \pilot systems are not defined
  by any specific communication and coordination pattern or protocol. The
  details of communication and coordination among the \pilot system components
  are determined at implementation time.


% In \ref{sec:compsandfuncs} it was
%   suggested that \pilot systems are not defined by any specific communication
%   and coordination pattern or protocol.

% These comments were originally in section 3.
% \msnote{The paragraph above is more ammunition for the argument that the
% division between core and aux props is at least fuzzy, or potentially
% arbitrary, or C\&C is just a core prop. This needs further discussion.}
% \mtnote{In the agenda for next Tuesday? Otherwise, please feel free to propose
% the changes in 4.1 and we can discuss via comments.}

% aux because: its an implementation detail
\item \textbf{Interface}. \pilot systems may present several types of private
  and public interfaces: among the components of the \pilot system, between the
  application and the \pilot system, or between end users and one or more
  programming language interfaces for the \pilot system.

% aux because: its not directly coupled to one of the components/functions
\item \textbf{Interoperability}. Interoperability is defined as the capability
  to deploy pilots on DCRs characterized by heterogeneous middleware. It allows
  for a \pilot system to provision pilots and execute workloads on different
  types of DCR middleware (e.g., HTC, HPC, cloud but also HTCondor, LSF, Slurm,
  or Torque).

% aux because: its not directly coupled to one of the components/functions
\item \textbf{Multitenancy}. \pilot systems may offer multitenancy at both
  system and local level. When offered at system level, multiple users are
  allowed to utilize the same instance of a \pilot system; when available at
  local level, multiple users may share the same pilot.

% aux because: not all the pilot systems offer an abstraction for multiple (at
% least two) pilots.
\item \textbf{Resource Overlay}. The resources of multiple pilots may be
  aggregated into a resource overlay. Overlays may be directly exposed to the
  application layer and to the end-users depending on the public interfaces and
  usability models. Overlays may abstract away the notion of pilot or offer an
  explicit semantic for their aggregation, selection, and management.

% aux because: its not directly coupled to one of the components/functions
\item \textbf{Robustness}. Indicates the design features that contribute towards
  the resilience and the reliability of a \pilot system. Fault-tolerance,
  high-availability, and state persistence are considered indicators of both the
  maturity of the development stage of the \pilot system implementation, and the
  type of support offered to the relevant use cases.

  % introduced in~\S\ref{sec:intro}.
  % \mtnote{I do not think use cases are presented in Section 2 anymore. I think
  % we should have a high level review of the paradigmatic use cases for pilot
  % systems in the introduction.}

% aux because: its not directly coupled to one of the components/functions
\item \textbf{Security}. The usage and applicability of \pilot systems' core
  functionalities are influenced by security protocols and policies.
  Authentication, authorization, and accounting can be based on diverse
  protocols and vary across \pilot systems. An in depth analysis of the security
  protocols for \pilot systems is outside the scope of this paper.

  % \jhanote{We need a sentence that speaks to what are the primary
  %     consideration before we go exclusionary. How about something like?
  %     ``\pilot functionality are not defined by the type and method of
  %     security associated with them; the usage and applicability is however
  %     influenced by the security''} \mtnote{I wrote a new description.
  %     Better?}

  % The properties of \pilot systems related to security would require a
  % dedicated analysis. The analysis here is limited to authentication,
  % authorization paradigms. The scope of the analysis is further constrained by
  % focusing only on those elements that impact the Core Functionalities as
  % defined in \S\ref{sec:compsandfuncs}. \jhanote{the sentence: ``the analysis
  % here is limited..'' is that legacy and can go?}

% aux because: its not directly coupled to one of the components/functions
\item \textbf{Data}. As discussed in Section~\ref{sec:compsandfuncs}, only
  basic data reading/writing functionalities are required by a
  \pilot system. Nonetheless, most real-life use cases require more
  advanced data management functionalities that can be implemented within the
  \pilot system or delegated to third party tools.

% aux because: its not directly coupled to one of the components/functions
\item \textbf{Performance and scalability}. \pilot systems vary in
  terms of  both overheads they add to the execution of a given workload, and
  size and duration of the workloads a user can expect to be supported.
  Furthermore, \pilot systems can be designed to optimize one or more
  performance metrics, depending on the targeted use cases.

% aux because: its not directly coupled to one of the components/functions
\item \textbf{Development Model}. The model used to develop \pilot systems is a
  distinguishing element, especially when considering whether the development is
  supported by an open community or by a specific project. Different development
  models have an impact on the life span of the \pilot system, its
  maintainability and, in case, evolution path.

% core because: its a must-have to do pilot provision
% (alternative view: aux because its an implementation detail \ldots)
\item \textbf{DCR Interaction}. \pilot systems interact with DCRs at multiple
  levels. The degree of coupling between the \pilot system and the DCR can vary
  as much as the information shared between them. Depending on the capabilities
  implemented, \pilot systems have to negotiate the scheduling on pilots, may be
  staging data in and out of the DCR, and may have to mediate task binding and
  execution by means of remote interfaces and protocols.

\end{itemize}

Both core and auxiliary properties have a direct impact on the use cases for
which \pilot systems are designed, engineered, and deployed. For example, while
every \pilot system offers the opportunity to schedule the tasks of a workload
on a pilot, the degree of support for specific workloads varies across
implementations. Some \pilot systems support Virtual Organizations
(VO)~\cite{foster2001} and running tasks from multiple users on a single pilot
while others support jobs using the Message Passing Interface (MPI).
Furthermore, all \pilot systems support the execution of one or more type of
workloads, but they differ when considering execution modalities that maximize
application throughput (HTC), task computing performance (HPC), or
container-based high scalability (cloud).

% \jhanote{Sorry, but I don't see what the last sentence is ``analagous'' to?
%   Possibly merge last sentence with second sentence for conciseness and
%   readability} \mtnote{Sentence \#1,2,3 are three examples of ``Both core and
%   auxiliary properties have a direct impact on the use cases for which \pilot
%   systems are engineered and deployed''. Better?}


% -----------------------------------------------------------------------------
% 4.2
%
\subsection{\pilot System Implementations}
\label{sec:implementations}

A set of \pilot systems has been chosen for further analysis to show how the
core properties just described are implemented under different requirements both
in terms of target DCRs and application workloads. The choice is based on
availability, design, intended use, and uptake of each \pilot system. The goal
is to describe systems that: (i) implement diverse design; (ii) target specific
or general-purpose use cases and DCR; and (iii) are currently available,
actively maintained and used by scientific communities. Where an exception is
made to the the last point (MyCluster), it is due to the specific design and
advance engendered by MyCluster. Space constraints prevented consideration of
additional \pilot systems, as well as necessitated limiting the analysis to the
core properties of \pilot systems.

% The goal is to describe systems that: (i) are currently available and are
% actively maintained; (ii) implement diverse design; (iii) target specific or
% general-purpose use cases and DCR; and (iv) are actively used by scientific
% communities.

Examining these \pilot systems using the architectural pattern and common
vocabulary defined in~\S\ref{sec:understanding} exposes similarities and
differences allowing a detailed comparison. Critically assessing these
differences will bring to the fore the generality of the pilot abstraction, its
independence from specific software systems and deployment environments, and the
more relevant challenges of \pilot systems implementation.
Table~\ref{table:implementations-properties} offers a summary of the core
properties implementation for each analyzed \pilot system\protect\footnote{Pilot
systems are ordered alphabetically in the table \& text.}.

\begin{table*}
 \centering
% \begin{tabular}{|p{2cm}||p{2cm}|p{2cm}|p{2cm}|p{2cm}|p{2cm}|p{2cm}|}
  \begin{tabular}{p{2.5cm}p{2.25cm}p{2cm}p{5cm}p{1.75cm}p{1.75cm}p{1.75cm}|}

  \toprule

    \textbf{Pilot\newline System} &
    \textbf{Pilot\newline Resources} &
    %\textbf{DCR\newline Interaction} &
    \textbf{Pilot\newline Deployment} &
    \textbf{Workload\newline Semantics} &
    \textbf{Workload\newline Binding} &
    \textbf{Workload\newline Execution} \\

  \midrule

    Coaster System &
    Compute &
    % GANGA &
    Implicit &
    WF (Swift~\cite{korkhov2009dynamic}) &
    Late &
    Serial, MPI \\

    DIANE &
    Compute &
    % GANGA &
    Explicit &
    WF (MOTOUR~\cite{korkhov2009dynamic}) &
    Late &
    Serial \\

    DIRAC &
    Compute &
    % Custom &
    Implicit &
    WF (TMS) &
    Late &
    Serial, MPI \\

    % Falkon &
    % HPC &
    % % Unspecified &
    % Web Service &
    % None &
    % Late (mixed push/pull) &
    % Serial \\

    % HTCondor Glidein &
    % Compute &
    % % Condor-G &
    % Explicit &
    % BoT &
    % Late &
    % Serial, MPI \\

    GlideinWMS &
    Compute &
    % Condor-G &
    Implicit &
    WF (Pegaus~\cite{deelman2015}, DAGMan~\cite{frey2002condor}) &
    Late &
    Serial, MPI \\

    MyCluster &
    Compute &
    % Custom (SGE / PBS / HTCondor) &
    Implicit &
    job descriptions &
    Late &
    All \\

    \panda &
    Compute &
    % Custom, SAGA &
    Implicit &
    BoT &
    Late &
    Serial, MPI \\

    RADICAL-Pilot &
    Compute, data &
    % SAGA &
    Explicit &
    ENS (EnsembleMD Toolkit~\cite{emdtoolkit_url}) &
    Early, Late &
    Serial, MPI \\

 \bottomrule

 \end{tabular}
 \caption{\textbf{Overview of \pilot systems and a summary of the values of
 their core properties. Based on the tooling currently available for each \pilot
 system, the types of workload supported as defined in~\S\ref{sec:termsdefs}
 are: BoT~$=$~Bag of Tasks; ENS~$=$~Ensembles; WF~$=$~workflows.}}
 \label{table:implementations-properties}
\end{table*}

% In the following analysis, \pilot systems are ordered alphabetically and
% syntactic conventions are used to support the reader. The \vocab{Logical
% Components}, the \vocab{Functionalities}, and the \vocab{Terms and
% Definitions} introduced in~\S\ref{sec:understanding} are highlighted in
% \vocab{Bold}.  \prop{Italic} is used to refer to the core \prop{Properties}
% just described and, when necessary, the specific terminology used in the
% documentation of the \pilot systems under analysis is distinguished by using
% the \impterm{Typewriter} font.

% \mtnote{Do we need this formatting devices? They are not always added to every
%   use of the terms and they make the text a bit ``heavy'' for me.} \mtnote{At
%   the moment we are not using them but please let me know whether you think we
%   should go back to them.}

% \mtnote{Pilot systems analyzed: DIANE, HTCONDOR, PANDA, RADICAL Pilot. Explain
%   why so few: we now introduce 4 exemplar cases showing how systems developed
%   to execute different types of application on diverse DCRs... Different
%   capabilities... Same core components/functionalities/properties ->
%   Therefore: general paradigm, best practices for the implementation, choosing
%   criteria.}

% ------------------------------------------------------------------------------
% Coaster System
%
\subsubsection{Coaster System}\label{sec:coaster}

% Who did it and for what scientific domains.
The Coaster System (also referred to in literature as Coasters) was developed by the Distributed Systems Laboratory at the University of Chicago is maintained by the Swift project.
The primary goal of the Coaster System is to provide \pilot abstraction
functionalities to Swift system~\cite{wilde2011swift,zhao2007swift}. The Coaster
System can be used stand-alone but, in practice, it should be regarded as a
system specifically tailored for the Swift System requirements and design.

\note{from Mihael: That last statement provides nothing to back it up. For a very long time the coasters code was part of the CoG project and maintained in a separate, standalone repository, separated from swift by a very well defined task interface, very similar to SAGA and actually pre-dating SAGA.\\
\\
So no. The Swift system instead is tailored for an abstract task interface. Low coupling of software components is what any sizeable project should strive for. Saying that this isn't the case is like saying that the project is badly engineered.}

% Architecture.
The Coaster System is composed of three main
components~\cite{hategan2011coasters}: a Coaster Client, a Coaster Service, and
a set of Workers. The Coaster Client implements both a Bootstrap and a Messaging
Service while the Coaster Service implements a data proxy service and a set of
job providers for diverse DCRs middleware. Workers are executed on the DCR
compute nodes to bind compute resources and execute the tasks submitted by the
users to the Coaster System.

% Why it is a pilot system.
Figure~\ref{fig:coaster_comparison} illustrates how the Coaster System
components map to the components and functionalities of a \pilot system as
described in in~\S\ref{sec:understanding}: the Coaster Client is a Workload
Manager, the Coaster Service a Pilot Manager, and each Worker a Task Manager.
The Coaster Service implements the Pilot Provisioning functionality by
submitting adequate numbers of Workers on suitable DCRs. The Coaster Client
implements Task Dispatching while the Workers implements Task Execution.

\begin{figure}[t]
    \centering
        \includegraphics[width=.48\textwidth]{figures/coaster_comparison.pdf}
    \caption{Diagrammatic representation of the Coaster System components,
    functionalities, and core vocabulary mapped on
    Figure~\ref{fig:core_vocabulary}.}
    \label{fig:coaster_comparison}
\end{figure}

% Execution model.
The execution model of the Coaster System can be summarized in 7
steps~\cite{coasters_url}: 1. a set of tasks is submitted by a user via the
Coaster Client API; 2. when not already active, the Bootstrap Service and the
Message Service are started within the Coaster Client; 3. when not already
active, a Coaster Service is instantiated for the DCR(s) indicated in the task
descriptions; 4. the Coaster Service gets the task descriptions and analyzes
their requirements; 5. the Coaster Service submits one or more Workers to the
target DCR taking also into account whether any worker is already active; 6.
when a Worker becomes active it pulls a task and, if any, its data dependences
from the Coaster Client via the Coaster Service; 7. the task is executed.

%% 5 core properties

% Pilot Resources.
Each Worker holds compute resources in the form of compute cores. Data can be
staged from a shared file-system, directly from the client to the Worker, or via
the Coaster Service acting as a proxy. Data are not a type of resource held by
the pilots so the \pilot abstraction is not used to expose data to the user.
Networking capabilities are assumed to be available among the Coaster System
components but a dedicated communication protocol is implemented, used also for
data staging.

% Pilot Deployment.
The Coaster Service automates the deployment of pilots (i.e., Workers) by taking
into account several parameters: total number of jobs that the DCR batch system
accepts; number of cores for each DCR compute node; DCR policy for compute nodes
allocation; walltime of the pilots compared to the total walltime of the tasks
submitted by the users. These parameters are evaluated by a custom pilot
deployment algorithm that performs a walltime overallocation estimated against
user-defined parameters, and chooses the number and sizing of pilots on the base
of the target DCR capabilities.

% Workload Semantics.
The Coaster System is primarily designed to serve as a \pilot backend for the
Swift System. As such, the Coaster System can execute workflows composed of
loosely coupled tasks with data dependences. Natively, the Coaster Client
implements a Java CoG Job Submission Provider~\cite{von2000cog,cog_url} for
which both Java and Python API are available to submit tasks and to develop
distributed applications. While tasks are assumed to be single-core by default,
multi-core tasks can be executed by configuring the Coaster System to submit
Workers holding multiple cores~\cite{swift_guide_url}. It should also be
possible to execute MPI tasks by having Workers to span multiple compute nodes
of a DCR.

\note{from Mihael: The Python CoG Kit, Keith's project, was a different thing. There's some bad naming. The initial Java CoG (or jglobus) and the Python CoG, were clients to GT2. Java CoG 4 and above (started around 2004) are the ones implementing a job abstraction layer.}

% workload binding
The Coaster Service uses providers from the Java CoG Kit Abstraction Library to
submit Workers to DCR with grid, HPC, and cloud middleware. Workers pull the
tasks to execute, The late binding of tasks to pilots is implemented by Workers
pulling tasks to be executed as soon as free resources are available. It should
be noted that tasks are early bound to the DCR that is specified as part of the
task description.

\note{from Mihael: That's just the way it's mostly used. We experimented with a mode in which tasks are bound only when actual workers become available. But I guess that's not documented.}

% ------------------------------------------------------------------------------
% DIANE
%
\subsubsection{DIANE}\label{sec:diane}

% Who did it and for what scientific domains.
DIANE~\cite{moscicki2003diane} (DIstributed ANalysis Environment) has been
developed at CERN to support the execution of workloads for the DCRs associated
with European Grid Infrastructure (EGI) and worldwide LHC Computing Grid (WLCG).
DIANE has since been used also in the Life
Sciences~\cite{moscicki2004biomedical,jacq2007virtual,moscicki2003} and in few
other scientific domains~\cite{bacu2011gswat,mantero2003simulation}.

% Architecture.
DIANE is an application task coordination framework for distributed applications
that can be executed by means of the \MW pattern~\cite{moscicki2003diane}. DIANE
consists of four logical components: a TaskScheduler, an ApplicationManager, a
SubmitterScript, and a set of ApplicationWorkers~\cite{diane_url}. The first two
components -- TaskScheduler and the ApplicationManager, are implemented as a
RunMaster service, while the ApplicationWorkers as a WorkerAgent service.
Submitter Scripts deploy ApplicationWorkers on DCRs.

% \jhanote{logical?} \mtnote{I am not sure we need to distinguish among types of
%   component as we do not use this differentiation in the \pilot system
%   descriptions. Said that, I think we mean logical components.} \jhanote{I
%   agree, but the mapping to RunMaster and WorkerAgent service is made easier
%   by explicitly calling them as logical} \mtnote{Added logical to the other
%   pilot descriptions.}

% Why it is a pilot system.
Figure~\ref{fig:diane_comparison} shows how DIANE implements the components and
functionalities of a pilot system as described in~\S\ref{sec:understanding}: the
RunMaster service is a Workload Manager, the SubmitterScript is a Pilot Manager,
and the ApplicationWorker of each WorkerAgent service is a Task Manager.
Accordingly, the \pilot provisioning functionality is implemented by the
SubmitterScript, Task Dispatching by the RunMaster, and Task Execution by the
WorkerAgent. In DIANE, Pilots are called ``WorkerAgents''.

\begin{figure}[t]
    \centering
        \includegraphics[width=.48\textwidth]{figures/diane_comparison.pdf}
    \caption{Diagrammatic representation of DIANE components, functionalities,
    and core vocabulary mapped on Figure~\ref{fig:core_vocabulary}.}
    \label{fig:diane_comparison}
\end{figure}

% Execution model.
The execution model of DIANE can be summarized in four
steps~\cite{moscicki2011understanding}: 1. the user submits one or more jobs to
DCR by means of SubmitScript(s) to bootstrap one or more WorkerAgent; 2. When
ready, the WorkAgent(s) reports back to the ApplicationManager; 3. tasks are
scheduled by the TaskScheduler on the available WorkerAgent(s); 4. after
execution, WorkerAgents send the output of the computation back to the
ApplicationManager.

\jhanote{minor point: for readability, we might want to have a boldface before
  numerical value of each step. if we decide to do so, we would need to be
  consistent across all systems} \mtnote{I would decide about this towards the
  end of the review process so to have a global lock.}

%% 5 core properties

% Pilot Resources
The pilots used by DIANE (i.e., WorkerAgents) hold compute resources on the
target DCRs. WorkerAgents are executed by the DCR middleware as jobs with mostly
one core but possibly more. DIANE also offers a data service with a dedicated
API and CLI that allows for staging files in and out of WorkerAgents. This
service represents an abstraction of the data resources and capabilities offered
by the DCR, and it is designed to handle data only in the form of files stored
into a file system. Network resources are assumed to be available and no
abstractions are offered.

% \jhanote{``bind to'' compute resources in the first sentence?? i.e., is it the
%   pilots that are being bound or do they act as the intermediary?} \mtnote{I
%   see pilots always as intermediaries but if you think it is preferable, I
%   would have no issues with using ``bind to compute resources''.}
%   \jhanote{I'm fine with ``bind'' as an intermediary, but then we need to
%   specify what is bound to what. currently only the resource is specified as
%   one entity that is being bound.} \mtnote{Would ``hold'' instead of ``bind''
%   be clearer? For me it would be better as it would avoid overloading of
%   ``bind'' as used in binding tasks to a pilot.}\jhanote{hold is fine.
%   consistent with placeholder too.}
% \mtnote{Done.}

% Pilot Deployment
DIANE leaves the user free to develop pilot deployment mechanisms tailored to
specific resources. The RunMaster service assumes to have pilots available to
schedule the tasks of the workload. Deployment mechanisms can range from direct
manual execution of jobs on remote resources to deployment scripts or
full-fledged factory systems to support the sustained provisioning of pilots
over extended periods of time.

A computational task-management tool called
GANGA~\cite{moscicki2009ganga,ganga_url} is available to support the development
of SubmitterScripts. GANGA offers a unified interface for job submission to DCRs
with Globus, HTCondor, UNICORE, or gLite middleware. The main goal of GANGA is
to facilitate the submission of pilots to diverse DCRs by means of a uniform
interface and abstraction.

% Workload Semantics
DIANE has been designed to execute workloads that can be partitioned into
ensembles of parametric tasks on multiple pilots. Each task can consist of an
executable invocation but also of a set of instructions, OpenMP threads, or MPI
processes~\ref{moscicki2011b}. Relations among tasks and group of tasks can be
specified before or during runtime enabling DIANE to execute articulated
workflows. Plugins have been written to manage DAGs~\cite{grzeslo2009} and
data-oriented workflows~\cite{glatard2008}.

% workload binding
DIANE is primarily designed for HTC and Grid environments and to execute pilots
with a single core. Nonetheless, the notion of ``capacity'' is exposed to the
user to allow for the specification of pilots with multiple cores. Although the
workload binding is controllable by the user-programmable TaskScheduler, the
general architecture is consistent with a pull model. The pull model naturally
implements the late-binding paradigm where every ApplicationAgent of each
available pilot pulls a new task.

% ------------------------------------------------------------------------------
% DIRAC
%
\subsubsection{DIRAC}\label{sec:dirac}

% Who did it and for what scientific domains.
DIRAC (Distributed Infrastructure with Remote Agent Control) is a software
product developed by the CERN LHCb project\cite{tsaregorodtsev2004dirac}. DIRAC
implements a Workload Management System (WMS) to manage the processing of
detector data, Monte Carlo simulations, and end-user analyses. DIRAC primarily
serves as the LHCb workload management interface to WLCG and, as such, it can
execute workloads on DCRs deploying Grid, Cloud, and HPC middleware.

% Architecture.
DIRAC has four main logical components: a set of TaskQeues, a set of
TaskQueueDirectors, a set of JobWrappers, and a MatchMaker. TaskQueues,
TaskQueueDirectors, and the MatchMaker are implemented within a monolithic WMS.
Each TaskQeue collects tasks submitted by users, multiple TaskQeue being created
depending on the requirements and ownership of the tasks. JobWrappers are
executed on the DCR to bind compute resources and execute tasks submitted by the
users. Each TaskQueueDirector submits JobWrappers to target DCRs. The MatchMaker
matches requests from JobWrappers to suitable tasks into TaskQeues.

% Why it is a pilot system.
DIRAC was the first pilot-based WMS designed, developed, and deployed to serve
one of the LHC main experiments~\cite{casajus2010dirac}.
Figure~\ref{fig:dirac_comparison} shows how the DIRAC WMS implements a Workload,
a Pilot, and a Task Manager as they have been described
in~\S\ref{sec:understanding}. TaskQueues and the MatchMaker implement the
Workload Manager and the related Task Dispatching functionality. Each
TaskQueueDirector implements a Pilot Manager and its Pilot Provisioning
functionality, while each JobWrapper implements a Task Manager and Pilot
Execution.

\begin{figure}[t]
    \centering
        \includegraphics[width=.48\textwidth]{figures/dirac_comparison.pdf}
    \caption{Diagrammatic representation of DIRAC components, functionalities,
    and core vocabulary mapped on Figure~\ref{fig:core_vocabulary}.}
    \label{fig:dirac_comparison}
\end{figure}

% Execution model.
The DIRAC execution model can be summarized in 5 steps: 1. a user submits one or
more tasks by means of a CLI, Web portal, or API to the WMS Job Manager; 2.
submitted tasks are validated and added to a new or an existing TaskQeue,
depending on the task properties; 3. one or more TaskQueues are evaluated by a
TaskQueueDirector and a suitable number of JobWrappers are submitted to
available DCRs; 4. JobWrappers, once instantiated on the DCRs, pull the
MatchMaker asking for tasks to be executed; 5. tasks are executed by the
JobWrappers under the supervision of each JobWrapper's Watchdog.

%% 5 core properties.

% Pilot Resources.
JobWrappers, the DIRAC pilots, hold compute resources in the form of single or
multiple cores, spanning portions, whole, or multiple compute nodes. A dedicated
subsystem is offered to manage data staging and replication but data
capabilities are not exposed via the \pilot abstraction. Network resources are
assumed to be available to allow pilots to communicate with the WMS.

% Pilot Deployment.
Pilots are deployed by TaskQueueDirectors. Three main operations are iterated:
(i) getting a list of TaskQueues; (ii) calculating the number of pilots to
submit depending on the user-specified priority of each task, and the number and
properties of the available or scheduled pilots; and (iii) submitting the
calculated number of pilots while rewriting the task description in the language
required by the DCR middleware on which the pilot has been submitted.

% Workload Semantics.
Natively, DIRAC can execute tasks described by means of the Job Description
Language (JDL)~\cite{pacini2006job}. As such, single-core, multi-core, MPI,
parametric, and collection jobs can be described and submitted. Users can
specify a priority index for each submitted job and one or more specific DCR
that should be targeted for execution. Tasks with complex data dependences can
be described by means of a DIRAC system called ``Transformation Management
System'' (TMS)~\cite{tsaregorodtsev2006dirac}. In this way, user-specified,
data-driven workflows can be automatically submitted and managed by the DIRAC
WMS.

% workload binding.
As DIANE and the Coaster System, DIRAC features a task pull model that naturally
implements the late-binding paradigm. Each JobWrapper pulls a new task once it
is available and has free resources. No early binding of tasks on pilots is
offered.

% ------------------------------------------------------------------------------
% Falkon
%
% \subsubsection{Falkon}\label{sec:falkon}

% Who did it and for what scientific domains.
% The Fast and Light-weight tasK executiON framework
% (Falkon)~\cite{raicu2007falkon} was developed at University of Chicago,
% Computer Science Department in the Distributed Systems Laboratory. Falkon
% originated between 2005 and 2007 from the need to run efficiently up to $10^4$
% single-core, short-lived, loosely coupled tasks on DCRs with grid computing
% middleware. At per 2015, Falkon in not actively maintained but it offered
% important indications about performance trade offs and informed the design of
% Coasters, the \pilot system of the Swift workflow engine~\cite{}.

% Architecture.


% Why it is a pilot system.

% \begin{figure}[t]
%     \centering
%         \includegraphics[width=.48\textwidth]{figures/falkon_comparison.pdf}
%     \caption{Diagrammatic representation of Falkon components,
%     functionalities, and core vocabulary mapped on
%     Figure~\ref{fig:core_vocabulary}.}
%     \label{fig:falkon_comparison}
% \end{figure}

% Execution model.

%% 5 core properties
% Pilot Resources

% Pilot Deployment

% Workload Semantics

% workload binding


% ------------------------------------------------------------------------------
% Glidein
%
\subsubsection{HTCondor Glidein and GlideinWMS}
\label{sec:glidein}

% Who did it and for what scientific domains.
The HTCondor Glidein system has been designed as part of the software ecosystem
of HTCondor. The HTCondor Glidein system implements pilots within regular Condor
pools. It was developed by the Center for High Throughput Computing at the
University of Wisconsin-Madison (UW-Madison). HTCondor Glidein's original goal
was to aggregate DCRs with heterogeneous middleware into HTCondor resource
pools~\cite{glidein_manual_url}.

% \jhanote{I'm not sure the meaning of the previous sentence is clear: does it
%   mean (a) DCRs that expose grid middleware into a pool, (b) although we don't
%   define/use federate, does ``include'' actually refer to federate?} \mtnote{I
%   would not think so. A condor pool is a list of DCRs with a set of properties
%   for each item of the list. I added a comma, better? That sentence needs to
%   convey the following: there is a condor pool, there are DCRs with a grid
%   middleware (i.e., Globus), these DCRs need to be added to the condor pool;
%   The HTCondor Glidein system does that.}\jhanote{Unless there are objections
%   to the word ``Aggregate'' I think the sentence is as simple as possible
%   without loss of correctness. OK?} \mtnote{I think there is loss of meaning:
%   if I got the literature right, it was about getting HTCondor to work with
%   grid middleware. Any specific reason because using ``grid middleware'' is
%   problematic?}\jhanote{the objective remains to aggregate DCRs. grid
%   middleware is a layer/mechanism.} \mtnote{I agree but DCRs are already
%   aggregated into condor pools. The problem is that without Glideins they are
%   not able to aggregate DCRs with grid middleware. As such, the issue is about
%   the layer mechanism, not about DCRs in general. Let me share some of the
%   references I used to make up the opinion above:
%   \url{http://www.slideserve.com/hastin/the-Glidein-service}, slide 2, ``A
%   technique for creating temporary, user controlled Condor pools using
%   resources from remote Grid sites'', slide 9, ``Running short jobs on the
%   Grid: Condor can dispatch jobs faster than Globus'';
%   \url{http://research.cs.wisc.edu/htcondor/manual/v7.6/5_4Glidein.html},
%   ``Glidein is a mechanism by which one or more grid resources (remote
%   machines) temporarily join a local Condor pool'' , ``condor\_Glidein
%   accomplishes two separate tasks towards having a remote grid resource join
%   the local Condor pool. They are the set up task and the execution task''; In
%   `Condor-G: A Computation Management Agent for Multi-Institutional Grids',
%   ``In effect, the Condor-G Glidein mechanism uses Grid protocols to
%   dynamically create a personal Condor pool out of Grid resources by
%   “gliding-in” Condor daemons to the remote resource. ''; In `Making Science
%   in the Grid World: Using Glideins to Maximize Scientific Output', ``In this
%   paper we present the Glidein approach that aims to make the Grid computing
%   as easy as working in a local batch environment.''}\jhanote{I refer us to
%   the working definition of DCR, which is already comprised of a defined
%   middleware}.

 \mtnote{Find out metrics on usage and target scientific communities}

% Architecture.
The logical components of HTCondor relevant to the Glidein system are: a set of
Schedd and Startd daemons, a Collector, and a
Negotiator~\cite{glidein_presentation_url}. Schedd is a queuing system that
holds workload tasks and Startd handles the DCR resources. The Collector holds
references to all the active Schedd/Startd daemons, and the Negotiator matches
tasks queued in a Schedd to resources handled by a Startd.

HTCondor Glidein has been complemented by
GlideinWMS~\cite{sfiligoi2008glideinwms}, a Glidein-based workload management
system that automates deployment and management of Glideins on multiple types of
DCR middleware. GlideinWMS builds upon the HTCondor Glidein system by adding the
following logical components: a set of Glidein Factory daemons, a set of VO
Frontend daemons, and a Collector dedicated to the
WMS~\cite{glideinwms_url,glideinwms_manual_url}. Glidein Factories submit tasks
to the DCRs middleware, each VO Frontend matches the tasks on one or more Schedd
to the resource attributes advertised by a specific Glidein Factory, and the WMS
Collector holds references to all the active Glidein Factories and VO Frontend
daemons.

% \jhanote{I have the original text commented below. I reorganized a bit so as
%   to help the flow better. Also, we were introducing too many concepts in the
%   first couple of paragraphs which was getting difficult to parse. HTCondor,
%   Glidein, glideninWMS, etc. This is a bit more gradual in terms of the rate
%   at which concepts are introduced}

% Why it is a pilot system.
Figure~\ref{fig:glidein_comparison} shows the mapping of the HTCondor Glidein
Service and GlideinWMS elements to the components and functionalities of a pilot
system as described in~\S\ref{sec:understanding}. The set of VO Frontends and
Glidein Factories alongside the WMS collector implement a Pilot Manager and its
pilot provisioning functionality. The set of Schedd, the Collector, and the
Negotiator implement a Workload Manager and its task dispatching functionality.
The Startd daemon implements a Task Manager alongside its task execution
functionality. A Glidein is a job submitted to a DCR middleware that, once
instantiated, configures and executes a Startd daemon. As such, a Glidein is a
pilot.

\begin{figure}[t]
    \centering
        \includegraphics[width=.48\textwidth]{figures/glidein_comparison.pdf}
        \caption{Diagrammatic representation of Glidein components,
          functionalities, and core vocabulary mapped on
          Figure~\ref{fig:core_vocabulary}.}
    \label{fig:glidein_comparison}
\end{figure}

% Execution model.
The execution model of the HTCondor Glidein system can be summarized in 9 steps:
1. the user submits a Glidein (i.e., a job) to a DCR batch scheduler; 2. once
executed, this Glidein bootstraps a Startd daemon; 3. the Startd daemon
advertises itself with the Collector; 4. the user submits the tasks of the
workload to the Schedd daemon; 5. the Schedd advertises these tasks to the
Collector; 6. the Negotiator matches the requirements of the tasks to the
properties of one of the available Startd daemon (i.e., a Glidein); 7. the
Negotiator communicates the match to the Schedd; 8. the Schedd submits the tasks
to the Startd daemon indicated by the Negotiator; 9. the task is executed.

GlideinWMS extends the execution model of the HTCondor Glidein system by
automating the Glideins provision. The user does not have to submit Glidein
directly but only tasks to Schedd. From there: 1. every Schedd advertises its
tasks with the VO Frontend; 2. the VO Frontend matches the tasks' requirements
to the resource properties advertised by the WMS Connector; 3. the VO Frontend
places requests for Glideins instantiation to the WMS Collector; 4. the WMS
Collector contacts the appropriate Glidein Factory to execute the requested
Glideins; 5. the requested Glideins become active on the DCRs; and 6. the
Glideins advertise their availability to the (HTCondor) Collector. From there on
the execution model is the same as described for the HTCondor Glidein Service.

%% 5 core properties.

% Pilot Resources.
The resources managed by a single Glidein (i.e., pilot) are limited to compute
resources. Glideins may bind one or more cores, depending on the target DCRs.
For example, heterogeneous HTCondor pools with resources for desktops,
workstations, small campus clusters, and some larger clusters will run mostly
single core Glideins. More specialized pools that hold, for example, only DCRs
with HTC, Grid, or Cloud middleware may instantiate Glideins with a larger
number of cores. Both HTCondor Glidein and GlideinWMS provide abstractions for
file staging but no pilot abstraction is offered for data or network resources.

% Pilot Deployment.
The process of pilot deployment is the main difference between HTCondor Glidein
and GlideinWMS. While the HTCondor Glidein system requires users to submit the
pilots to the DCRs, GlideinWMS automates and optimizes pilot provisioning.
GlideinWMS attempts to maximize the throughput of task execution by continuously
instantiating Glideins until the queues of the available Schedd are emptied.
Once all the tasks have been executed, the remaining Glideins are terminated.

% \jhanote{not sure what ``minimizes the amount of resource utilization''
%   means?}
% \mtnote{That uses as little resources as possible to execute the given tasks.
%   Better?}\jhanote{I propose removing, ``minimizes resource utilization'' and
%   emphasising what it controls actively?} \mtnote{OK. I am happy with the
%   current version.}

% Workload Semantics.
HTCondor Glidein and GlideWMS expose the interfaces of HTCondor to the
application layer and no theoretical limitations are posed on the type and
complexity of the workloads that can be executed~\cite{couvares2007workflow}.
For example, DAGman (Directed Acyclic Graph Manager)~\cite{frey2002condor} has
been designed to execute workflows by submitting tasks to Schedd, and a tool is
available to design applications based on the \MW coordination pattern.

% \jhanote{the second ``d'' in schedd is the daemon..} \mtnote{The word is
% capitalized so the ``d'' is part of the name of the component, not an
% abbreviation or a letter of an acronym.}
% \jhanote{what is the name of the master-worker tool?} \mtnote{It is called
% like that: `\MW tool'.}

% In practice, as HTCondor has been originally designed for resource scavenging
% and opportunistic computing, sin- gle or low-core, independent tasks are more
% commonly ex- ecuted than multi-core, possibly parallel tasks.

HTCondor was originally designed for resource scavenging and opportunistic
computing. Thus, in practice, independent and single (or few-core) tasks are
more commonly executed than many-core tasks, as is the case for OSG, the largest
HTCondor and GlideinWMS deployment. Nonetheless, in principle specific projects
may use dedicated installation and resources to execute tasks with larger core
requirements both for distributed and parallel applications, including MPI
applications.

% \jhanote{Matteo: OK?} \mtnote{Extrapolating from the literature I think so. We
%   may want to seek confirmation for an expert.}

% \jhanote{I'm not sure I understand the rationale of the last sentence?}
%   \mtnote{Get a cluster with multi-core compute nodes and fast interconnect
%   (and possibly add a bunch of separate machines/VMs); install all the
%   HTCondor stack + the GlideinWMS and the headnode (and possibly on the
%   separate machines); now you can run your distribute, parallel, MPI
%   applications on that cluster via Glideins. As far as I understand, there is
%   no intrinsic limitation built within HTCondor or GlideinWMS to the execution
%   of distributed, parallel, and MPI applications. Executing only distributed
%   applications, mostly with single-core tasks is a consequence of the type of
%   resources that are added to a Condor pool.} workload binding.

Both HTCondor Glidein and GlideWMS rely on one or more HTCondor Collectors to
match task requirements and resource properties, represented as
ClassAds~\cite{classad_url}. This matching can be evaluated right before the
execution of the task. In this way, both pilot systems allow for late binding.
Early binding instead is not available.

% \jhanote{is this match-maker the ClassAds, or is there another level of
%   match-makeing for the Glideins?} \mtnote{ There is lack of documentation but
%   I assume the Collector is the component where the matchmaking function is
%   implemented. ClassAds should be the name given to the entries published by
%   the Glidein Factories within the WMS collector (in the case of GlideinWMS
%   but it should be the same with the old HTCondor Glidein)}\jhanote{Out of
%   abundance of caution, ok to use a reference to ClassAds?} \mtnote{Better?}

% ------------------------------------------------------------------------------
% MyCluster
%
\subsubsection{MyCluster}\label{sec:mycluster}

% Who did it and for what scientific domains.
MyCluster~\cite{walker2007personal,mycluster_url} was originally developed at
the Texas Advanced Computing Center (TACC), sponsored by NSF to enable execution
of workloads on TeraGrid, a set of DCRs deploying Grid middleware. MyCluster
provides users with virtual clusters: aggregates of homogeneous resources
dynamically acquired on multiple and diverse DCRs. Each virtual cluster exposes
HTCondor~\cite{thain2005}, SGE~\cite{chase2003dynamic}, or
OpenPBS~\cite{openpbs_url} job-submission systems, depending on the user and use
case requirements.

% Architecture.
MyCluster is designed around three main components: a Cluster Builder Agent, a
system where users create Virtual Login Sessions, and a set of Task Managers.
The Cluster Builder Agent acquires the resources from diverse DCRs by means of
multiple Task Managers, while the Virtual Login Session presents these resources
as a virtual cluster to the user. A virtual login session can be dedicated to a
single user, or customized and shared by all the users of a project. Upon login
on the virtual cluster, a user is presented with a shell-like environment used
to submit tasks for execution.

% Why it is a pilot system.
Figure~\ref{fig:mycluster_comparison} shows how the components of MyCluster map
to the components and functionalities of a \pilot system as described
in~\ref{sec:compsandfuncs}: The Cluster Builder Agent implements a Pilot Manager
and a Virtual Login Session implements a Workload Manager. The Task Manager
shares its name and functionality with the homonymous component defined
in~\ref{sec:compsandfuncs}. The Cluster Builder Agent provides Task Managers by
submitting Job Proxies to diverse DCRs, and a Virtual Login Session uses the
Task Managers to submit and execute tasks. As such, Job Proxies are pilots.

\begin{figure}[t]
    \centering
        \includegraphics[width=.48\textwidth]{figures/mycluster_comparison.pdf}
    \caption{Diagrammatic representation of MyCluster components,
    functionalities, and core vocabulary mapped on
    Figure~\ref{fig:core_vocabulary}.}
    \label{fig:mycluster_comparison}
\end{figure}

% Execution model.
The execution model of MyCluster can be summarized in 5 steps: 1. a user logs
into a dedicated virtual cluster via, for example, ssh to access a dedicated
Virtual Login Session; 2. the user writes a job wrapper script using the
HTCondor, SGI, or OpenPBS job specification language; 3. the user submits the
job to the job submission system on the virtual cluster; 4. the Cluster Builder
Agent submits a suitable number of Job Proxies on one or more DCR; 5. when the
Job Proxies become active, the user-submitted job is executed on the resources
they hold.

%% 5 core properties
% Pilot Resources
Job Proxies hold compute resources in the form of compute cores. MyCluster does
not offer any dedicated data subsystem and Job Proxies (i.e. pilots) are not
used to expose data resources to the user. Users are assumed to stage the data
required by the compute tasks directly or by means of the data capabilities
exposed by the job submission system of the virtual cluster. Networking is
assumed to be available among the MyCluster components.

% Pilot Deployment
The Cluster Builder Agent submits Job Proxies to each DCR by using the GridShell
framework~\cite{walker2004gridshell}. GridShell wraps the Job Proxies
description into the job description language supported by the target DCR.
Thanks to GridShell, MyCluster can submit jobs to DCR with diverse middleware.

% Workload Semantics
MyCluster exposes to the user a virtual cluster with a predefined job submission
system. Moreover, pilots can have a user-defined amount of cores inter or
cross-compute node. As such, every application built to utilize HTCondor, SGE,
or OpenPBS can be executed transparently on MyCluster. This includes single and
multi-core tasks, MPI tasks, and data-driven workflows.

% workload binding
The jobs specified by a user are bound to the DCR resources as soon as Job
Proxies become active. The user does not have to specify on which Job Proxies or
DCR each task has to be executed. As such MyCluster implements both pilot and
DCR late binding.

% ------------------------------------------------------------------------------
% PANDA
%
\subsubsection{PANDA}
\label{sec:panda}

% Who did it and for what scientific domains.
\panda (Production and Distributed Analysis)~\cite{zhao2011panda} was
developed to provide a multi-user workload management system (WMS) for
ATLAS~\cite{aad2008atlas}. ATLAS is a particle detector at the Large Hadron
Collider (LHC) at CERN that requires a WMS to handle large numbers of tasks for
their data-driven processing workloads. In addition to the logistics of handling
large-scale task execution, ATLAS also needs integrated monitoring for the
analysis of system state, and a high degree of automation to reduce user and
administrative intervention.

\panda has been initially deployed as an HTC-oriented, multi-user WMS system for
ATLAS, consisting of ~100 heterogeneous computing sites~\cite{maeno2012pd2p}.
Recent improvements to \panda have extended the range of deployment scenarios to
HPC and cloud-based DCRs making \panda a general-purpose \pilot
system~\cite{nilsson2012recentrp}.

% Architecture.
\panda architecture consists of a Grid Scheduler and a \panda
Server~\cite{panda_architecture_url,maeno2011overview}. The Grid Scheduler is
implemented by a component called ``AutoPilot'' that submits jobs to diverse
DCRs. The \panda server is implemented by four main components: a Task Buffer, a
Broker, a Job Dispatcher, and a Data Service. The Task Buffer collects all the
submitted tasks into a global queue and the Broker prioritizes and binds those
tasks to DCRs on the basis of multiple criteria. The Data Service stages the
input file(s) of the tasks to the DCR to which the tasks have been bound using
the data transfer technologies exposed by the DCR middleware (e.g., uberftp,
gridftp, or lcg-cp). The Job Dispatcher delivers the tasks to the RunJobs
run by each Pilot bound to a DCR.

% Why it is a pilot system.
Figure~\ref{fig:panda_comparison} shows how PANDA implements the components and
functionalities of a pilot system as described in~\S\ref{sec:understanding}: the
Grid Scheduler is a Workload Manager implementing Pilot Provision while the
\panda Server is a Task Manager implementing Task Dispatching. The jobs
submitted by the Grid Scheduler are called ``Pilots'' and act as pilots once
instantiated on the DCR by contacting the Job Dispatcher component to request
for tasks to execute.

\begin{figure}[t]
    \centering
        \includegraphics[width=.48\textwidth]{figures/panda_comparison.pdf}
    \caption{Diagrammatic representation of PANDA components, functionalities,
    and core vocabulary mapped on Figure~\ref{fig:core_vocabulary}.}
    \label{fig:panda_comparison}
\end{figure}

% Execution model.
The execution model of PANDA can be summarized in 8
steps~\cite{nilsson2011atlas,pandarun_url}: 1. the user submits tasks to the
\panda server; 2. the tasks are queued within the Task Buffer; 3. the tasks
requirements are evaluated by the Broker and bound to a DCR; 4. the input files
of the tasks are staged to the bound DCR by the Data Service; 5. the required
pilot(s) are submitted as jobs to the target DCR; 6. the submitted pilot(s)
becomes available and reports back to the Job Dispatcher; 7. tasks are
dispatched to the available pilots for execution; 8. tasks are executed.

%% 5 core properties.

% Pilot Resources.
\panda pilots expose computational resources. Pilots are designed to expose
mainly a single core but extensions have been developed to instantiate pilots
with multiple cores~\cite{crooks2012multi}. The Data Service of \panda allows to
integrate and automate data staging within the task execution process but no
pilot-based data abstractions are offered~\cite{maeno2012pd2p}. Network
resources are assumed to be available but no network-specific abstractions are
made available.

% Pilot Deployment.
The AutoPilot component of \panda's Grid Scheduler has been designed to use
multiple methods to submit pilots to DCRs. The \panda installations of the US
ATLAS infrastructure uses the HTCondor-G~\cite{frey2002condorG} system to submit
pilots to the US production sites. Other schedulers enable AutoPilot to submit
to local and remote batch systems or to the GlideinWMS frontend. Submissions via
the canonical tools offered by HTCondor have also been used to submit tasks to
cloud resources via \panda.

% Workload Semantics.
\panda was initially designed to serve specifically the ATLAS use case and, as
such, to execute mostly single-core tasks with input and output files. Since its
initial design, the ATLAS analysis and simulation tools have started to
investigate multi-core task execution with AthenaMP~\cite{crooks2012multi} and
\panda has been evolving towards a more general purpose workload
manager~\cite{schovancova2014next,schovancova2013panda,borodin2015scaling}. As a
consequence, \panda is starting to offer experimental support for multi-core
pilots and tasks with or without data dependences. Now, \panda also supports
applications from a variety of science domains.\cite{maeno2014evolution}.

% workload binding.
\panda offers late binding but not early binding capabilities. Workload jobs are
assigned to activated and validated pilots by the \panda server based on
brokerage criteria like data locality and resource characteristics.

% ------------------------------------------------------------------------------
% RADICAL PILOT
%
\subsubsection{RADICAL-Pilot}
\label{sec:radical_pilot}

% Who did it and for what scientific domains.
The authors of this paper have been engaged in theoretical and practical aspects
of \pilot systems for several years. In addition to formulating the P*
Model~\cite{luckow2012} which by most accounts is the first complete conceptual
model of a pilot system, the RADICAL group is responsible for the development
and maintenance of RADICAL-Pilot\cite{merzky2015radical,rp_url}. RADICAL-Pilot
is the group's long-term effort for creating a production level \pilot system.
The effort is built upon the experience gained from developing and deploying
BigJob~\cite{luckow2010}, and integrating it with many
applications~\cite{ko2010efficient,kim2010exploring,ct500776j} on different
DCRs.

% Architecture.
RADICAL-Pilot consists of three main logical components: a Pilot Manager, a
Compute Unit (CU) Manager, and a set of Agents. The Pilot Manager describes
pilots and then submit them to DCR, while the CU manager describes tasks (i.e.
CU) and schedules them to one or more pilots. Agents are instantiated on DCRs
and execute the CUs pushed by the CU manager.

% Why it is a pilot system.
RADICAL-Pilot closely resembles the description offered
in~\S\ref{sec:understanding} (see Figure~\ref{fig:rp_comparison}). The Pilot
Manager and the Workload Manager are implemented by the CU Manager. The Agent is
deployed on the DCR to expose its resources and execute the tasks pushed by the
CU Manager. As such, the Agent is a pilot.

%  has been specifically designed to be a \pilot system and, as seen for example
% with DIANE, it

\begin{figure}[t]
    \centering
        \includegraphics[width=.48\textwidth]{figures/radicalp_comparison.pdf}
    \caption{Diagrammatic representation of RADICAL Pilot components,
    functionalities, and core vocabulary mapped on
    Figure~\ref{fig:core_vocabulary}.}
    \label{fig:rp_comparison}
\end{figure}

% Execution model.
RADICAL-pilot is implemented as a python module that a user can use to design
and code a distributed application. The execution model of RADICAL-Pilot can be
summarized in 6 steps: 1. the user describes tasks as a set of CUs with or
without data and DCR dependences; 2. the user also describes one or more pilots
choosing the DCR(s) where they should be submitted to; 3. The Pilot Manager
submits each pilot that has been described to the indicated DCR; 4. The CU
Manager schedules each CU either to the pilot indicated in the CU description or
on the first pilot with free and available resources; 5. when required, the CU
Manager also stages the CU's input file(s) to the target DCR; and 6. the Agent
executes the CU.

%% 5 core properties.
% Pilot Resources.
The Agent component of RADICAL-Pilot offers abstractions for both compute and
data resources. Every Agent can expose between one and all the cores of the
compute node where it is executed; it can also expose a data handle that
abstracts away specific storage properties and capabilities~\cite{luckow2010}.
In this way, the CUs running on a Agent can benefit from unified interfaces to
both core and data resources. Networking is assumed to be available between the
Agent and the Pilot and Workload Manager.

% Pilot Deployment.
The Pilot Manager deploys the Agents of RADICAL-Pilot by means of the
SAGA-python API~\cite{merzky2015saga}. SAGA provides access to diverse DCR
middleware via a unified and coherent API, and thus RADICAL-Pilot can submit
pilots to resources exposed by XSEDE and NERSC~\cite{nersc_url}, by the OSG
HTCondor pools, and many ``leadership'' class systems like those managed by
OLCF~\cite{olcf_url} or NCSA~\cite{ncsa_url}.

The resulting separation of agent deployment from DCR architecture reduced the
overheads of adding support for a new DCR~\cite{merzky2015radical}. This is
demonstrated by the relative ease with which RADICAL-Pilot is extended to
support (i) a new type of DCR such as IaaS, and (ii) DCRs that have different
middleware but essentially similar architecture, for example the Cray
supercomputers operated in the US and Europe.

% Workload Semantics.
RADICAL-Pilot can execute tasks with varying degree of coupling and
communication requirements. Tasks can be completely independent, single or
multi-threaded; they may be loosely coupled requiring input and output files
dependences, or they might be tightly coupled requiring low-latency runtime
communication. As such, RADICAL-Pilot supports MPI applications, workflows, and
diverse execution patterns such as simulation/analysis, Replica Exchange
simulations, or pipelines~\cite{emdtoolkit_url}.

% workload binding.
CU descriptions may or may not contain a reference to the pilot to which the
user wants to bind the CU. When a reference is present, the scheduler of the CU
Manager waits for a slot to be available on the indicated pilot. When a target
pilot is not specified and more pilots are available, the CU Manager schedules
the CU on the first pilot available. As such, RADICAL-Pilot supports both early
and late binding, depending on the use case and the user specifications.

% -----------------------------------------------------------------------------
% 4.3
%
\subsection{Implementations Comparison}
%Differences and Overall observations}
\label{sec:context}

\mtnote{Some of the auxiliary properties are missing from the following
analysis. More discussion about whether we have something relevant to say about
them is needed.}

% \paragraph*{HTC and HPC}

% The analysis offered in the previous subsection shows how diverse \pilot
% system implementations conform to the set of components and functionalities
% defined in~\S\ref{sec:understanding}. These sets are therefore minimal but
% also necessary, as the implementation descriptions showed that \pilot
% deployment, task submission, and task execution functionalities have to be
% implemented in order to enable the execution of task-based workloads by means
% of multi-level scheduling.

% The analysis of DIANE, HTCondor Glidein, GlideinWMS, PANDA, and RADICAL-Pilot
% also highlighted implementation differences, especially concerning the
% auxiliary properties of these \pilot systems. For example, based on the
% description of the auxiliary properties offered in~\S\ref{sec:auxprops}, the
% described \pilot systems showed not only evident variations in their
% architectures and data management capabilities, but also in their DCR
% interaction model, their interfaces, and interoperability. Here these
% differences are described, widening the scope of the analysis to other
% relevant auxiliary properties.

The descriptions offered in the previous subsection show how diverse \pilot
system implementations conform to the set of components and functionalities
defined in~\S\ref{sec:understanding}.  The descriptions of the seven \pilot jobs \note{these were really pilot systems, not pilot jobs}
also highlight implementation differences, especially concerning the auxiliary
properties of these \pilot systems. For example, the described \pilot systems
show variations in their architectures and data management capabilities, DCR
interaction model, interfaces, and interoperability.

% The goal is not only to highlight the differences among Pilot systems but also
% to show their implementation chal- lenges. In this way, the following analysis
% can contribute clarity and useful indications to the evolution process the Pi-
% lot systems implementations are undergoing. Furthermore, clarifying the
% differences among Pilot systems implementa- tions could also offer relevant
% indications on how and when a specific implementation should be adopted.

Clarifying the differences among \pilot systems implementations can offer
insight into how and when a specific implementation should be adopted, or
possibly adapted. Understanding the differences will also help appreciate the
challenges in implementing them.  Here these differences are described, widening
the scope of the analysis to relevant auxiliary properties.

% and to thereby highlight the challenges in implementing them.


% \jhanote{I propose the following sentence can be removed? If yes, then the
%   previous sentence can be merged with the previous paragraph, as opposed to
%   being stand-alone}

\jhanote{this we now do in the next section: as well as the co-evolution of
  \pilot system implementations and technology/infrastructure.}

% \jhanote{this is an interesting/important point. Are pilot systems evolving or
%   are they adapting to the evolution of technology/infrastructure etc?}
% \mtnote{Probably both?}

% Architecture (DONE in previous subsection)

Architecturally, the seven \pilot systems implement different types of design.
DIANE, DIRAC, and, to some extent, both PANDA and the Coaster System show a
monolithic design (Figures~\ref{fig:diane_comparison},
\ref{fig:dirac_comparison}, \ref{fig:panda_comparison}, and
\ref{fig:coaster_comparison}). Most of their functionalities are aggregated into
a single middleware component, designed as a service, with a more or less
modular implementation. For DIRAC and PANDA a dedicated hardware infrastructure
is assumed for a production-grade environment. Consistently with a
Globus-oriented design, the Coaster Service is assumed to be run on the DCR
resources and act as a proxy for both the pilot and workload-related
functionalities.

RADICAL-Pilot also adopts a monolithic architecture
(Figure~\ref{fig:rp_comparison}) but one implemented in user space \note{aren't most of the others also in user space?} and with a
library-oriented design. Both the pilot and workload functionalities are
implemented into a single python module that users load within their
applications. Users are free to decide where to deploy these applications,
either locally on their workstations or on dedicated hardware.

GlideinWMS requires full integration within the HTCondor ecosystem and therefore
also a service-based architecture but it departs from a monolithic design.
GlideinWMS implements a set of separate services
(Figure~\ref{fig:glidein_comparison}) that can be deployed with different
scenarios, depending on the amount of dedicated hardware available and on the
motivating use case.

% Coordination and communication
The seven \pilot systems described in the previous subsection display
differences also in their communication and coordination models. While all the
\pilot systems assume preexisting networking functionalities, the Coaster System
implements a dedicated communication protocol used both for coordination and
data staging. The Coaster System and RADICAL-Pilot both can work as
communication proxies among the \pilot system's components when connectivity is
not available on the DCR compute nodes. DIRAC, PANDA, MyCluster and, to some
extent, the Coaster System implement a coordination model between the Task and
the Workload Managers that allow for recovering task execution failures and
isolate under-performing or failing DCR compute nodes.

% Interoperability
The \pilot systems described in the previous subsection offer a varying degree
of interoperability across diverse DCRs. For example, GlideinWMS was designed to
execute tasks on the HTCondor based DCRs; DIANE, DIRAC, and PANDA were initially
designed to support DCRs used by the LHC experiments but not bound to HTCondor;
and MyCluster was designed specifically for TeraGrid resources. With the
progressive development of \pilot systems and DCR middleware, many types of DCR
are now supported: \note{by what?} the Coaster System, Glidein, PANDA, DIRAC and RADICAL-Pilot
all support diverse DCR middleware including typical HPC, grid, and cloud batch
systems.

% For example, DIANE and PANDA were initially designed to support mostly
% grid-based DCRs used by the LHC experiments, while Glidein to execute tasks on
% the HTCondor-based DCRs.

The need for interoperability \note{why is there a need for interoperability?} reinforces the importance of well-defined
separation between the \pilot system and heterogeneous DCRs, which in turn
supports the generality of the notion of resource placeholder and its
independence from the specificities of the target infrastructures and their
middleware. Placeholders abstract the notions of resources, scheduling, and task
execution creating a well-defined and isolated logical space for the management
of task execution. This isolated space is well represented in
Figures~\ref{fig:diane_comparison}--\ref{fig:rp_comparison} by the positioning
of each Task Manager implementation within the DCRs' compute nodes. It remains
to be understood under what conditions and for what type of application each
implementation of a placeholder is more appropriate.

%As seen in~\S\ref{fig:core_vocabulary}, this flexibility

% This also imposes decision capabilities on the task scheduler and the
% development of a

% Robustness
% Performance and Scalability
% Development Model

% DCR Interaction

% The \pilot systems described in the previous subsection present differences
% also in the two auxiliary properties: DCR Interaction and Interface.
% \jhanote{can we remove the previous sentence. And just have the following
%   sentences following on from the previous sentence?}

% Interfaces

\jhanote{I understand the rationale for explicitly separating out Glidein and
  GlideinWMS, but I would make a mention of Glidein and associated tools once
  somewhere, and then keep it simple and readable by using only Glidein}

The \pilot system \note{systems?} described in the previous subsection expose diverse interfaces
to the user. \jhanote{to what? for what?}\mtnote{Better?} DIANE, DIRAC,
GlideinWMS, MyCluster, and PANDA offer command line tools tailored to specific
use cases, applications, and DCRs. The Coaster System and RADICAL-Pilot expose
an API, and command line tools of DIANE, DIRAC, and PANDA are built on APIs that
users may directly access and use to develop distributed applications. Specific
types of user interfaces may be better suited for some projects with distinctive
requirements but for general-purpose \pilot systems they may lead to
fragmentation and duplication of effort. The adoption of Open APIs specifically
designed for general-purpose \pilot systems may be a viable solution to these
shortcomings.

\jhanote{I'm not sure I see the point of the last sentence} \mtnote{Better?}

\note{perhaps the last sentence and the next 2 paragraph should move to an advocacy section?}

% DIANE, HTCondor Glidein, GlideinWMS, and PANDA offer command line tools
% tailored

Open APIs like the one specified in~\cite{luckow2012towards} and implemented by
RADICAL-Pilot offer also the missing layer upon which distributed applications
could be built. \pilot systems can offer a well-defined and isolated layer
between applications and resources. This can foster extensibility,
interoperability, and modularity by separating the application logic from the
management of its execution, and from the provisioning and aggregation of
resources. In turn, this can help to avoid the need to develop special-purpose,
vertical, and end-to-end applications, the main source of duplication and
fragmentation in the current distributed application and tooling landscape.

It should be noted that leveraging this opportunity would also have direct
implications on the type of architectures best suited to general-purpose \pilot
systems. \pilot systems with a monolithic design could hinder the separation of
concerns among application logic, workload execution, and resource provisioning.
These three functional domains should not only be exposed via open APIs but also
kept separated at architectural level. In this way, components developed within
diverse projects and domains could share functionalities and interoperate,
therefore minimizing duplication and fragmentation.

\jhanote{The issue covered in this paragraph is (a) more than just an issue of
API but one of the power of well defined model/understanding of pilots, (b) a
punch of our paper and I would definitely put this into section 5} \mtnote{I
would be happy to start here and then bring the big picture back in 5 too. Would
that work for you? Is the current version any better now?}

% Security and multitenancy
Finally, the seven \pilot systems described in the previous subsection implement
different types of authentication, authorization, and accounting (AAA).  The AAA
required by the user to access their own pilots varies depending on the pilot's
tenancy. With single tenancy, the pilot can be accessed only by the user that
submitted it. As such, AAA can be based on inherited privileges. With
multitenancy, the \pilot system has to evaluate whether a user requesting access
to a pilot is part of the group of allowed users. This requires advanced
abstractions like VO and federated certificate authority~\cite{horwitz2002},
implemented by GlideinWMS and the Coaster Systems.

The credential used for pilot deployment depends on the target DCR. The AAA
requirements of DCRs are a diverse and often inconsistent array of mechanisms
and policies. \pilot systems are gregarious in the face of such a diversity as
they only need to present the credentials provided by the application layer (or
directly by the user) to the DCR. As such, the requirements for AAA
implementation within \pilot systems are minimal.

\jhanote{In general, we are missing an opportunity to use the figures/diagram in
  section 4 as an aid/tool to help (i) reinforce main points (ii) reiterate any
  differences between the different systems.} \mtnote{Better?}


% ------------------------------------------------------------------------------
% Section 4's concluding remarks
%
% \subsection{Overall observations}

% The descriptions and analyses offered in
% subsections~\ref{sec:implementations} and~\ref{sec:context} shows the
% relevance of at least three characteristics of
% \pilot system implementations: interoperability layers, automation of pilot
% provision, and type of implemented architecture.

% The interoperability layer enables the implementations of a Pilot Manager to
% submit pilots to diverse DCRs. As such general-purpose \pilot systems depend
% upon this layer and the capabilities that exposes.

% \begin{itemize}

% \item Importance on interoperability layers/libraries over DCRs' middleware
% (e.g., GANGA, SAGA)

% \item Importance of automating pilot provisioning as opposed to leave the task
% tot he user to develop deployment scripts.

% \item Remark consistency of implementation architectures with the one given in
% 3.

% \item Use three dimensions (D (dependency) C (coupling) and H (heterogeneity)
%   ) (D I H) to analyze types of workloads. \jhanote{Is ``I'' independent?.
%   Also we might want to add ``N'' -- cardinality, which captures the number of
%   tasks in the workload.}

% \item Accessory properties is what determine whether potential capabilities
% can be translate into practical executions. E.g. while Condor allows for
% execution of parallel applications, in practice the requirements in terms of
% software and configuration imposed on the DCR make very difficult or
% undesirable to expose resources from DCR with HTC-size compute nodes.

% \item Analyze different implementation methods underlying the differences in
% usability and requirements between frameworks and full-fledged applications.
% ``Some of systems we have discussed, like DIRAC, X, are inherently service
% oriented, or exposed as a service. On the other side of the spectrum, systems
% like DIANE and RADICAL-Pilot take the form of a library or a framework. From a
% formal standpoint, services can be abstracted by a library and a library can
% be abstracted by a service, so the main distinction between these
% architectures if the default form they come in and how they are meant to be
% deployed.''

% \item Other Pilot Systems: We have discussed only a limited number of
% \pilots systems that exist; this is purely a reflection of the limitations of
% space. We eschewed coverage in favour of favour of completeness of a the few
% \pilot systems. \jhanote{As discussed we'll have a closing paragraph or two,
%   about all the pilot systems that we don't discuss}


% \end{itemize}


% -----------------------------------------------------------------------------
% SECTION 5
% -----------------------------------------------------------------------------
%
\section{Discussion and Conclusion}
\label{sec:discussion}

% \jhanote{General note: (i) needs major reduction, (ii) needs 4.2 to be
%   complete before this section will get desired/needed punch, (iii) needs
%   elaboration of P*.}

% \jhanote{the structure of section 5 is: (i) revisit the myths. (ii)
% state clearly how section 3 and 4 help us define what a pilot is,
% the necessary and sufficient conditions (if possible), discuss the
% classifiers and apply/discuss them to the set of pilot systems here.
% (iii) then we go on to say motivate P*/pilot for data, (iv) discuss
% implications for WF systems and conclude with a summary for
% tools/sustainability/etc. At some point, we discuss how/why pilots
% are more than just pilots, eg can be RM layer for middleware,
% runtime framework etc}\mtnote{I have organized the following items on
% the lines you have traced, with some changes in the order and number
% of the items.}

% \mtnote{In Section 3 we use pilotjob. We will have to move to \pilot or decide
% to use the less correct pilotjob here too.}\mtnote{Following discussion, we
% decided to use \pilot. Section 3 has been updated, the other Sections will
% have to follow.}

Section~\ref{sec:understanding} offered a description of the minimal
capabilities and properties of a \pilot system alongside a vocabulary defining
``pilot'' and its cognate concepts. Section~\ref{sec:analysis} offered a
classification of the core and auxiliary properties of \pilot system
implementations, and the analysis of an exemplar set of them. Considered
altogether, these contributions outline the characteristics and properties of a
paradigm for the execution of tasks on distributed resources by means of
resource placeholders. This is the crux of the concept referred to as ``\pilot
paradigm''.

In this section, the properties of the \pilot paradigm are critically assessed.
The goal is to show the generality of this paradigm and how \pilot systems go
beyond special purpose solutions to improve the throughput of certain type of
workload. The section closes with a look into the future of \pilot systems
moving from the current state of the art.

% transcend %go beyond the implementation of

% speed up the execution of a

%and discussing the engineering and sociotechnical challenges that are being
%faced both by developers and target users.

% Once the breadth of the \pilot paradigm has been understood, it is
% contextualized by describing its relation with relevant domains such as
% middleware, applications, security, and the enterprise.  Finally,

% \jhanote{A suggestion for the organization of S5: Establish the generality of
%   the concept in 5.1 via a discussion of ``internal'' core properties (task
%   binding, execution...)  of the pilot (c.f. section 4.1). We move to 5.2 and
%   discuss the generality with respect to ``external'' core properties. (I am
%   not advocating for a formal bloat of terminology, but just as an internal
%   organizational principle). 5.2 than speaks to the resource types, possibly
%   workload semantics (might require shifting text around from 5.1) etc.. Also,
%   I know in 5.2 we are trying to highlight that the paradigm remains the same,
%   but the implementation of and application of the pilot paradigm differs.  It
%   can speak to auxiliary properties such as AAA, but we know they are already
%   fairly specific to an implementation, so we don't need to spend much real
%   estate on specific auxiliary properties, but possibly generalize the
%   challenges/reasons to the set of auxiliary properties.}

% \mtnote{Just an idea: Section 2 -- pilot abstraction; Section 3 --- pilot
% system; Section 4 --- Pilot implementation; Section 5 --- pilot paradigm.
% Abstraction: the conceptualization of pilot; system: the modeling of pilot
% conceptualization; implementation: the implementation of the model; paradigm:
% methods, tools, and procedures pertaining to/enabled by the use of pilots.
% What do you think? Happy to refine/change/abandon if you think it does not
% work.}

% -----------------------------------------------------------------------------
% 5.1
\subsection{The Pilot Paradigm}
\label{sec:paradigm}

% The \pilot paradigm has been successfully integrated within workflow systems
% to support execution of workloads with articulated data and task dependencies.
% Furthermore, both single and multi-core tasks can be executed by many existing
% \pilot systems, with some of them also offering resource-specific MPI support.
% As such, depending on the type of the given workload, the \pilot paradigm can
% be implemented to optimize different metrics of task execution other than
% throughput. \

% The \pilot paradigm is characterized and defined primarily by resource
% placeholding, multi-level scheduling, and late binding.

The generality of the \pilot paradigm may come as a surprise when considering
the requirement that has motivated most implementations, viz., to increase the
execution throughput of large workloads made of short running tasks. For
example, as seen in~\S\ref{sec:analysis}, \panda, or DIRAC were initially
developed to focus on either a type of workload, a specific infrastructure, or
the optimization of a single performance metric.

% As shown in this paper, this process of abstraction requires the understanding
% of the \pilot paradigm.

% and how multi-level scheduling works. % \pilot systems do not circumvent the
% infrastructure middleware but abstract away some of its properties in order to
% optimize one or more user-defined (performance) metrics. Multi-level
% scheduling does not replace the infrastructure-level scheduler either. As
% clearly illustrated in~\S\ref{sec:understanding}, resource containers are
% still created on the target infrastructure by means of that infrastructure's
% capabilities and the second level of scheduling is completely contained within
% the boundaries of the resource container.

% The \pilot paradigm identifies a type of software system with both general and
% unique characteristics.

The \pilot paradigm is general because it does not strictly depend on a single
type of workload, a specific DCR, or a unique performance metric. In principle,
systems implementing the \pilot paradigm can execute workloads composed of an
arbitrary number of tasks with disparate requirements. For example, as seen
in~\S\ref{sec:analysis}, \pilot systems can execute homogeneous or heterogeneous
bags of independent or intercommunicating tasks with arbitrary duration, data,
or computation requirements.

The same generality applies to both the types of DCR and of resource on which a
\pilot system can execute workloads. The descriptions presented
in~\S\ref{sec:implementations}, showed how \pilot systems already operate on
diverse DCRs. Originally devised for HTC grid infrastructures, \pilot systems
have been (re-)engineered to operate also on HPC and cloud infrastructures.

As seen in~\S\ref{sec:understanding}, the \pilot paradigm demands resource
placeholders but does not specify the type of resource that the placeholder
should expose. In principle, pilots can be placeholders for data or network
resources, either exclusively or in conjunction with compute resources.  For
example, in Ref.~\cite{luckow2014pilot} the concept of \pilotdata was conceived
to be fundamental to dynamic data placement and scheduling as \pilot is to
computational tasks.  With the advent of Software-Defined
Networking~\cite{kirkpatrick2013software} and User-Schedulable Network
paths~\cite{he2013software} in mind, the concept of ``Pilot networks'' was
introduced in Ref.~\cite{santcroos2012}.

% depending on the capabilities exposed by the middleware of the target
% infrastructure.

% Nonetheless, most of the \pilot systems illustrated in \S\ref{sec:analysis}
% are designed to expose only compute resources, i.e. the cores of the compute
% nodes for which the pilot is a placeholder. Tasks executed on these \pilots
% may also use data or network resources but mostly as accessories to their
% execution on computational resources.  It is important to note that the focus
% on computational resources is not mandated by the \pilot paradigm.

% It is important to note that many of the \pilot systems described in
% \S\ref{sec:analysis} have been adapted to work with different infrastructures
% without undergoing major re-architecturing. This is a further indication of
% the independence of the \pilot paradigm from infrastructural details,
% especially those concerning the type of container used to instantiate a
% placeholder.

Traditionally, \pilots have been thought of as a means to optimize the
throughput of single-core (or at least single-node), short-lived, uncoupled
tasks execution~\cite{pordes2007,sfiligoi2009,juve2010}. The analysis presented
in~\S\ref{sec:analysis} showed that such a view is restrictive: \pilot systems
can be used to optimize diverse types of workloads along multiple performance
dimensions. For example, \pilot systems have been successfully integrated within
workflow systems to support optimal execution of workloads with articulated data
and single or multi-core task dependencies. As such, not only can throughput be
optimized for multi-core, long-lived, coupled tasks executions, but also for
optimal data/compute placement, and dynamic resource sizing.

% As such, not only throughput can be optimized for multi-core, long-lived,
% coupled tasks execu- tions but optimal data/compute placement, and dynamic
% resource sizing can be achieved.

% As such, depending on the type of the given workload, the \pilot paradigm can
% be adopted to optimize different metrics of task execution other than
% throughput.

% Furthermore, workload with multiple, possibly tightly-coupled components may
% use a \pilot system to provide not only a single entry point to the DCR
% scheduler, but also a dynamic resource allocation that grows or shrinks
% depending on the required resources.

Thanks to the generality of the \pilot paradigm with respect to types of
workload, target DCR, and resource, \pilot systems offer a well-defined and
well-isolated layer between applications and resources. This fosters
extensibility, interoperability, and modularity by separating the application
description and logic from the management of its execution, and from the
provisioning and aggregation of resources. In turn, this avoids the need to
develop special-purpose, vertical, and end-to-end applications, which have been
the main sources of duplication and fragmentation in the current distributed
application and tooling landscape~\cite{dpa_surveypaper,dpagrid2009}.

% \jhanote{this is a repeat (that I placed/introduced) from 4.3, so if we keep
% this here we must fix in 4.3..}

% \jhanote{We need 1-2 paragraphs describing the relationship of this paper to
%   P* model/paper. Else we are going to get dinged by reviewer or cause
%   confusion to the reader. Where this paragraph goes is open for discussion.}

Appreciating the properties of the \pilot paradigm becomes necessary once
requirements of DCR interoperability, support for multiple types of workloads,
or flexibility in the optimization of execution are introduced.  The generality
of the pilot paradigm across workload, DCR, and resource types was first
discussed in Ref.~\cite{luckow2012}, wherein an initial conceptual model for
\pilot systems was proposed. As evidenced by the introduction of the \pilot
Architectural Pattern and the discussion in \S3 and 4, this paper significantly
enhances and extends that preliminary analysis of Ref.~\cite{luckow2012}.

% Satisfying those requirements demands abstracting the specificity of application
% characteristics, of the middleware and architectures of DCRs, and of the types
% of resources.

%
% The generality of the \pilot paradigm needs careful interpretation: \pilots are
% not just an alternative implementation of a scheduler, a workload manager, a
% batch system, or any other special-purpose software component. A \pilot system
% represents a type of software system in itself.

% and the independence of its
% implementations from other types of software, clarify also a potential
% misunderstanding:

\jhanote{These next two paragraphs are possibly nice ways to start rather than
  end this subsection. Thoughts?} \mtnote{Moved at the beginning.}

% \mtnote{To be expanded with better/further examples of why we want to run MPI
% with pilots.}\jhanote{multiple possibly tight-coupled components use a pilot
% to provide (i) single point of entry to the scheduler for multiple components,
% (ii) growing/shrinking of resources needed for (interacting) components i.e.,
% a different type of dynamic resource allocation} \jhanote{move to previous
% section where generality of workload is discussed} \mtnote{Done.}

% This specific system is used when abstracting one or more type of physical
% resources with well-defined spatial and time properties can lead to the
% optimization of one or more parameters of the workload execution.
% \jhanote{last sentence can be deleted or needs attention to make straight and
% simple.}

% Optimizing the usage of the resource containers, for example by minimizing
% idling time, is an implementation issue, not an intrinsic limitation of the
% \pilot paradigm.  Shifting the control over tasks scheduling away from the
% infrastructure middleware does not imply that the end-user will become
% necessarily responsible for an efficient resource utilization. A \pilot system
% can be implemented as a middleware component, as seen with HTCondor and its
% role as an integral part of the infrastructure middleware stack on some DCIs.

% The \pilot paradigm... Why it is a independent distributed system abstraction?
% Independent = unique characteristics and properties when compared to other
% distributed system paradigms. What ``other'' paradigms should we consider? In
% what sense the pilot paradigm is distributed? Why should we define the \pilot
% paradigm an abstraction?

% Because the \pilot can run as a different ``user'' than the ``owner'' of the
% \vocab{Task} there are challenges for many aspects of Authentication,
% Authorization, and Accounting (AAA). The issue mainly arises because
% multi-level scheduling decouples the DCI user from the task} owner. While
% technically one could make the choice not to be concerned about this
% decoupling and only be concerned with the DCI credential of the Pilot, in
% practice this does not hold true. [ref glexec].\mtnote{Need to be discussed
% and then extended.}

% \paragraph*{Pilot and the Enterprise}


% -----------------------------------------------------------------------------
% 5.2
%
\subsection{Future Directions and Challenges}
\label{sec:future}

The \pilot landscape is currently fragmented with a high degree of duplicated
effort and capabilities. The reasons for such a balkanization can be traced back
mainly to two factors: (i) the relatively recent discovery of the generality and
importance of the \pilot paradigm; and (ii) the development model fostered
within academic institutions.

As seen in~\S\ref{sec:history} and~\S\ref{sec:analysis}, \pilot systems were
each developed to serve a specific use case, e.g., they emerged as a pragmatic
solution for improving the throughput of distributed applications, and were designed
as local and point solutions. \pilot systems were not thought from their
inception as an independent and well-defined system, but, at best, as a module
within a specific framework. \pilot systems also inherited the development model
of the scientific projects within which they were initially developed. As a
consequence, these systems were not engineered to promote (re)usability,
modularity, well-defined interfaces, or long-term sustainability. Collectively,
this not only resulted in duplication of development effort across frameworks
and projects but also hindered the appreciation for the generality of the \pilot
abstraction, the theoretical framework underlying the \pilot systems, and the
paradigm for application execution they enable.

An important example of the generality of the \pilot systems is their ability to
support task parallel applications (and the many incarnations of task
parallelism such as many-task computing, ensemble-based computing,
compute-intensive map-reduce applications etc.) on current and future generation
of high-performance computers. The need for intervention to support the
concurrent execution of multiple tasks that comprise a single workload arises
due to the job-centric resource allocation and management viewpoint that is
still pervasive on high-performance computers. \pilots via placeholders provide
{\it application-level} control of the resources, where {\it application-level}
is a proxy for raising control of resources from the system level. This could be
the end-user application or an intermediary level, e.g., \pilots can serve as
run-time systems to support non-traditional and heterogeneous workloads on
supercomputers.  In fact, the ability to provide a run-time system to support
many short running tasks on Blue Gene computers was a motivation for the
development of Falkon~\cite{raicu2007falkon}.

%serve as important run-time systems to

% As seen in~\S\ref{sec:history} and~\S\ref{sec:analysis}, \pilot systems
% emerged as a pragmatic solution for improving the throughput of distributed
% applications, and designed as local and point solution.  \pilot systems were
% not thought from their inception as an independent and well-defined system,
% but, at best, as a module within a specific framework.  This not only promoted
% duplication of development effort across frameworks and projects but also
% hindered the appreciation for the generality of the \pilot abstraction, the
% theoretical framework underlying the \pilot systems, and the application
% execution paradigm they enable.

% \jhanote{there is some compactification that can be done by merging these two
%   paragraphs. Sentence that begins with, ``... Most of the early \pilot
%   systems were developed to serve a specific use case, often within the remit
%   of a specific research project.'' can replace the sentence, ``pragmatic
%   solution..''}

% \pilot systems inherited the development model of the scientific projects
% within which they were initially developed.  Most of the early \pilot systems
% were developed to serve a specific use case, often within the remit of a
% specific research project. As a consequence, these systems were not engineered
% to promote (re)usability, modularity, well-defined interfaces, or long-term
% sustainability.

% Furthermore, for the most part code was not build around a wider community,
% following recognized and widely-accepted coding standards.

% All this is likely to change with the progressive appreciation of the
% generality, richness, and versatility of the \pilot paradigm. The ongoing
% re-engineering of PANDA and DIRAC and the development of RADICAL-Pilot
% indicate that the \pilot paradigm is becoming the leading approach for large
% scale scientific applications. For example, PANDA supports the execution of 5
% million jobs weekly for the ATLAS project~\cite{Aad2008} and in the past year
% RADICAL-pilot has be used to execute [...]\mtnote{TODO: Ask Shantenu reference
% about RP usage on XSEDE} million jobs on XSEDE. \msnote{This sounds biased. I
% would think there are better examples of production usage than that of RP ...}
% \mtnote{This is based on Shantenu's report about RP-based experiments being
% the biggest user of XSEDE in the past year. If you have better examples, I
% would be happy to follow pointers to them.}

% \mtnote{we may want to add some data here, showing the staggering amount of
% resources consumed by means of these two pilot systems.}

% It should be noted that ``large scale scientific applications'' includes both
% distributed and parallel applications executed on diverse types of
% infrastructure.

% \jhanote{important point about removing artificial distinction between HPC and
%   HTC and why..but for later..} and its underlying notions of resource
%   placeholder and multi-level scheduling do not require for the application to
%   be distributed, parallel, MPI, closed-coupled, or communication independent.
%   This shows how historical distinctions between, for example, HPC and HTC are
%   progressively loosing their meaning when considering resource provisioning
%   and application execution by means of \pilot systems.

% It is relevant to stress how \pilot systems allow for the unification of the
% execution process of diverse type of applications outside the remit of the
% middleware of target resources. This offers the opportunity to overcome one of
% the main limitations encountered when developing other types of middleware,
% especially those supporting grid computing [cit].

% \paragraph*{Performance and Scalability}

% Scalability and performance are two important elements that should see further
% development in the \pilot system landscape. The ongoing increase of the scale
% required by scientific applications [cit] poses new challenges to the
% development of effective \pilot systems. Increase in scale brings the need to
% focus on federation of resources, integration of leadership machines, and
% scaling-out to cloud infrastructures. Furthermore, large scale executions make
% fault-tolerance and advanced mechanisms for fault-recovery essential. For
% example, \pilot systems need to support the rerun of those few tasks that
% failed among the millions that have been successfully executed.\msnote{How do
% we want to draw the line between pilot systems and WMS? The restart of task
% can be done on multiple levels.} \mtnote{You can read this as `the \pilot
% system needs to support fault-tolerance either directly or indirectly'. At 30
% pages, I would avoid further distinctions.} Analogously, the failure of a
% \pilot among the many instantiated must not be an unrecoverable event. In this
% context, resource discovery and dynamic handling of application execution will
% probably play an increasingly important role.

% With scale, performance becomes an increasingly relevant issue. Performance
% not only as in overall application throughput but, more subtly and yet not
% less relevant, also as in overheads imposed by the \pilot system on the
% application execution. Such overheads need to be quantifiable and accountable
% for. Especially in a context of heterogeneous types of applications, \pilot
% overheads may be relevant to decide what type of resource to target, how many
% pilots to instantiate, or whether a \pilot system should be used at all.

% \paragraph*{Generality or Optimization} The existence of many \pilot systems
% is tempting to frown at from a perspective of Not-Invented-Here-Syndrome.
% Workload management. [this has a clear link to the pilot and application layer
% paragraph]\mtnote{Happy to write this after discussing what was the original
% idea. Should we move this to future trends?}

The analysis offered in this paper indicates that the number of \pilot systems
actively developed can be reduced so as to avoid duplication while promoting
consolidation, robustness, and overall capabilities.  Nonetheless, this
conclusion should not be taken to an extreme. A single \pilot system should not
be elected as the only implementation worthy of development effort or adoption.
As with other software systems and middleware~\cite{bernstein1996} the problem
is not to eliminate special purpose systems in favor of a single encompassing
solution but it is, instead, having both of them, depending on the application
and use case requirements.  \note{awk} Another rationale \note{reason? argument?} against rigid consolidation is the
diversity in programming languages used, the different deployment models
required, and its implications for the interaction with existing applications.

This work suggests that there is critical commonality across \pilot
implementations and functionality that a set of \pilot ``building blocks''.
should be possible.  It is also possible to have an unambiguous description of
properties yielding in well-defined models of \pilot functionality. Collectively
these should be reflected in external/internal interfaces that both developers
and users can rely upon, which would help to find the balance between specific
and general implementations and thus the number of active \pilot systems.

The commonality amongst \pilot systems is arguably unique among other tools and
middleware, which thus allows \pilot systems to serve as an interesting
case-study into the underlying reasons for the proliferation of functionally
similar but otherwise distinct software systems. An analysis of software
decisions and development trajectory of the \pilot implementations could provide
insight into how sustainable software ecosystems might evolve.

%address the aforementioned issue of  that help

%to as a potential as a

%to serve as an experiment and
% For specific applications designed to run on a single type of infrastructure
% with a unique performance objective, a special purpose system might be
% warranted, but otherwise a conservative approach would be more efficient and
% effective.

\mtnote{I think this paragraph should be moved to 5.2 as it is part of an
  analysis and not of a summary of the overall contributions of the paper. This
  should help also with eliminating yet another reference to the sorry state of
  workflows - we did that already twice.} \jhanote{Lets discuss how to
  consolidate these remarks/observations}

The current state of the workflow systems~\cite{taylor2014} is a paradigmatic
example of the consequence of a lack of conceptual clarity: many workflow
systems have been implemented with significant duplication of effort and limited
means for extensibility and interoperability. One important contributing factor
to these limitations is the lack of suitable, open, and possibly standard-based
interfaces for the resource layer. Most workflow engines are developed with
proprietary solutions to access the resource layer; solutions that cannot be
shared with other engines and that often serve specific requirements, use cases,
and infrastructures.

% A multi-purpose, functionally encompassing \pilot system is desirable for all
% those use cases in which applications are heterogeneous in the type of
% computation, time, or space.

As argued in the previous section, the \pilot paradigm is agnostic towards the type
of application, application objective and resource upon which applications are
executed. % The type of matching between the requirements of tasks and the
% capabilities offered by the resources held by the pilot should not be
% mandated. For this reason,
Thus adopting the \pilot paradigm for the execution of increasingly diverse
applications on new infrastructure or different resources should be seen as a
set of often challenging implementation details more than a foundational issue
requiring new paradigms.

%They can be engineered so as to support
% different types of applications and application objectives. \pilots hold
% resources and the properties of such resources should also be left open to
% arbitrarily specification. Thus


% allow for tasks to be scheduled but the

% Moving away from the relationship between the \pilot paradigm and critical
% implementation details associated with its use in a scientific,
% production-grade computing environment, the \pilot paradigm has relevant
% implications also for the enterprise sector, which has seen an increased
% adoption of distributed computing.

% enabling the application to deploy their own application-level scheduling
% routines on top of Hadoop-managed storage and compute resources.

% With YARN managing the lower resources, the higher-level runtimes typically
% use an application-level scheduler to optimize resource usage for the
% application.

% This can be evidenced by the emergence of frameworks that encapsulate the
% \pilot paradigm. Hadoop~2~\cite{hadoop_url}, the second version of the
% affirmed infrastructure for data-intensive computing, While early versions of
% Hadoop were monolithic, tightly coupling the Map Reduce programming framework
% to the underlying infrastructure resource management, Hadoop~2 introduced the
% YARN~\cite{vavilapalli2013apache} resource manager for heterogeneous
% workloads. in addition to traditional MapReduce workloads In order to do so,
% YARN supports multi-stage scheduling: Applications need to initialize their
% so-called `Application-Master' via YARN; the Application Master is then
% responsible for allocating resources in form of so called ``containers'' for
% the applications. YARN then can execute tasks in these containers.
% Furthermore, frameworks built upon the \pilot paradigm are emerging for
% YARN~\cite{i,j}. For example, Llama~\cite{llama_url} offers a long-running
% application master for YARN designed for the Impala SQL engine.
% TEZ~\cite{tez_url} is a DAG processing engine primarily designed to support
% the Hive SQL engine allowing the application to hold containers across
% multiple phases of the DAG execution without the need to de/reallocate
% resources. REEF~\cite{chun2013reef} is a similar runtime environment that
% provides applications a higher-level abstractions to YARN resources allowing
% it to retain memory and cores supporting heterogeneous workloads. Independent
% of the Hadoop developments, Google's Kubernetes~\cite{bernstein2014containers}
% is emerging as an important container management approach, which not
% completely coincidently is Greek for the English term `Pilot'.

This can be evidenced \note{evidenced is a very big word for shown or seen} by the emergence of frameworks that encapsulate the \pilot
paradigm. Hadoop~2~\cite{hadoop_url}, the second version of the affirmed \note{affirmed?}
infrastructure for data-intensive computing, introduced the
YARN~\cite{vavilapalli2013apache} resource manager for heterogeneous workloads.
YARN supports multi-stage scheduling: Applications need to initialize their
so-called ``Application-Master'' via YARN; the Application Master is then
responsible for allocating resources in form of so called ``containers'' for the
applications. YARN then can execute tasks in these containers.
TEZ~\cite{tez_url} is a DAG processing engine primarily designed to support the
Hive SQL engine allowing the application to hold containers across multiple
phases of the DAG execution without the need to de/reallocate resources.
Independent of the Hadoop developments, Google's
Kubernetes~\cite{bernstein2014containers} is emerging as an important container
management approach. Not completely coincidently, Kubernetes is the Greek term
for the English ``Pilot''.

% Currently, no \pilot system exposes networking resources by means of
% placeholders but there is no theoretical limitation to the implementation of
% what may be called ``\pilot networks''. With the advent of Software-Defined
% Networking and User-Schedulable Network paths in mind, the concept was already
% hinted at in \cite{santcroos2012}.


% -----------------------------------------------------------------------------
% 5.3
%
\subsection{Contributions}
\label{sec:contributions}

\note{this is a very strange place to put a contributions section - perhaps this should be called summary?}

\jhanote{(i) we provide comprehensive historical and technical analysis of
  pilots, (ii) set the stage for a common conceptual model and implementation
  framework, (iii) provide insight and lessons for other tools and higher-level
  frameworks, such as Workflow and Workload management systems, (iv) but
  implications/insight for software sustainability/ecosystem, (v) pilots: a
  proxy for the limitations of resource management in distributed system (mostly
  a legacy of resource management from HPC)}

% \mtnote{Here the first crack at it. (i) Done. (ii) Done. (iii) Not sure. Added
%   also contributions of 4 and 5. }

This paper offers several contributions to support the understanding, design,
and adoption of \pilot systems. \S\ref{sec:history} provided an overview of both
the motivations that led to the development of the \pilot abstraction and its
early implementations, as well as an analysis of the many \pilot systems that
have been used to support scientific computing. These systems were clustered on
the basis of their capabilities to show the progressive evolution of
the \pilot abstraction.

%development process

The analysis provided in \S\ref{sec:history} also showed the heterogeneity of
the \pilot landscape and the need for a clarification of the basic components
and functionalities that distinguish a \pilot system. These were described in
\S\ref{sec:understanding} offering an architectural pattern to identify \pilot
systems and discriminate them from other systems. \note{awk} \S\ref{sec:understanding} also
contributed a well-defined vocabulary that can be used to reason consistently
about different implementation of the \pilot abstraction.

%type of middleware

Both \note{what are both?} contributions offered in \S\ref{sec:understanding} were then leveraged in
\S\ref{sec:analysis} to analyze a set of paradigmatic \pilot system
implementations. The shift from understanding the minimal set of components and
functionalities characterizing the \pilot abstraction to the comparison of
actual \pilot implementations required that core and auxiliary implementation
properties be delineated. Tables~\ref{table:core_properties}
and~\ref{table:aux_properties} summarize \note{present tense or past tense?} these contributions and can be used to
analyze a software system design, decide whether it is a \pilot system, and
assess the richness of its functionalities.

The work done in \S\ref{sec:history}, \S\ref{sec:understanding}, and
\S\ref{sec:properties} supported the comparative analysis of \pilot system
implementations offered in \S\ref{sec:implementations}. This contribution
outlined differences and similarities among implementations, showing how they
impact the overall \pilot system capabilities and their target use cases.
Thanks to these insights, it was possible in \S\ref{sec:discussion} to highlight
the properties of the \pilot paradigm.

This paper establishes the generality of \pilot paradigm and shows that a more
structured approach is needed to the conceptualization and design of its
software systems. The generality of this paradigm together with the types of
workload, resource, and performance indicates the fundamental role that it can
play to support task level parallelism at higher scales and on possibly multiple
and diverse DCRs. If appreciated, \note{appreciated?} the contributions of this paper offer an
analytical base for the improvement of existing \pilot systems implementations
and curtailing the need to create unsustainable partial implementations.

% . This paper establishes the \pilot paradigm by providing a conceptual
% framework in \S3; underscores the importance of the \pilot paradigm, as well
% as establishing the critical need for greater rigor in the conceptualization
% and design of software systems used as tools and middleware.

\mtnote{I would eliminate the reference to a better interoperability seeing your
  comments in the notes about it. Maybe hinting at better choosing among the
  available implementations?}\jhanote{Yes, I agree we shouldn't get into pilot
  interoperability. I meant interoperability across different DCR. I have made
  that explicit; please remove if you disagree with this}.

% When understood, it would show to the user communities, DCIs managers, and
% developers that converging towards a robust, interoperable, openly available,
% community developed and maintained \pilot system is a privileged way to
% progress the computational-intense research of many scientific fields.

\jhanote{a few sentences about how our work provides insight and example into
  possible models and ``reference architectures'' for different tools}


% -----------------------------------------------------------------------------
% NOTES
%

%\input{5.4}

% -----------------------------------------------------------------------------
% Acknowledgements
%
\section*{Acknowledgements}

{\footnotesize{This work is funded by the Department of Energy Award (ASCR)
    DE-FG02-12ER26115 and NSF CAREER ACI-1253644. We thank the many members of
    the RADICAL group -- former and current, for helpful discussions, comments
    and criticisms. We also thank members of the AIMES project for helpful
    discussions.}}

\subsubsection*{Author Contributions} MT was the primary author and co-organized
the paper with SJ. MS contributed to an early draft of parts of Section 2 and 4.
SJ and MT edited the paper.


% -----------------------------------------------------------------------------
% References
%
\bibliographystyle{ieeetr}
\bibliography{pilotreview}


\end{document}


% \amnote{why are task dependencies (which seems to be what you call
% 'workload semantics') a core property?  A pilot system is still a pilot
% system if it does not implement task dependencies (ahem! ;-).}

% \mtnote{A pilot system implementation without tasks is a useless pilot
% system. In this section we are not modeling (that was done in 3) but
% reviewing/analyzing existing, real world implementations designed and used to
% run tasks.}

% \amnote{sure, tasks are necessary.  But the paragraph explicitly
%     refers to 'relationships among tasks'.  More to the point, the
%     paragraph says: 'tasks ... are dispatched to pilots depending on
%     their semantics' -- I disagree with that: a pilot system does
%     not care about task semantics, ie. does not care if the task is
%     /bin/sleep or mdrun.  One \textit{can} implement schedulers
%     which respect task dependencies (which are part of the workload
%     semantics, but I don't see that as a core property: a stupid
%     round-robin scheduler (which does not care about any workload
%     semantics, neither in tasks not task dependencies) is perfectly
%     viable for a pilot system...}

% \mtnote{I agree. Indeed the kernel of the task is not mentioned among the
% semantic properties taken into consideration for the scheduling decision.
% Pilot systems that are not capable of ``dispatching decisions
% depends on the temporal and spatial relationships among tasks'', or ``the
% affinity between data and compute resources required by the tasks'', or ``and
% the type of capabilities needed for their execution'' are toy systems or mere
% conceptual possibilities that are not relevant in a section dedicated to pilot
% systems used to run real-life scientific workloads. Do you feel we should add
% a sentence specifying that we are not speaking about the intrinsic properties
% of the kernel of a task? Consider that we do not have the word `kernel'
% defined in 3, so in case of affirmative answer I may have to add that too.
% Not a problem if you consider this necessary.}


% \jhanote{Need thinking: Following on from discussion yesterday, is early/late
% binding a logical consequence of multi-level scheduling? Can we absorb the
% latter into the former?}\mtnote{Following the revision of S3, early/late
% binding seems to be related to having resource placeholder while multi-level
% scheduling seems to be the means to have placeholders. Both early and late
% binding refers to the same entities: tasks bound to pilots, i.e. place
% holders. Multi-level scheduling refers to two pairs of entities: container
% (i.e., jobs) scheduled to DCIs, and tasks scheduled to placeholders. This
% opens the issue of whether binding is a form of scheduling, whether binding
% and scheduling are the same. We may want to discuss about it.}

% \mtnote{The following paragraphs address: PJs have well defined semantics and
% model. PJs dont help with data intensive applications.}

% The scope of the \pilot paradigm encompasses multiple types of resource.

% Usually, the tasks executed on pilots leverage mainly the cores of the
% compute nodes for which the pilot is a placeholder. Tasks may also use
% data or network resources but mostly as accessories to their execution
% on computational resources.

% Local or shared filesystems may be used to
% read input and write output data while network connectivity may be
% required to download, for example, the code of the task executor.

% The elements of generality of the \pilot paradigm have been modeled in
% Ref.~\cite{Luckow:2008la}. This investigation was motivated by the desire to
% provide a single conceptual framework -- the so called \pstar Model -- that
% would be used to subsume the design characteristics of the various \pilot
% system implementations, a sample of which have been analyzed in detail in
% Section~\ref{sec:analysis}. \mtnote{Is this still the case? I think this has
% become one of the goals of this paper. How do you want to mention P* here?
% What do you want to say about it? Is the paragraph above sufficient?}
% Interestingly, the \pstar model was amenable and easily extensible to
% \pilotdata.  The consistent and symmetrical treatment of data and compute in
% the model led to the generalization of the model as the {\it P* Model of
% Pilot Abstractions}.

% \mtnote{The following paragraphs address: (i) PJs are just for HTC and they
% just circumvent job queuing delays; PJs unfairly game HPC queuing system;
% (ii) PJs have to be tied to specific infrastructure and infrastructures have
% to be tied to specific pjs; (iii) PJs are such a simple concept, it doesn't
% need more attention: Conversely, everyone should write their own PJ just
% because they can; Note: the second part of this statement is better addressed
% when discussing the fragmentation of the pilot landscape. (iv) PJs are
% stand-alone tools passive (system) tools, as opposed to user-space, active
% and extensible components of a CI;}


% These differences are well illustrated by looking at how \pilot systems support
% the execution of parallel and distributed applications.

% Both types of applications may have similar \pilot and task durations but their
% spatial dimension varies sensibly. Typically, parallel applications require
% tightly coupled computations [ref] that, in turn, need \pilots to be
% instantiated on infrastructures offering potentially large-core, large-memory
% compute nodes. These pilots are usually submitted by means of the infrastructure
% headnode, each pilot having at least as many core as those offered by a node of
% the target infrastructure.

% Different to parallel applications, distributed applications usually require
% loosely coupled or fully independent computations. Consequently, \pilots
% supporting their execution depend less on the size of the nodes of the target
% infrastructure and more on the overall number of possibly independent cores that
% they can hold. These \pilots can be instantiated also by means of decentralized
% services (e.g., WMS) aggregating resources from possibly heterogeneous
% infrastructures.

% to HPC and HTC presents practical differences. The ratio of tasks and \pilots
% might be similar for both type of infrastructures in the temporal dimension
% (i.e., the walltime of the \pilots and the duration of tasks execution) but
% the spatial dimension varies sensibly. Consider HPC as a system with $O(10)$
% tightly coupled sites. Typically, this system has a head/login/service node
% where the \pilot will operate from. For one site, the ratio of \pilots vs
% compute resources will typically be $O(1):O(Nodes)$ and therefore the ratio of
% \pilots vs tasks will be $O(1):O(Tasks)$. With the given $O(10)$ sites this
% leaves us with a \pilot:tasks ratio of at most $O(10):O(Tasks)$. When HTC is
% defined as a system with O(100) loosely coupled sites without a central node
% to foster these resources, the ratio of \pilots to compute resources will
% typically be closer to $O(1):O(Cores)$. These differences have not only
% implications on the type of applications that can be run, but also on the
% nature of the multi-level scheduling. In the HPC case there is a large degree
% of freedom for the \pilot layer scheduling while in HTC the scheduling on that
% level is much more predetermined.

% \paragraph*{MPI, OpenMP, etc.}


% As seen in~\S\ref{sec:history}, the abstraction
% of \pilot was progressively developed in a distributed context leaving
% under-explored the execution of parallel applications by means of \pilot
% systems. This is merely a historical byproduct, not an intrinsic limitation of
% the \pilot paradigm. For example, by leveraging localization, different flavors
% of MPI libraries can be loaded and specific execution commands may be used to
% run MPI applications via a task wrapper and a \pilot agent.


% In the \pstar model and in the discussion so far we have mainly been
% concerned about the abstract notion of task, i.e., the application payload to
% be run on the DCI. While this abstraction is very natural, it does pose
% challenges in practice when we are operating in a real-life heterogeneous
% environment, i.e. what does ``run'' mean \ldots. And if we are able to define
% what it means, how do we pass that information down the stack?\mtnote{How
% this relate to MPI, OpenMP?}

% \paragraph*{Pilot and Middleware Layer}

% \jhanote{In support of generality of workload, we should mention multiple
% tightly-coupled workloads cf next section} \mtnote{Done}
% The benefits of executing tightly-coupled parallel code via \pilot systems are
% especially evident when considering workloads involving homogeneous tasks and
% possibly diverse computational phases. For example, applications with
% separated simulation and analysis phases [ref], or applications integrating
% multiple types of analysis [ref], \ldots.