background.tex

\tightsection{Motivation and System Overview}
\label{sec:overview}

In this section, we present the motivation and an overview of our
system design.

The tenet of our work is that improving application performance in
today's Internet requires one to predict the outcome of different
decisions that the client or the server can make to optimize video
delivery. 

In Section~\ref{subsec:predictivemodel}, we argue that in order to
predict these decision outcomes, we need a global control architecture
that (1) continuously collects performance information of ongoing
sessions, and (2) use this information to create an accurate
\emph{model} that predicts the performance of a session for a give
configuration setting or protocol decision (see
Figure~\ref{fig:global-control--overview}).  Performance information
can be collected at the clients, at the servers, or at different
network elements such as routers or caches.  The decisions can involve
any element in the control plane. For example, with video delivery,
decisions may include CDN selection, video bitrate adaptation, video
encoding profile, and.  Being able to predict the outcome of
every possible decision enables us to make the best decision.

In Section~\ref{subsec:gooverview}, we present the overview of GO, an 
early instantiation of the global control plane in video delivery.
We present the quality metrics and attributes used in this paper in 
Section~\ref{subsec:dataset}.

\begin{figure}[h!]
\centering
 \includegraphics[width=0.5\textwidth] {figures/global-control-overview.pdf}
\tightcaption{Global control plane.}
\label{fig:global-control--overview}
\end{figure}

\tightsubsection{Predictive Decision Model}
\label{subsec:predictivemodel}
At the high level, any protocol designer aims to implicitly or
explicitly build a model of the environment (or world) in which the
protocol operates. Then the protocol takes the decision that optimizes
the utility of the predicted outcome.  For example, if it were
possible to predict whether packet loss would occur given a particular
TCP initial window size, we can maximize the QoS and user experience
by choosing the highest initial window size that wouldn't cause a
packet loss.  Similarly, if it were possible to predict whether a
client could sustain a particular bitrate, we can choose the highest
sustainable bitrate.

The challenge is to come up with a model that accurately describe the
world. Unfortunately, coming up with an analytical model of a complex
environment such as Internet is infeasible. As a result, we must use
direct measurements or observations of real world to inform our
model. This is the approach we take in this paper. In particular, we
use the performance of existing clients to predict the performance of
another client. For example, when a new client starts streaming a
video, we can use the quality experienced by other clients to predict
the quality that would be experienced by the new client when streaming
at a given bitrate. Based on this prediction, we can then select the
optimal bitrate for the new client.  In TCP protocol design, we might
gather data about many TCP connections to discover the distribution of
packet loss rates for different initial window sizes.  
In this paper, we refer to such model as a {\it predictive decision} model.

% ION: move to discussion section
% We note that predictive decision model which requires sharing the
% information across clients at a global scale is naturally enabled by
% today's service architectures where a very large number of clients
% are typically connected to a backend at any given time. Google,
% Facebook, Microsoft, Yahoo!, and Twitter are just a handful of sites
% that can observe the quality and the performance experienced by a
% huge number users at any give time. Thus, maximizing the QoS of
% existing applications using predictive decision model does not
% require new architectures or building new large scale systems; it
% can be done in the context of the already existing artifacts.
% However, due to the complexity of networks -- in particular their
% temporal and spatial heterogeneity -- we find it is necessary to
% bring a degree of statistical sophistication to network modeling
% algorithms.


\tightsubsection{GO System Overview}
\label{subsec:gooverview}

In this paper, we apply the idea of predictive decision model to video
quality optimization in the context of a large site that manages the
video delivery of many premium content brands such as HBO, MLB, and
ESPN. Video traffic is singularly important as it dominates the
Internet traffic today, and this domination will only intensify in the
forseeable future.

As a first step in realizing the vision of the predictive decision and control 
platform, we have built a wide area distributed system, called video Global 
Optimization (GO). As shown in
Figure~\ref{fig:go-overview}, GO consists of two main components: (i)
a backend that collects and process the information about the video
quality across all clients, and (ii) a client library that collects
quality information at each client and sends it back to the backend. In
particular, the client library monitors the states of player and
network condition, summarizes them in the form of \emph{quality
  samples} (see details in \Section~\ref{subsec:dataset}) and send
these samples back to the GO backend.

\begin{figure}[h!]
\centering
 \includegraphics[width=0.5\textwidth] {figures/go-overview.pdf}
\tightcaption{Architecture of GO.}
\label{fig:go-overview}
\end{figure}
 
The GO backend uses a highly-scalable distributed streaming system to
process the quality samples received from clients in real time, and
predict the quality outcomes of future sessions for each possible
decision.  The existing GO system makes two decisions for each
session: the initial or starting {\it bitrate} (most videos today are
encoded at multiple resolutions), and the initial CDN to stream the
content from (many content providers today are on or moving towards
multi-CDN setup). The initial CDN and bitrate are chosen from a
pre-determined set of options and they can be changed at any point
during a session. Note that to play a video that is encoded at $n$
different bitrates and stored on $m$ CDNs, there are $m\times n$
options to chose from.
%Readers may refer to~\cite{conext12} for more details.

To pick one of these options, GO predicts the quality metric for each
option, and then picks the one that corresponds to the highest
quality. In particular, GO uses real-time quality-related information
from every client currently streaming video to predict the quality of
another user if she were streaming from a particular CDN at a
particular bitrate, and use this prediction to select initial bitrate
and CDN for a new user.

%\xil{quality metric is vague, we need to setup a notion of utility function.}

%Since there are multiple quality metrics, GO combines them
%into a single figure of merit using a \emph{utility function}\jc{We do
%  not really use the notion of utility function now. Remember to echo
%  this point later on.}.  For example, a utility function may return a
%linear combination of several quality metrics.

\tightsubsection{Quality Samples} 
\label{subsec:dataset}

As explained above, the client library collects various quality
information, summarizes this information in quality samples, and send
these samples to the GO backend. More precisely, a quality sample
contains a set of quality measurements and attributes. Based on the
quality measurements, GO computes a set of quality metrics. In this
paper, we focus on four industry-standard video quality metrics that
have been shown to impact user's engagement~\cite{sigcomm11}:
%\xil{naming consistency: use re-buffering ratio}
\begin{packedenumerate}
\item \emph{Buffering ratio:} The percentage of time a session spends
  in buffering state, i.e., waiting for the player's buffer to
  replenish with enough data to continue the playback.
%Prior work has shown that buffering ratio is a key metric that impacts user engagement~\cite{sigcomm11}.
\item \emph{Join time:} The time it takes to start playing the video
  from the time the user clicks the ``play'' button.
% While join time may not directly impact the amount of a specific video viewed,
% it does have long term effects as it reduces the likelihood of repeated
%visits~\cite{sigcomm11,akamai-imc12}. 
\item \emph{Average bitrate:} Many of today's video players support
  adaptive bitrate switching to effectively react to changes in the
  bandwidth availability. The average bitrate of a session is simply
  the time-weighted average of the bitrates used in a given session.
%(Bitrate refers to the video playback rate, rather than throughput or download rate.)
\item \emph{Start failures:} Some sessions fail to start playing due
  to various reasons, including content unavailability, or CDN server
  overload.
% video; either the content is not available on the CDN server or the CDN is
%under overload or other unknown reasons. We mark as a session as a join failure
%if no content was played during this session.\footnote{Start failures are
%reported by the client-side measurement module that sends a ``heartbeat'' on
%the player status.}
\end{packedenumerate}

% Join time and start failures are valid metrics only for those
% sessions that just join the session.

In addition, a quality sample contains a large number of client and
video session attributes. Table~\ref{tab:attributes} summarizes the attributes information.

\comment{
\begin{packedenumerate}
\item \emph{ASN:} The Autonomous System Number (ASN) that the client
  IP belongs to. Note that a single ISP (e.g., Comcast) may own
  different ASNs both for management and business reasons.

% We focus on the ASN as it is more fine-grained than the ISP
% granularity.  We observe in aggregate \fillme unique ASNs spanning
% multiple countries.
%\xil{naming: change to video asset}
\item \emph{Object:} The video asset or object being streamed.
%We observe in aggregate \fillme objects.

\item \emph{Content provider (Site):} The video site providing the
  content. We have 279 content providers that span different video
  genres.  We use the terms site and content provider interchangeably.

\item \emph{Initial CDN:} Thw CDN that the video session starts with.
  We see 19 unique CDNs spanning popular CDN providers as well as
  several in-house and ISP-based CDNs. 
%(Some providers use proprietary
%  CDN switching logic; in this case we pick the segment of the session
%  with the CDN used for the longest duration.)

\item \emph{Initial Bitrate:} The bitrate the video session
  starts at. 730 of the video objects we
  observe have multiple bitrates.

\item \emph{OS:} The OS of the client device, which in many cases
  determines the video player. 
  %For example, when watching the same
  %video, iOS clients use different player than other OSes.

\item \emph{Connection type:} Type of the last mile connectivity,
  e.g., Wireless, DSL, fiber-to-home. This information is obtained
  from a third-party commercial service based on the client's IP
  address.
\end{packedenumerate}
}

  In this paper, we report results from collecting quality samples
  from over 800 million sessions or views (both successful and
  failed) over a duration of a month.


\begin{table}[h!]
\begin{center}
\begin{small}
\begin{tabular}{p{1.7cm}|p{2.8cm}|p{2.8cm}}
    ~                          & Description                                                       & \# of unique values                                                          \\ \hline
    ASN                        & The ASN that the client IP belongs to  & 15K unique ASNs from multiple countries                                  \\ \hline
    Object                     &  The video asset or object being streamed.                  & 80K                                                                           \\ \hline
    Content provider (Site)    & The video site providing the content.                             & 279                                                                          \\ \hline
    Initial CDN                &  	The CDN that the video session starts with.                & 19 unique CDNs, (popular CDNs, in-house and ISP-based CDNs) \\ \hline
     Initial Bitrate     &  		The bitrate the video session starts at          & 15K video objects with multiple bitrates                          \\ \hline
     Connection type     &  	Type of the last mile connectivity                         & 7, including WiFi wireless, DSL, fiber-to-home                                    \\
\end{tabular}
\end{small}
\end{center}
\tightcaption{Client and session attributes included in each session.}
\label{tab:attributes}
\end{table}