intro.tex

\tightsection{Introduction}
\label{sec:intro}

\comment{
\begin{packeditemize}
	\item Increasing demands for high QoS and performance.
	\item Limitation of current tenent of Internet/application protocols.
	\item Quality depends on prediction accuracy.
	\item Accurate prediction for video quality is possible through sharing clients' information.
	\item Sharing clients' information is feasible.
	\item Contribution of this paper.
\end{packeditemize}
}


Today's Internet applications are demanding better-and-better Quality of Service (QoS) and performance. Examples of such applications include video streaming, hosted productivity applications, social networks, and on-line multiplayer games. These requirements will only increase in the future with more and more applications and services being hosted, and an ever increasing demand for higher quality video (e.g.\ 4K video~\cite{4kvideo}).

Most existing protocols today, either at the system level (e.g.\ TCP) or the application level (e.g.\ file transfer and video streaming), assume two fixed end points and varying resource availability between or at the two end points.  The key mechanism to achieve better performance is to rapidly react to congestion and/or changes in the resource availability along the path or at the end-hosts. 

As applications and network services become more sophisticated, new opportunities are emerging to further improve quality of service: 

\myparatight{More choices of application end points and paths}  Most scalable Internet services are implemented using many and usually geographically distributed servers.  For each application session, one or multiple of the many servers can be selected at the beginning of the session.  If the duration of the application session is long, it is possible to switch servers during the life time of the session. The flexibility of selecting servers (and thus Internet paths) at the start and throughout the life time of a session enables new opportunities for performance optimization. 

\myparatight{Many configuration parameters.} As protocols become more sophisticated, there are many parameters that need to be configured either at the start and throughout the life time of the session. Even for a traditional protocol like TCP, the initial window size needs to be configured before the reactive adaptation algorithm kicks in. For an adaptive bit rate video streaming protocol, the initial bitrate needs to be selected before the bitrate adaptation algorithm can start to adjust the bit rate. 

In this paper, we observe that while reactive protocols have served us well for decades, 
they have three inherent limitations that make them ill-suited for the ever increasing demand for better quality of service: 

\myparatight{Suboptimal initial configurations} Existing protocols typically use \emph{static} configuration parameters that are often suboptimal. For example, adaptive streaming protocols usually start with a statically configured bitrate. If this bitrate is too low, the protocol might not be even able to reach the optimal rate by the time the video has ended (e.g., for a 30s or 60s news clip). Similarly, in many cases, the initial window size of TCP is too small, which may cause a transfer to take far more RTTs than necessary. 

\myparatight{Suboptimal decisions} When these protocols react, they don't always make the optimal decisions, which may further impact user experience. For example, in case of congestion, an adaptive bitrate protocol may switch up to a bitrate that cannot be sustained, and, as a result, the user may experience rebuffering.

\myparatight{Large configuration space} As protocols and applications become more sophisticated the number of configuration choices increases dramatically. For instance, with a video application one can select the initial bitrate, the CDN, and at a finer granularity the proxy or the web server from which to stream the content. This makes it hard, or even infeasible, for a reactive protocol to explore the configuration space and select the best configuration.

%\myparatight{Suboptimal decisions:} by the time these protocols react, it might be already too late to mask the quality problems to users. For example, if the TCP loses a packet in the initial window, this will trigger a few seconds retransmission timeout that will impact the user experience. 

%In this paper we make three arguments. First, we argue that to address these challenges one needs to accurately predict the outcome of making a particular choice, e.g., would a stream be able to sustain a particular bitrate? Would a TCP connection experience any loss given a particular initial window size? In theory, this would allow protocols to use ``optimal'' configuration parameters and make ``optimal'' decisions, e.g., pick the largest sustainable bitrate for a video stream,  or pick the largest window size for which a TCP connection won't experience congestion losses.

In this paper we make three arguments. First, given the fundamental limitations of reactive approaches, we argue for the alternative \emph{predictive} approach, where we need to accurately predict the outcome of making a particular choice, e.g., would a stream be able to sustain a particular bitrate? Would a TCP connection experience any loss given a particular initial window size? In theory, perfect prediction would allow protocols to use ``optimal'' configuration parameters and make ``optimal'' decisions. For example, it would be possible to pick the largest sustainable bitrate for a video stream, or the largest window size for which a TCP connection won't experience congestion losses.

Second, in order to accurately predict the outcome of a given choice, one may be tempted to use an analytical approach to model the environment or Internet. However, we believe this is infeasible. Thus we argue for a \emph{data-driven} empirical approach to leverage the information available from other streams or connections, i.e., use the performance experienced by other ``similar'' sessions to predict the performance of another session. For example, if multiple sessions located at some organization can sustain 2Mbps when streaming from a CDN, then it's likely a new session from the same organization will be also able to stream at 2Mbps from the same CDN.

Third, given the large number of attributes that impact a session's performance (protocol, device, OS, firmware, ISP, and geographic location, to name a few), even if we have access to a very large number of sessions, it is hard to find enough sessions that exactly match all the attributes of the session whose performance we wish to predict. To address this challenge we use techniques from statistics that draw information from merely-similar sessions.  This amounts to careful \emph{aggregation} of sessions with similar attributes.  The attributes along which we perform the aggregation are an important determinant of prediction accuracy.

To this end, we propose a global control plane architecture that continuously collects information about the performance of existing sessions and use this information to maximize performance of other sessions. There are several challenges to implement such an architecture. First, we need the ability to collect information from a large number of streams. Second, we need to process this information and make decisions in real-time. Third, to make these decisions, we need to accurately model the session performance. 


In this paper, we present an early instantiation of such a control plane architecture for video streaming, called video Global Optimization (GO), which addresses the above challenges.  Video traffic is singularly important as it dominates the Internet traffic today, and this domination will only intensify in the foreseeable future. With GO, each streaming video client continuously sends quality related information (e.g., current bitrate, re-buffering, start time) to a backend. This backend processes the quality information in real-time and based on this information provides hints to clients about the best bitrate or server to start with or switch to. We present the system architecture and discuss in detail the challenges and the solution for making accurate predictions based on the aggregate quality information. For ease of deployment in today's Internet eco-system, we make two simplifying decisions in the first implementation of GO.  First, GO selects server and Internet path at the coarse granularity of CDN level, but not at the fine granularity of the server level.  Second, GO makes decisions only at the start of a video session, but not in the middle of the lifetime of the session.  Together, these reduce both the frequency of and the number of choices available for GO.  The next version of GO will remove both simplifications. Even with these simplification decisions, our early experience deploying GO to optimize video streaming from one site shows that it is a promising first step towards the full global control plane. 
   
The rest of the paper is organized as follows. Section~\ref{sec:overview} presents the motivation and overview of GO. Section~\ref{sec:challenges} discusses the challenges of designing a prediction algorithm, while Section~\ref{sec:prediction} presents our solution, i.e., the GO algorithm. We evaluate GO's performance in Section~\ref{sec:improvement}. In Section~\ref{sec:impl}, we present the implementation details, and we share some early results from a real world deployment in Section~\ref{sec:eval}. 
%\xil{one high level comment is that we are mentioning TCP a lot, but we have nothing to follow on. Maybe we can change the tune a bit to just have video as the main thread, and explicitly spell out TCP or SDN as analogy}