draft-ietf-rmcat-eval-criteria-12.xml

<?xml version="1.0"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY rfc2119 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY rfc3550 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3550.xml">
<!ENTITY rfc3551 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3551.xml">
<!ENTITY rfc3611 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3611.xml">
<!ENTITY rfc4585 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4585.xml">
<!ENTITY rfc5506 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5506.xml">
<!ENTITY rfc5166 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5166.xml">
<!ENTITY rfc5033 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5033.xml">
<!ENTITY rfc8593 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.8593.xml">
<!-- <!ENTITY rfc5681 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5681.xml"> -->
<!ENTITY rfc8083 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.8083.xml">
<!ENTITY I-D.ietf-rmcat-cc-requirements PUBLIC ""
"http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-rmcat-cc-requirements.xml">
<!ENTITY I-D.ietf-avtcore-rtp-circuit-breakers PUBLIC ""
"http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-avtcore-rtp-circuit-breakers.xml">
<!ENTITY I-D.ietf-netvc-testing PUBLIC ""
"https://xml2rfc.tools.ietf.org/public/rfc/bibxml3/reference.I-D.ietf-netvc-testing.xml">
<!ENTITY I-D.ietf-rmcat-eval-test PUBLIC ""
"http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-rmcat-eval-test.xml">
<!ENTITY I-D.ietf-rmcat-wireless-tests PUBLIC ""
"http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-rmcat-wireless-tests.xml">
]>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<?rfc toc="yes" ?>
<?rfc compact="yes" ?>
<?rfc symrefs="yes" ?>
<rfc ipr="trust200902" docName="draft-ietf-rmcat-eval-criteria-12" category="info">
    <!-- What is the category field value-->
    <front>
        <title abbrev="Evaluating Congestion Control for RMCAT">
            Evaluating Congestion Control for Interactive Real-time Media
            <!--Evaluation Criteria for RTP Congestion Avoidance Techniques -->
        </title>

        <author initials="V." surname="Singh" fullname="Varun Singh">
          <organization abbrev="callstats.io">
            CALLSTATS I/O Oy
          </organization>
          <address>
            <postal>
              <street>Runeberginkatu 4c A 4</street>
              <code>00100</code> <city>Helsinki</city>
              <country>Finland</country>
            </postal>
            <email>varun.singh@iki.fi</email>
            <uri>
              https://www.callstats.io/about
            </uri>
          </address>
        </author>

        <author initials="J." surname="Ott" fullname="Joerg Ott">
          <organization>Technical University of Munich</organization>
          <address>
            <postal>
              <street>Faculty of Informatics</street>
              <street>Boltzmannstrasse 3</street>
              <city>Garching bei München</city>
              <region>DE</region>
              <code>85748</code>
              <country>Germany</country>
            </postal>
            <email>ott@in.tum.de</email>
          </address>
        </author>

        <author fullname="Stefan Holmer" initials="S." surname="Holmer">
          <organization abbrev="Google">Google</organization>
          <address>
            <postal>
              <street>Kungsbron 2</street>
              <code>11122</code>
              <city>Stockholm</city>
              <country>Sweden</country>
            </postal>
            <email>holmer@google.com</email>
          </address>
        </author>

        <date year="2020" month="2"/>
        <area>TSV</area>
        <workgroup>RMCAT WG</workgroup>
        <keyword>RTP</keyword>
        <keyword>RTCP</keyword>
        <keyword>Congestion Control</keyword>
        <abstract>
            <t>The Real-time Transport Protocol (RTP) is used to transmit
            media in telephony and video conferencing applications. This
            document describes the guidelines to evaluate new congestion
            control algorithms for interactive point-to-point real-time
            media.</t>
        </abstract>
    </front>
    <middle>
        <section title="Introduction">

            <t>This memo describes the guidelines to help with evaluating
            new congestion control algorithms for interactive
            point-to-point real time media. The requirements for the
            congestion control algorithm are outlined in <xref
            target="I-D.ietf-rmcat-cc-requirements" />). This document
            builds upon previous work at the IETF: <xref
            target="RFC5033">Specifying New Congestion Control
            Algorithms</xref> and <xref target="RFC5166">Metrics for the
            Evaluation of Congestion Control Algorithms</xref>.</t>

            <t>The guidelines proposed in the document are intended to help
            prevent a congestion collapse, promote fair capacity usage and
            optimize the media flow's throughput. Furthermore, the proposed
            algorithms are expected to operate within the envelope of the
            circuit breakers defined in <xref target="RFC8083">RFC8083</xref>.</t>

            <t>This document only provides the broad set of network
            parameters and and traffic models for evaluating a new
            congestion control algorithm.  The minimal requirements
            for congestion control proposals is to produce or present
            results for the test scenarios described in <xref
            target="I-D.ietf-rmcat-eval-test" /> (Basic Test Cases),
            which also defines the specifics for the test cases.
            Additionally, proponents may produce evaluation results
            for the <xref target="I-D.ietf-rmcat-wireless-tests">
            wireless test scenarios</xref>.
            </t>

            <t>
	      This document does not cover application-specific
	      implications of congestion control algorithms and how
	      those could be evaluated.  Therefore, no quality metrics
	      are defined for performance evaluation; quality metrics
	      and algorithms to infer those vary between media types.
	      Metrics and algorithms to assess, e.g., quality of
	      experience evolve continuously so that determining
	      suitable choices is left for future work. However, there
	      is consensus that each congestion control algorithm
	      should be able to show that it is useful for interactive
	      video by performing analysis using a real codecs and
	      video sequences and state-of-the-art quality metrics.
	    </t>
	    <t>
	      Beyond optimizing individual metrics, real-time
	      applications may have further options to trade off
	      performance, e.g., across multiple media; refer to the
	      <xref target="I-D.ietf-rmcat-cc-requirements">RMCAT
	      requirements</xref> document.  Such trade-offs may be
	      defined in the future.
	    </t>

        </section>

        <section title="Terminology" anchor="sec-terminology">
            <!--<t> The key words "MUST", "MUST NOT", "REQUIRED", "SHALL",
            "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
            "OPTIONAL" in this document are to be interpreted as described
            in BCP 14, <xref target="RFC2119" /> and indicate requirement
            levels for compliant implementations. </t> -->

            <t> The terminology defined in <xref target="RFC3550">RTP</xref>,
            <xref target="RFC3551">RTP Profile for Audio and Video Conferences
            with Minimal Control</xref>, <xref target="RFC3611">RTCP Extended
            Report (XR)</xref>, <xref target="RFC4585">Extended RTP Profile
            for RTCP-based Feedback (RTP/AVPF)</xref> and <xref
            target="RFC5506">Support for Reduced-Size RTCP</xref> apply.</t>
        </section>

        <section title="Metrics" anchor="cc-metrics">

        <!-- <t><xref target="RFC5166" /> describes the basic metrics for
        congestion control. Metrics that are of interest for interactive
        multimedia are:
        <list style="symbols">
            <t>Throughput.</t>
            <t>Minimizing oscillations in the transmission rate (stability)
            when the end-to-end capacity varies slowly.</t>
            <t>Delay.</t>
            <t>Reactivity to transient events.</t>
            <t>Packet losses and discards.</t>
            <t>Users' quality of experience</t>
            <t>Section 2.1 of <xref target="RFC5166" /> discusses the tradeoff
            between throughput, delay and loss.</t>
        </list></t> -->

	<t> This document specifies testing criteria for evaluating
	congestion control algorithms for RTP media flows.  Proposed
	algorithms are to prove their performance by means of
	simulation and/or emulation experiments for all the cases
	described.</t>
	
         <t>Each experiment is expected to log every incoming and outgoing
         packet (the RTP logging format is described in <xref
         target="rtp-logging" />). The logging can be done inside the
         application or at the endpoints using PCAP (packet capture, e.g.,
         tcpdump, wireshark). The following metrics are calculated based on the
         information in the packet logs:
         <list style="numbers">
            <t>Sending rate, Receiver rate, Goodput (measured at 200ms intervals)</t>
            <t>Packets sent, Packets received</t>
            <t>Bytes sent, bytes received</t>
            <t>Packet delay</t>
            <t>Packets lost, Packets discarded (from the playout or de-jitter buffer)</t>
            <t>If using, retransmission or FEC: post-repair loss</t>


        <!-- <t>[Editor's note: How to handle packet re-transmissions? loss before
        retransmission, after retransmission?]</t> -->
            <!-- t>Fairness or Unfairness: Experiments testing the performance
            of an RMCAT proposal against any cross-traffic must define its
            expected criteria for fairness. The "unfairness" test guideline
            (measured at 1s intervals) is:<vspace />
                1. Does not trigger the circuit breaker.<vspace />
                2. No RMCAT stream achieves more than 3 times the average throughput
                of the RMCAT stream with the lowest average throughput, for a case
                when the competing streams have similar RTTs.<vspace />
                3. RTT should not grow by a factor of 3 for the existing flows when a
                new flow is added.
                <vspace />
	    -->
	    <t>Self-Fairness and Fairness with respect to cross
	    traffic: Experiments testing a given congestion control proposal must
	    report on relative ratios of the average throughput
	    (measured at coarser time intervals) obtained by each
	    RTP media stream. In the presence of background cross-traffic
	    such as TCP, the report must also include the relative
	    ratio between average throughput of RTP media streams and
	    cross-traffic streams.
	    <vspace/>
	    During static periods of a test (i.e., when bottleneck
	    bandwidth is constant and no arrival/departure of
	    streams), these report on relative ratios serve as an
	    indicator of how fair the RTP streams share bandwidth
	    amongst themselves and against cross-traffic streams. The
	    throughput measurement interval should be set at a few
	    values (for example, at 1s, 5s, and 20s) in order to
	    measure fairness across different time scales.
	    <vspace/>
	    As a general guideline, the relative ratio between congestion controlled RTP
	    flows with the same priority level and similar path RTT
	    should be bounded between (0.333 and 3.)  For example, see
	    the test scenarios described in <xref
	    target="I-D.ietf-rmcat-eval-test" />.</t>

            <t>Convergence time: The time taken to reach a stable rate at startup,
            after the available link capacity changes, or when new flows get added
            to the bottleneck link.</t>

            <t>Instability or oscillation in the sending rate: The frequency or
            number of instances when the sending rate oscillates between an
            high watermark level and a low watermark level, or vice-versa in
            a defined time window. For example, the watermarks can be set at 4x
            interval: 500 Kbps, 2 Mbps, and a time window of 500ms.</t>

        <!--
        <t>[Open issue (2): Convergence time was discussed briefly in the
        design meetings. It is defined as: the time it takes the congestion
        control to reach a stable rate (at startup or after new RMCAT flows
        are added). What is a stable rate?]</t>
                 -->
            <t>Bandwidth Utilization, defined as ratio of the instantaneous
            sending rate to the instantaneous bottleneck capacity. This metric is
            useful only when a congestion controlled RTP flow is by itself or competing with similar
            cross-traffic.</t>
        </list></t>

	<t>
	  Note that the above metrics are all objective
	  application-independent metrics.  Refer to Section 3, in
	  <xref target="I-D.ietf-netvc-testing" /> for objective
	  metrics for evaluating codecs.
	</t>

        <t>From the logs the statistical measures (min, max, mean, standard
        deviation and variance) for the whole duration or any specific part of
        the session can be calculated. Also the metrics (sending rate,
        receiver rate, goodput, latency) can be visualized in graphs as
        variation over time, the measurements in the plot are at 1 second
        intervals. Additionally, from the logs it is possible to plot the
        histogram or CDF of packet delay.</t>

        <!-- t>[Open issue (1): Using Jain-fairness index (JFI) for measuring
            self-fairness between RTP flows? measured at what intervals?
            visualized as a CDF or a time series? Additionally: Use JFI
            for comparing fairness between RTP and long TCP flows?
           ]</t -->


         <!-- <t> <list style="empty">
         <t>(i) Bandwidth Utilization: is the
        ratio of the encoding rate to the (available) end-to-end path
        capacity.

        <list style="symbols">

            <t>Under-utilization: is the period of time when the endpoint's
            encoding rate is lower than the end-to-end capacity, i.e., the
            bandwidth utilization is less than 1.</t>

             <t>Overuse: is the period of time when the endpoint's encoding
             rate is higher than the end-to-end capacity, i.e., the bandwidth
             utilization is greater than 1.</t>

             <t>Steady-state: is the period of time when the endpoint's
             encoding rate is relatively stable, i.e., the bandwidth
             utilization is constant.</t>

        </list></t>

        <t></t>

        <t>(ii) Packet Loss and Discard Rate.</t> <t></t>

        <t>(iii) Fair Share. </t> <t></t>

        <t>[Editor's Note: This metric should match the ones defined in the
        <xref target="I-D.ietf-rmcat-cc-requirements">RMCAT requirements</xref>
        document.]</t>
        <t></t>

        <t>(iv) Quality: There are many different types of quality metrics for
        audio and video. Audio quality is often expressed by a MOS ("Mean
        Opinion Score") and can be calculated using an objective algorithm
        (E-model/R-model). Section 4.7 of <xref target="RFC3611" /> can also
        be used for VoIP metrics. Similarly, there exist several metrics to
        measure video quality, for example Peak Signal to Noise Ratio (PSNR).
        </t>

        <t>[Editor's Note: Should the algorithm compare average PSNR of test
        video sequences or what other video quality metric can be used? If
        Quality is used as a metric, it should not be the only metric used to
        compare rate-control schemes. Also, algorithms using different codecs
        cannot be compared]. </t>

            </list>
            </t>
            -->

        <section title="RTP Log Format" anchor="rtp-logging">
	  <t>
	    Having a common log format simplifies running analyses
	    across and comparing different measurements.  The log file
	    should be tab or comma separated containing the following
	    details:
	  </t>
	    
<figure><artwork><![CDATA[
        Send or receive timestamp (unix)
        RTP payload type
        SSRC
        RTP sequence no
        RTP timestamp
        marker bit
        payload size
]]></artwork></figure>
	  
          <t>If the congestion control implements, retransmissions or FEC, the
          evaluation should report both packet loss (before applying
          error-resilience) and residual packet loss (after applying
          error-resilience).</t>

            <!-- <t>The retransmissions for post-repair loss metric be logged in a
            separate file, as the repair streams have different payload type
            and/or SSRC.</t> -->
        </section>
        </section>

        <!--
        <section title="Congestion control requirements" anchor="cc-require">
            <t> </t>
        </section>
        -->
<!--
        <section title="Guidelines" anchor="cc-guidelines">
            <t>A congestion control algorithm should be tested in
            simulation or a testbed environment, and the experiments should
            be repeated multiple times to infer statistical significance.
            The following guidelines are considered for evaluation:</t>

            <section title="Avoiding Congestion Collapse">
            <t>The congestion control algorithm is expected to take an action,
            such as reducing the sending rate, when it detects congestion.
            Typically, it should intervene before the circuit breaker <xref
            target="I-D.ietf-avtcore-rtp-circuit-breakers" /> is engaged. </t>

            <t>Does the congestion control propose any changes to (or diverge
            from) the circuit breaker conditions defined in <xref
            target="I-D.ietf-avtcore-rtp-circuit-breakers" />.</t> </section>

            <section title="Stability">
            <t>The congestion control should be assessed for its stability
            when the path characteristics do not change over time. Changing
            the media encoding rate estimate too often or by too much may
            adversely affect the application layer performance.</t>
            </section>

            <section title ="Media Traffic">
            <t>The congestion control algorithm should be assessed with
            different types of media behavior, i.e., the media should contain
            idle and data-limited periods. For example, periods of silence for
            audio, varying amount of motion for video, or bursty nature of
            I-frames. </t>

            <t>The evaluation may be done in two stages. In the first stage,
            the endpoint generates traffic at the rate calculated by the
            congestion controller. In the second stage, real codecs or models
            of video codecs are used to mimic application-limited data periods
            and varying video frame sizes.</t>
            </section>

            <section title="Start-up Behavior">
            <t>The congestion control algorithm should be assessed with
            different start-rates. The main reason is to observe the behavior
            of the congestion control in different test scenarios, such
            as when competing with varying amount of cross-traffic or how
            quickly does the congestion control algorithm achieve a stable
            sending rate.</t>
            </section>

            <section title="Diverse Environments">
            <t>The congestion control algorithm should be assessed in
            heterogeneous environments, containing both wired and wireless
            paths. Examples of wireless access technologies are: 802.11, GPRS,
            HSPA, or LTE. One of the main challenges of the wireless
            environments for the congestion control algorithm is to
            distinguish between congestion induced loss and transmission
            (bit-error) loss. Congestion control algorithms may
            incorrectly identify transmission loss as congestion loss and
            reduce the media encoding rate by too much, which may cause
            oscillatory behavior and deteriorate the users' quality of
            experience. Furthermore, packet loss may induce additional delay
            in networks with wireless paths due to link-layer
            retransmissions.</t>
            </section>

            <section title="Varying Path Characteristics">
            <t>The congestion control algorithm should be evaluated for a
            range of path characteristics such as, different end-to-end
            capacity and latency, varying amount of cross traffic on a
            bottleneck link and a router's queue length. For the moment, only
            Drop Tail queues are used. However, if new Active Queue Management
            (AQM) schemes become available, the performance of the congestion
            control algorithm should be again evaluated.</t>

            <t>In an experiment, if the media only flows in a single
            direction, the feedback path should also be tested with varying
            amounts of impairment.</t>

            <t>The main motivation for the previous and current criteria is to
            identify situations in which the proposed congestion control is
            less performant.</t>
            </section>

            <section title="Reacting to Transient Events or Interruptions">
            <t>The congestion control algorithm should be able to handle
            changes in end-to-end capacity and latency. Latency may change
            due to route updates, link failures, hand-overs etc. In mobile
            environment the end-to-end capacity may vary due to the
            interference, fading, hand-overs, etc. In wired networks the
            end-to-end capacity may vary due to changes in resource
            reservation.</t>
            </section>

            <section title="Fairness With Similar Cross-Traffic">
            <t>The congestion control algorithm should be evaluated when
            competing with other RTP flows using the same or another candidate
            congestion control algorithm. The proposal should highlight the
            bottleneck capacity share of each RTP flow.</t>
            </section>

            <section title="Impact on Cross-Traffic">

            <t>The congestion control algorithm should be evaluated when
            competing with standard TCP. Short TCP flows may be considered
            as transient events and the RTP flow may give way to the short
            TCP flow to complete quickly. However, long-lived TCP flows may
            starve out the RTP flow depending on router queue length. </t>

            <t>The proposal should also measure the impact on varied number
            of cross-traffic sources, i.e., few and many competing flows,
            or mixing various amounts of TCP and similar cross-traffic.</t>
            </section>

            <section title="Extensions to RTP/RTCP">
            <t>The congestion control algorithm should indicate if any
            protocol extensions are required to implement it and should
            carefully describe the impact of the extension.</t>
            </section>

        </section> -->


    <section anchor="add-params" title="List of Network Parameters">

      <t>The implementors initially are encouraged to choose evaluation settings
      from the following values:</t>

      <section anchor="scen-delay" title="One-way Propagation Delay">
        <!-- -->

        <t>Experiments are expected to verify that the congestion control is
        able to work across a broad range of path characteristics, also including challenging situations, for example over
        trans-continental and/or satellite links.  Tests thus account for the following different latencies:

	<list style="numbers">
            <t>Very low latency: 0-1ms</t>

            <t>Low latency: 50ms</t>

            <t>High latency: 150ms</t>

            <t>Extreme latency: 300ms</t>
          </list></t>
      </section>

      <section anchor="scen-loss" title="End-to-end Loss">
	<t>Many paths in the Internet today are largely lossless but,
	with wireless networks and interference, towards remote
	regions, or in scenarios featuring high/fast mobility, media
	flows may exhibit substantial packet loss.  This variety needs
	to be reflected appropriately by the tests.</t>
	
        <t>To model a wide range of lossy links, the experiments can choose one of the
        following loss rates, the fractional loss is the ratio of packets lost
        and packets sent. <list style="numbers">
            <t>no loss: 0%</t>

            <t>1%</t>

            <t>5%</t>

            <t>10%</t>

            <t>20%</t>
          </list></t>
      </section>

      <section anchor="scen-queue" title="Drop Tail Router Queue Length">
	<t>Routers should be configured to use Drop Trail queues in
	the experiments due to their (still) prevalent nature.  
	Experimentation with AQM schemes is encouraged but not mandatory.
	</t>
	
        <t>The router queue length is measured as the time taken to drain the
        FIFO queue. It has been noted in various discussions that the queue
        length in the current deployed Internet varies significantly. While
        the core backbone network has very short queue length, the home
        gateways usually have larger queue length. Those various queue lengths
        can be categorized in the following way: <list style="numbers">
            <t>QoS-aware (or short): 70ms</t>

            <t>Nominal: 300-500ms</t>

            <t>Buffer-bloated: 1000-2000ms</t>
          </list> Here the size of the queue is measured in bytes or packets
        and to convert the queue length measured in seconds to queue length in
        bytes:</t>

        <t>QueueSize (in bytes) = QueueSize (in sec) x Throughput (in
        bps)/8</t>

        <!-- <t>and 2) queue length in packets:</t>
        <t>QueueSize (in pkts) = QueueSize (in bytes)/MTU,
        MTU=1500</t> -->

        <!-- <t>[Open issue (11): Confirm the above values, do we need to
                        define parameters for other types of queues?]</t> -->
      </section>

      <section title="Loss generation model">
        <t>
	  Many models for generating packet loss are available, some
	  yield correlated, others independent losses; losses can also
	  be extracted from packet traces.  As a (simple) minimum loss
	  model with minimal parameterization (i.e., the loss rate),
	  independent random losses must be used in the evaluation.
	</t>
	<t>
	  It is known that independent loss models may reflect reality
	  poorly and hence more sophisticated loss models could be
	  considered.  Suitable models for correlated losses includes
	  the Gilbert-Elliot model and losses generated by modeling a
	  queue including its (different) drop behaviors.
	</t>
      </section>

      <section anchor="JM" title="Jitter models">
        <t>This section defines jitter models for the purposes of this
        document. When jitter is to be applied to both the congestion controlled RTP flow and any
        competing flow (such as a TCP competing flow), the competing flow will
        use the jitter definition below that does not allow for re-ordering of
        packets on the competing flow (see NR-RBPDV definition below).</t>

        <t>Jitter is an overloaded term in communications. It is
        is typically used to refer to the variation of a metric (e.g.,
        delay) with respect to some reference metric (e.g., average
        delay or minimum delay). For example, RFC 3550 jitter is
        computed as the smoothed difference in packet arrival times
        relative to their respective expected arrival times, which is
        particularly meaningful if the underlying packet delay
        variation was caused by a Gaussian random process.</t>

        <t>Because jitter is an overloaded term, we use the term
        Packet Delay Variation (PDV) instead to describe the variation
        of delay of individual packets in the same sense as the IETF
        IPPM WG has defined PDV in their documents (e.g., RFC 3393)
        and as the ITU-T SG16 has defined IP Packet Delay Variation
        (IPDV) in their documents (e.g., Y.1540).</t>

        <t>Most PDV distributions in packet network systems are
        one-sided distributions, the measurement of which with a
        finite number of measurement samples results in one-sided
        histograms. In the usual packet network transport case, there
        is typically one packet that transited the network with the
        minimum delay; a (large) number of packets transit the network
        within some (smaller) positive variation from this minimum
        delay, and a (small) number of the packets transit the network
        with delays higher than the median or average transit time
        (these are outliers). Although infrequent, outliers can cause
        significant deleterious operation in adaptive systems and
        should be considered in rate adaptation designs for RTP
        congestion control.</t>

        <t>In this section we define two different bounded PDV
        characteristics, 1) Random Bounded PDV and 2) Approximately Random
        Subject to No-Reordering Bounded PDV.</t>

        <t>The former, 1) Random Bounded PDV is presented for
        information only, while the latter, 2) Approximately Random
        Subject to No-Reordering Bounded PDV, must be used in the
        evaluation.</t>

        <section title="Random Bounded PDV (RBPDV)">

        <t>The RBPDV probability distribution function (PDF) is specified to
        be of some mathematically describable function which includes some
        practical minimum and maximum discrete values suitable for testing.
        For example, the minimum value, x_min, might be specified as the
        minimum transit time packet and the maximum value, x_max, might be
        defined to be two standard deviations higher than the mean.</t>

        <t>Since we are typically interested in the distribution relative to
        the mean delay packet, we define the zero mean PDV sample, z(n), to be
        z(n) = x(n) - x_mean, where x(n) is a sample of the RBPDV random
        variable x and x_mean is the mean of x.</t>

        <t>We assume here that s(n) is the original source time of packet n
        and the post-jitter induced emission time, j(n), for packet n is:
	</t>
	<t>j(n) = {[z(n) + x_mean] + s(n)}.</t>
	<t>
	  It follows that the separation in the post-jitter time of
	  packets n and n+1 is {[s(n+1)-s(n)] - [z(n)-z(n+1)]}. Since
	  the first term is always a positive quantity, we note that
	  packet reordering at the receiver is possible whenever the
	  second term is greater than the first. Said another way,
	  whenever the difference in possible zero mean PDV sample
	  delays (i.e., [x_max-x_min]) exceeds the inter-departure
	  time of any two sent packets, we have the possibility of
	  packet re-ordering.</t>

        <t>There are important use cases in real networks where packets can
        become re-ordered such as in load balancing topologies and during
        route changes. However, for the vast majority of cases there is no
        packet re-ordering because most of the time packets follow the same
        path. Due to this, if a packet becomes overly delayed, the packets
        after it on that flow are also delayed. This is especially true for
        mobile wireless links where there are per-flow queues prior to base
        station scheduling. Owing to this important use case, we define
        another PDV profile similar to the above, but one that does not allow
        for re-ordering within a flow.</t>
        </section>

        <section title="Approximately Random Subject to No-Reordering Bounded PDV
        (NR-RPVD)">

          <t>No Reordering RPDV, NR-RPVD, is defined similarly to the above with
          one important exception. Let serial(n) be defined as the serialization
          delay of packet n at the lowest bottleneck link rate (or other
          appropriate rate) in a given test. Then we produce all the post-jitter
          values for j(n) for n = 1, 2, ... N, where N is the length of the
          source sequence s to be offset-ed. The exception can be stated as
          follows: We revisit all j(n) beginning from index n=2, and if j(n) is
          determined to be less than [j(n-1)+serial(n-1)], we redefine j(n) to
          be equal to [j(n-1)+serial(n-1)] and continue for all remaining n
          (i.e., n = 3, 4, .. N). This models the case where the packet n is
          sent immediately after packet (n-1) at the bottleneck link rate.
          Although this is generally the theoretical minimum in that it assumes
          that no other packets from other flows are in-between packet n and n+1
          at the bottleneck link, it is a reasonable assumption for per flow
          queuing.</t>

          <t>We note that this assumption holds for some important exception
          cases, such as packets immediately following outliers. There are a
          multitude of software controlled elements common on end-to-end
          Internet paths (such as firewalls, ALGs and other middleboxes) which
          stop processing packets while servicing other functions (e.g., garbage
          collection). Often these devices do not drop packets, but rather queue
          them for later processing and cause many of the outliers. Thus NR-RPVD
          models this particular use case (assuming serial(n+1) is defined
          appropriately for the device causing the outlier) and thus is believed
          to be important for adaptation development for congestion controlled RTP streams.</t>
        </section>
        <section title="Recommended distribution">
          <t>Whether Random Bounded PDV or Approximately Random
          Subject to No-Reordering Bounded PDV, it is recommended that
          z(n) is distributed according to a truncated Gaussian for
          the above jitter models:</t>
            <t>z(n) ~ |max(min(N(0, std^2), N_STD * std), -N_STD * std)|</t>
          <t>where N(0, std^2) is the Gaussian distribution with zero mean and
          standard deviation std. Recommended values:</t>
          <t><list style="symbols">
            <t>std = 5 ms</t>
            <t>N_STD = 3</t>
          </list></t>
        </section>
      </section>
    </section>

    <!--
    <section title="WiFi or Cellular Links">
        <t>
          <xref target="I-D.ietf-rmcat-wireless-tests" /> describes the test
          cases to simulate networks with wireless links. The document
          describes mechanism to simulate both cellular and WiFi networks.
        </t>
	</section>
    -->

    <section anchor="app-additional" title="Traffic Models">

      <section title="TCP traffic model">
        <t>Long-lived TCP flows will download data throughout the
        session and are expected to have infinite amount of data to
        send or receive.  This roughly applies, for example, when
        downloading software distributions.</t>

        <t>Each short TCP flow is modeled as a sequence of file downloads
        interleaved with idle periods.  Not all short TCP flows start at the same
        time, i.e., some start in the ON state while others start in the OFF
        state.</t>

        <t>The short TCP flows can be modeled as follows: 30
        connections start simultaneously fetching small (30-50 KB)
        amounts of data, evenly distributed.  This covers the case
        where the short TCP flows are fetching web page resources rather
        than video files.</t>

        <t>The idle period between bursts of starting a group of TCP flows is
        typically derived from an exponential distribution with the mean value of
        10 seconds.</t>

        <t>[These values were picked based on the data available at
        http://httparchive.org/interesting.php as of October 2015].</t>

	<t>
	  Many different TCP congestion control schemes are deployed
	  today.  Therefore, experimentation with a range of different
	  schemes, especially including CUBIC, is encouraged.
	  Experiments must document in detail which congestion control
	  schemes they tested against and which parameters were used.
	</t>
      </section>

      <section title="RTP Video model">
        <t>
          <xref target="RFC8593"/>
	  describes two
          types of video traffic models for evaluating candidate algorithms for RTP congestion control.
          The first model statistically characterizes the behavior of a video
          encoder, whereas the second model uses video traces.
        </t>
        <t>
	  Sample video test sequences are available at:
          <xref target="xiph-seq"></xref> and <xref target="HEVC-seq"></xref>.
	  The following two video streams are the recommended minimum for testing:
	  Foreman and FourPeople.</t>
      </section>

      <section title="Background UDP">
       <t>Background UDP flow is modeled as a constant
            bit rate (CBR) flow. It will download data at a particular CBR
            rate for the complete session, or will change to particular
            CBR rate at predefined intervals. The inter packet interval is
            calculated based on the CBR and the packet size (is typically
            set to the path MTU size, the default value can be 1500 bytes).
       </t>

       <t>Note that new transport protocols such as QUIC may use UDP
       but, due to their congestion control algorithms, will exhibit
       behavior conceptually similar in nature to TCP flows above and
       can thus be subsumed by the above, including the division into
       short- and long-lived flows.  As QUIC evolves independently of
       TCP congestion control algorithms, its future congestion
       control should be considered as competing traffic as appropriate.
       </t>
        </section>

    </section>

        <section title="Security Considerations">
          <t>
	    This document specifies evaluation criteria and parameters
	    for assessing and comparing the performance of congestion
	    control protocols and algorithms for real-time
	    communication.  This memo itself is thus not subject to
	    security considerations but the protocols and algorithms
	    evaluated may be.  In particular, successful operation
	    under all tests defined in this document may suffice for a
	    comparative evaluation but must not be interpreted that
	    the protocol is free of risks when deployed on the
	    Internet as briefly described in the following by example.
	  </t>
	  <t>
	    Such evaluations are expected to be
	    carried out in controlled environments for limited numbers
	    of parallel flows.  As such, these evaluations are by
	    definition limited and will not be able to systematically
	    consider possible interactions or very large groups of
	    communicating nodes under all possible circumstances, so
	    that careful protocol design is advised to avoid
	    incidentally contributing traffic that could lead to
	    unstable networks, e.g., (local) congestion collapse.
	  </t>
	  <t>
	   This specification focuses on assessing the regular
	   operation of the protocols and algorithms under
	   considerations.  It does not suggest checks against
	   malicious use of the protocols -- by the sender, the
	   receiver, or intermediate parties, e.g., through faked,
	   dropped, replicated, or modified congestion signals.  It is
	   up to the protocol specifications themselves to ensure that
	   authenticity, integrity, and/or plausibility of received
	   signals are checked and the appropriate actions (or
	   non-actions) are taken.
	  </t>
        </section>

        <section title="IANA Considerations">
            <t>There are no IANA impacts in this memo.</t>
        </section>

        <section anchor="contrib" title="Contributors">
            <t>The content and concepts within this document are a product of
            the discussion carried out in the Design Team.</t>

            <t>Michael Ramalho provided the text for the Jitter model.</t>
        </section>

        <section title="Acknowledgments">
          <t> Much of this document is derived from previous work on
          congestion control at the IETF.</t>
          <t> The authors would like to thank
          Harald Alvestrand,
          Anna Brunstrom,
          Luca De Cicco,
          Wesley Eddy,
          Lars Eggert,
          Kevin Gross,
          Vinayak Hegde,
          Randell Jesup,
          Mirja Kuehlewind,
          Karen Nielsen,
          Piers O'Hanlon,
          Colin Perkins,
          Michael Ramalho,
          Zaheduzzaman Sarker,
          Timothy B. Terriberry,
          Michael Welzl,
          Mo Zanaty, and
	  Xiaoqing Zhu
          for providing valuable feedback on earlier versions of this draft.
          Additionally, also thank the participants of the design team for
          their comments and discussion related to the evaluation
          criteria.</t>
        </section>
    </middle>
    <back>
        <references title="Normative References">
            <!--&rfc2119;-->
            <!-- RTP related -->
            &rfc3550;
            &rfc3551;
            &rfc3611;
            &rfc4585;
            &rfc5506;
            <!--RMCAT related -->
	    &rfc8083;
            &rfc8593;
            &I-D.ietf-rmcat-cc-requirements;
            </references>

            <references title="Informative References">
            &rfc5033; <!-- CC Evaluation -->
            &rfc5166; <!-- CC Metrics -->
            <!-- &rfc5681; Standard TCP -->
            &I-D.ietf-rmcat-eval-test;
            &I-D.ietf-rmcat-wireless-tests;
            &I-D.ietf-netvc-testing;
            <!-- <?rfc include="reference.3GPP.R1.081955"?>
            <reference anchor="SA4-EVAL">
                <front>
                    <title>LTE Link Level Throughput Data for SA4 Evaluation Framework</title>
                    <author initials="3GPP" surname="R1-081955" fullname="3GPP R1-081955">
                        <organization />
                    </author>
                    <date month="5" year="2008" />
                    <abstract>
                    <t>In R1-081720, 3GPP SA4 has requested RAN1 and RAN2 for link
                    level throughput traces to be used in an evaluation framework
                    they are developing for dynamic video rate adaptation.
                    </t></abstract>
                </front>
                <seriesInfo name="3GPP" value="R1-081955" />
                <format type='ZIP' octets='3459875' target='http://www.3gpp.net/ftp/tsg_ran/WG1_RL1/TSGR1_53/Docs/R1-081955.zip' />
            </reference>
            -->

<!--
            <reference anchor="SA4-LR">
                <front>
                    <title>Error Patterns for MBMS Streaming over UTRAN and GERAN</title>
                    <author initials="3GPP" surname="S4-050560" fullname="3GPP S4-050560">
                        <organization />
                    </author>
                    <date month="5" year="2008" />
                </front>
                <seriesInfo name="3GPP" value="S4-050560" />
                <format type='ZIP' octets='335322' target='http://www.3gpp.org/FTP/tsg_sa/WG4_CODEC/TSGS4_36/Docs/S4-050560.zip' />
            </reference>
-->

<!--
            <reference anchor="TCP-eval-suite">
              <front>
                <title>Towards a Common TCP Evaluation Suite</title>
                <author initials="A." surname="Lachlan"   fullname="Andrew Lachlan"/>
                <author initials="C." surname="Marcondes" fullname="Cesar Marcondes"/>
                <author initials="S." surname="Floyd"  fullname="Sally Floyd"/>
                <author initials="L." surname="Dunn"  fullname="Lawrence Dunn"/>
                <author initials="R." surname="Guillier"  fullname="Romeric Guillier"/>
                <author initials="W." surname="Gang"  fullname="Wang Gang"/>
                <author initials="L." surname="Eggert"  fullname="Lars Eggert"/>
                <author initials="S." surname="Ha"  fullname="Sangtae Ha"/>
                <author initials="I." surname="Rhee"  fullname="Injong Rhee"/>
                <date month="August" year="2008"/>
              </front>
              <seriesInfo name="Proc. PFLDnet." value="2008"/>
            </reference>
-->

            <reference anchor="xiph-seq">
                <front>
                  <title>Video Test Media Set</title>

                  <author fullname="Daede, T." initials="T." surname="Daede"></author>

                  <date month="" year="" />
                </front>
                <seriesInfo name="https://people.xiph.org/~tdaede/sets/" value="" />
            </reference>

            <reference anchor="HEVC-seq">
                <front>
                  <title>Test Sequences</title>

                  <author fullname="" initials="" surname="HEVC"></author>

                  <date month="" year="" />
                </front>
                <seriesInfo name="http://www.netlab.tkk.fi/~varun/test_sequences/"
                        value="" />
            </reference>

        </references>

<!--
        <section anchor="misc"  title="Application Trade-off">
          <t>Application trade-off is yet to be defined. see <xref
          target="I-D.ietf-rmcat-cc-requirements">RMCAT requirements</xref>
          document. Perhaps each experiment should define the application's
          expectation or trade-off.</t>
          <section anchor="misc-2"  title="Measuring Quality">
            <t>No quality metric is defined for performance evaluation, it is
            currently an open issue. However, there is consensus that
            congestion control algorithm should be able to show that it is
            useful for interactive video by performing analysis using a real
            codec and video sequences. </t>
          </section>
        </section>
-->

        <section anchor="App-cl" title="Change Log">
        <t>Note to the RFC-Editor: please remove this section prior to
        publication as an RFC.</t>
        <section title="Changes in draft-ietf-rmcat-eval-criteria-07">
	  <t>Updated the draft according to the discussion at IETF-101.</t>
            <t><list style="symbols">
              <t>Updated the discussion on fairness.  Thanks to Xiaoqing Zhu for providing text.</t>
	      <t>Fixed a simple loss model and provided pointers to more sophisticated ones.</t>
	      <t>Fixed the choice of the jitter model.</t>
              </list></t>
            </section>
            <section title="Changes in draft-ietf-rmcat-eval-criteria-06">
            <t><list style="symbols">
                <t>Updated Jitter.</t>
              </list></t>
            </section>
            <section title="Changes in draft-ietf-rmcat-eval-criteria-05">
            <t><list style="symbols">
                <t>Improved text surrounding wireless tests, video sequences,
                and short-TCP model.</t>
              </list></t>
            </section>
            <section title="Changes in draft-ietf-rmcat-eval-criteria-04">
            <t><list style="symbols">
                <t>Removed the guidelines section, as most of the sections
                  are now covered: wireless tests, video model, etc.</t>
                <t>Improved Short TCP model based on the suggestion to use
                  httparchive.org.</t>
              </list></t>
            </section>
            <section title="Changes in draft-ietf-rmcat-eval-criteria-03">
            <t><list style="symbols">
                <t>Keep-alive version.</t>
                <t>Moved link parameters and traffic models from eval-test</t>
              </list></t>
            </section>
            <section title="Changes in draft-ietf-rmcat-eval-criteria-02">
            <t><list style="symbols">
                <t>Incorporated fairness test as a working test.</t>
                <t>Updated text on mimimum evaluation requirements.</t>
            </list></t>
            </section>
            <section title="Changes in draft-ietf-rmcat-eval-criteria-01">
            <t><list style="symbols">
                <t>Removed Appendix B.</t>
                <t>Removed Section on Evaluation Parameters.</t>
            </list></t>
            </section>
            <section title="Changes in draft-ietf-rmcat-eval-criteria-00">
            <t><list style="symbols">
                <t>Updated references.</t>
                <t>Resubmitted as WG draft.</t>
            </list></t>
            </section>
            <section title="Changes in draft-singh-rmcat-cc-eval-04">
            <t><list style="symbols">
                <t>Incorporate feedback from IETF 87, Berlin.</t>
                <t>Clarified metrics: convergence time, bandwidth
                utilization.</t>
                <t>Changed fairness criteria to fairness test.</t>
                <t>Added measuring pre- and post-repair loss.</t>
                <t>Added open issue of measuring video quality to
                appendix.</t>
                <t>clarified use of DropTail and AQM.</t>
                <t>Updated text in "Minimum Requirements for Evaluation"</t>

            </list></t>
            </section>
            <section title="Changes in draft-singh-rmcat-cc-eval-03">
            <t><list style="symbols">
                <t>Incorporate the discussion within the design team.</t>
                <t>Added a section on evaluation parameters, it describes the
                flow and network characteristics.</t>
                <t>Added Appendix with self-fairness experiment.</t>
                <t>Changed bottleneck parameters from a proposal to an example
                set.</t>
                <t></t>
            </list></t>
            </section>

            <section title="Changes in draft-singh-rmcat-cc-eval-02">
            <t><list style="symbols">
                <t>Added scenario descriptions.</t>
            </list></t>
            </section>

            <section title="Changes in draft-singh-rmcat-cc-eval-01">
            <t><list style="symbols">
                <t>Removed QoE metrics.</t>
                <t>Changed stability to steady-state.</t>
                <t>Added measuring impact against few and many
                flows.</t>
                <t>Added guideline for idle and data-limited periods.</t>
                <t>Added reference to TCP evaluation suite in example
                evaluation scenarios.</t>
            </list></t>
            </section>
        </section>
    </back>
</rfc>