<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<?rfc docname="yes"?>
<rfc category="std" docName="draft-qian-6man-ipv6-multipath-mtu-detection-00"
     ipr="trust200902">
  <front>
    <title abbrev="6man Multipath MTU Detection">IPv6 Minimum Multipath MTU
    Detection</title>

    <author fullname="Guofeng Qian" initials="G. " surname="Qian">
      <organization>Huawei</organization>

      <address>
        <postal>
          <street>Huawei Bld., No.156 Beiqing Rd.</street>

          <city>Beijing</city>

          <code>100095</code>

          <country>China</country>
        </postal>

        <email>Qianguofeng@huawei.com</email>
      </address>
    </author>

    <author fullname="Tianran zhou" initials="T." surname="Zhou">
      <organization/>

      <address>
        <postal>
          <street>Huawei Bld., No.156 Beiqing Rd.</street>

          <city>Beijing</city>

          <code>100095</code>

          <country>China</country>
        </postal>

        <email>zhoutianran@huawei.com</email>
      </address>
    </author>

    <date day="1" month="March" year="2022"/>

    <abstract>
      <t>I In current multipath load balancing network scenario, all path
      detection mechanisms have a defect. A typical load balancing route
      selection mechanism cannot cover all forwarding paths, which will cause
      missing detection.This document describes how to extend a new path
      detection mechanism to instruct intermediate devices to send probe
      packets to all downstream paths. This new mechanism is named
      load-sharing multipath replication forwarding (LMRF).</t>

      <t/>
    </abstract>

    <note title="Requirements Language">
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
      "OPTIONAL" in this document are to be interpreted as described in BCP 14
      [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as
      shown here.</t>
    </note>
  </front>

  <middle>
    <section title="Introduction">
      <t>In the current multipath load balancing scenario, a path detection
      mechanism has a defect. A common load balancing route selection solution
      cannot cover all forwarding paths, which causes missing detection.This
      document describes how to extend a new probe mechanism to instruct
      intermediate forwarding devices to send probe packets to all downstream
      paths.</t>

      <t>Typical problem: During path MTU detection, the path MTU of a path
      cannot be used as the path MTU of all load balancing paths. In this
      case, the source selects the minimum path MTU of different paths as the
      path MTU of the entire path to ensure normal forwarding on the
      intermediate network.</t>

      <t>Currently, there are some solutions in the industry, such as the
      Paris trace solution.By constructing a large number of packets at the
      source and modifying information such as the transport-layer port number
      of the packets, the forwarding device on the network can hash the
      packets to as many forwarding paths as possible during route selection.
      This solution cannot ensure that all paths are covered. In addition, a
      large number of packets need to be constructed at the source, which
      affects network performance and imposes more workload and skill
      requirements on O&amp;M engineers.</t>
    </section>

    <section title="Terminology">
      <t>The following terminology is used in this document.</t>

      <t>MTU: Maximum Transmission Unit</t>

      <t>Path MTU: path maximum transmission unit</t>

      <t>TWAPM:Two-Way Active Measurement Protocol</t>

      <t>BFD: Bidirectional Forwarding Detection</t>

      <t>LMRF: Load-sharing multipath replication forwarding</t>
    </section>

    <section title="Scenario Description">
      <t/>

      <section title="Example">
        <t/>

        <figure>
          <artwork align="center"><![CDATA[  
                                +-----------+                     
                                |           |                              
                                |     B     |                              
                       /------->|  Router   |------\                       
       +-----------+  /         |           |       \        +---------+ 
       |           | /          +-----------+        \       |         | 
       |     A     |/                                 \      |     D   | 
       |  Router   |\                                  \---->|  Router | 
       |           | \                                 /     |         | 
       +-----------+  \         +-----------+         /      +---------+ 
                       \        |           |        /                     
                        \------>|     C     |-------/                      
                                |  Router   |                              
                                |           |                              
                                +-----------+               
               
                         Figure 1:Muiltpah Network Example
]]></artwork>
        </figure>

        <t/>

        <t>As shown in Figure 1, there are two paths from A to D: A-B-D and
        A-C-D. The two paths are ECMP paths from A to D. Data packets from A
        to D are transmitted based on the 5-tuple or triplet information in
        the packet header. Selects a path based on the hash calculation
        result. TCP/UDP/ICMP packets are routed based on quintuple, and raw IP
        packets are routed based on triplet. Take ping packets as an example.
        The source IP address, destination IP address, protocol number, ICMP
        type, and ICMP code are used for hash calculation. The result is used
        for ECMP route selection. Therefore, ping packets from A to D can
        always cover only one path. Therefore, even if the ping result is
        normal, services may be abnormal.Conversely, when a service fault
        occurs, the ping detection may be normal.</t>

        <t>Similar problems occur in trace route detection, BFD detection,
        TWAMP detection, and path MTU detection.</t>

        <t>In multi-channel load balancing scenarios, incorrect path MTU
        detection may cause service exceptions. To simplify packet processing
        and improve processing efficiency, IPv6 packets are fragmented only on
        the source node.Therefore, the IPv6 path MTU discovery protocol must
        be implemented.The latest document (draft-ietf-6man-mtu-option-11 -
        IPv6 Minimum Path MTU Hop-by-Hop Option) provides the path MTU
        discovery method for a single path, but does not solve the path MTU
        problem in multipath scenarios.</t>

        <figure>
          <artwork><![CDATA[                         +-----------+                             
                         |           |                             
              MTU 1600   |     B     |  MTU 1600                   
                /------->|  Router   |------\                      
+-----------+  /         |           |       \        +-----------+
|           | /          +-----------+        \       |           |
|     A     |/                                 \      |     D     |
|  Router   |\                                  \---->|  Router   |
|           | \                                 /     |           |
+-----------+  \         +-----------+         /      +-----------+
                \        |           |        /                    
                 \------>|     C     |-------/                     
               MTU 1500  |  Router   |   MTU 1500                  
                         |           |                             
                         +-----------+                             
             
                Figure 2:MTU in Multipath Network
]]></artwork>
        </figure>

        <t/>

        <t>As shown in Figure 2, if the path MTU probe packet from A to D is
        A-B-D, the path MTU of this path is 1600, and the path MTU of the path
        A-C-D is 1500, Packet loss occurs when data packets with more than
        1500 bytes are routed to route A-C-D.</t>
      </section>

      <section title="Solution ">
        <t>A universal replication detection mechanism is required to support
        connectivity detection, path MTU detection, and delay detection. This
        document discusses enhancements to IP header to support multipath
        detection.</t>

        <t>Path MTU detection affects service availability. Therefore, this
        document focuses on the problem of path MTU detection. Other problems,
        such as connectivity monitoring and delay monitoring, will be
        discussed in the future.</t>

        <t/>
      </section>
    </section>

    <section title="Detail solution">
      <section title="IPv4 solution">
        <t>This document focuses on the IPv6 network solution, IPv4 netwrok
        solution will be discussed in the future.</t>
      </section>

      <section title="IPv6 solution">
        <section title="Detection Solution  ">
          <t>For IPv6, Hop by hop header and Destination header are extended
          to carry the multipath replication switch and MTU detection switch.
          For details, see section 4.2.2. The source node marks the flag, and
          the intermediate device and tail device perform corresponding
          processing. After the replication function is enabled on the source
          node, the source node and transit node copy probe packets to all
          downstream load balancing paths. After the MTU detection function is
          enabled on the source node, the source node and intermediate node
          add the MTU value of the outbound interface to the packet. You can
          add the MTU value to the packet one by one, or you can compare the
          MTU value and enter the minimum value. The end node responds to all
          received detection packets, carries the MTU added along the path,
          and sends the packets to the source node. The end node can also
          compare the packets and select the smallest MTU as the final path
          MTU. To simplify the packet format, packet size, and data-plane
          prosection cessing, it is recommended that only the minimum MTU be
          reserved in packets. In addition, the path MTU aging mechanism needs
          to be modified. Considering that the network topology may change,
          the path MTU may increase.If you always select the minimum value,
          you can never increase it. Therefore, if no path MTU smaller than or
          equal to the current path MTU is received for a long time, the
          current path MTU may be set to an aging state. When the path MTU is
          in the aging state, the path MTU may be replaced by a larger path
          MTU.</t>
        </section>

        <section title="Modifications to existing mechanisms  ">
          <section title="Modification of the packet structure  ">
            <t>The hop-by-hop extension header is used in common IP packet.
            The TTTTT needs to be allocated by the IANA.</t>

            <t><figure>
                <artwork><![CDATA[    Option    Option    Option
     Type    Data Len   Data
   +--------+--------+--------+--------+---------+ 
   |BBCTTTTT|00000011|RRRRRRMD|-------MTU--------+
   +--------+--------+--------+--------+---------+
R:Reserved
M:Path MTU detection flag
D:Load balancing duplicating flag
MTU:Minimum MTU on the path

]]></artwork>
              </figure></t>

            <t>The reply packet uses the DH extension header, and the TTTTT
            needs to be allocated from the IANA.</t>

            <figure>
              <artwork><![CDATA[    Option    Option    Option
     Type    Data Len   Data
   +--------+--------+--------+---------+ 
   |BBCTTTTT|00000010|-------MTU--------+
   +--------+--------+--------+---------+
MTU:Minimum MTU on the path
]]></artwork>
            </figure>

            <t/>
          </section>

          <section title="Source node behavior  ">
            <t>1. Enable the load balancing duplicating flag.</t>

            <t>2. Enable the MTU detection flag.</t>

            <t>3. Set the detection timer: The system periodically sends
            detection packets in duplicate mode and carries the MTU
            information of its own interface. You are advised to set the timer
            interval to minutes, which is configurable using the command
            line.</t>

            <t>4. After receiving the response packet from the tail node, the
            ingress node compares the path MTU value with the local path MTU
            value and selects the minimum value.</t>

            <t>5. Set the path MTU aging timer: The lifetime of the path MTU
            is periodically updated. When a smaller path MTU or equivalent
            path MTU is received, the timer is cleared. It is recommended that
            the timer be set to three times of the detection timer.</t>

            <t>6. When the path MTU aging timer expires, the path MTU is set
            to the aging state and the minimum MTU detected in the next
            detection period is used to overwrite the path MTU.</t>
          </section>

          <section title="transit node behavior  ">
            <t>1. Duplicating is performed to all load balancing next hops
            based on the enabling flag of the load balancing duplicating
            flag.</t>

            <t>2. Compare the MTU in packet with the local output interface
            MTU, and replace the MTU in the packet with the smaller one.</t>
          </section>

          <section title="Destination  node behavior">
            <t>1. Send Reply to source node accouding to all received packets
            and fill back MTU value get from the received packets.</t>

            <t/>
          </section>

          <section title="Process flow">
            <t><figure>
                <artwork><![CDATA[      step 5                                   
       |---------------------------<<---------------------------|      
       |                                                        |      
       |                    +-----------+                       |      
       |                    |           |                       |      
       |                    |     B     |                       |      
       |           /------->|  Router   |------\                |      
   +-----------+  /         |           |step3  \        +--------+ 
   |           | /          +-----------+        \       |        | 
   |     A     |/ step2                           \      |    D   | 
   |  Router   |\                                  \---->| Router | 
   |           | \                                 /     |        | 
   +-----------+  \         +-----------+         /      +--------+ 
       step 1      \        |           |step4   /                    
       step 6       \------>|     C     |-------/                      
                            |  Router   |                              
                            |           |                              
                            +-----------+                          
]]></artwork>
              </figure>step 1. Router A try to dicovery the path mtu to Router
            D</t>

            <t>step 2. Two packets will be send to Router D through Router B
            and Router C, A-B-D path MTU set as 1600, A-C-D path MTU set as
            1700</t>

            <t>step 3. Router B received the packet and transfer to Router D,
            and modify the MTU to 1500</t>

            <t>step 4. Router C received the packet and transfer to Router D,
            and modify the MTU to 1600</t>

            <t>step 5. Router D received two packets and reply to Router A
            with the corresponding path MTU</t>

            <t>step 6. Router A updates local Path MTU with 1500, which is the
            smallest one among all reply packets.</t>
          </section>

          <section title="Uplayer protocol consideration">
            <t>This function does not depend on upper-layer protocols and can
            work with any upper-layer protocols, such as TCP, UPD, ICMP, Quic,
            and TWAMP.</t>

            <t>Take TWAMP as an example, TWAMP-test packets carry hop-by-hop
            extension headers and enable M and D flags to detect the MTU of
            multipath. Sequence numbers are used to identify multiple copies
            of a packet.</t>

            <t/>

            <figure>
              <artwork><![CDATA[ 
      0               1               2               3      
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                        Sequence Number                        |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                          Timestamp                            |
     |                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |        Error Estimate         |                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
     |                                                               |
     .                                                               .
     .                         Packet Padding                        .
     .                                                               .
     |                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+]]></artwork>
            </figure>

            <t>The receiver replies to the source as follows:</t>

            <t><figure>
                <artwork><![CDATA[   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                        Sequence Number                        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                          Timestamp                            |
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         Error Estimate        |           MBZ                 |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                          Receive Timestamp                    |
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                        Sender Sequence Number                 |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      Sender Timestamp                         |
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |      Sender Error Estimate    |           MBZ                 |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Sender TTL   |                                               |
   +-+-+-+-+-+-+-+-+                                               +
   |                                                               |
   .                                                               .
   .                         Packet Padding                        .
   .                                                               .
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+]]></artwork>
              </figure></t>

            <t>Sender Sequence Number is a copy of the Sequence Number of the
            packet transmitted by the Session-Sender that caused the
            Session-Reflector to generate and send this test packet.</t>

            <t/>
          </section>
        </section>
      </section>
    </section>

    <section title="Supplementary description of the protocol  ">
      <t>1. In SDN scenarios, path MTUs can be sent to the controller by
      telemetry, and controller then transfer the packets to source node. This
      is not discussed in this document.</t>

      <t>2. The detection protocol can be extended by TWAMP, BFD, or other OAM
      protocol. This document does not provide any analysis.</t>

      <t>3. This solution assumes all devices on the network support this
      solution. If intermediate devices do not support, real path MTU will be
      not detected, Then, PTB will be used to detect the path MTU.</t>

      <t>4. The detection of connectivity faults and parameters such as
      latancy in multipath load balancing scenarios will be discussed in
      future.</t>
    </section>

    <section title="Benefits">
      <t>This solution provides accurate path MTU detection in load balancing
      scenarios to prevent packet loss caused by excessively large
      packets.</t>
    </section>

    <section title="Acknowledgements">
      <t>Thank you to Yang Pingan, Zhao Ranxiao, Xia Yang, Wu Qin, Yudan, and
      others for participating in the solution discussion and helping improve
      the solution.</t>

      <t/>
    </section>

    <section anchor="IANA" title="IANA Considerations">
      <t>For carrying the Load balancing duplicating flag and Path MTU
      detection flag, new option types need to be defined in the existing RH
      and Hop by Hop headers.</t>

      <t/>
    </section>

    <section anchor="Security" title="Security Considerations">
      <t>Considering the impact of packet replication on device and network
      performance, packets in replication mode need to be traced, encrypted,
      URPF, security filtering, and rate limiting.</t>

      <t/>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include="reference.RFC.1191"?>

      <?rfc include="reference.RFC.2119"?>

      <?rfc include="reference.RFC.8174"?>

      <?rfc include='reference.RFC.4821'?>

      <?rfc include='reference.RFC.8899'?>

      <?rfc include='reference.RFC.2460'?>

      <?rfc include='reference.RFC.8200'?>

      <?rfc include='reference.I-D.ietf-6man-mtu-option'?>
    </references>
  </back>
</rfc>
