Why Does Flow Director Cause Packet Reordering?

Reading time: 5 minute
...

📝 Abstract

Intel Ethernet Flow Director is an advanced network interface card (NIC) technology. It provides the benefits of parallel receive processing in multiprocessing environments and can automatically steer incoming network data to the same core on which its application process resides. However, our analysis and experiments show that Flow Director cannot guarantee in-order packet delivery in multiprocessing environments. Packet reordering causes various negative impacts. E.g., TCP performs poorly with severe packet reordering. In this paper, we use a simplified model to analyze why Flow Director can cause packet reordering. Our experiments verify our analysis.

💡 Analysis

Intel Ethernet Flow Director is an advanced network interface card (NIC) technology. It provides the benefits of parallel receive processing in multiprocessing environments and can automatically steer incoming network data to the same core on which its application process resides. However, our analysis and experiments show that Flow Director cannot guarantee in-order packet delivery in multiprocessing environments. Packet reordering causes various negative impacts. E.g., TCP performs poorly with severe packet reordering. In this paper, we use a simplified model to analyze why Flow Director can cause packet reordering. Our experiments verify our analysis.

📄 Content

Abstract
 ­
 Intel Ethernet Flow Director is an advanced network interface card (NIC) technology. It provides the benefits of parallel receive processing in multiprocessing environments and can automatically steer incoming network data to the same core on which its application process resides. However, our analysis and experiments show that Flow Director cannot guarantee in-order packet delivery in multiprocessing environments. Packet reordering causes various negative impacts. E.g., TCP performs poorly with severe packet reordering. In this paper, we use a simplified model to analyze why Flow Director can cause packet reordering. Our experiments verify our analysis. Index Terms – Packet Reordering, Flow Director, TCP, High Performance Networking.


  1. Introduction
 Computing is now shifting towards multiprocessing (e.g., CMP, SMP, and UNMA). The fundamental goal of multiprocessing is improved performance through the introduction of additional hardware threads, CPUs, or cores (all of which will be referred to as “cores” for simplicity). The emergence of multiprocessing has brought both opportunities and challenges for TCP/IP performance optimization in such environments. Modern network stacks can exploit parallel cores to allow either message-based parallelism or connection- based parallelism as a means of enhancing performance [1]. While existing OSes exploit parallelism by allowing multiple threads to carry out network operations concurrently in the kernel, supporting this parallelism carries significant costs, particularly in the context of contention for shared resources, software synchronization, and poor cache efficiencies [1][2]. Investigations regarding processor affinity [3][4][5] indicate that the coordinated affinity scheduling of protocol processing and network applications on the same target cores can significantly reduce contention for shared resources, minimize software synchronization overheads, and enhance cache efficiency.
    Coordinated affinity scheduling of protocol processing and network applications on the same target cores has the following goals: (1) Interrupt affinity: Network interrupts of the same type should be directed to a single core. Redistributing network interrupts in either a random or round- robin fashion to different cores has undesirable side effects [4]. (2) Flow affinity: Packets belonging to a specific flow should be processed by the same core. Flow affinity is especially important for TCP. TCP is a connection-oriented protocol, and it has a large and frequently accessed state that must be shared and protected when packets from the same connection are processed. Ensuring that all packets in a TCP flow are processed by a single core reduces contention for shared resources, minimizes software synchronization, and enhances cache efficiency. (3) Network data affinity: Incoming network data should be steered to the same core on which its application process resides. This is becoming more important with the advent of Direct Cache Access (DCA) [6]. Network data affinity maximizes cache efficiency and reduces core-to-core synchronization. In a multicore system, the function of network data steering is executed by directing the corresponding network interrupts to a specific core (or cores). Receive Side Scaling (RSS) [7] is a NIC technology. It supports multiple receive queues and integrates a hashing function in the NIC. The NIC computes a hash value for each incoming packet. Based on hash values, NIC assigns packets of the same data flow to a single queue and evenly distributes traffic flows across queues. With Message Signal Interrupt (MSI/MSI-X) [8] support, each receive queue is assigned a dedicated interrupt and RSS steers interrupts on a per-queue basis. RSS provides the benefits of parallel receive processing in multiprocessing environments. Operating systems like Windows, Solaris, Linux, and FreeBSD now support interrupt affinity. When an RSS receive queue (or interrupt) is tied to a specific core, packets from the same flow are steered to that core (Flow pinning [9]). This ensures flow affinity on most OSes, with Linux being the major exception. However, RSS has a limitation: it cannot steer incoming network data to the same core where its application process resides. The reason is simple: the existing RSS-enabled NICs do not maintain the relationship in the NIC: Traffic Flows → Network applications → Cores
    Since network applications run on cores, the most critical relationship is simply: Traffic Flows → Cores (Applications) Unfortunately, RSS does not support such capability. This is symptomatic of a broader disconnect between existing software architecture and multicore hardware. With OSes like Windows and Linux, if an application is running on one core, while RSS has scheduled received traffic to be processed on a different core, poor cache efficiency and significant core-to- c

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut