Search This Site Search This Site
 
About Solutions Products Partners How To Buy Benchmarks Support Home
     
  News/Events  
     
STAC report highlights 10GbE benefits

Using the Reuters Market Data System (RMDS) running on
Chelsio's adapters with TCP/IP offload technology
STAC reported record performance

April 22, 2008

Abstract
Trading systems handle network-IO intensive market data and are known to be extremely latency stringent. Ethernet has traditionally not enjoyed the status of being the fabric of choice among the IT heads in the financial world. InfiniBand, so far, has been accepted for high performance computing (HPC) environments because of its latency leadership. This white paper illustrates how the 10G Ethernet solution provided by Chelsio with its protocol -offload technology meets the requirements of the HPC cluster infrastructure. This low-latency solution compounded with its high performance and Ethernet's inherent richness in scalability, flexibility, and cost-effectiveness makes a strong case for Ethernet to be used as the interconnect technology for the HPC environment.

The Securities Technology Analysis Center (STAC) recently measured the performance of the Chelsio NIC using the Reuters Market Data System (RMDS6) benchmarking utility. The main factors measured were:

·        Latency

·        Standard deviation

·        Throughput

Chelsio's TCP/IP offload (TOE) technology
Chelsio's adapters are equipped with Chelsio's flagship third-generation Terminator engine (T3). This chip implements a fully IETF RFC standards compliant TCP/IP stack, thereby bypassing all software processing between the network interface and the application layer, including connection setup and teardown, timer management, retransmissions (at microsecond level resolution) and other exception handling. This TOE technology increases the server performance while reducing the CPU utilization. Due to the cut-through processing of the Terminator ASIC's VLIW architecture the latency is kept low.

In addition, it provides unparalleled performance through a specialized pipelined data flow processor implementation and a host of features designed for high throughput and low latency in demanding conditions and networking environments, while using standard size 1500B Ethernet frames. T3 can effectively hide packet loss in the network, and shield the host systems on both ends from its effects, thereby allowing Ethernet to be used as if it were a virtually lossless fabric.

The T3 ASIC uses the mechanism of Direct Data Placement (DDP)  that provides a flexible zero copy on receive capability for regular TCP connections, requiring no changes to the sender, the wire protocol, or the socket API on sending or the receiving side.

Reuters Market Data System 6 (RMDS 6)
RMDS 6 is the latest version of Reuters' market data platform. It is a system that distributes real-time market information from a variety of market data sources to a range of analytical, display and trading applications. This tool is popular as it is ideal for benchmarking high volume applications.

Test Set-up

Server Specification
Vendor Model IBM eServer BladeCenter HS21
Processors 2
Processor type Dual-Core Intel Xeon 5160 @ 3.00 GHz
Cache 4MB Integrated L2 Cache split between 2 cores
Bus speed 1.333 MHz
Memory 4 GB (2x2048 MB) DDR DIMMS
Disk 73 GB SAS

Networking Equipment Specification
Switch Blade Network Technologies’ Nortel 10Gb Ethernet Switch Module for IBM BladeCenter H, Software version 1.0.3
NIC Chelsio NIC S320EM-BCH
NIC driver cxgb3, version 1.0.129a
NIC firmware T 5.0.0 TP 1.1.0
NIC BIOS BCE1.08

Operating System Specification
Version Red Hat Enterprise Linux 5.1 beta, 32-bit Kernel 2.6.18-36.el5
OS services All OS daemons were stopped with the exception of : init, udevd, auditd, audispd, syslogd, klogd, sshd, smartd, mingetty



Figure 2: End-to-End Infrastructure

To maximize performance, STAC chose a multiplex topology, in which two Point-to-Point Server (P2PS) instances fed 2 client apps per P2PS. Two publishing apps and two source distributors supplied the data. This sort of stacked topology effectively co-locates multiple P2PS instances.

 

Each blade server had two Chelsio 10Gb/s interface ports, the TCP traffic between the publishing apps and source distributors and between the P2PS instances and client apps used one network (Network A) and the UDP multicast traffic between the source distributor box and P2PS box used a separate interface port and network (Network B) as shown in Figure 3.


Performance comparison with InfiniBand
Using RMDS 6, STAC recently tested for latency and bandwidth over the InfiniBand fabric. The results from that test form a good basis for comparison with the results from testing using the 10G Ethernet technology.

The following graphs compare the Chelsio performance with that of InfiniBand in the order of:

  • Latency
  • Standard deviation of latency
  • Bandwidth

Inference

  • Lowest mean latency ever reported with RMDS
    • Less than 0.9 milliseconds of end-to-end infrastructure latency at up to 600,000 updates per second in the low-latency configuration of RMDS

Inference

  • Lowest standard deviation of latency ever reported with RMDS
    • Less than 0.5 milliseconds at rates up to 600,000 updates per second

Inference

  • Very high output rate in the ?"Producer 50/50" fan-out test of a stacked P2PS
    • 5.8 million updates per second
    • 30% of this due to the TCP/IP Offload Engine (TOE) in the Chelsio NIC

Conclusion
The STAC report highlights the clear benefits of 10G Ethernet and is a harbinger of a period where the Ethernet technology will become ubiquitous across all of today's demanding HPC, storage and server networking environments.

The results collected using Chelsio's low-latency and high performance adapters have brought to light the following important conclusions about 10G Ethernet:

  • Lowest mean latency ever reported with RMDS using 10G Ethernet, 1G Ethernet or InfiniBand
  • Lowest standard deviation of latency ever reported with RMDS using 10G Ethernet, 1G Ethernet or InfiniBand
  • 30 % improvement in bandwidth with RMDS over traditional NICs

10G Ethernet is going to be a very strong player in the HPC space and as a very viable alternative to InfiniBand. Chelsio's hardware is specifically designed to dramatically improve cluster performance by reducing the application latency while keeping the CPU utilization at a minimum.
Appendix

References

1) STAC report: RMDS6/RHEL 5.1/Intel Xeon/IBM BladeCenter/BNT/Chelsio
2) STAC report: RMDS6/RHEL4/HPDL380/Xeon(2xDualCore)/VoltaireInfiniBand

About | Solutions | Products | Partners | How To Buy | Support | Contact | Careers | Legal | Privacy Policy | Home |

© Copyright 2007 Chelsio Communications