A Chelsio Communications White Paper
Introduction
Over the years, computer networks have evolved from centralized mainframes to decentralized clusters, where computing processing power is spread throughout an organization.
The powerful processing capabilities of these clusters — due largely to the scalability and flexibility of affordable servers — have made clusters extremely popular for a number of different applications, particularly high-performance computing (HPC) environments. As servers have grown cheaper and more powerful, it has become common to cluster larger numbers of less powerful servers to build highly available, cost-effective networks that deliver the same high performance capabilities of HPC networks.
Ethernet — reliable, plentiful and affordable — has been the primary network technology for supporting the vast majority of network connectivity solutions. However, due to a lack of truly high-performance Ethernet products, most of these high-performance computing solutions couldn’t use Ethernet as their interconnect solutions. In order to maintain the high performance of clustered networks, servers were forced to use specialized interconnects that added cost and complexity to the overall solution.
That’s all changing. Today, with the availability of high-performance 10 Gigabit Ethernet technology, clustered networks can finally use this common and dependable network access method to interconnect their servers — while maintaining the high-performance capabilities they have grown to expect.
High-performance Computing
High-performance clusters are used for a variety of applications: biosciences, scientific research, mechanical design, defense, geoscience, imaging, financial modeling — in short, any field or discipline that demands intense, powerful compute processing capabilities.
These applications can be divided into two basic categories. In the first category, applications depend on the compute processing capabilities of a computer to complete their work; in the second category, applications that are I/O oriented need interconnects to be highly throughput-oriented to maintain performance.
From the interconnect perspective, there are three main requirements for clusters to effectively serve these applications:
- High Throughput: High throughput capabilities are critical for throughput-oriented applications
- Low latency: In a typical cluster, lots of nodes with moderate processing capabilities are joined to achieve super computer-like processing capabilities. In this environment, the parallel application splits single, large tasks into multiple smaller tasks and distributes them to all the nodes in the cluster; the original larger task will only be completed once the individual nodes are done processing their specific smaller tasks. In this process, latency plays a crucial role in passing these messages between the nodes; if latency is not below a certain threshold, then the performance of the application degrades as processing time increases.
- Scalability: Although high throughput and low latency help individual nodes achieve high performance, that performance needs to be maintained as the number of nodes in the cluster increases. The true strength of a clustered network is its ability to maintain performance as it grows. If the interconnect doesn’t scale, then overall performance suffers as the cluster expands.
Interconnects for HPC
To support the throughput and latency requirements of a typical HPC environment, multiple high-performance interconnects are typically used — primarily Myrinet and QsNet. Infiniband has also made some inroads as an interconnect technology.
Myrinet and QsNet support link bandwidths of 2 gigabits-per-second (Gbps) and 10 Gbps, respectively, while Infiniband supports 10 Gbps. All three interconnects deliver the impressive performance required for HPC environments.
However, while these interconnects provide adequate performance and support for HPC environments, they do have some drawbacks:
- Training: Most IT administrators are familiar with Ethernet, which is typically used in their networking environment. In order to deploy a new technology in an HPC environment, the IT staff must spend time and money on training in order to understand and grow familiar enough to support the technology.
- Vendor dependency: Myrinet is developed by Myricom, QsNet is developed by Quadrics, and Infiniband is developed by a mere handful of vendors. End users don’t have much choice to select best-of-breed solutions; once a technology is selected, the user is essentially locked-in with a single vendor. With competition virtually non-existent for these three technologies, there is little incentive for the sole-source providers to reduce their prices.
- Total solution: Switches are used to connect multiple nodes in a cluster using these interconnects. These switches must be purchased from the same interconnect vendors. For Infiniband, though it is standards-based and a few different vendors are selling IB switches, it is preferred to buy the switch from the same vendor that provided the interconnect technology. As a result, IT managers can not buy best-of-breed solutions from different vendors; a complete solution must be purchased from a single vendor.
- Management and diagnostics tools: Vendors typically develop proprietary management tools for their own products. Therefore, one tool may not work with another vendor’s interconnects, reducing choice and once again eliminating the option of selecting best-ofbreed solutions from preferred providers.
- Multiple software packages: In order for these interconnects to work with different networks, different software libraries are required. For instance, using Infiniband solutions in an IP network requires modules such as IP over IB (IPoIB) and Sockets Direct Protocol (SDP), adding layers of software that ultimately degrade performance. The same holds true for QsNet and Myrinet.
Despite the numerous drawbacks, the lack of viable alternatives is forcing HPC users to stick with these interconnects for their high-performance needs.
Meanwhile, standards-based Ethernet — which has none of the drawbacks of the other interconnect technologies — has flourished in other segments of the network market. Unfortunately, the lack of a true high-performance solution has hampered Ethernet’s adoption by the HPC market.
That is not to say Ethernet has not made some inroads into the HPC arena. A TOP500 Super Computers survey published in November 2003 reported that a good 20 percent of HPC installations used Ethernet as the interconnect technology, with the top Ethernet supercomputer installation ranking 25th in performance.
The clear message: users want to use a familiar technology like Ethernet. And with the necessary performance boost, Ethernet could easily become the preferred HPC interconnect technology.
Emergence of 10 Gigabit Ethernet
That performance boost has been realized.
Over the years, Ethernet has proved surprisingly resilient, evolving into the default interconnect for networked servers around the globe. Starting as a relatively simple 10 Mbps solution, Ethernet has gained significant market share as successive generations of more powerful 100 Mbps and 1 Gbps solutions were released, gaining favor due to its compatibility with previous generations and its lower costs, especially compared to other interconnect technologies. Users deploying the latest Ethernet products could leverage their expertise — and their previous investments — making Ethernet a highly desirable networking solution.
The emergence of 10 gigabit-per-second Ethernet (10 GbE) carries on the tradition, bringing with it the same compatibility and easy migration expectations as its predecessors. Featuring 10 times the bandwidth of the previous 1 Gbps Ethernet technology, and with the cost dropping rapidly, 10 GbE has emerged as a viable technology that is ready to enter the mainstream.
While some vendors, hoping to capitalize on this new opportunity, have announced the availability of 10 GbE network adapters, and switch vendors have been selling 10 GbE switch ports — used primarily for inter-switch links — for some time, cost has prevented the technology from achieving “prime time” status.
However, as the price of 10 GbE NICs continues to fall, and as the cost of switch ports drops, 10 GbE is rapidly becoming an affordable solution. Switch port prices, for instance, have fallen 87 percent in the last two years, while 10 GbE NIC prices have dropped by as much as 50 percent retail. Given the history and popularity of Ethernet, once the price of a 10 GbE switch port drops to five to 10 times the price of 1 GbE switch port, 10 GbE port shipments will grow in volume, causing the price to drop even further. This could happen sooner rather than later; currently, 10 GbE solutions for fiber and CX4 for copper have been released into the market, while 10 GbE solutions for unshielded twisted pair (UTP) — the most popular medium for Ethernet over copper cable — will be ready in 2006, driving cost down even further.
Processing: The 10 Gigabit Ethernet Bottleneck
Though processing power is doubling every 18 months, Ethernet speeds are far ahead of the processing speeds. A standard rule of thumb is that every MHz of CPU processes 1 Mbps of data; at this rate, to process 10 Gbps of Ethernet traffic, a 10 GHz processor is required, while a 20 GHz processor is required to process full duplex traffic. Clearly, processing speeds are a serious obstacle preventing 10 GbE from realizing its true potential.
But the processor isn’t the only bottleneck. Memory bandwidth and the host bus interface driving the traffic are also hindrances to achieving full 10 GbE throughput. The PCI-X 2.0 and PCI-Express, slated for release next year, will address the host bus issues, leaving memory bandwidth as the most serious hurdle to be cleared.
In any application, data is copied from user to kernel; due to multiple trips to the memory imposed by this process, memory bandwidth is not enough to sustain this activity. A standard that would reduce these trips on the system bus — Remote Direct Memory Access (RDMA) — has already been ratified by the RDMA consortium and is in the IETF for it to be standardized.
With the RDMA standard, data moves directly from one end user’s application space to the remote-end user’s application space, enabling the server to use fewer processing cycles to process data while reducing the latency imposed by moving the data. The RDMA mechanism has already been tested in Infiniband, resulting in significantly lower latency for that particular interconnect technology. By using RDMA, 10 GbE can achieve similarly low latency — a huge achievement.
Chelsio’s 10 Gigabit Ethernet solution
Chelsio currently offers the T110 10 GbE TCP/IP Offload Engine (TOE) HBA. The T110 not only offers all the features of 10 GbE NIC, but it also completely offloads the TCP/IP protocol stack from the processor, thereby relieving the CPU from this compute-intensive task and freeing up processing cycles for improving application performance.
The T110 delivers the following advantages for 10 GbE environments:
- Performance: Powered by Chelsio’s unique Terminator architecture, the TOE ASIC achieves near line-rate throughput, gated only by PCI-X 1.0(133MHz) bandwidth limitations.
- Latency: Due to the flow-through processing of the Terminator’s VLIW architecture, the T110’s packet processing latency is kept below 10μs, making it a suitable candidate for HPC applications.
- Scalability: The T110’s scalable architecture maintains flat performance from one connection to thousands of connections, allowing the network to grow without degrading performance.
The T110 for HPC Environments
The T110, with its high throughput, low latency and scalability features, delivers a solution that is the equal of, if not superior to, the Myrinet, QsNet and Infiniband interconnects. Offering better performance, backed by familiar Ethernet technology, the T110 eliminates the disadvantages faced by the other interconnect technologies in HPC environments.
- No training required: With the experience gained over the years with 100 Mbps Fast Ethernet and Gigabit Ethernet, the IT team is already familiar with any Ethernet-based technology. Deploying the T110 requires no additional training.
- Vendor independence: Because it is a standards-based product, the T110 will work with any available 10 GbE switch — compatibility is ensured. The T110 has also been tested with various operating systems and by various OEMs to further demonstrate the robustness of the product.
- Management tools: Any standards-based third-party Ethernet management tools work with the T110.
With the abundance of vendors in the Ethernet market, and the number of available products growing, the total cost of ownership for installing and maintaining 10 GbE networks with the T110 is low and getting lower. Compared to other 10 GbE products, the T110 delivers better throughput, lower latency and greater scalability for HPC environments.
In addition, the T110 supports the iSCSI protocol, enabling it to support storage area networks (SAN), as well as HPC, network-attached storage (NAS) and local area network (LAN) environments. Traditionally, each of these networks have their own interconnects — Fibre Channel for storage, low-latency interconnects for HPC and Ethernet for LAN/NAS. The T110 can be used as a single interconnect in any server to support all of these networks, leading to the convergence of these heterogeneous environments into a single, common network.
Conclusion
While Ethernet is one of the world’s most popular network interconnect technologies, it has struggled to gain a foothold in the HPC market due to perceived performance deficiencies. Meanwhile, the interconnect technologies that have so far been used for HPC all have certain drawbacks.
With the emergence of 10 GbE, Chelsio has taken a leading role in this market segment with the introduction of the T110 TOE HBA. The T110, which has demonstrated line-rate throughput even with relatively low-speed processors, stands out as a strong contending interconnect for HPC environments due to its performance benefits and Ethernet-based technology. Chelsio’s T110 has greater throughput and comparable latency compared to specialized interconnects, and is driving the convergence of LANs, SANs and NAS by supporting multiple protocols.

For more information about Chelsio Communications and the T110 TOE HBA, visit the Chelsio web site at www.chelsio.com or send an e-mail to info@chelsio.com.