Next: Myrinet
Up: The Main
Architectural Classes Previous: Networks
Infiniband is rapidly becoming a widely accepted medium for
internode networks. The specification was finished in June 2001.
From 2002 on a number of vendors has started to offer their products
based on the Infiniband standard. A very complete description (1200
pages) can be found in [32].
Infiniband is employed to connect various system components within a
system. Via Host Channel Adapters (HCAs) the Infiniband fabric can
be used for interprocessor networks, attaching I/O subsystems, or to
multi-protocol switches like Gbit Ethernet switches, etc. Because of
this versatility, the market is not limited just to the
interprocessor network segment and so Infiniband is expected to
become relatively inexpensive because a higher volume of sellings
can be realised. The characteristics of Infiniband are rather nice:
there are product definitions both for copper and glass fiber
connections, switch and router properties are defined and for high
bandwidth multiple connections can be employed. Also the way
messages are broken up in packets and reassembled as well as
routing, prioritising, and error handling are all described in the
standard. This makes Infiniband independent of a particular
technology and it is, because of its completeness, a good basis to
implement a communication library (like MPI) on top of it.
Conceptually, Infiniband knows of two types of connectors to the
system components, the Host Channel Adapters (HCAs), already
mentioned, and Target Channel Adapters (TCAs). The latter are
typically used to connect to I/O susbsystems while HCAs does more
concern us as these are the connectors used in interprocessor
communication. Infiniband defines a basic link speed of 2.5 Gb/s
(312.5 MB/s) but also a 4× and 12× speed of 1.25 GB/s and 3.75 GB/s,
respectively. Also HCAs and TCAs can have multiple ports that are
independent and allow for higher reliability and speed.
Messages can be sent on the basis of Remote Memory Direct Access
(RDMA) from one HCA/TCA to another: a HCA/TCA is permitted to
read/write the memory of another HCA/TCA. This enables very fast
transfer once permission and a write/read location are given. A port
together with its HCA/TCA provide a message with a 128-bit header
which is IPv6 compliant and that is used to direct it to its
destination via cut-through wormhole routing: In each switching
stage the routing to the next stage is decoded and send on. Short
messages of 32 B can be embedded in control messages which cuts down
on the negotiation time for control messages. Infiniband
switches for HPC are offered with 8--128 ports and always at a speed
of 1.25 GB/s. The switches can be configured in any desired topology
but in practice a fat tree
topology is almost always preferred. It obviously depends on the
quality of the MPI implementation put on top of the Infiniband
specifications how much of the raw speed can be realised. A
Ping-Pong experiment on an Infiniband-based cluster has shown a
bandwidth of around 850 MB/s and an MPI latency of < 7 µs for
small messages. The in-switch latency is typically about 200 ns.
Presently, the price per port is still somewhat higher than for
that of Myrinet the market leader. However, when Infiniband would
take on, the price can drop significantly and become a serious
competitor for Myrinet.
Because of the recent availability of PCI Express, 12-wide
Infiniband connections would make technical sense. Whether it would
also be commercially viable remains to be seen
Next: Myrinet
Up: The Main
Architectural Classes Previous: Networks
Aad van der Steen Tue Oct 12 12:00:28 CEST 2004
|