Next: The HP
Integrity Superdome. Up: Recount of
(almost) available ... Previous: The Hitachi
SR11000.
Machine type |
RISC-based ccNUMA system. |
Models |
HP 9000 SuperDome. |
Operating system |
HP-UX (HP's usual Unix flavour) |
Connection structure |
Crossbar |
Compilers |
Fortran 77, Fortran 90, Parallel Fortran, HPF, C, C++ |
Vendors information Web page |
http://www.hp.com/products1/servers/scalableservers/superdome/ |
Year of introduction |
2000, 2004 with PA-RISC 8800. |
System parameters:
Model |
HP 9000 SuperDome |
Clock cycle |
1 GHz |
Theor. peak performance |
Per proc. (64-bits) |
8 Gflop/s |
Maximal (64-bits) |
512 Gflop/s |
Main memory |
Memory/node |
≤ 64 GB |
Memory/maximal |
1 TB |
No. of processors |
≤ 64 |
Communication bandwidth |
aggregate (global) |
64 GB/s |
(cell—backplane) |
8 GB/s |
(within cell, see below) |
16 GB/s |
Remarks:
The Superdome replaced the Exemplar V2600 system which has been
withdrawn by HP (see section Systems
Disappeared from the List). The connection structure of the
Superdome has significantly improved over that of the former V2600.
The Superdome has a 2-level crossbar: one level within a 4-processor
cell and another level by connecting the cells the crossbar
backplane. Every cell connects to the backplane at a speed of 8 GB/s
and the global aggregate bandwidth for a fully configured system is
therefore 64 GB/s.
As said, the basic building block of the Superdome is the
4-processor cell. All data traffic within a cell is controlled by
the Cell Controller, a 10-port ASIC. It connects to the four local
memory subsystems at 16 GB/s, to the backplane crossbar at 8 GB/s,
and to two ports that each serve two processors at 6.4 GB/s/port. As
each processor houses two CPU cores the available bandwidth per CPU
core is 1.6 GB/s. Like the SGI Altix
systems, the cache coherency in the Superdome is secured by
using directory memory. The NUMA factor for a full 64 processor
systems is by HP's account very modest: only 1.8.
The PA-RISC 8800
processors run at a clock frequency of 1 GHz. As each processor
contains two processor cores which in turn contain 2 floating-point
units that are able to execute a combined floating multiply-add
instruction, in favourable circumstances 8 flops/cycle can be
achieved and a Theoretical Peak Performance of 8 Gflop/s per
processor can be attained. This amounts to a peak speed of 512
Gflop/s for a full configuration. Because a shared-memory
parallel model is supported over the entire system, OpenMP can be
employed on the total of 64 processors (128 CPU cores). The
Superdome can be partitioned in different complexes that run with
different processors, e.g., the Itanium 2. In that case the same
backplane can be used but the cells are of a different type. In
theory one therefore can have a mixed HP 9000 Superdome and an Integrity
Superdome (see below).
Measured Performances: From the new model with the
dual-core PA-RISC 8800 processors no performance results (in the HPC
realm) are known to the author, the system in on the market from
April 2004. In [42]
a speed of 756 Gflop/s is reported for solving a full linear system
of unspecified size. This result is achieved on an older 8-way
coupled system with a total of 512 PA-RISC 8700+ processors at 875
MHz. As the Theoretical Peak Performance of such a cluster is 1792
Gflop/s the efficiency is 42%.
Next: The HP
Integrity Superdome Up: Recount of
(almost) available ... Previous: The Hitachi
SR11000.
Aad van der Steen Wed Oct 13 11:33:00 CEST 2004
|