Dell has launched the first server to use intelligent processing unit (IPU) technology from Graphcore, fitting 16 high performance machine learning processors into a 4U server and improving the power performance.
The DSS8440 uses technology developed by the Bristol, UK, company to connect the 16 processors that can run over 100,000 completely independent programs, all working in parallel on a machine intelligence knowledge model for data centres.
The new version provides 1.6PetaFLOPs of performance in a power envelope of 2.5kW in the 4U rack. This is 60% more performance for 20% less power than the previous designs based on graphics processor units (GPUs). This can be used to provide hgher peak performance for inference training of AI models using Graphcore's Poplar deelopment environment for less power, or provide 1PFLOP performance for half the power consumption.
Each of the Graphcore IPUs has 1,216 machine learning processor cores, each running 6 processor threads, to provide 7,296 threads on a chip. Each IPU-Core is tightly coupled to 256kB of very fast local In-Processor-Memory. Overall the IPU has roughly 300MB of memory with a memory bandwidth of 45Tbit/s with no off-chip memory to avoid memory bandwidth issues, delivering 250TFLOPS. Every core connects directly to the IPU-Exchange, a crossbar in the middle of the die which can transfer 62.5Tbit/s of data and connects to the host processor through a 16-lane PCI Express interface.
Two IPU chips are combined in the Graphcore C2 dual slot PCIe card. This provides 80 IPU-Links, with each Link at 32Gbps, for a total of about 2.5Tbit/s or 450GByte/s of chip to chip bandwidth. On the C2 card, 192GB/s of IPU-Link bandwidth is used to connect the two IPUs on the C2 card itself, while 256GB/s of IPU bandwidth is used to connect C2 cards together. This card consumes 315W of power and is passively cooled.