Cerebras Systems has developed a waferscale accelerator for AI applications using a novel power integrity monitoring infrastructure.
“Our mission is to transform the landscape of compute by accelerating performance for AI by orders of magnitude over today,” said Dhiraj Mallick, VP of engineering at Cerebras Systems. “Both the massive size and shape of the problem make it difficult for today’s infrastructure and precludes the most interesting problems from being tackled. Computer requirements for AI have increased 300,000 fold in the last 8 years, this is a doubling every 3.4 months compared to 24 months for Moore’s Law. So we need a new computer solutions,” he said.
The Cerebras approach is to use a whole wafer for the AI processing. This system is 46,000 sq mm in size with 400,000 cores, each purpose built for deep learning and 18Gbytes of SRAM. These are connected directly in the silicon with a 2D mesh providing a 100Pbit/s fabric.
The power system is vital for such a massive system, and Cerebras used nearly 1000 instances of a power macro from Analog Bits.
“One of the challenges to overcome is power integrity,” said Mallick. “This includes the ability to monitor power events and take corrective actions at very high speeds. We have hundreds of thousands of cores where dynamic current surges can cause catastrophic failures. So we distributed 840 analog glitch detectors across the waferscale chip to provide real time health data.”
This can detect anomalies with significant higher bandwidth than other approaches and so catch short duration events. This comes from a sensitivity of 5pV for monitoring the power supply in real time. “This provides a wealth of data to optimise instantaneous current spikes,” he said.
“Our power supply glitch detector has an integrated voltage reference and the macro is easy to integrate with no additional components or special power requirements,” said Mahesh Tirupattur, executive vice president at Analog Bits.
The asynchronous macro is