The Embedded Blog: AI chip reaches 1petaOps/s

Friday, November 15, 2019

AI chip reaches 1petaOps/s

By Nick Flaherty www.flaherty.co.uk

US startup Groq has developed a single chip for machine learning that is capable of 1 PetaOp/s performance.

The architecture is also capable of up to 250 trillion floating-point operations per second (FLOPS) and has been used to create the Tensor Streaming Processor shown on this PCIe board which is currently being tested by customers

“We are excited for the industry and our customers,” said Jonathan Ross, Groq’s co-founder and CEO. “Top GPU companies have been telling customers that they’d hoped to be able to deliver one PetaOp/s performance within the next few years; Groq is announcing it today, and in doing so setting a new performance standard. The Groq architecture is many multiples faster than anything else available for inference, in terms of both low latency and inferences per second. Our customer interactions confirm that. We had first silicon back, first-day power-on, programs running in the first week, sampled to partners and customers in under six weeks, with A0 silicon going into production.”

The architecture provides both compute flexibility and massive parallelism without the synchronization overhead of traditional GPU and CPU architectures. Groq’s architecture can support both traditional and new machine learning models, and is currently in operation on customer sites in both x86 and non-x86 systems.

Groq’s new, simpler processing architecture is designed specifically for the performance requirements of computer vision, machine learning and other AI-related workloads. Execution planning happens in software, freeing up valuable silicon real estate otherwise dedicated to dynamic instruction execution. The tight control provided by this architecture provides deterministic processing that is especially valuable for applications where safety and accuracy are paramount.

Compared to complex traditional architectures based on CPUs, GPUs and FPGAs, Groq’s chip also streamlines qualification and deployment, enabling customers to simply and quickly implement scalable, high performance-per-watt systems.

“Groq’s solution is ideal for deep learning inference processing for a wide range of applications,” said Dennis Abts, Chief Architect at Groq, “but even beyond that massive opportunity, the Groq solution is designed for a broad class of workloads. Its performance, coupled with its simplicity, makes it an ideal platform for any high-performance, data- or compute-intensive workload.”

This is similar to the architecture launched by Blaize earlier this week, with a focus on software planning and a simpler chip - see AI CHIP STARTUP TAPS TWO UK DESIGN TEAMS

www.groq.com

All the latest quantum computer articles

Friday, November 15, 2019

AI chip reaches 1petaOps/s

No comments: