The Embedded Blog: Achronix turns to network-on-chip for AI accelerators in 7nm FPGA

Tuesday, May 21, 2019

Achronix turns to network-on-chip for AI accelerators in 7nm FPGA

By Nick Flaherty www.flaherty.co.uk

Achronix Semiconductor has launched its latest FPGA family aimed at artificial intelligence, machine learning and high-bandwidth data acceleration applications.

The Achronix Speedster7t family is based on a new architecture that is optimised for high-bandwidth workloads with a 2D network-on-chip (NoC), and a high-density array of new machine learning processors (MLPs) blocks optimised for high-bandwidth and AI/ML workloads. This blending of FPGA programmability and ASIC routing structures and compute engines boosts performance.

“The growth potential for AI/ML is astounding, and the use cases are rapidly evolving, and we are offering a new solution to address the varying requirements of high performance, flexibility and time to market,” said Robert Blake, president and CEO of Achronix Semiconductor. “Our Speedster7t family breaks new ground as the first solution to deliver FPGA adaptability with ASIC-like performance. We believe our new ‘FPGA+’ class of technology truly pushes the boundaries in the high-performance market.”

Manufactured on TSMC’s 7nm FinFET process, Speedster7t devices are designed to accept massive amounts of data from multiple high-speed sources, distribute that data to programmable on-chip algorithmic and processing units, and then deliver those results with the lowest possible latency. They include high-bandwidth GDDR6 interfaces, 400G Ethernet ports, and PCI Express Gen5 — all interconnected to deliver ASIC-level bandwidth while retaining the full programmability of FPGAs.

“The new Achronix Speedster7t FPGA family is a prime example of the explosion of innovative silicon architectures created to handle massive amounts of data that are aimed directly at AI applications”, said Rich Wawrzyniak, principal market analyst for ASIC and SoC at Semico Research Corp. “Combining math functions, memory and programmability into their machine learning processor, combined with the cross chip, two-dimensional NOC structure, is a brilliant method of eliminating bottlenecks and ensuring the free flow of data throughout the device. In AI/ML applications, memory bandwidth is everything and the Achronix Speedster7t delivers impressive performance metrics in this area.” Semico’s forecast shows the market size for FPGAs in AI applications will grow by 3x in the next four years to over $4.8B.

The massively parallel array of programmable compute elements within the new machine learning processors (MLPs) are highly configurable, compute-intensive blocks that support integer formats from 4 to 24 bits and efficient floating-point modes including direct support for TensorFlow’s 16-bit format as well as a block floating-point format that doubles the compute engines per MLP.

The MLPs are tightly coupled with embedded memory blocks, eliminating the traditional delays associated with FPGA routing to ensure that data is delivered to the MLPs at the maximum performance of 750 MHz. This combination of high-density compute and high-performance data delivery results in a processor fabric that delivers the highest usable FPGA-based tera- operations (TOps) per second.

The family includes GDDR6 high speed memory controllers capable of supporting 512 Gbps of bandwidth, the up to 8 GDDR6 controllers in a Speedster7t device can support an aggregate GDDR6 bandwidth of 4 Tbps, delivering the equivalent memory bandwidth of an HBM-based FPGA at a fraction of the cost.

Along with this memory bandwidth, Speedster7t devices include the industry’s highest performance interface ports to support extremely high-bandwidth data streams. Speedster7t devices have up to 72 of the industry’s highest performance SerDes that can operate from 1 to 112 Gbps plus hard 400G Ethernet MACs with forward error correction (FEC), supporting 4x 100G and 8x 50G configurations, plus hard PCI Express Gen5 controllers with 8 or 16 lanes per controller.

The 2D NoC spans horizontally and vertically over the FPGA fabric, connecting to all of the FPGA’s high-speed data and memory interfaces. Each row or column in the NoC is implemented as two 256-bit, unidirectional industry-standard AXI channels operating at 2 GHz, delivering 512 Gbps of data traffic in each direction simultaneously.

Most importantly, the NoC eliminates the congestion and performance bottlenecks that occur in traditional FPGAs that use the programmable routing and logic lookup table (LUT) resources to move data streams throughout the FPGA. This high-performance network not only increases the overall bandwidth capacity of Speedster7t FPGAs, but also increases the effective LUT capacity while reducing power.

The FPGAs include bitstream security features with multiple layers of defence for protecting bitstream secrecy and integrity. Keys are encrypted based on a tamper-resistant physically unclonable function (PUF), and bitstreams are encrypted and authenticated by 256-bit AES-GCM. To defend against side-channel attacks, bitstreams are segmented, with separately derived keys are used for each segment, and the decryption hardware employs differential power analysis (DPA) counter measures. A 2048-bit RSA public key authentication protocol is used to activate the decryption and authentication hardware.

The Speedster7t FPGA devices range from 363K to 2.6M 6-input LUTs. The first devices and development boards for evaluation will be available in Q4 2019.

www.achronix.com

All the latest quantum computer articles

Tuesday, May 21, 2019

Achronix turns to network-on-chip for AI accelerators in 7nm FPGA

No comments: