Field Programmable Gate Arrays are well-known for their ability to accelerate artificial intelligence / machine-learning applications. But how are FPGAs implemented and what are the differences? Let’s take a look at the engineering design space.
FPGAs and AI: Implementing AI
Artificial intelligence (AI), is a hot topic for both edge and cloud applications. AI can often make systems safer, more efficient, or secure. Although artificial intelligence has been around for a while, it was first introduced to the public in 1956 by John McCarthy at the first conference on artificial Intelligence. Although significant research has been done over the years, AI systems have only been able to move from research and lab into products and road maps.
Machine Learning (ML) is one of the most widely used forms of AI in cloud and edge environments. Machine learning refers to the study of computer algorithms that enable computers to improve their performance through experience. This is done by providing a ML network that includes images with labels. The machine learning algorithm recognizes elements and features of an image. When a new image is added, the ML algorithm calculates how likely it is that the image contains any of the learned elements and features. These ML algorithms can detect objects in images and process keywords in speech to analyze sensor data for anomalies. These algorithms are used in vision-guided robotics and autonomous operation of vehicles. They also provide prognostication for industrial and safety critical systems.
The ML learning algorithms can be divided into two parts. The first is the training of the network against a training dataset. The second element is the deployment of the network in the field. These elements are known as training and inference. To train accurate models, you need a large dataset that is labelled. This data can often be used on cloud-based GPUs to speed up the training process. The trained network can be deployed by design engineers across a variety of technologies, including MCU, GPU, and FPGA. Caffe and TensorFlow are two of the most popular frameworks. Pytorch and Caffe are used to aid in training and deployment of AI/ML systems. These frameworks can be used for network definition, training, or inference.
Many edge-based AI systems have the ability to infer within a defined time frame. This is one of their key features. Autonomous vehicles must be able to quickly detect pedestrians, vehicles and obstacles in order to avoid collisions. This requires a system that is both responsive to the inputs and deterministic.
Edge-based solutions developers often target FPGA and heterogeneous SOC solutions due to their requirements for responsivity. This provides a developer with a programmable logic that is ideal for creating machine learning networks. Its parallel nature allows both a responsive and highly deterministic application.
Two approaches are possible when it comes to Machine Learning inference using programmable logic. No matter which approach you choose, neural networks are trained using floating point mathematics. However, FPGA and heterogeneous SoC implementations typically use fixed-point implementations. Quantization is the process of converting floating to fixed points. This can lead to a slight decrease in inference accuracy. However, most applications can use additional training using quantized weights or activations to recover that accuracy.
The first method implements the neural network within the programable logic. The network is loaded with the trained weights to make inferences at them. This can be done either in run time or during compilation / synthesis of the design.
The AMD-Xilinx FINN Network is an example of such neural networks. It can be used for quantized neural networks to be implemented in FPGAs. These quantized neural networks can be implemented using a quantized network with binary weights, two-bit activations and quantized neural connections.
An alternative to directly implementing the neural network in the FPGA logic is to use a highly specialized neural accelerator. The neural network accelerator is implemented within the programmable logic. It is closely coupled with the DDR memory with high-bandwidth links and the dedicated processors in the heterogeneous SOC. The software application provides the network, weights, activations, and biases to applications that use the neural network accelerator. This makes it easier to integrate ML inference into the overall application. The AMD-Xilinx Deep Learning Unit is an example of a neural network accelerator. It can work with networks in Pytorch and TensorFlow, and can also perform quantization, retraining and program generation. This allows for easier integration with the current application.
Quantized neural networks can be achieved in FPGAs using less resources as there is no need for external DDR support or SoC-based system support. A specialized neural network accelerator is best for high performance and accuracy. It also makes integration easier and often results in a more efficient solution. This is why several vendors use this approach to their AI solutions.
Last Thoughts
As in many cases, the choice of solution depends on the end application. AI may be a dominant marketing component of the solution. Real-world AI is often only one component of the solution. Sensor interfacing, preprocessing, actuator drive, and other components that make up the solution all have their own requirements and constraints.
Programmable logic allows developers to create deterministic and responsive AI/ML solutions for a variety of applications. These solutions integrate with industry standard frameworks, allowing the developer to concentrate on the value-added activity.