PhD defence of Amir Ardakani – Towards Multiplier-less Implementation of Neural Networks
Abstract
Artificial Intelligence (AI) has become profoundly embedded in contemporary life, with its applications proliferating across a wide array of domains. Central to AI are neural networks, which have markedly enhanced the capabilities of AI in areas such as computer vision and natural language processing. As neural networks scale in both size and computational complexity, the intelligent devices tasked with executing these networks face growing demands for computational and energy resources to ensure efficient and reliable performance. Consequently, resource-limited embedded devices, such as smartphones, encounter significant challenges in deploying state-of-the-art AI models. These devices frequently resort to cloud-based platforms, which necessitate continuous internet connectivity. This dissertation seeks to address these challenges by reducing the computational complexity of neural networks. Specifically, it targets the primary source of computational burden and a major contributor to energy consumption in neural networks: high-precision multipliers (e.g., 16-bit or 8-bit multipliers). We propose novel implementations of neural networks that either markedly reduce the bit-width of multipliers (to 4 bits or fewer) or entirely replace them with simpler logic operations (e.g., XNOR and shift operations). In our initial implementation of neural networks, we present a novel approach for training multi-layer networks utilizing Finite State Machines (FSMs). In this approach, each FSM is interconnected with every FSM in both the preceding and subsequent layers. We demonstrate that the FSM-based network can effectively synthesize complex multi-input functions, such as 2D Gabor filters, and perform non-sequential tasks, such as image classification on stochastic streams, without the need for multiplications, given that FSMs are implemented solely through look-up tables. Building on the FSMs' capability to handle binary streams, we propose an FSM-based model specifically designed for handling time series data, applicable to temporal tasks such as character-level language modeling. In our second implementation, we introduce an advanced stochastic computing (SC) representation termed the dynamic sign-magnitude (DSM) stream. This representation is specifically designed to enhance the precision of short-sequence SC-based multiplication. The DSM framework facilitates the substitution of conventional neural network multiplications with more efficient bitwise XNOR operations. By employing DSM, we achieve a reduction in the sequence length for SC-based neural networks by a factor of 64, while maintaining accuracy levels comparable to existing methodologies. In our third implementation, we propose an innovative base-2 logarithmic quantization scheme for neural networks. This scheme quantizes weights into discrete power-of-two values by leveraging information about the network’s weight distribution. This method allows us to replace computationally intensive high-precision multipliers with more efficient shift-add operations. Consequently, our quantized networks exhibit approximately eight times fewer parameters compared to high-precision networks, without compromising classification accuracy. In our latest implementation, we introduce a novel training framework that utilizes quantization techniques to facilitate the conversion between quantized networks and spiking neural networks (SNNs). SNNs are inherently devoid of multiplications, relying instead on addition and subtraction. This new framework offers an alternative approach for training SNNs. Specifically, we modify the SNN algorithm and mathematically demonstrate that after T time steps, the modified SNN approximates the behavior of a quantized network with T quantization intervals. This allows for the replacement of any SNN with its corresponding quantized network for training purposes and then transfer the parameters from the trained quantized network to the SNN without additional steps.