Machine Learning

tinyML Talks: Twofold Sparsity: Joint Bit- and Network-level Sparse Deep Neural Network for…



Twofold Sparsity: Joint Bit- and Network-level Sparse Deep Neural Network for Energy-efficient RRAM Based CIM
Foroozan Karimzadeh
Postdoctoral Fellow
Georgia Institute of Technology

The rising popularity of intelligent mobile devices and the computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. AI-powered edge devices require compressed deep learning algorithms and energy efficient hardware. A traditional Von Neumann architecture suffers from the latency and power dissipation caused by intra-chip data communication. Therefore, Compute-in-memory (CIM) architecture has emerged to overcome these challenges by conducting the computations in the memory and reducing the need for data movement between memory and processing units. However, the current deep learning compression techniques are not designed to take advantage of CIM architecture. In this work, we proposed Twofold Sparsity, a joint bit- and network-level sparsity method to highly sparsify the deep leaning models by taking advantage of CIM architecture for energy-efficient computations. Twofold Sparsity method sparsify the network during training by adding two regularizations, one to sparsify the weights using Linear Feedback Shift Register (LFSR) mask, and the other one to sparsify the values in the bit-level by making bits to be zero. During inference, the same LFSR is used to choose the correct sparsed weights and 2bit/cell RRAM based CIM is responsible to do the computation. Twofold Sparsity method achieved 2.2x to 14x energy efficiency in different overall sparsity rates from 10% to 90% compared to the original 8-bit network and eventually enabling powerful deep learning models to be run on power constrained edge devices.

source

Authorization
*
*
Password generation