tinyML Summit 2021 tiny Talks: CUTIE: Multi-PetaOP/s/W Ternary DNN inference Engine for TinyML

tinyML Summit 2021 https://www.tinyml.org/event/summit-2021
CUTIE: Multi-PetaOP/s/W Ternary DNN inference Engine for TinyML
Moritz SCHERER, PhD Student, ETH Z├╝rich

With the surge in demand for deeply embedded deep learning on increasingly power-constrained, devices, neural network inference engines must continue to improve in terms of energy efficiency. In recent years especially, accelerators for networks with binary and ternary weights and activations have been addressing this demand, achieving energy efficiencies that are orders of magnitude higher than byte precision accelerators. We address the main bottlenecks for energy efficiency in binary and ternary neural networks accelerators and present CUTIE, the Completely Unrolled Ternary Inference Engine.

The design of CUTIE is focused on minimizing non-computational energy and switching activity so that dynamic power spent on storing intermediate results is minimized. We achieve this by 1) a data path architecture completely unrolled in the feature map and filter dimensions to reduce switching activity by favoring silencing over iterative computation and maximizing data re-use, 2) targeting ternary neural networks which, in contrast to binary NNs, allow for sparse weights which reduce switching activity, and 3) exploiting an optimized training method for higher sparsity of the filter weights, resulting in a further reduction of the switching activity.

We demonstrate that our architecture achieves better-than-binary inference accuracy at dramatically higher energy efficiency. We present power simulation data showing an average energy efficiency of 2.1 POp/s/W, while achieving 88% inference accuracy on CIFAR-10 at an energy cost of 520 nJ, outperforming the state-of-the-art, including compute-in-memory (CIM) approaches, by a factor of 4.8x.


Leave a Reply

Your email address will not be published.