tinyML Summit 2021https://www.tinyml.org/event/summit-2021
Keynote: miliJoules for 1000 Inferences: Machine Learning Systems “on the Cheap”
Diana MARCULESCU, Professor and Department Chair, The University of Texas at Austin
Machine learning (ML) applications have entered and impacted our lives unlike any other technology advance from the recent past. While the holy grail for judging the quality of a ML model has largely been accuracy and only recently its resource usage, neither of these metrics translate directly to energy efficiency, runtime, or mobile device battery lifetime. This talk uncovers the need for designing efficient convolutional neural networks (CNNs) for deep learning mobile applications that operate under stringent energy and latency constraints. We show that, while CNN model quantization and pruning are effective tools in bringing down the model size and resulting energy cost by up to 1000x while maintaining baseline accuracy, the interplay between bitwidth, channel count, and CNN memory footprint uncovers a non-trivial trade-off. Surprisingly, our results show that when the channel count is allowed to change, a single weight bitwidth can be sufficient for model compression, which greatly reduces the software and hardware optimization costs for CNN-based ML systems.