Design and analysis of hardware friendly pruning algorithms to accelerate deep neural networks at the edge
Christin BOSE, PhD Candidate, Purdue University
Employing unstructured pruning is very popular to get state-of-the-art model compression of convolutional neural network weights. While unstructured pruning results in good model accuracy, it is challenging for hardware architectures to exploit unstructured sparsity for speedups and power savings during model inference. While it is more challenging to do structured pruning for a given model accuracy, structured pruning readily translates to model inference speedups and power reduction due to its hardware-friendly nature. Model pruning is performed by removing network weights according to some criteria. While there are several structured and unstructured pruning criteria being proposed in literature, it is often unclear which criteria offers the best performance in hardware. In this work, we generate the accuracy-sparsity-latency pareto curve for several state-of-the-art filter pruning criteria and generate insights based on an edge-based DNN accelerator. We also propose combining various granularities of pruning and evaluate the benefits. These insights will be useful to prune deep learning workloads for inference at the edge subject to a given accuracy and compute budget.
- Hassle-Free ESP32 USB (ESP32-C3, ESP32-S2, ESP32-S3)
- tinyML Research Symposium: Memory-Oriented Design-Space Exploration of Edge-AI Hardware for XR…