tinyML EMEA – Jonna Matthiesen: Sensitivity analysis of hyperparameters in deep neural-network…

Sensitivity analysis of hyperparameters in deep neural-network pruning
Deep Learning Researcher /Master thesis student

Deep neural-network pruning plays an integral part for deployment to resource constrained devices. By adopting the right pruning strategies to the right hardware, it is possible to significantly reduce the inference latency, memory footprint and energy consumption without affecting the network’s performance considerably. It is particularly useful to have structured and robust pruning methods for dynamically scaling the computational need of a model when deploying to a plurality of platforms since manually redesigning the model can be too tedious or even infeasible. It is clear that pruning will continue to play an important role in enabling tinyML, even as the performance of embedded hardware increases, with an ever-increasing need to deploy bigger and better models while keeping the latency, energy, and memory demands within permissible budgets.

Structured pruning, where entire filters/channels or groups of operations are removed is a proven way to speed up models on hardware. The best methods involve fine-tuning or pruning the model iteratively during training. Pruning is, thus, tightly coupled with the act of training. A key element in training deep neural networks is the choice of hyperparameters. Since structured pruning modifies the actual model architecture, it is unclear how it will affect the choice of hyperparameters. To answer this question, we have investigated the sensitivity of hyperparameters under structured neural network pruning.

First, we use state-of-the-art hyperparameter optimization (HPO) methods, such as Bayesian optimization, and Bayesian optimization together with Hyperband (BOHB), to find the best possible set of hyperparameters for training a variation of models on public datasets. We then span a bigger region of the hyperparameter space by performing a grid search around the vicinity of the optimal hyperparameter set from the bayesian-based methods in order to extract an approximate hyperparameter-performance distribution. We then prune the models to various degrees and perform a new grid search on the compressed model. Finally, the sensitivity is captured and quantified in a distance metric for distributions. By observing the shift in the hyperparameter-performance distribution between the original model and pruned model, we are able to identify how sensitive hyperparameters are to pruning and how aggressively models can be pruned before the hyperparameters need to be reconsidered for optimal performance.

From a practical perspective, understanding how pruning affects the choice of hyperparameters is of crucial importance for maximizing the performance of networks running on resource-limited hardware. However, It is also interesting from a more fundamental perspective in understanding how neural networks work and are able to generalize well.


Password generation