EMEA 2021 Student Forum: Processor Architecture Optimization for Spatially Dynamic Neural Networks
EMEA 2021 Student Forum
Processor Architecture Optimization for Spatially Dynamic Neural Networks
Steven COLLEMAN, PhD Student, KU Leuven
Spatially dynamic neural networks (SDNNs) are a new and promising type of type of neural networks for handling image-related machine learning tasks. SDNNs adjust network execution based on the input data, saving computations by skipping non-important image regions. However, the saved computation of SDNN is hard to be translated into real speedup on hardware platforms like GPU due to the fact that GPUs don’t support these spatially dynamic execution patterns.
Our research investigates hardware constraints preventing such speedup and proposes and compares novel processor architectures and dataflows enabling latency improvements due to the dynamic execution with minimal loss of utilization. We propose two hardware architectures that flexibly support spatial execution of a broad range of convolutional layers. The first architecture has only 1 PE array which is used to map all the workloads. The second architecture has 2 differently configured PE arrays that can work in parallel. Our flexible architectures can handle both standard convolutional layers and depthwise layers. For the derived architectures, the spatial unrolling for each layer type is optimized and validated making use of the ZigZag design space exploration framework where appropriate.
This allows to benchmark and compare the hardware architectures on NNs for classification and human pose estimation, increasing throughput up to x1.9 and x2.3 compared to their static executions, respectively, outperforming GPU. SDNNs can bring the same order of magnitude speedup as other dynamic execution methods if the hardware is designed wisely, and SDNNs and other types of dynamic execution methods can to combined to use to get even larger hardware benefit. The architecture with 2 PE arrays is better for networks with depthwise layers; the architecture with 1 PE array is better for networks with only standard convolutional layers.
source