EMEA 2021 https://www.tinyml.org/event/emea-2021
Innovative Minimization of Parameter Memory Space in Small-Silicon, Low-Power Devices
Moshe HAIUT – CTO Staff,
NN models normally require tens and sometimes hundreds of mega-bytes for their parameter (weights) storage, which introduces a big challenge for tinyML edge-based solutions. In these tinyML chips the storage space that is allocated for the weights is below 1Mbyte, in order to meet the requirements of reasonable silicon area and power consumption below 1mW.
Fortunately, the memory space for weights storage can be reduced dramatically by using a combination of three techniques: Quantization, Pruning, and Lossless Compression.
This tinyML talk will show how the weights space can be reduced to minimum when using the DSPG nNetLite h/w engine to make inference of small to medium NN models in the DBM10L chip. The combination of dedicated h/w and efficient compilation and simulation toolchain results in a high compression ratio, still maintaining inference accuracy with minimum latency. The nNetLite compiler provides the user a way of controlling the number of bits that is allocated for weights quantization, as well as a way of reducing the number of weights by applying smart post-training pruning. The nNetLite bit-exact simulator provides the user a means of analyzing the final inference accuracy under different quantization and pruning constraints as selected in the compilation process. This integrated toolchain enables a trial-and-error approach to reach a final optimized solution in which the compressed weights can fit into a pre-determined memory space with minimum degradation in performance.
The final piece of the DSPG nNetLite solution is the h/w part: This IP incorporates a module called Weight Extraction Unit (WEU) that is responsible for performing the weights de-compression and de-quantization in real-time in order to prepare for the math operations. This way the Multiply & Accumulate unit (MAC) enjoys access to a narrow sliding window of plain parameters from the WEU zero-wait-states tightly-coupled cache memory.
Attendees will see the process of shrinking the weights memory space in a specific example of a NN model that is based on a known popular database. The process will demonstrate the power of the DSPG nNetLite compiler and simulator toolchain.