tinyML EMEA – Marco Lattuada: Exploiting forward-forward based algorithm for training on device
Exploiting forward-forward based algorithm for training on device
Marco LATTUADA
Senior Software Engineer
STMicroelectronics
In recent years, there has been a growing interest in training machine learning models on devices, rather than in the Cloud or on a centralized server. This approach, known as on-device training, has several advantages, including improved privacy and reduced latency. Tiny Machine Learning (TinyML) is becoming a novel way to deliver intelligence into constrained hardware devices e.g., Micro Controlling Units (MCUs) for the realization of low power tailored applications. The training of deep learning models on embedded systems is a very challenging process mainly due to their low amount of memory, energy, and computing power which significantly limit the task complexity that can be executed, thus making impossible the use of traditional training algorithms such as backpropagation. To overcome this issue, various techniques have been proposed, such as model compression and quantization, which reduce the size and complexity of a model, and transfer learning, which uses pre-trained models as a starting point. However, these solutions only address the problems related to the deployment and the inference steps, and become ineffective when there is the necessity of learning new patterns in real-time.
In such a context, the goal should be the realization of an on-device training/inference system able to learn and generate predictions without the need of external components. Forward-Forward (FF) is a novel training algorithm that has been recently proposed as an alternative to backpropagation when computing power is an issue [2]. Unlike backpropagation, this algorithm split a neural network architecture into multiple layers which are individually trained, without the need of storing activities and gradients, thus reducing the amount of computing power, energy and memory required.
Formally speaking, FF algorithm is a learning procedure that takes inspiration from Boltzmann machines and noise contrastive estimation. The base idea of this algorithm is to replace the forward and backward passes of backpropagation with two forward passes having opposite objectives. To do so, FF introduces a new metric called goodness calculated as the sum of the squared activities of a given layer:
source