Machine Learning

tinyML Asia 2023 – Muhammet Yanik: Target Classification on the Edge using mmWave Radar: A Novel…



Target Classification on the Edge using mmWave Radar: A Novel Algorithm and Its Real-Time Implementation on TI’s IWRL6432
Muhammet Emin YANIK
Radar R&D Systems Engineer
Texas Instruments

he past few years have seen a significant increase in sensing applications in the mmWave spectrum, primarily due to the advances in mmWave technology that have reduced the size, cost, and power consumption. While mmWave radars have been traditionally employed in target detection and tracking, there has been a trend toward using radar signals for target classification. Many of these classification algorithms use machine learning (ML) to leverage the radar’s high sensitivity to motion. Examples include motion classification for reducing false alarms in indoor and outdoor surveillance, fall detection for elderly care, and more. In our study, we developed and implemented a mmWave radar-based multiple-target classification algorithm for cluttered real-world environments, emphasizing the limited memory and computing budget of edge devices. The following figure depicts the high-level processing flow of the overall algorithmic chain running on TI’s low-cost and ultra-low-power integrated radar-on-chip device, IWRL6432, in real-time.

The analog/digital front-end of the IWRL6432 mmWave radar first generates the radar cube (i.e., ADC raw data) in all range (after in-place range-FFTs), chirp, and antenna (two transmitters, three receivers) domains. The detection layer then processes the ADC data using traditional FFT processing (Doppler-FFT and angle-FFT), followed by a peak detection step to generate a point cloud. The point cloud is then clustered into objects using a multiple-target tracker algorithm (based on an Extended Kalman Filter). Although we develop and implement this algorithm on IWRL6432, it is important to note that it is also generic to be deployed on other TI device variants.

The main challenge addressed in our study is to generate µ-Doppler spectrograms from multiple targets concurrently to be used in the following modules. To achieve this goal, we developed and implemented a solution to create µ-Doppler spectrograms using multiple target tracker layer integration. The centroid for each tracked object is mapped to the corresponding location in the radar cube, and a µ-Doppler spectrum is extracted from the neighborhood of this centroid (which is repeated for each track). The µ-Doppler spectrum is concatenated across multiple consecutive frames to generate a 2D µ-Doppler vs. time spectrogram. Compared to prior work (mostly lacks the essential elements needed to achieve a reliable solution for real-world problems with multiple target scenarios and typically assumes single-target scenes), our invention proposes a novel approach and implements it efficiently on edge to classify multiple target objects concurrently.

Although most prior art processes the generated 2D µ-Doppler vs. time spectrogram through a 2-D CNN model for classification, we further process the 2D spectrogram to create hand-crafted features (a small set of parameters that capture the essence of the image), which reduces the computational complexity and makes multiple target classification possible on edge. In this approach, a single frame yields a single value for each feature. Therefore, the sequence of frames generates a corresponding 1D time series data for each feature. In this study, six features are extracted from µ-Doppler spectrograms for use by the classifier. We then build and train a 1D-CNN model to classify the target objects (human or non-human) given the extracted features as 1D time series data. Since a 2D-CNN model is memory and MIPS-intensive, too complex to support multiple tracks on low-power edge processors, and needs massive data for training, the proposed approach results in lower complexity.

In our 1D-CNN network architecture, the input size is configured as the total number of features. Three blocks of 1D convolution, ReLU, and layer normalization layers are used. To reduce the output of the convolutional layers to a single vector, a 1D global average pooling layer is added. To map the output to a vector of probabilities, a fully connected layer is used with an output size of two (matching the number of classes), followed by a softmax layer and a classification layer. In the model training step, a total of 125816 frames of data were captured at 10Hz (corresponds to 3.5 hours in total) from different human and non-human targets. In the data capture campaign, 310 scenarios are created in distinct environments with around 20 different people and numerous non-human targets (e.g., fan, tree, dog, plant, drone, etc.). The data is captured with synchronized video to assist in labeling. In total, 55079 observations (25448 humans, 29631 non-humans) are generated when the time window size is configured to 30 frames (3sec at 10Hz).

source

Authorization
*
*
Password generation