Contents
Computer vision models enable the machine to extract, analyze, and recognize useful information from a set of images. Lightweight computer vision models allow the users to deploy them on mobile and edge devices.
Today’s boom in CV started with the implementation of deep learning models and convolutional neural networks (CNN). The main CV methods include image classification, image localization, detecting objects, and segmentation.
This article lists the most significant lightweight computer vision models SMEs can efficiently implement in their daily tasks. We’ve split the lightweight models into four different categories: face recognition, healthcare, traffic, and general-purpose machine learning models.
About us: ProX PC allows enterprise teams to realize value with computer vision in only 3 days. By easily integrating into existing tech stacks, ProX PC makes it easy to automate inefficient and expensive processes. Learn more by booking a demo.
Lightweight Models for Face Recognition
DeepFace – Lightweight Face Recognition Analyzing Facial Attribute
DeepFace AI is Python’s lightweight face recognition and facial attribute library. The open-source DeepFace library includes all modern AI models for modern face recognition. Therefore, it can handle all procedures for facial recognition in the background.
DeepFace is an open-source project written in Python and licensed under the MIT License. You can install DeepFace from its GitHub library, published in the Python Package Index (PyPI).
DeepFace for Face Verification
DeepFace features include:
MobileFaceNets (MFN) – CNNs for Real-Time Face Verification
Chen et al. (2018) published their research titled MobileFaceNets. Their model is an efficient CNN for Accurate Real-Time Face Verification on Mobile Devices. They used less than 1 million parameters. They adapted the model for high-accuracy real-time face verification on mobile and embedded devices.
Also, they analyzed the weakness of previous mobile networks for face verification. They trained it by ArcFace loss on the refined MS-Celeb-1M. MFN of 4.0MB size achieved 99.55% accuracy on LFW.
Face feature embedding CNN and the receptive field (RF)
Model characteristics:
EdgeFace – Face Recognition Model for Edge Devices
Researchers A. George, C. Ecabert, et al. (2015) published a paper called EdgeFace or Efficient Face Recognition Model. This paper introduced EdgeFace, a lightweight and efficient face recognition network inspired by the hybrid architecture of EdgeNeXt. Thus, EdgeFace achieved excellent face recognition performance optimized for edge devices.
The proposed EdgeFace network had low computational costs and required fewer computational resources and compact storage. Also, it achieved high face recognition accuracy, making it suitable for deployment on edge devices. EdgeFace model was top-ranked among models with less than 2M parameters in the IJCB 2023 Face Recognition Competition.
EdgeFace Architecture – a lightweight face recognition
Model Characteristics:
Healthcare CV Models
MedViT: A Robust Vision Transformer for Medical Image Recognition
Manzari et al. (2023) published the research MedViT – A Robust Vision Transformer for Generalized Medical Image Recognition. They proposed a robust yet efficient CNN-Transformer hybrid model equipped with CNNs and global integration of vision Transformers.
The authors performed data augmentation on image shape information by permuting the feature mean and variance within mini-batches. In addition to its low complexity, their hybrid model demonstrated its high robustness. Researchers compared it to the other approaches that utilize MedMNIST-2D dataset.
MedViT-T and ResNet-18 Recognition on MedMNIST-2D datasets
Model characteristics:
MaxCerVixT: A Lightweight Transformer-based Cancer Detection
Pacal (2024) introduced an advanced framework (architecture), the Multi-Axis Vision Transformer (MaxViT). He addressed the challenges in Pap test accuracy. Pacal conducted a large-scale study with a total of 106 deep learning models. In addition, he utilized 53 CNN-based and 53 vision transformer-based models for each dataset.
He substituted MBConv blocks in the MaxViT architecture with ConvNeXtv2 blocks and MLP blocks with GRN-based MLPs. That change reduced parameter counts and also enhanced the model’s recognition capabilities. In addition, he evaluated the proposed method using the publicly available SIPaKMeD and Mendeley LBC, Pap smear datasets.
Cervical Cancer detection on LBC datasets
Model characteristics:
Lightweight CNN Architecture for Anomaly Detection in E-health
Yatbaz et al. (2021) published their research Anomaly Detection in E-Health Applications Using Lightweight CNN Architecture. The authors used ECG data for the prediction of cardiac stress activities. Moreover, they tested the proposed deep learning model on the MHEALTH dataset with two different validation techniques.
The experimental results showed that the model achieved up to 97.06% accuracy for the cardiac stress level. In addition, the model for ECG prediction was lighter than the existing approaches with sizes of 1.97 MB.
Flow of the entire E-health System
Model characteristics:
Traffic / Vehicles Recognition Models
Lightweight Vehicles Detection Network model based on YOLOv5
Wang et al. (2024) published their research Lightweight Vehicle Detection Based on Improved YOLOv5. They applied integrated perceptual attention, with few parameters and high detection accuracy.
They proposed a lightweight module IPA with a Transformer encoder based on integrated perceptual attention. In addition, they achieved a reduction in the number of parameters while capturing global dependencies for richer contextual information.
YOLOv5 Vehicle Detection
Model characteristics:
A Lightweight Vehicle-Pedestrian Detection Based on Attention
Zhang et al. (2022) published their research Lightweight Vehicle-Pedestrian Detection Algorithm Based on Attention Mechanism in Traffic Scenarios. They proposed an improved lightweight and high-performance vehicle-pedestrian detection algorithm based on the YOLOv4.
To reduce parameters and improve feature extraction, they replaced the backbone network CSPDarknet53 with MobileNetv2. Also, they used the method of multi-scale feature fusion to realize the information interaction among different feature layers.
ResNet Convolution Network for Vehicles Detection
Model characteristics:
Smart Lightweight Visual Attention Model for Fine-Grained Vehicle Recognition
Boukerche et al. (2023) published “Smart Lightweight Visual Attention Model for Fine-Grained Vehicle Recognition.” Their LRAU (Lightweight Recurrent Attention Unit) extracted the discriminative features to locate the key points of a vehicle.
They generated the attention mask using the feature maps received by the LRAU and its preceding attention state. Moreover, by utilizing the standard CNN architecture they received the multi-scale feature maps.
Model characteristics:
General Purpose Lightweight CV Models
MobileViT: Lightweight, General-purpose Vision Transformer
Mehta et al. (2022) published their research, MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer. They combined the strengths of CNNs and ViTs to build a lightweight and low-latency network for mobile vision tasks.
They introduced MobileViT, a lightweight and general-purpose vision transformer for mobile devices. MobileViT provides a different perspective for the global processing of information with transformers, i.e., transformers as convolutions.
MobileVit Training and Validation Error and Accuracy
Model characteristics:
DINOv2: Learning Robust Visual Features without Supervision
In April 2023 Meta published their DINOv2: State-of-the-art computer vision pre-trained models with self-supervised learning. DINOv2 provides high-performance features, along with simple linear classifiers. Therefore, users utilize DINOv2 to create multipurpose backbones for many different computer vision tasks.
DINOv2 Accuracy
ProX PC: End-to-End Computer Vision Platform
ProX PC is an end-to-end computer vision platform. Businesses use it to build, deploy, and monitor real-world computer vision applications. Also, ProX PC is an end-to-end infrastructure utilizing state-of-the-art CV models – OpenCV, Tensor Flow, and PyTorch.
What’s Next?
Lightweight computer vision systems are useful on mobile and edge devices since they require low processing and storage resources. Hence, they are essential in many business applications. ProX PC with its proven expertise can lead you to implement your successful CV model.
Our platform offers comprehensive tools for building, deploying, and managing CV apps on different devices. The lightweight pre-trained models are applicable in multiple industries. We provide computer vision models on the edge – where events and activities happen.
Related Products
Share this: