YOLOv9: Advancements in Real-time Object Detection (2024)

YOLOv9: Advancements in Real-time Object Detection (2024)

June 12, 2024

What is YOLOv9?

YOLOv9 is the latest version of YOLO, released in February 2024, by Chien-Yao Wang, I-Hau Yeh, and Hong-Yuan Mark Liao. It is an improved real-time object detection model that aims to surpass all convolution-based, and transformer-based methods.

YOLOv9 is released in four models, ordered by parameter count: v9-S, v9-M, v9-C, and v9-E. To improve accuracy, it introduces programmable gradient information (PGI) and the Generalized Efficient Layer Aggregation Network (GELAN). PGI prevents data loss and ensures accurate gradient updates and GELAN optimizes lightweight models with gradient path planning.

At this time, the only computer vision task supported by YOLOv9 is object detection.

YOLOv9 concept proposed in the paper: YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information

YOLO Version History

Before diving into the YOLOv9 specifics, let’s briefly recap on the other YOLO versions available today.

YOLOv1

YOLOv1 architecture (displayed above) surpassed R-CNN with a mean average precision (mAP) of 63.4, and an inference speed of 45 FPS on the open-source Pascal VOC 2007 dataset. With YOLOv1, object detection is treated as a regression task to predict bounding boxes and class probabilities from a single pass of an image.

YOLOv2

Released in 2016, it could detect 9000+ object categories. YOLOv2 introduced anchor boxes – predefined bounding boxes called priors that the model uses to pin down the ideal position of an object. YOLOv2 achieved 76.8 mAP at 67 FPS on the VOC 2007 dataset.

YOLOv3

The authors released YOLOv3 in 2018 which boasted higher accuracy than previous versions, with an mAP of 28.2 at 22 milliseconds. To predict classes, the YOLOv3 model uses Darknet-53 as the backbone with logistic classifiers instead of softmax and Binary Cross-entropy (BCE) loss.

YOLOv3 application for a smart refrigerator in gastronomy and restaurants

YOLOv4

2020, Alexey Bochkovskiy et al. released YOLOv4, introducing the concept of a Bag of Freebies (BoF) and a Bag of Specials (BoS). BoF is a set of data augmentation techniques that increase accuracy at no additional inference cost. (BoS significantly enhances accuracy with a slight increase in cost). The model achieved 43.5 mAP at 65 FPS on the COCO dataset.

YOLOv5

Without an official research paper, Ultralytics released YOLOv5 also in 2020. The model is easy to train since it is implemented in PyTorch. The model architecture uses a Cross-stage Partial (CSP) Connection block as the backbone for a better gradient flow to reduce computational cost. YOLOv5 uses YAML files instead of CFG files in the model configurations.

Small object detection with YOLOv5 in traffic analysis with computer vision

YOLOv6

YOLOv6 is another unofficial version introduced in 2022 by Meituan – a Chinese shopping platform. The company targeted the model for industrial applications with better performance than its predecessor. The changes resulted in YOLOv6n achieving an mAP of 37.5 at 1187 FPS on the COCO dataset and YOLOv6s achieving 45 mAP at 484 FPS.

YOLOv7

In July 2022, a group of researchers released the open-source model YOLOv7, the fastest and the most accurate object detector with an mAP of 56.8% at FPS ranging from 5 to 160. YOLOv7 is based on the Extended Efficient Layer Aggregation Network (E-ELAN), which improves training by letting the model learn diverse features with efficient computation.

Applied AI system trained for aircraft detection with YOLOv7

YOLOv8

YOLOv8 has no official paper (as with YOLOv5 and v6) but boasts higher accuracy and faster speed for state-of-the-art performance. For instance, the YOLOv8m has a 50.2 mAP score at 1.83 milliseconds on the MS COCO dataset and A100 TensorRT. YOLO v8 also features a Python package and CLI-based implementation, making it easy to use and develop.

Segmentation with YOLOv8 applied in smart cities for pothole detection.

Since YOLOv9’s February 2024 release, another team of researchers has released YOLOv10 (May 2024), for real-time object detection.

Architecture YOLOv9

To address the information bottleneck (data loss in the feed-forward process), YOLOv9 creators propose a new concept, i.e. the programmable gradient information (PGI). The model generates reliable gradients via an auxiliary reversible branch. Deep features still execute the target task and the auxiliary branch avoids the semantic loss due to multi-path features.

The authors achieved the best training results by applying PGI propagation at different semantic levels. The reversible architecture of PGI is built on the auxiliary branch, so there is no additional cost. Since PGI can freely select a loss function suitable for the target task, it also overcomes the problems encountered by mask modeling.

The proposed PGI mechanism can be applied to deep neural networks of various sizes. In the paper, the authors designed a generalized ELAN (GELAN) that simultaneously takes into account the number of parameters, computational complexity, accuracy, and inference speed. The design allows users to choose appropriate computational blocks arbitrarily for different inference devices.

YOLOv9 GELAN Architecture

Using the proposed PGI and GELAN – the authors designed YOLOv9. To conduct experiments they used the MS COCO dataset, and the experimental results verified that the proposed YOLO v9 achieved the top performance in all cases.

Research Contributions

Theoretical analysis of deep neural network architecture from the perspective of reversible function. The authors designed PGI and auxiliary reversible branches based on this analysis and achieved excellent results.
The designed PGI solves the problem that deep supervision can only be used for extremely deep neural network architectures. Thus, it allows new lightweight architectures to be truly applied in daily life.
The GELAN network only uses conventional convolution to achieve a higher parameter usage than the depth wise convolution design. So it shows the great advantages of being light, fast, and accurate.
Combining the proposed PGI and GELAN, the object detection performance of the YOLOv9 on the MS COCO dataset largely surpasses the existing real-time object detectors in all aspects.

Performance of YOLOv9 against other object detection models on COCO dataset

YOLOv9 License

YOLOv9 was not released with an official license. In the following days, however WongKinYiu updated the official license to GPL-3.0. YOLOv7 and YOLOv9 have been released under WongKinYiu’s repository.

Advantages of YOLOv9

YOLOv9 arises as a powerful model, offering innovative features that will play an important role in the further development of object detection, and maybe even image segmentation and classification down the road. It provides faster, clearer, and more flexible actions, and other advantages include:

Handling the information bottleneck and adapting deep supervision to lightweight architectures of neural networks by introducing the Programmable Gradient Information (PGI).

Creating the GELAN, a practical and effective neural network. GELAN has proven its strong and stable performance in object detection tasks at different convolution and depth settings. It could be widely accepted as a model suitable for various inference configurations.
By combining PGI and GELAN – YOLOv9 has shown strong competitiveness. Its clever design allows the deep model to reduce the number of parameters by 49% and the number of calculations by 43% compared with YOLOv9. And it still has a 0.6% Average Precision improvement on the MS COCO dataset.
The developed YOLOv9 model is superior to RT-DETR and YOLO-MS in terms of accuracy and efficiency. It sets new standards in lightweight model performance by applying conventional convolution for better parameter utilization.

Model	#Param.	FLOPs	AP50:95val	APSval	APMval	APLval
YOLOv7 [63]	36.9	104.7	51.2%	31.8%	55.5%	65.0%
+ AF [63]	43.6	130.5	53.0%	35.8%	58.7%	68.9%
+ GELAN	41.7	127.9	53.2%	36.2%	58.5%	69.9%
+ DHLC [34]	58.1	192.5	55.0%	38.0%	60.6%	70.9%
+ PGI	58.1	192.5	55.6%	40.2%	61.0%	71.4%

The above table demonstrates average precision (AP) of various object detection models.

YOLOv9 Applications

YOLOv9 is a flexible computer vision model that you can use in different real-world applications. Here we suggest a few popular use cases.

YOLOv9 object detection for detecting customers in check-out queues

Logistics and distribution: Object detection can assist in estimating product inventory levels to ensure sufficient stock levels and provide information regarding consumer behavior.
Autonomous vehicles: Autonomous vehicles can utilize YOLOv9 object detection to help navigate self-driving cars safely through the road.
People counting: Retailers and shopping malls can train the model to detect real-time foot traffic in their shops, detect queue length, and more.
Sports analytics: Analysts can use the model to track player movements in a sports field to gather relevant insights regarding team performance.

Street view detection with YOLOv9

YOLOv9: Main Takeaways

The YOLO models are the standard in the object detection space with their great performance and wide applicability. Here are our first conclusions about YOLOv9:

Ease-of-use: YOLOv9 is already in GitHub, so the users can implement YOLOv9 quickly through the CLI and Python IDE.
YOLOv9 tasks: YOLOv9 is efficient for real-time object detection with improved accuracy and speed.
YOLOv9 improvements: YOLOv9’s main improvements include a decoupled head with anchor-free detection and mosaic data augmentation that turns off in the last ten training epochs.

In the future, we look forward to seeing if the creators will expand YOLOv9 capabilities to a wide range of other computer vision tasks as well.

ProX PC is the end-to-end platform for computer vision. ProX PC offers a host of pre-trained models to choose from, or the possibility to import or train your own custom AI models. To learn how you can solve your industry’s challenges with computer vision, book a demo of VProX PC.

Related Products

Micro Edge Orin Nano

ProX MicroEdge Orin Nano

Compact AI accelerator with 6-core Arm® Cortex® CPU and 1024/512-core NVIDIA Ampere GPU with Tensor Cores
8GB/4GB of high-speed LPDDR5 memory and NVMe SSD
Dual GbE ports, Wi-Fi options, and 4G/5G support
Versatile I/O and robust features
Ideal for data-intensive tasks and AI innovation

Learn more

Micro Edge Orin NX

ProX MicroEdge Orin NX

Compact powerhouse that combines an 8/6-core Arm® Cortex® CPU, a 1024-core NVIDIA Ampere GPU with 32 Tensor Cores, and lightning-fast 128-bit LPDDR5 memory.
Store and retrieve data seamlessly with an NVMe SSD and Micro SD slot.
Stay connected with dual GbE ports, Wi-Fi options, and 4G/5G support.
Versatile I/O options, including USB 3.1 and HDMI, make interfacing a breeze.
Unlock the future of AI innovation with Jetson Orin NX.

Learn more

Maven PX-007

Maven PX-007

For Professionals, By Professionals

COMPANY

PRODUCTS

SOLUTIONS

Info Links

SERVICES

CONTACT US