Shape-based Single Stage Deep Neural Network for Traffic Sign Applications
Keywords:
Object detection, Deep learning, YOLO, Artificial intelligence, Traffic sign detection and recognitionAbstract
One of the main parts of the advanced driving assistant system (ADAS) is traffic sign detection and recognition, which seeks to detect and recognize street signs in real-time. However, real-world applications demanding high precision and instantaneous recall present difficulties for traffic sign identification. The tiny object size and the class imbalance are the causes of these difficulties. Recently, researchers have proposed several methods to enhance the detection quality, including adding attention techniques, spatial enhancing of small objects, and enriching the features using a multiscale network. Researchers are addressing the class imbalance by introducing different loss functions and cascaded networks. However, because of the existing techniques and systems, the current model becomes more complicated. Single-stage networks like YOLO are also impacted by the imbalance, which results in a reduced recall for tiny objects. We have proposed a new training method for a one-stage detection network called the Real Time-Shape Deep Neural Network (Real Time-Shape DNN). The YOLO detection head is expanded by our suggested method to include the four primary parameters of objectiveness, regression, class, and shape. We added an additional parameter to the loss and Non-Maximum Suppression (NMS) to reduce the class number. We train the network jointly between classes and shapes. With the German Traffic Sign Detection Benchmark (GTSDB) as the benchmark dataset, we validate our proposed methods. The findings indicate that our presented method increases average precision (AP) for Yolov4 from 69.49% to 76.12% while increasing the recall index from 88.07% to 99.30% and for Yolov4-tiny-3l increases the recall index from 84.95% to 97.73% while increasing the average precision from 46.45% to 49.44% without increasing the complexity of the primary network. In terms of recall and precision, the baseline in the German Traffic Sign Detection Benchmark dataset is not as good as our proposed method.