Machine Learning for the Visually Impaired: Benchmarking Object Detection Models

Authors

  • Aditya Patra California High School
  • Mrs.Crandall California High School

DOI:

https://doi.org/10.47611/jsrhs.v13i2.6630

Keywords:

Machine Learning, Object Detection

Abstract

This research paper benchmarks object detection models, a form of machine learning, to determine which algorithm would be most beneficial for the visually impaired to locate objects that are used in everyday life. As most benchmarking experiments test object detection models with a variety of objects, it is essential to test the models using images of more relevant objects to find the most suitable algorithm. The models are tested using still images from the COCO database. Pretrained models employing five of the most popular object detection algorithms are used to process the images and find each model’s detection accuracy. To simulate real life scenarios, these objects may be partially hidden or at a distance. For each image, the models return a list of detections providing the names, confidence rating, and location of each object detected. These results will be filtered to remove detections with low confidence ratings as well as detections of irrelevant objects. The remaining results are compared to the dataset of object names and locations provided by the COCO database to calculate the distance between the predicted object locations and the true location. The algorithms will be ranked based on the number of failed detections, the time taken to analyze each image, and the accuracy of each object’s determined location.

Downloads

Download data is not yet available.

References or Bibliography

Brownlee, Jason. “How Do Convolutional Layers Work in Deep Learning Neural Networks?” MachineLearningMastery.Com, 16 Apr. 2020, machinelearningmastery.com/convolutional-layers-for-deep-learning-neural-networks/.

“Centernet/Resnet.” Kaggle, www.kaggle.com/models/tensorflow/centernet-resnet. Accessed 20 Feb. 2024.

Cohen, Jeremy. “Finally Understand Anchor Boxes in Object Detection (2D and 3D).” Welcome to The Library!, Welcome to The Library!, 2 May 2023, www.thinkautonomous.ai/blog/anchor-boxes/.

Duan, Kaiwen, et al. “CenterNet: Keypoint Triplets for Object Detection.”

“Efficientdet.” Kaggle, www.kaggle.com/models/tensorflow/efficientdet/frameworks/tensorFlow2/variations/d2. Accessed 20 Feb. 2024.

“FASTER_RCNN/Resnet_v1.” Kaggle, www.kaggle.com/models/tensorflow/faster-rcnn-resnet-v1/frameworks/tensorFlow2/variations/faster-rcnn-resnet101-v1-1024x1024. Accessed 20 Feb. 2024.

Grel, Tomasz. “What Is Region of Interest (ROI) Pooling?” Deepsense.Ai, Tomasz Grel https://deepsense.ai/wp-content/uploads/2023/10/Logo_black_blue_CLEAN_rgb.png, 6 Nov. 2023, deepsense.ai/region-of-interest-pooling-explained/.

Keita, Zoumana. “Yolo Object Detection Explained: A Beginner’s Guide.” DataCamp, DataCamp, 28 Sept. 2022, www.datacamp.com/blog/yolo-object-detection-explained.

Kundu, Rohit. “Yolo Algorithm for Object Detection Explained [+examples].” YOLO Algorithm for Object Detection Explained [+Examples], 17 Jan. 2023, www.v7labs.com/blog/yolo-object-detection#how-does-yolo-work-yolo-architecture.

“Max Pooling.” DeepAI, DeepAI, 17 May 2019, deepai.org/machine-learning-glossary-and-terms/max-pooling#:~:text=Max%20pooling%20is%20a%20downsampling,dimensions%20of%20an%20input%20volume.

“Object Detection Guide - Everything You Need to Know.” Fritz Ai, 3 Dec. 2023, fritz.ai/object-detection/#:~:text=for%20object%20detection.-,Basic%20structure,to%20locate%20and%20label%20objects.

Patel, Jagrat. “Top 10 Object Detection Models in 2023!” LinkedIn, 27 Aug. 2023, www.linkedin.com/pulse/top-10-object-detection-models-2023-jagrat-patel#:~:text=High%20Accuracy%3A%20Faster%20R%2DCNN,versatile%20for%20different%20use%20cases.

Ren, Shaoqing, et al. “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.” arXiv.Org, 6 Jan. 2016, arxiv.org/abs/1506.01497v3.

Schumacher, Devin. “Center Pooling.” SERP AI, SERP AI, 26 July 2023, serp.ai/center-pooling/.

“Ssd_mobilenet_v2.” Kaggle, www.kaggle.com/models/tensorflow/ssd-mobilenet-v2/frameworks/tensorFlow2/variations/fpnlite-320x320. Accessed 20 Feb. 2024.

Thoma, Martin. “How Do Subsequent Convolution Layers Work?” Data Science Stack Exchange, 1 Nov. 1961, datascience.stackexchange.com/questions/9175/how-do-subsequent-convolution-layers-work.

Vyas, Kanan. “Efficientdet - A Comprehensive Review.” Medium, VisionWizard, 19 May 2020, medium.com/visionwizard/efficientdet-a-paper-review-21918d9a648d.

“Yolov8.” Kaggle, www.kaggle.com/models/keras/yolov8/frameworks/keras/variations/yolo_v8_l_backbone. Accessed 20 Feb. 2024.

Published

05-31-2024

How to Cite

Patra, A., & Crandall, R. (2024). Machine Learning for the Visually Impaired: Benchmarking Object Detection Models. Journal of Student Research, 13(2). https://doi.org/10.47611/jsrhs.v13i2.6630

Issue

Section

HS Research Articles