Depth vs. Complexity: A Comparative Study of Neural Network Architectures in Image Classification

Authors

  • Mihir Kulgod The Shri Ram School Aravali

DOI:

https://doi.org/10.47611/jsrhs.v13i3.7237

Keywords:

Artificial Intelligence, Neural Networks, Machine Learning, Image Classification, Image Recognition, Recognition Algorithm

Abstract

There is a growing requirement for image classification algorithms in a plethora of fields, including medical imaging, autonomous vehicles, surveillance, etc. To streamline the process of designing such algorithms to accomplish such a task, one must be aware of the strengths and drawbacks of existing models. This paper investigates the performance of various image classification algorithms, focusing on the dynamic between model depth and complexity, and their effect on accuracy. This study utilizes three datasets - MNIST, Fashion MNIST, and CIFAR10 - to conduct a comprehensive analysis of six distinct image classification architectures. There is a discernible accuracy gradient as one traverses model complexities, from the standard Multilayer Perceptrons (MLPs) to a Visual Transformer (ViT). Training a ViT requires large amounts of computational resources, yet the investment is justified by the remarkable accuracy it achieves.  However, it is always more efficient to use a model that fits the scale of the data. No model is the best for every dataset, and data complexity plays a vital role in determining the optimal model architecture for any data.

Downloads

Download data is not yet available.

References or Bibliography

Krizhevsky, A. (2009). Learning Multiple Layers of Features from Tiny Images. https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf

LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324. https://doi.org/10.1109/5.726791

Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv (Cornell University). https://arxiv.org/pdf/1409.1556

He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep Residual Learning for Image Recognition. arXiv (Cornell University). https://doi.org/10.48550/arxiv.1512.03385

Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2016). Densely Connected Convolutional Networks. arXiv (Cornell University). https://doi.org/10.48550/arxiv.1608.06993

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv (Cornell University). https://arxiv.org/pdf/2010.11929

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). SciKit-Learn: Machine Learning in Python. HAL (Le Centre Pour La Communication Scientifique Directe). https://hal.inria.fr/hal-00650905

Published

08-31-2024

How to Cite

Kulgod, M. (2024). Depth vs. Complexity: A Comparative Study of Neural Network Architectures in Image Classification. Journal of Student Research, 13(3). https://doi.org/10.47611/jsrhs.v13i3.7237

Issue

Section

HS Research Projects