Depth vs. Complexity: A Comparative Study of Neural Network Architectures in Image Classification
DOI:
https://doi.org/10.47611/jsrhs.v13i3.7237Keywords:
Artificial Intelligence, Neural Networks, Machine Learning, Image Classification, Image Recognition, Recognition AlgorithmAbstract
There is a growing requirement for image classification algorithms in a plethora of fields, including medical imaging, autonomous vehicles, surveillance, etc. To streamline the process of designing such algorithms to accomplish such a task, one must be aware of the strengths and drawbacks of existing models. This paper investigates the performance of various image classification algorithms, focusing on the dynamic between model depth and complexity, and their effect on accuracy. This study utilizes three datasets - MNIST, Fashion MNIST, and CIFAR10 - to conduct a comprehensive analysis of six distinct image classification architectures. There is a discernible accuracy gradient as one traverses model complexities, from the standard Multilayer Perceptrons (MLPs) to a Visual Transformer (ViT). Training a ViT requires large amounts of computational resources, yet the investment is justified by the remarkable accuracy it achieves. However, it is always more efficient to use a model that fits the scale of the data. No model is the best for every dataset, and data complexity plays a vital role in determining the optimal model architecture for any data.
Downloads
References or Bibliography
Krizhevsky, A. (2009). Learning Multiple Layers of Features from Tiny Images. https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324. https://doi.org/10.1109/5.726791
Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv (Cornell University). https://arxiv.org/pdf/1409.1556
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep Residual Learning for Image Recognition. arXiv (Cornell University). https://doi.org/10.48550/arxiv.1512.03385
Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2016). Densely Connected Convolutional Networks. arXiv (Cornell University). https://doi.org/10.48550/arxiv.1608.06993
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv (Cornell University). https://arxiv.org/pdf/2010.11929
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). SciKit-Learn: Machine Learning in Python. HAL (Le Centre Pour La Communication Scientifique Directe). https://hal.inria.fr/hal-00650905
Published
How to Cite
Issue
Section
Copyright (c) 2024 Mihir Kulgod

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright holder(s) granted JSR a perpetual, non-exclusive license to distriute & display this article.