Multi-View Gaze: Leveraging Multi-View Images to Disentangle Features for Accurate Gaze Estimation
DOI:
https://doi.org/10.47611/jsrhs.v13i1.6019Keywords:
Gaze Estimation, Convolutional Neural Network, Multiview ImagesAbstract
Gaze estimation is a prominent field within artificial intelligence and machine learning, rapidly developing due to the practical uses and possibilities. However, this development brings challenges, such as inaccuracy with different facial features and external factors such as lighting or camera quality. In the past, research has led to a cross encoder, or a swapping mechanism of the disentangled data from an image. The proposed method takes this one step further and incorporates multi-view images to leverage this disentanglement. Multi-view images allow for more data pairs within images to be swapped, resulting in more maximized and fine-tuned accuracy in gaze detection. Another added detail was transfer learning, or the carry over of a pre-optimized encoder to make the training process much more efficient. This can be incorporated into the real world, for example, by using it to control a computer mouse without physical movement or to detect patterns to diagnose neurodevelopmental disorders such as Attention Deficit Hyperactivity Disorder (ADHD) which can be difficult to detect in young children otherwise. The results of this newly proposed method produced more accurate results than state-of-the-art mechanisms, only having an angular error of 7.4 when trained and tested within the EVE dataset.
Downloads
References or Bibliography
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
Kellnhofer, P., Recasens, A., Stent, S., Matusik, W., & Torralba, A. (2019). Gaze360: Physically unconstrained gaze estimation in the wild. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6912-6921).
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
Krafka, K., Khosla, A., Kellnhofer, P., Kannan, H., Bhandarkar, S., Matusik, W., & Torralba, A. (2016). Eye tracking for everyone. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2176-2184).
Park, S., Aksan, E., Zhang, X., & Hilliges, O. (2020). Towards end-to-end video-based eye-tracking. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XII 16 (pp. 747-763). Springer International Publishing.
Park, S., Mello, S. D., Molchanov, P., Iqbal, U., Hilliges, O., & Kautz, J. (2019). Few-shot adaptive gaze estimation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9368-9377).
Sun, Y., Zeng, J., Shan, S., & Chen, X. (2021). Cross-encoder for unsupervised gaze representation learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 3702-3711).
Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of machine learning research, 9(11).
Yu, Y., & Odobez, J. M. (2020). Unsupervised representation learning for gaze estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 7314-7324).
Published
How to Cite
Issue
Section
Copyright (c) 2024 Yireh Ban
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright holder(s) granted JSR a perpetual, non-exclusive license to distriute & display this article.