A Method of Disentanglement of Latent Factor using Geometric Feature for Gaze Estimation Network Training
DOI:
https://doi.org/10.47611/jsrhs.v12i1.4075Keywords:
Gaze Estimation, Autoencoder, Representation LearningAbstract
Since each human eye has different anatomical features, gaze estimation is a very challenging task. Although numerous studies regarding gaze estimation were proposed, there is a need for improving the preciseness in order to facilitate the application of the method to real-world scenarios. To accomplish this goal, I propose a novel training strategy for gaze representation learning. The proposed training method includes two training phases: the autoencoder-based representation learning phase and the gaze estimation network training phase. The proposed training strategy enforces the trained model to disentangle the gaze-related latent code and produce a more accurate gaze estimation. In addition, I also propose and showcase a real-world application that exploits the proposed method in order to prove the practicality of the proposed method. Through the experiment, it is proven that the proposed method shows an outstanding performance compared to other methods on the Gaze360 dataset.
Downloads
References or Bibliography
Cheng, Y., Bao, Y., & Lu, F. (2022, June). Puregaze: Purifying gaze feature for generalizable gaze estimation. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 36, No. 1, pp. 436-443).
Sun, Y., Zeng, J., Shan, S., & Chen, X. (2021). Cross-encoder for unsupervised gaze representation learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 3702-3711).
Gideon, J., Su, S., & Stent, S. (2022). Unsupervised Multi-View Gaze Representation Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 5001-5009).
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
Park, S. C., Park, M. K., & Kang, M. G. (2003). Super-resolution image reconstruction: a technical overview. IEEE signal processing magazine, 20(3), 21-36.
Kellnhofer, P., Recasens, A., Stent, S., Matusik, W., & Torralba, A. (2019). Gaze360: Physically unconstrained gaze estimation in the wild. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6912-6921).
Liu, Y., Liu, R., Wang, H., & Lu, F. (2021). Generalizing gaze estimation with outlier-guided collaborative adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 3835-3844).
Published
How to Cite
Issue
Section
Copyright (c) 2023 Seung-woo Ko; Bo Kyoung Park
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright holder(s) granted JSR a perpetual, non-exclusive license to distriute & display this article.