Disentanglement of Latent Factors of Real and Fake Appearance for Deepfake Face Manipulation Detection
DOI:
https://doi.org/10.47611/jsrhs.v12i1.4076Keywords:
Deepfake, representation learning, classificationAbstract
A deepfake video is a video in which generative models are used to alter the facial features to make the subject appear to be a different person. There are various ways to utilize such content, including those that are positive such as entertainment. However, it is also very easy to exploit deepfake videos for harmful use, including for spreading fake news or creating unwanted content. Thus there have been numerous attempts to detect whether a video has been manipulated using deepfake technology so as to prevent further harm. Previous approaches for achieving this purpose have attempted to detect discrepancies in the video frames through the use of techniques such as exploiting the temporal consistency between each frame with convolutional neural networks. Though this has produced adequate results, its accuracy is insufficient for real-world use. In this paper, we propose a novel method of using a convolutional neural network based autoencoder to detect whether a video is pristine or deepfake. Our method successfully disentangles latent factors of real and fake appearance to increase the classification accuracy while maintaining a relatively low time complexity, enhancing real-world applicability. Results from extensive experimentation show significant improvement from state-of-the-art-methods by upwards of 18.51%.
Downloads
References or Bibliography
Blitz, M. J. (2018). Lies, line drawing, and deep fake news. Okla. L. Rev., 71, 59.
Deepfakes: What are they, and why are they dangerous?[Website]. (2022, Otc 3). https://wyche.com/insights/blog/posts/deepfakes-what-are-they-and-why-are-they-dangerous
Deepfake video of Volodymyr Zelensky surrendering surfaces on social media[Website]. (2022 Oct 3). https://www.youtube.com/watch?v=X17yrEV5sl4
Güera, D., & Delp, E. J. (2018, November). Deepfake video detection using recurrent neural networks. In 2018 15th IEEE international conference on advanced video and signal based surveillance (AVSS) (pp. 1-6). IEEE. https://doi.org/10.1109/AVSS.2018.8639163
de Lima, O., Franklin, S., Basu, S., Karwoski, B., & George, A. (2020). Deepfake detection using spatiotemporal convolutional networks. arXiv preprint arXiv:2006.14749. https://doi.org/10.48550/arXiv.2006.14749
Zhao, H., Zhou, W., Chen, D., Wei, T., Zhang, W., & Yu, N. (2021). Multi-attentional deepfake detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2185-2194).
https://doi.org/10.1109/cvpr46437.2021.00222
Li, Y., Yang, X., Sun, P., Qi, H., & Lyu, S. (2020). Celeb-df: A large-scale challenging dataset for deepfake forensics. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3207-3216). https://doi.org/10.1109/cvpr42600.2020.00327
Deepfake modulated video[Website]. (2022 Oct 3). https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=55
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84-90. https://doi.org/10.1145/3065386
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. https://doi.org/10.1109/cvpr.2016.90
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778). https://doi.org/10.1109/cvpr.2016.90
Chavda, A., Dsouza, J., Badgujar, S., & Damani, A. (2021, April). Multi-stage CNN architecture for face mask detection. In 2021 6th International Conference for Convergence in Technology (i2ct) (pp. 1-8). IEEE. https://doi.org/10.1109/i2ct51068.2021.9418207
Sun, Y., Wang, X., & Tang, X. (2013). Hybrid deep learning for face verification. In Proceedings of the IEEE international conference on computer vision (pp. 1489-1496). https://doi.org/10.1109/iccv.2013.188
Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114. https://doi.org/10.48550/arXiv.1312.6114
Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33, 6840-6851.
https://doi.org/10.48550/arXiv.2006.11239
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2020). Generative adversarial networks. Communications of the ACM, 63(11), 139-144.
https://doi.org/10.48550/arXiv.1406.2661
Badale, A., Castelino, L., Darekar, C., & Gomes, J. (2018). Deepfake detection using neural networks. In 15th IEEE international conference on advanced video and signal based surveillance (AVSS).
https://doi.org/10.7717/peerj-cs.881
Han, D., Yun, S., Heo, B., & Yoo, Y. (2021). Rethinking channel dimensions for efficient model design. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition (pp. 732-741).
https://doi.org/10.48550/arXiv.2007.00992
Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4700-4708).
https://doi.org/10.48550/arXiv.1608.06993
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
https://doi.org/10.48550/arXiv.1412.6980
Hossin, M., & Sulaiman, M. N. (2015). A review on evaluation metrics for data classification evaluations. International journal of data mining & knowledge management process, 5(2), 1. https://doi.org/10.5121/ijdkp.2015.5201
Carreira, J., & Zisserman, A. (2017). Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 6299-6308). https://doi.org/10.1109/CVPR.2017.502.
Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., & Paluri, M. (2018). A closer look at spatiotemporal convolutions for action recognition. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 6450-6459). https://doi.org/10.1109/CVPR.2018.00675
You Won’t Believe What Obama Says In This Video![Website]. (2022 Oct 3).
https://www.youtube.com/watch?v=cQ54GDm1eL0
This is not Morgan Freeman - A Deepfake Singularity[Website]. (2022 Oct 3). https://www.youtube.com/watch?v=oxXpB9pSETo
박영선인데 박영선 아니다…"영상을 믿지 마세요" (2019.11.13/뉴스데스크/MBC)[Website]. (2022 Oct 3).
https://www.youtube.com/watch?v=hqZhH9Qr4B0
Silvio Santos apresentando o Jornal Nacional[Website]. (2022 Oct 3). https://www.youtube.com/watch?v=VDqTIThdj1s
Published
How to Cite
Issue
Section
Copyright (c) 2023 Suh-Yoon Hong, Dayul Park; GeunJung Yi
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright holder(s) granted JSR a perpetual, non-exclusive license to distriute & display this article.