Cross-Lingual Data Augmentation Techniques: Insights from Multilingual Back Translation
DOI:
https://doi.org/10.47611/jsrhs.v13i3.7305Keywords:
data augementation, natural language processing, back translation, training datasets, multilingual back translationAbstract
This paper investigates the effectiveness of utilizing multiple chains of back translation compared to the traditional method of single-chain back translation for enhancing data diversity in natural language processing (NLP). We explore how multiple rounds of translation and back translation across different languages contribute to enriching the training dataset with diverse linguistic variations. We evaluate the effectiveness of multilingual back translation in achieving better data diversity by reporting the BLEU scores of different back translation techniques. Additionally, we investigate the impact of using languages from different language families and the resulting effect on the diversity of data. Our findings highlight the importance of leveraging multiple chains and multiple language families of back translation for augmenting datasets and provide insights for future research and advancement in data augmentation techniques for NLP.
Downloads
References or Bibliography
Hayashi, T., Watanabe, S., Zhang, Y., Toda, T., Hori, T., Astudillo, R., & Takeda, K. (2018, December). Back-translation-style data augmentation for end-to-end ASR. In 2018 IEEE Spoken Language Technology Workshop (SLT) (pp. 426-433). IEEE.
Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning, Andrew Y Ng, and Christopher Potts. “Recursive deep models for semantic compositionality over a sentiment treebank”. In: Proceedings of the 2013 conference on empirical methods in natural language processing. 2013, pp. 1631–1642
Papineni, K., Roukos, S., Ward, T., & Zhu, W. J. (2002, July). Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics (pp. 311-318).
Published
How to Cite
Issue
Section
Copyright (c) 2024 Harini Champooranan; Dr. Solomon Ubani

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright holder(s) granted JSR a perpetual, non-exclusive license to distriute & display this article.