A Comparative Evaluation of Transformers in Seq2Seq Code Mutation: Non-Pre-trained Vs. Pre-trained Variants

Zheung Yik Loh; Wan Mohd Nasir  Wan Kadir; Noraini  Ibrahim

doi:10.37934/ard.123.1.4565

Authors

Zheung Yik Loh Department of Software Engineering, Faculty of Computing, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia
Wan Mohd Nasir Wan Kadir Department of Software Engineering, Faculty of Computing, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia
Noraini Ibrahim Department of Software Engineering, Faculty of Computing, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia

DOI:

https://doi.org/10.37934/ard.123.1.4565

Keywords:

Mutation testing, transformer, seq2seq, code mutation , mutant, trivial mutant

Abstract

Mutation testing (MT) is a gold standard way to assess the efficacy of software test suites. However, the accuracy of mutation score is affected by the presence of trivial mutants which can be “killed” by even the simplest and most basic test suites. Since the existence of trivial mutants is due to the fixed set of mutation operators that constraints the complexity of code mutations, state-of-the-art recurrent neural network (RNN) model is used for sequence-to-sequence (seq2seq) code mutation without relying on mutation operators. However, the quality of the produced mutants is affected by the limitation of RNN in interpreting the relationships between far-apart tokens of the code to be mutated. Transformers that do not have this limitation, have superseded RNN in seq2seq machine translation domains such as natural language processing (NLP). However, to the best of our knowledge, there is still no research that investigates the performance of transformers in seq2seq code mutation. This paper presents a comparison study that involves different variants of the non-pre-trained transformers, the transformers pre-trained with source code, the transformers pre-trained with natural language, and the state-of-the-art RNN model in seq2seq code mutation. The results show that transformers pre-trained with source code, especially CodeT5, demonstrated the best performance, achieving an average character n-gram F-score (CHRF) of 82.89 and superior code mutation complexity. Since the performance of transformers in seq2seq code mutation has not been previously investigated, the primary contribution of this paper is the best performing transformer for seq2seq code mutation. It establishes the foundation for the future research that proposes an integrated solution which addresses both the high-cost problem and the inaccurate mutation score problem of MT simultaneously, unlike existing solutions which only tackle one of the MT problems and give rise to other MT problems.