April 6, 2021

The growing threat of cross-language plagiarism and the challenges in its detection

What is cross-language plagiarism?

Cross-language plagiarism refers to the kind of plagiarism or cheating where the source content is in one language while the plagiarised content is in another. In other words, it is plagiarism by translation.

In recent years, this kind of plagiarism has been on the rise given the free and easy access to online resources and free-to-use translation tools. It has become increasingly simple for students and people in general, to access content from anywhere in the world irrespective of the language and reproduce it in a different language using translation tools.

Additionally, the growth in multi-lingual people has further led to an increase in cross-language plagiarism as many find it easy to read and research in one language whilst writing in another language.

Research by Chris Park (2003), Stevens and Stevens (1987), Davis et al. (1992), Love and Simmons (1998), Silverman (2002) and Straw (2002) point to a variety of factors that can lead to plagiarism – lack of understanding about what constitutes plagiarism, growing competition, poor time management, inadequate writing skills and lack of deterrence are some of them.

Can cross-language plagiarism be detected?

Cross-language plagiarism detection has largely been an unexplored domain. One of the main reasons for this is because detection of such instances of plagiarism is extremely difficult given that the original text is no longer in the same language as the re-produced text. Hence, traditional similarity or plagiarism detection solutions are unable to correctly identify this type of plagiarism.

However, in light of the growing threat to academic integrity from this type of plagiarism, it has become extremely important to find ways to detect and deter plagiarism by translation.

Researchers have now come up with different methods to estimate if two sets of texts written in different languages are essentially copies of each other. For example, a model proposed by Barron-Cedeno, based on statistical machine translation technology. Another model is MLPlag (2008), proposed by Ceska. In this method, translations are compared at document level. While progress has been made within the larger more commonly used languages, for less-resourced and remote languages, there is still much to be desired.

Solutions that help detect cross-language plagiarism

While it’s reassuring to see that progress is being made in this sphere, for most of us working in the education sector or in professions where we need to be able to detect plagiarism, what is key is access to a simple and efficient solution that is easy to use and works at the click of a button.

Ouriginal, a pioneer in text-matching and similarity-detection solutions, offers exactly this. Its Cross-Language Text Matching (CLTM) feature is now able to identify matching content that has been translated from one language to another using a proprietary algorithm. The highly specialized algorithm identifies segments of texts in different languages that appear to be similar. The algorithm then identifies sentences that contain these segments to check if the sentences themselves are translations of each other, thus detecting potential matches.

See the example below: