مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

video

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

sound

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Persian Version

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View:

51
مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Download:

33
مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Cites:

Information Journal Paper

Title

A Corpus for Evaluation of Cross Language Text Re-use Detection Systems

Pages

  169-179

Abstract

 In recent years, the availability of documents through the Internet along with automatic translation systems have increased plagiarism, especially across languages. Cross-lingual plagiarism occurs when the source or original text is in one language and the plagiarized or re-used text is in another language. Various methods for automatic Text re-use detection across languages have been developed whose objective is to assist human experts in analyzing documents for plagiarism cases. For evaluating the performance of these systems and algorithms, standard evaluation resources are needed. To construct cross lingual plagiarism detection corpora, the majority of earlier studies have paid attention to English and other European language pairs, and have less focused on low resource languages. In this paper, we investigate a method for constructing an English-Persian cross-language plagiarism detection Corpus based on parallel bilingual sentences that artifi cially generate passages with various degrees of paraphrasing. The plagiarized passages are inserted into topically related English and Persian Wikipedia articles in order to have more realistic text documents. The proposed approach can be applied to other less-resourced languages. In order to evaluate the compiled Corpus, both intrinsic and extrinsic evaluation methods were employed. So, the compiled Corpus can be suitably included into an evaluation framework for assessing cross-language plagiarism detection systems. Our proposed Corpus is free and publicly available for research purposes.

Multimedia

  • No record.
  • Cites

  • No record.
  • References

  • No record.
  • Cite

    APA: Copy

    Mohtaj, Salar, & Asghari, Habibollah. (2022). A Corpus for Evaluation of Cross Language Text Re-use Detection Systems. JOURNAL OF INFORMATION SYSTEMS AND TELECOMMUNICATION (JIST), 10(3 (39)), 169-179. SID. https://sid.ir/paper/991929/en

    Vancouver: Copy

    Mohtaj Salar, Asghari Habibollah. A Corpus for Evaluation of Cross Language Text Re-use Detection Systems. JOURNAL OF INFORMATION SYSTEMS AND TELECOMMUNICATION (JIST)[Internet]. 2022;10(3 (39)):169-179. Available from: https://sid.ir/paper/991929/en

    IEEE: Copy

    Salar Mohtaj, and Habibollah Asghari, “A Corpus for Evaluation of Cross Language Text Re-use Detection Systems,” JOURNAL OF INFORMATION SYSTEMS AND TELECOMMUNICATION (JIST), vol. 10, no. 3 (39), pp. 169–179, 2022, [Online]. Available: https://sid.ir/paper/991929/en

    Related Journal Papers

  • No record.
  • Related Seminar Papers

  • No record.
  • Related Plans

  • No record.
  • Recommended Workshops






    Move to top
    telegram sharing button
    whatsapp sharing button
    linkedin sharing button
    twitter sharing button
    email sharing button
    email sharing button
    email sharing button
    sharethis sharing button