مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

video

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

sound

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Persian Version

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View:

4
مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Download:

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Cites:

Information Journal Paper

Title

Multi-type Obfuscation Corpus for CrossLingual Plagiarism Detection

Pages

  59-74

Abstract

 In recent years, due to the high availability of documents through the Internet, plagiarism is becoming a serious issue in many fields of research. Moreover, the availability of machine translation systems facilitates the re-use of textual content across languages. So, the detection of plagiarism in cross-lingual cases is now of great importance especially when the source and target language are different. Various methods for automatic detection of text reuse have been developed whose objective is to help human experts investigate suspicious documents for plagiarism cases. For evaluating the performance of theses plagiarism detection systems and algorithms, we need to construct plagiarism detection corpora. In this paper, we propose an English-Persian plagiarism detection corpus comprised of different types of paraphrasing. The goal is to simulate what would be done by humans to conceal plagiarized passages after translating the text into the target language. The proposed corpus includes seven types of paraphrasing methods that cover (but not limited to) all of the obfuscation types in the previous works into one integrated CLPD corpus. To evaluate the corpus, an extrinsic evaluation approach has been applied by executing a wide variety of plagiarism detection algorithms as downstream tasks on the proposed corpus. The results show that the performance of the algorithms decreases by increasing the obfuscation complexity.

Multimedia

  • No record.
  • Cites

  • No record.
  • References

  • No record.
  • Cite

    Related Journal Papers

  • No record.
  • Related Seminar Papers

  • No record.
  • Related Plans

  • No record.
  • Recommended Workshops






    Move to top
    telegram sharing button
    whatsapp sharing button
    linkedin sharing button
    twitter sharing button
    email sharing button
    email sharing button
    email sharing button
    sharethis sharing button