مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Seminar Paper

Paper Information

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

video

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

sound

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Persian Version

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View:

42
مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Download:

0
مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Cites:

Information Seminar Paper

Title

ExaPPC: a Large-Scale Persian Paraphrase Detection Corpus

Pages

  -

Abstract

 This paper describes the creation of Exa Persian Paraphrase Corpus (ExaPPC), a large paraphrase corpus consisting of monolingual sentence-level paraphrases using different sources. ExaPPC is the first large-scale paraphrase dataset used in Persian paraphrase detection to the best of our knowledge. There are 2. 3M labeled sentence pairs in the corpus consisting of a 1M paraphrase label and 1. 3M non-paraphrase label. Efforts were made manually and semi-automatically to construct this corpus using techniques such as subtitle alignment, translating existing parallel English-Persian corpus and similarity corpus on English tweets. In addition to enriching the corpus, candidate sentence pairs among tweets have been extracted via NLP tools and labeled by two Persian native speakers. The advantages of this corpus compared to the existing ones are the number of pair sentences, sentence Length variation and textual diversity, including formal and dialogue sentences. The result on the provided test corpus shows that ExaPPC achieves 94% accuracy on paraphrase detection task. The corpus is publicly available1.

Video

Cites

  • No record.
  • References

  • No record.
  • Cite

    APA: Copy

    Sadeghi, Reyhaneh, Karbasi, Hamed, & AKBARI, AHMAD. (2022). ExaPPC: a Large-Scale Persian Paraphrase Detection Corpus. INTERNATIONAL CONFERENCE ON WEB RESEARCH. SID. https://sid.ir/paper/949631/en

    Vancouver: Copy

    Sadeghi Reyhaneh, Karbasi Hamed, AKBARI AHMAD. ExaPPC: a Large-Scale Persian Paraphrase Detection Corpus. 2022. Available from: https://sid.ir/paper/949631/en

    IEEE: Copy

    Reyhaneh Sadeghi, Hamed Karbasi, and AHMAD AKBARI, “ExaPPC: a Large-Scale Persian Paraphrase Detection Corpus,” presented at the INTERNATIONAL CONFERENCE ON WEB RESEARCH. 2022, [Online]. Available: https://sid.ir/paper/949631/en

    Related Journal Papers

  • No record.
  • Related Seminar Papers

  • No record.
  • Related Plans

  • No record.
  • Recommended Workshops






    مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources
    مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources
    مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources
    مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources
    مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources
    مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources
    مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources
    File Not Exists.
    Move to top
    telegram sharing button
    whatsapp sharing button
    linkedin sharing button
    twitter sharing button
    email sharing button
    email sharing button
    email sharing button
    sharethis sharing button