مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

video

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

sound

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Persian Version

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View:

214
مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Download:

543
مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Cites:

Information Journal Paper

Title

Recognizing Transliterated English Words in Persian Texts

Pages

  84-92

Abstract

 One of the most important problems of text processing systems is the word mismatch problem. This results in limited access to the required information in information retrieval. This problem occurs in analyzing textual data such as news, or low accuracy in text classification and clustering. In this case, if the text-processing engine does not use similar/related words in the same sense, it may not be able to guide you to the appropriate result. Various statistical techniques have been proposed to bridge the vocabulary gap problem; e. g., if two words are used in similar contexts frequently, they have similar/related meanings. Synonym and similar words, however, are only one of the categories of related words that are expected to be captured by statistical approaches. Another category of related words is the pair of an original word in one language and its transliteration from another language. This kind of related words is common in non-English languages. In non-English texts, instead of using the original word from the target language, the writer may borrow the English word and only transliterate it to the target language. Since this kind of writing style is used in limited texts, the frequency of transliterated words is not as high as original words. As a result, available corpus-based techniques are not able to capture their concept. In this article, we propose two different approaches to overcome this problem: (1) using neural network-based transliteration, (2) using available tools that are used for machine translation/transliteration, such as Google Translate and Behnevis. Our experiments on a dataset, which is provided for this purpose, shows that the combination of the two approaches can detect English words with 89. 39% accuracy.

Multimedia

  • No record.
  • Cites

  • No record.
  • References

  • No record.
  • Cite

    APA: Copy

    Hoseinmardy, Ali, & Momtazi, Saeedeh. (2020). Recognizing Transliterated English Words in Persian Texts. JOURNAL OF INFORMATION SYSTEMS AND TELECOMMUNICATION (JIST), 8(2 (30)), 84-92. SID. https://sid.ir/paper/332826/en

    Vancouver: Copy

    Hoseinmardy Ali, Momtazi Saeedeh. Recognizing Transliterated English Words in Persian Texts. JOURNAL OF INFORMATION SYSTEMS AND TELECOMMUNICATION (JIST)[Internet]. 2020;8(2 (30)):84-92. Available from: https://sid.ir/paper/332826/en

    IEEE: Copy

    Ali Hoseinmardy, and Saeedeh Momtazi, “Recognizing Transliterated English Words in Persian Texts,” JOURNAL OF INFORMATION SYSTEMS AND TELECOMMUNICATION (JIST), vol. 8, no. 2 (30), pp. 84–92, 2020, [Online]. Available: https://sid.ir/paper/332826/en

    Related Journal Papers

  • No record.
  • Related Seminar Papers

  • No record.
  • Related Plans

  • No record.
  • Recommended Workshops






    Move to top
    telegram sharing button
    whatsapp sharing button
    linkedin sharing button
    twitter sharing button
    email sharing button
    email sharing button
    email sharing button
    sharethis sharing button