مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Persian Verion

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

video

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

sound

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Persian Version

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View:

619
مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Download:

0
مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Cites:

Information Journal Paper

Title

A Distributed Method for Extracting Persian-English Chunks

Pages

  42-48

Abstract

 This research is in the field of machine translation and in relation to extraction of Persian-English chunks. In this regard, the most important challenge is that the operation must be carried out on large corpus; therefore, it requires distributed computing along with big data analysis techniques and tools. In fact, when translating text, we are usually confronted with a lot of chunks that we need to find the corresponding chunks of each one in the target language and insert it in our translation; this is accomplished by locating it in a corpus that contain the chunks and their corresponding translations. The existing methods, perform this operations in a non-distributed way, therefore while they run slowly, they cannot use a very large corpus. To overcome this shortcoming, in this research a distributed method has been presented, which also takes distance between the sections of chunks into account, and it lemmatization the words in the corpus as well. The proposed method extracts all possible chunks from the input sentences in the monolingual corpus and uses the correlation coefficient to translate those chunks using the bilingual corpus. We implemented the proposed algorithm in a platform consisting of a computing cluster with sixty-four GB of memory and a twenty-four-core processor in Spark. The incorporated experimental data was a Persian and an English monolingual corpus along with an English-Persian bilingual corpus, each of which containing 100, 000 sentences. Experimental results show that run time could greatly be reduced, and the quality of translation is also significantly improved.

Cites

  • No record.
  • References

  • No record.
  • Cite

    APA: Copy

    Mirmobin, S.S., GHASEMZADEH, M., & Nezarat, A.. (2020). A Distributed Method for Extracting Persian-English Chunks. NASHRIYYAH -I MUHANDISI -I BARQ VA MUHANDISI -I KAMPYUTAR -I IRAN, B- MUHANDISI -I KAMPYUTAR, 18(1 ), 42-48. SID. https://sid.ir/paper/228425/en

    Vancouver: Copy

    Mirmobin S.S., GHASEMZADEH M., Nezarat A.. A Distributed Method for Extracting Persian-English Chunks. NASHRIYYAH -I MUHANDISI -I BARQ VA MUHANDISI -I KAMPYUTAR -I IRAN, B- MUHANDISI -I KAMPYUTAR[Internet]. 2020;18(1 ):42-48. Available from: https://sid.ir/paper/228425/en

    IEEE: Copy

    S.S. Mirmobin, M. GHASEMZADEH, and A. Nezarat, “A Distributed Method for Extracting Persian-English Chunks,” NASHRIYYAH -I MUHANDISI -I BARQ VA MUHANDISI -I KAMPYUTAR -I IRAN, B- MUHANDISI -I KAMPYUTAR, vol. 18, no. 1 , pp. 42–48, 2020, [Online]. Available: https://sid.ir/paper/228425/en

    Related Journal Papers

    Related Seminar Papers

  • No record.
  • Related Plans

  • No record.
  • Recommended Workshops






    Move to top
    telegram sharing button
    whatsapp sharing button
    linkedin sharing button
    twitter sharing button
    email sharing button
    email sharing button
    email sharing button
    sharethis sharing button