مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Persian Verion

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

video

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

sound

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Persian Version

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View:

736
مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Download:

182
مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Cites:

Information Journal Paper

Title

COMPARING K-MEANS CLUSTERS ON PARALLEL PERSIAN-ENGLISH CORPUS

Pages

  203-208

Keywords

PRINCIPAL COMPONENT ANALYSIS (PCA)Q2

Abstract

 This paper compares clusters of aligned Persian and English texts obtained from K-MEANS method. Text CLUSTERING has many applications in various fields of natural language processing. So far, much English documents CLUSTERING research has been accomplished. Now this question arises, are the results of them extendable to other languages? Since the goal of document CLUSTERING is grouping of documents based on their content, it is expected that the answer to this question is yes. On the other hand, many differences between various languages can cause the answer to this question to be no. This research has focused on kmeans that is one of the basic and popular document CLUSTERING methods. We want to know whether the clusters of aligned Persian and English texts obtained by the K-MEANS are similar. To find an answer to this question, MIZAN ENGLISH-PERSIAN PARALLEL CORPUS was considered as benchmark. After features extraction using text mining techniques and applying the PCA dimension reduction method, the K-MEANS CLUSTERING was performed. The morphological difference between English and Persian languages caused the larger feature vector length for Persian. So almost in all experiments, the English results were slightly richer than those in Persian. Aside from these differences, the overall behavior of Persian and English clusters was similar. These similar behaviors showed that results of K-MEANS research on English can be expanded to Persian. Finally, there is hope that despite many differences between various languages, CLUSTERING methods may be extendable to other languages.

Cites

  • No record.
  • References

  • No record.
  • Cite

    APA: Copy

    KHAZAEI, A., & GHASEMZADEH, M.. (2015). COMPARING K-MEANS CLUSTERS ON PARALLEL PERSIAN-ENGLISH CORPUS. JOURNAL OF ARTIFICIAL INTELLIGENCE AND DATA MINING, 3(2), 203-208. SID. https://sid.ir/paper/255378/en

    Vancouver: Copy

    KHAZAEI A., GHASEMZADEH M.. COMPARING K-MEANS CLUSTERS ON PARALLEL PERSIAN-ENGLISH CORPUS. JOURNAL OF ARTIFICIAL INTELLIGENCE AND DATA MINING[Internet]. 2015;3(2):203-208. Available from: https://sid.ir/paper/255378/en

    IEEE: Copy

    A. KHAZAEI, and M. GHASEMZADEH, “COMPARING K-MEANS CLUSTERS ON PARALLEL PERSIAN-ENGLISH CORPUS,” JOURNAL OF ARTIFICIAL INTELLIGENCE AND DATA MINING, vol. 3, no. 2, pp. 203–208, 2015, [Online]. Available: https://sid.ir/paper/255378/en

    Related Journal Papers

    Related Seminar Papers

  • No record.
  • Related Plans

  • No record.
  • Recommended Workshops






    Move to top
    telegram sharing button
    whatsapp sharing button
    linkedin sharing button
    twitter sharing button
    email sharing button
    email sharing button
    email sharing button
    sharethis sharing button