مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Persian Verion

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

video

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

sound

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Persian Version

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View:

2,547
مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Download:

0
مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Cites:

Information Journal Paper

Title

AN APPROACH FOR EXTRACTION OF KEYWORDS AND WEIGHTING WORDS FOR IMPROVEMENT FARSI DOCUMENTS CLASSIFICATION

Pages

  55-78

Abstract

 Due to ever-increasing information expansion and existing huge amount of unstructured documents, usage of keywords plays a very important role in INFORMATION RETRIEVAL. Because of a manually-extraction of keywords faces various challenges, their automated extraction seems inevitable. In this research, it has been tried to use a THESAURUS, (a structured word-net) to automatically extract them. Authors claim that extraction of more meaningful keywords out of documents can be attained via employment of a THESAURUS. The keywords extracted by applying THESAURUS, can improve the document classification. The steps to be taken to increase the comprehensiveness of search should be such that in the first step the stop words are removed and the remaining words are stemmed. Then, with the help of a THESAURUS are found words equivalent, hierarchical and dependent. Then, to determine the relative importance of words, a numerical WEIGHT is assigned to each word, which represents effect of the word on the subject matter and in comparison with other words used in the text. According to the steps above and with the help of a THESAURUS, an accurate text classification is performed. In this method, the KNN algorithm is used for the classification. Due to the simplicity and effectiveness of this algorithm (KNN), there is a great deal of use in the classification of texts. The cornerstone of KNN is to compare with the text trained and text tested to determine their similarity between. The empirical results show the quality and accuracy of extracted keywords are satisfiable for users. They also confirm that the document classification has been enhanced. In this research, it has been tried to extract more meaningful keywords out of texts using THESAURUS (which is a structured word-net) rather than not using it.

Cites

  • No record.
  • References

  • No record.
  • Cite

    APA: Copy

    Rezaei, Vahideh, MOHAMMADPOUR, MAJID, PARVIN, HAMID, & NEJATIAN, SAMAD. (2018). AN APPROACH FOR EXTRACTION OF KEYWORDS AND WEIGHTING WORDS FOR IMPROVEMENT FARSI DOCUMENTS CLASSIFICATION. SIGNAL AND DATA PROCESSING, 14(4 (SERIAL 34) ), 55-78. SID. https://sid.ir/paper/160830/en

    Vancouver: Copy

    Rezaei Vahideh, MOHAMMADPOUR MAJID, PARVIN HAMID, NEJATIAN SAMAD. AN APPROACH FOR EXTRACTION OF KEYWORDS AND WEIGHTING WORDS FOR IMPROVEMENT FARSI DOCUMENTS CLASSIFICATION. SIGNAL AND DATA PROCESSING[Internet]. 2018;14(4 (SERIAL 34) ):55-78. Available from: https://sid.ir/paper/160830/en

    IEEE: Copy

    Vahideh Rezaei, MAJID MOHAMMADPOUR, HAMID PARVIN, and SAMAD NEJATIAN, “AN APPROACH FOR EXTRACTION OF KEYWORDS AND WEIGHTING WORDS FOR IMPROVEMENT FARSI DOCUMENTS CLASSIFICATION,” SIGNAL AND DATA PROCESSING, vol. 14, no. 4 (SERIAL 34) , pp. 55–78, 2018, [Online]. Available: https://sid.ir/paper/160830/en

    Related Journal Papers

    Related Seminar Papers

  • No record.
  • Related Plans

  • No record.
  • Recommended Workshops






    Move to top
    telegram sharing button
    whatsapp sharing button
    linkedin sharing button
    twitter sharing button
    email sharing button
    email sharing button
    email sharing button
    sharethis sharing button