مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Persian Verion

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

video

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

sound

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Persian Version

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View:

584
مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Download:

0
مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Cites:

Information Journal Paper

Title

Presenting a Topic Classification Model of Health Scientific Productions Using Text-Mining Methods

Pages

  553-574

Abstract

 With the proliferation of the Internet and the rapid growth of electronic articles, Text Classification has become one of the key and important tools for data organization and management. In Text Classification a set of basic knowledge is provided to the system by learning. Then, new input documents enter to one of the subject groups. In Health literature due to wide variety of topics, preparing such a set of early education is a very time consuming and costly task. The purpose of this article is to present a hybrid model of learning (supervised and unsupervised) for the subject classification of Health scientific products that performs the classification operation without the need for an initial labeled set. To extract the thematic model of Health science texts from 2009 to 2019 at PubMed database, data mining and Text Mining were performed using Machine Learning. Based on Latent Dirichlet Allocation model, the data were analyzed and then the Support Vector Machine was used to classify the texts. In the findings of this study, the model was introduced in three main steps. In data preprocessing, the unnecessary words were eliminated from the data set and the accuracy of the proposed model increased. In the second step, the themes in the texts were extracted using the Latent Dirichlet Allocation method, and as a basic training set in step 3, the data were backed up by the Support Vector Machine Algorithm and the classifier learning was performed with the help of these topics. Finally, with the help of the classification, the subject of each document was identified. The results showed that the proposed model can build a better classification by combining unsupervised clustering properties and prior knowledge of the samples. Clustering on labeled samples with a specific similarity criterion merges related texts with prior knowledge, and the learning algorithm teaches classification by supervisory method. Combining classification and clustering can increase the accuracy of classification of Health texts.

Cites

  • No record.
  • References

  • No record.
  • Cite

    APA: Copy

    Shokouhian, Mahboobeh, ASEMI, ASEFEH, SHABANI, AHMAD, & CheshmehSohrabi, Mozaffar. (2020). Presenting a Topic Classification Model of Health Scientific Productions Using Text-Mining Methods. IRANIAN JOURNAL OF INFORMATION PROCESSING & MANAGEMENT (INFORMATION SCIENCES AND TECHNOLOGY), 35(2 ), 553-574. SID. https://sid.ir/paper/131047/en

    Vancouver: Copy

    Shokouhian Mahboobeh, ASEMI ASEFEH, SHABANI AHMAD, CheshmehSohrabi Mozaffar. Presenting a Topic Classification Model of Health Scientific Productions Using Text-Mining Methods. IRANIAN JOURNAL OF INFORMATION PROCESSING & MANAGEMENT (INFORMATION SCIENCES AND TECHNOLOGY)[Internet]. 2020;35(2 ):553-574. Available from: https://sid.ir/paper/131047/en

    IEEE: Copy

    Mahboobeh Shokouhian, ASEFEH ASEMI, AHMAD SHABANI, and Mozaffar CheshmehSohrabi, “Presenting a Topic Classification Model of Health Scientific Productions Using Text-Mining Methods,” IRANIAN JOURNAL OF INFORMATION PROCESSING & MANAGEMENT (INFORMATION SCIENCES AND TECHNOLOGY), vol. 35, no. 2 , pp. 553–574, 2020, [Online]. Available: https://sid.ir/paper/131047/en

    Related Journal Papers

    Related Seminar Papers

  • No record.
  • Related Plans

  • No record.
  • Recommended Workshops






    Move to top
    telegram sharing button
    whatsapp sharing button
    linkedin sharing button
    twitter sharing button
    email sharing button
    email sharing button
    email sharing button
    sharethis sharing button