مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Persian Verion

Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

video

Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

sound

Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Persian Version

Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View:

128
Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Download:

0
Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Cites:

Information Journal Paper

Title

Optimizing Confusion of Authors’,Names in Persian Articles Using Random Forest Algorithm

Pages

  203-220

Abstract

 Purpose: Name is a key factor for distinguishing authors. In the academic databases that store information on papers, searching for the name of the article author is one of the most important elements in increasing visibility and the quantitative studies in the field of Scientology including the amount of citing works. The diversity of writings is one of the issues that lead to challenges in various scientific fields. In addition, the lack of writing standards in the Persian language and the lack of keyboards and standard codes, the habit of simply writing are among the factors that lead to the author's name disambiguation. Also, the spelling mistakes that occur by the writers in writing the name lead to the creation of different forms of writing for a single name. Considering the importance of solving the confusion of authors’,names in Persian articles, this paper aims to propose a framework to solve the problem of confusion and dispersion of authors' names in Persian articles, which has led to a rupture and lack of comprehensiveness in information retrieval. Methodology: The present research is an applied scientometrics method carried out by documentary procedure, and the required data is collected from the ISC database. The initial statistical population is 913 records during the period 2015 to 2017. The proposed framework consists of three stages: searching, matching, and grouping. In this regard, after initial pre-processing and feature extraction, the search operation is performed to find records that are potentially likely to be identical. Our method extracts two types of features including internal and external. The internal feature has been extracted from the author’, s information like first name, last name, affiliation, email, and co-authors. In addition, the external feature uses the scientific history of authors like articles and research interests. Next, in the search phase, the records that are potentially the same are identified. We propose a new method called Farsi-Soundex, which has been inspired by the well-known Soundex to categorize potential unique names. The same records are then found through further investigation in the adaptation phase, which is based on random forests. Therefore, the input of the matching stage is a group of records that have been detected the same based on the Farsi-Soundex algorithm. To specify whether these records are the same or not, a Random forest algorithm has been applied to them. Finally, in the grouping stage, all the records that have been identified as the same using random forest are placed in one group by a hash-based algorithm. Findings: The internal features of Email address, last name, and first name are the most significant features to optimize name-writing confusion. Also, the obtained results show the external features of the main subject and sub-subject provide the least effective features for solving the author name disambiguation problem in the academic database. In addition, using a random forest as a classifier in the matching phase, with an accuracy of over 99%, can solve the problem of confusion in writing the authors' names. Conclustion: Results show the high efficiency of our framework in uniformity of names according to the criteria of accuracy, recall, and F value compared to the support vector machine, the nearest neighbor, and genetics. Our proposed method can be applied to scientific databases to standardize the names of the authors. In the future, we are investigating the efficiency of our proposed framework in a non-stationary environment in which the distribution of data may be changed over time.

Cites

  • No record.
  • References

  • No record.
  • Cite

    APA: Copy

    MOZAFARI, NILOOFAR, & Vara, Narjes. (2023). Optimizing Confusion of Authors’,Names in Persian Articles Using Random Forest Algorithm. SCIENTOMETRICS RESEARCH JOURNAL, 8(2 (16) ), 203-220. SID. https://sid.ir/paper/1020957/en

    Vancouver: Copy

    MOZAFARI NILOOFAR, Vara Narjes. Optimizing Confusion of Authors’,Names in Persian Articles Using Random Forest Algorithm. SCIENTOMETRICS RESEARCH JOURNAL[Internet]. 2023;8(2 (16) ):203-220. Available from: https://sid.ir/paper/1020957/en

    IEEE: Copy

    NILOOFAR MOZAFARI, and Narjes Vara, “Optimizing Confusion of Authors’,Names in Persian Articles Using Random Forest Algorithm,” SCIENTOMETRICS RESEARCH JOURNAL, vol. 8, no. 2 (16) , pp. 203–220, 2023, [Online]. Available: https://sid.ir/paper/1020957/en

    Related Journal Papers

  • No record.
  • Related Seminar Papers

  • No record.
  • Related Plans

  • No record.
  • Recommended Workshops






    Move to top