Optimizing Confusion of Authors’,Names in Persian Articles Using Random Forest Algorithm

Q: How can I download an article?

To download an article from SID, first log in to the site, search for the article title, and click on the 'Download Article' option.

Q: How can I download an ISI article?

To download an ISI article on SID, enter the keyword or article title in the search bar, view the relevant results, click on the desired article, and select the 'Download Article' option.

Q: How can I access the SID database?

To access the SID database, visit SID.ir, create an account, and log in to access scientific resources.

Q: Is downloading articles from SID free?

Some articles on SID are available for free, while others require payment. Details are specified on the article's page.

MOZAFARI NILOOFAR; Vara Narjes

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Journal Paper

Paper Information

Journal: Scientometrics Research Journal Year:2023 | Volume:8 | Issue:2 (16) Page(s): 203-220

Download Full-Text

Persian Verion

View:

134

Download:

Cites:

Information Journal Paper

Title

Optimizing Confusion of Authors’,Names in Persian Articles Using Random Forest Algorithm

Author(s)

MOZAFARI NILOOFAR | Vara Narjes | Issue Writer Certificate

Keywords

Name ambiguity

Article authors Persian articles

Random forest algorithm

Name Authority

Farsi-Soundex algorithm

Abstract

Purpose: Name is a key factor for distinguishing authors. In the academic databases that store information on papers, searching for the name of the article author is one of the most important elements in increasing visibility and the quantitative studies in the field of Scientology including the amount of citing works. The diversity of writings is one of the issues that lead to challenges in various scientific fields. In addition, the lack of writing standards in the Persian language and the lack of keyboards and standard codes, the habit of simply writing are among the factors that lead to the author's name disambiguation. Also, the spelling mistakes that occur by the writers in writing the name lead to the creation of different forms of writing for a single name. Considering the importance of solving the confusion of authors’,names in Persian articles, this paper aims to propose a framework to solve the problem of confusion and dispersion of authors' names in Persian articles, which has led to a rupture and lack of comprehensiveness in information retrieval. Methodology: The present research is an applied scientometrics method carried out by documentary procedure, and the required data is collected from the ISC database. The initial statistical population is 913 records during the period 2015 to 2017. The proposed framework consists of three stages: searching, matching, and grouping. In this regard, after initial pre-processing and feature extraction, the search operation is performed to find records that are potentially likely to be identical. Our method extracts two types of features including internal and external. The internal feature has been extracted from the author’, s information like first name, last name, affiliation, email, and co-authors. In addition, the external feature uses the scientific history of authors like articles and research interests. Next, in the search phase, the records that are potentially the same are identified. We propose a new method called Farsi-Soundex, which has been inspired by the well-known Soundex to categorize potential unique names. The same records are then found through further investigation in the adaptation phase, which is based on random forests. Therefore, the input of the matching stage is a group of records that have been detected the same based on the Farsi-Soundex algorithm. To specify whether these records are the same or not, a Random forest algorithm has been applied to them. Finally, in the grouping stage, all the records that have been identified as the same using random forest are placed in one group by a hash-based algorithm. Findings: The internal features of Email address, last name, and first name are the most significant features to optimize name-writing confusion. Also, the obtained results show the external features of the main subject and sub-subject provide the least effective features for solving the author name disambiguation problem in the academic database. In addition, using a random forest as a classifier in the matching phase, with an accuracy of over 99%, can solve the problem of confusion in writing the authors' names. Conclustion: Results show the high efficiency of our framework in uniformity of names according to the criteria of accuracy, recall, and F value compared to the support vector machine, the nearest neighbor, and genetics. Our proposed method can be applied to scientific databases to standardize the names of the authors. In the future, we are investigating the efficiency of our proposed framework in a non-stationary environment in which the distribution of data may be changed over time.

Cites

No record.

References

No record.

Cite

APA: Copy

MOZAFARI, NILOOFAR, & Vara, Narjes. (2023). Optimizing Confusion of Authors’,Names in Persian Articles Using Random Forest Algorithm. SCIENTOMETRICS RESEARCH JOURNAL, 8(2 (16) ), 203-220. SID. https://sid.ir/paper/1020957/en

Vancouver: Copy

MOZAFARI NILOOFAR, Vara Narjes. Optimizing Confusion of Authors’,Names in Persian Articles Using Random Forest Algorithm. SCIENTOMETRICS RESEARCH JOURNAL[Internet]. 2023;8(2 (16) ):203-220. Available from: https://sid.ir/paper/1020957/en

IEEE: Copy

NILOOFAR MOZAFARI, and Narjes Vara, “Optimizing Confusion of Authors’,Names in Persian Articles Using Random Forest Algorithm,” SCIENTOMETRICS RESEARCH JOURNAL, vol. 8, no. 2 (16) , pp. 203–220, 2023, [Online]. Available: https://sid.ir/paper/1020957/en

Related Journal Papers

No record.

Related Seminar Papers

No record.

Related Plans

No record.

Recommended Workshops