مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Persian Verion

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

video

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

sound

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Persian Version

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View:

412
مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Download:

0
مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Cites:

Information Journal Paper

Title

Extraction of Core Medical Terms Using Frequency Approach

Pages

  227-244

Abstract

 Over the past few decades, with the advancement of technology, the use of corpora in linguistic studies has dramatically increased. Linguistic corpuses provide linguistic experts with the possibility to apply different methods for linguistic analysis by providing large collections. Most of the studies that have been done so far have been in English, French, and Japanese, and limited research has been conducted in Farsi language, and this lack, especially in specialized fields such as medical sciences, mathematics, science, tourism and so on is so tangible. So far most of the term or vocabulary extractions in Farsi have been done by using non-automatic methods and through reading and collecting data by the researchers; however, due to the technical properties of Farsi language, using non-Farsi term extractors which have been quite successful in other languages such as English, French and Japanese, have been impossible to use in Farsi so far. This is because of the particularities and specific features of languages. Each of these extractors is defined based on the features and properties of language they have been used for. In order to improve teaching materials in Farsi, paying attention to this problem was of paramount importance and we decided to apply some of these extraction methods and devise an extraction method for Farsi language which works properly. Since Iran’ s universities admit a lot of non-native Farsi international students annually whose goal is to study at fields such as medicine, engineering and humanities, preparing standard modern teaching materials in Farsi, which are based on the most modern technologies, is significantly important. The purpose of this study was to improve the resources used in teaching Farsi language at university levels, especially for non-native Farsi speakers and to explore the feasibility of using frequency-based methods in the automatic extraction of core medical terms and comparing the capabilities of each method. Findings of the research reveal the strengths and weaknesses of these methods in Farsi language and explore the possibility of using each of these methods in Farsi and provide technical solutions for the improvement of the results. Research Methodology: The frequency counting approaches utilized in this study included the general and a specialized corpus which was created by the researcher. The general corpus used in this study was the Hamshahri corpus and the specialized researcher made corpus included: texts from the science books of grades 1-4 of senior high schools and grades 1-3 of junior high schools in Iran, science courses in Imam Khomeini Farsi language center, general medicine texts from journals and internet. After the formation of the corpus, preparation and tokenization, the research introduced two methods of frequency i. e. classical and modern categories. Then, in the next step, the capabilities of each method were compared. The methods used in the classical frequency approach were the frequency of the main general corpus, the frequency of the specialized corpus and their improved approaches. Also, modern methods used in the research were: PMI and Chi-square. Pearson correlation analysis and trend analysis were also used to compare the methods used in the research. Research findings The results showed that classical methods in their general form, have little accuracy in identifying specialized vocabulary, however, by applying some techniques, it was possible to improve the process of selecting specialized vocabulary, among which the best performance related to the improved numerical method in the specialized corpus which resulted in extracting 60% of the specialized vocabulary in the first 50 highfrequency words. This result improved by increasing the scope of the study to 100, 150 and 200 first extracted words and it was observed that the percentage of specialized vocabulary identified increased by about 75%. Moreover, the results obtained for modern methods indicated that these methods can be used in Farsi. It can be seen that chi-square method with 32% and PMI method with 52% extraction of specialized vocabulary in the first 50 high frequency words showed a good function in automatic term extraction in Farsi. They automatically detected specialized vocabulary and by increasing the scope of the study to 200 first words, these percentages improved. Conclusion: The results of the research showed that frequency-based methods are applicable in Farsi. If we use classic frequency methods, we will need to utilize improved classic frequency methods in order to increase the accuracy of extracted words. Also, in order to achieve reliable results in modern frequency approaches, it is necessary to choose large enough vocabulary scope for the extracted vocabulary.

Cites

  • No record.
  • References

  • No record.
  • Cite

    APA: Copy

    ZOLFAGHAR KONDORI, ZOHREH, MOSAVI MIANGAH, TAYEBEH, & Rowshan, Belgheys. (2019). Extraction of Core Medical Terms Using Frequency Approach. JOURNAL OF PERSIAN LANGUAGE TEACHING TO NON-PERSIAN SPEAKERS, 8(1 (17) ), 227-244. SID. https://sid.ir/paper/398008/en

    Vancouver: Copy

    ZOLFAGHAR KONDORI ZOHREH, MOSAVI MIANGAH TAYEBEH, Rowshan Belgheys. Extraction of Core Medical Terms Using Frequency Approach. JOURNAL OF PERSIAN LANGUAGE TEACHING TO NON-PERSIAN SPEAKERS[Internet]. 2019;8(1 (17) ):227-244. Available from: https://sid.ir/paper/398008/en

    IEEE: Copy

    ZOHREH ZOLFAGHAR KONDORI, TAYEBEH MOSAVI MIANGAH, and Belgheys Rowshan, “Extraction of Core Medical Terms Using Frequency Approach,” JOURNAL OF PERSIAN LANGUAGE TEACHING TO NON-PERSIAN SPEAKERS, vol. 8, no. 1 (17) , pp. 227–244, 2019, [Online]. Available: https://sid.ir/paper/398008/en

    Related Journal Papers

    Related Seminar Papers

  • No record.
  • Related Plans

  • No record.
  • Recommended Workshops






    Move to top
    telegram sharing button
    whatsapp sharing button
    linkedin sharing button
    twitter sharing button
    email sharing button
    email sharing button
    email sharing button
    sharethis sharing button