مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Persian Verion

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

video

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

sound

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Persian Version

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View:

978
مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Download:

0
مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Cites:

Information Journal Paper

Title

Automatic Keyword Extraction from Persian short Text Using word2vec

Pages

  105-114

Abstract

 With the growing number of Persian electronic documents and texts, the use of quick and inexpensive methods to access desired texts from the extensive collection of these documents becomes more important. One of the effective techniques to achieve this goal is the extraction of the keywords which represent the main concept of the text. For this purpose, the frequency of a word in the text can not be a proper indication of its significance and its crucial role. Also, most of the Keyword Extraction methods ignore the concept and semantic of the text. On the other hand, the unstructured nature of new texts in news and electronic documents makes it difficult to extract these words. In this paper, an automated, unsupervised method for keywords extraction in the Persian Language that does not have a proper structure is proposed. This method not only takes into account the probability of occurrence of a word and its frequency in the text, but it also understands the concept and semantic of the text by learning Word2vec model on the text. In the proposed method, which is a combination of statistical and machine learning methods, after learning Word2vec on the text, the words that have the smallest distance with other words are extracted. Then, a statistical equation is proposed to calculate the score of each extracted word using co-occurence and frequency. Finally, words which have the highest scores are selected as the keywords. The evaluations indicate that the efficiency of the method by the F-measure is 53. 92% which is 11% superior to other methods.

Cites

  • No record.
  • References

  • No record.
  • Cite

    APA: Copy

    Hajipoor, O., & SADIDPOUR, S.S.. (2020). Automatic Keyword Extraction from Persian short Text Using word2vec. JOURNAL OF ELECTRONIC AND CYBER DEFENCE, 8(2 (30) ), 105-114. SID. https://sid.ir/paper/387111/en

    Vancouver: Copy

    Hajipoor O., SADIDPOUR S.S.. Automatic Keyword Extraction from Persian short Text Using word2vec. JOURNAL OF ELECTRONIC AND CYBER DEFENCE[Internet]. 2020;8(2 (30) ):105-114. Available from: https://sid.ir/paper/387111/en

    IEEE: Copy

    O. Hajipoor, and S.S. SADIDPOUR, “Automatic Keyword Extraction from Persian short Text Using word2vec,” JOURNAL OF ELECTRONIC AND CYBER DEFENCE, vol. 8, no. 2 (30) , pp. 105–114, 2020, [Online]. Available: https://sid.ir/paper/387111/en

    Related Journal Papers

  • No record.
  • Related Seminar Papers

  • No record.
  • Related Plans

  • No record.
  • Recommended Workshops






    Move to top
    telegram sharing button
    whatsapp sharing button
    linkedin sharing button
    twitter sharing button
    email sharing button
    email sharing button
    email sharing button
    sharethis sharing button