مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Persian Verion

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

video

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

sound

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Persian Version

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View:

466
مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Download:

0
مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Cites:

Information Journal Paper

Title

A Structure-Based Method for Building a Database of Extracted Figures from Scientific Documents: A Case Study of Iran Scientific Information Database (GANJ)

Pages

  729-754

Abstract

 in scientific documents are rich sources of information. The first step in retrieving information from such figures is to build a valid figure database. To this end, we developed a system for generating figure database from scholarly Persian documents, in large scale. The first step is to parse files and extract figures and their corresponding descriptions. There are two general approaches for extracting figures from documents. One is based on Image Processing methods and another is based on processing the file primitives. The focus of this paper is on latter one. This approach is shown to be a better choice for the search engines because of its speed and scalability properties. We propose a structure based method that extracts the figures and their descriptions by analyzing the file layout. This information is saved in a database with a specific structure and is indexed for retrieval in the search engine. The proposed algorithm was implemented in Python programming language. As a benchmark we used the basic method in the literature which is based on the processing PDF file. We employed the proposed method in a case study on Iran scientific information database (Ganj). In this regard, 150 scientific documents were randomly chosen from Ganj database and analyzed using two mentioned methods. Based on our experimental results, the proposed method is more efficient than the basic method especially for Persian documents. There are many unanswered challenges for Persian documents when using the basic method. The number of noise images resulted from the basic method is high and Persian text extracted is not well organized. Our proposed method overcomes some of these drawbacks and is recommended for generating figure database from scientific Persian documents. The proposed method is able to correctly extract about 40% of the images with their corresponding descriptions which is 10% better than the basic method.

Cites

  • No record.
  • References

  • No record.
  • Cite

    APA: Copy

    Fakhrzadeh, Azadeh, & Seddighi, Amir Hossein. (2020). A Structure-Based Method for Building a Database of Extracted Figures from Scientific Documents: A Case Study of Iran Scientific Information Database (GANJ). IRANIAN JOURNAL OF INFORMATION PROCESSING & MANAGEMENT (INFORMATION SCIENCES AND TECHNOLOGY), 35(3 ), 729-754. SID. https://sid.ir/paper/131117/en

    Vancouver: Copy

    Fakhrzadeh Azadeh, Seddighi Amir Hossein. A Structure-Based Method for Building a Database of Extracted Figures from Scientific Documents: A Case Study of Iran Scientific Information Database (GANJ). IRANIAN JOURNAL OF INFORMATION PROCESSING & MANAGEMENT (INFORMATION SCIENCES AND TECHNOLOGY)[Internet]. 2020;35(3 ):729-754. Available from: https://sid.ir/paper/131117/en

    IEEE: Copy

    Azadeh Fakhrzadeh, and Amir Hossein Seddighi, “A Structure-Based Method for Building a Database of Extracted Figures from Scientific Documents: A Case Study of Iran Scientific Information Database (GANJ),” IRANIAN JOURNAL OF INFORMATION PROCESSING & MANAGEMENT (INFORMATION SCIENCES AND TECHNOLOGY), vol. 35, no. 3 , pp. 729–754, 2020, [Online]. Available: https://sid.ir/paper/131117/en

    Related Journal Papers

    Related Seminar Papers

  • No record.
  • Related Plans

  • No record.
  • Recommended Workshops






    Move to top
    telegram sharing button
    whatsapp sharing button
    linkedin sharing button
    twitter sharing button
    email sharing button
    email sharing button
    email sharing button
    sharethis sharing button