مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

video

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

sound

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Persian Version

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View:

266
مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Download:

0
مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Cites:

Information Journal Paper

Title

BUILDING AN EFFICIENT INDEXING FOR CRAWLING THE WEBSITE WITH AN EFFICIENT SPIDER

Pages

  1-21

Abstract

 With the present effort, we propose to investigate results of applying the Right-Truncated Index-Based WEB SEARCH ENGINE in order to determine its usefulness for storing and retrieving Arabic documents. The Right-Truncated Index-Based WEB SEARCH ENGINE, being a program for reading any set of Arabic documents accepts a query, and then processes both the documents and the query. Thus, it selects (predicts) those documents most relevant to the query which has been inserted. The program encompasses both a morphological component and a mathematical one. The morphological component allows the researcher to run either a stemming algorithm or a right-truncated algorithm. The chief advantage of the stemming algorithm is that it uses the least possible amount of storage for indexing by mapping the inflected and derived terms into a single, indexed-stem word. On the other hand, the right-truncated algorithm reduces the amount of storage to a lesser degree, but increases the probability of retrieving relevant (user-favorable) documents, compared to the stemming algorithm. One of the purposes of our investigation is to compare the efficiency of these two indexing mechanisms. The mathematical component of the algorithm accepts the output of the right TRUNCATION algorithm, and then employs both term-frequency and inverse document-frequency (TF-IDF) in order to establish the relative importance of each document, respective to the terms of the query. This paper also describes building a simple search engine based on a crawler or a spider. The clawer which indexes different types of documents is an algorithm to crawl the file systems from specified folder. A basic design and object model was developed to support single search word results as well as multiple search words results. It is capable of finding data to index by following (tracing) web links rather than searching directory listings in the file system. In this process files are downloaded through HTTP and HTML pages parsed in order to obtain more links without getting into a recursive loop. Also, this paper discusses how to improve indexing mechanism efficiency using a right truncated stemmer in terms of Arabic documents processing.

Cites

  • No record.
  • References

  • No record.
  • Cite

    APA: Copy

    AL GAPHARI, G.. (2008). BUILDING AN EFFICIENT INDEXING FOR CRAWLING THE WEBSITE WITH AN EFFICIENT SPIDER. INTERNATIONAL JOURNAL OF INFORMATION SCIENCE AND MANAGEMENT, 6(2), 1-21. SID. https://sid.ir/paper/544846/en

    Vancouver: Copy

    AL GAPHARI G.. BUILDING AN EFFICIENT INDEXING FOR CRAWLING THE WEBSITE WITH AN EFFICIENT SPIDER. INTERNATIONAL JOURNAL OF INFORMATION SCIENCE AND MANAGEMENT[Internet]. 2008;6(2):1-21. Available from: https://sid.ir/paper/544846/en

    IEEE: Copy

    G. AL GAPHARI, “BUILDING AN EFFICIENT INDEXING FOR CRAWLING THE WEBSITE WITH AN EFFICIENT SPIDER,” INTERNATIONAL JOURNAL OF INFORMATION SCIENCE AND MANAGEMENT, vol. 6, no. 2, pp. 1–21, 2008, [Online]. Available: https://sid.ir/paper/544846/en

    Related Journal Papers

  • No record.
  • Related Seminar Papers

  • No record.
  • Related Plans

  • No record.
  • Recommended Workshops






    مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources
    مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources
    مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources
    مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources
    مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources
    مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources
    مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources
    File Not Exists.
    Move to top
    telegram sharing button
    whatsapp sharing button
    linkedin sharing button
    twitter sharing button
    email sharing button
    email sharing button
    email sharing button
    sharethis sharing button