Search Results/Filters    

Filters

Year

Banks



Expert Group




Full-Text


Author(s): 

HESABI AKBAR

Issue Info: 
  • Year: 

    2016
  • Volume: 

    7
  • Issue: 

    2 (17)
  • Pages: 

    101-114
Measures: 
  • Citations: 

    0
  • Views: 

    806
  • Downloads: 

    0
Abstract: 

In this research the difficulties in the mapping of FarsNet synsets with the Princeton WordNet synsets were investigated. Â Regarding the three kinds of difficulties in mapping of synsets between the WordNets including 1. Difficulties related to the meaning distinction in the source WordNet 2. Difficulties related to the principles underlying the source WordNet and the target language resources and 3. Difficulties related to the intrinsic differences between the source and target languages, the synsets and their mappings were investigated. This research tried to answer three questions: What are the difficulties in the mapping FarsNet synsets with Princeton Synsets? Which difficulties were more frequent? Was there any difference between the difficulties in the mapping of FarsNet synsets and Princeton WordNet and mapping of synsets of other WordNets? Considering the large amount of the data a sample of 1552 synsets were chosen randomly. With regard to the overlap of words between synsets, only the first member of the synset was taken into account. The cases were divided into eight types. For solving the observed difficulties some suggestions were proposed that can be used for FarsNet enrichment and in designing and developing other WordNets for special disciplines.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 806

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    2021
  • Volume: 

    7
Measures: 
  • Views: 

    132
  • Downloads: 

    0
Abstract: 

Finding similarity and semantic relatedness between words and concepts of a language is very important in natural language processing and can help improve the performance of various systems such as plagiarism detection, summarization, machine translation evaluation, transliteration detection, implication detection and intelligent conversation. Finding semantic similarity and relatedness, depending on the type of meaning representation, can be graph-based or vector-based. In graph-based methods determine the degree of semantic similarity of the two concepts based on the information in the hierarchy, and semantic relatedness is calculated using more information, such as other non-hierarchical relations and glosses or examples for each concept in the wordnet. In this paper, first we explain how the six existing measures of semantic similarity and the three measures of semantic relatedness work on a pair of Persian concepts or words. Besides using these measures, we introduce a new FarsNet-based method and measure semantic similarity and relatedness of Persian words based on all these measures. We also prepare a baseline service to calculate word similarities and test, evaluate or compare similarity measures.

Yearly Impact:   مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 132

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0
Author(s): 

Shadanpour Farzaneh

Issue Info: 
  • Year: 

    2025
  • Volume: 

    35
  • Issue: 

    4
  • Pages: 

    7-38
Measures: 
  • Citations: 

    0
  • Views: 

    14
  • Downloads: 

    0
Abstract: 

Purpose: Synonymy is one of the important features of natural languages. Since a single concept may be expressed by two or more lexical forms, and it is not predictable which lexical form of a single concept will be searched for, the retrieval system must be able to refer from all synonyms of the same idea to the document in which the concept is discussed. This research aimed to investigate the use of synonyms in non-preferred headings/ terms in Persian subject headings and Asfa Thesaurus, using Farsentas a comprehensive lexical source of the Persian language.Method: This was an applied research in terms of its goals, and used content analysis as a general methodology, specifically Natural Language Processing techniques and tools to measure the extent to which synonyms are used to build non-preferred headings/ terms in both controlled vocabulary, by measuring the similarity of the two groups of data. 3270 main subject headings and 2020 main thesaurus terms were selected, in a purposive sampling procedure, from Persian Subject Headings, and Asfa Thesaurus, as two controlled vocabulary used in the process of compiling the Iran National Bibliography. Non-preferred headings/ terms related to each main heading/ term, as well as synonyms of each, were also extracted from Farsent. Reliability was obtained by repeating the extraction of a part of the headings/ terms by a second researcher with a score of 0.618 and 0.706 between zero and 1 respectively. The similarity between the two data sets of non-preferred headings/terms with the synonyms of main headings/ terms related to them in FarsNet was measured using Cosine Similarity.Findings: In the sample taken from Persian subject headings, 2561 main subject headings (78.3%) have non-preferred headings that refer to them. 2316 main subject headings (70.8%) also have synonyms in Farsent. The similarity score between non-preferred headings and synonyms of the corresponding main headings was 0.125, thus very low. Also, in the sample taken from Asfa, 545 main terms in Asfa (about 27%) have non-preferred terms. 1376 terms (68%) of these main terms also have synonyms in FarsNet. Thus, 1475 main terms (73%) do not have non-preferred terms (which refer to the main term). The similarity score between non-preferred terms in the Asfa Thesaurus and the synonyms of the corresponding main terms was 0.131, very low as well.Conclusion: More commitment to the construction and use of subject references in the form of non-preferred headings is observable in Persian Subject Headings, but a small number of referential headings and terms (non-preferred) have been selected from among the synonyms of main subjects/terms in the Persian language. This research recommends the introduction of synonyms of terms for all users, including catalogers and those involved in the creation of controlled vocabularies, both during the search for concepts and in the creation of terms, because it can be a step towards improving subject authority databases and, ultimately, a more exhaustive user subject search and retrieval experience.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 14

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Author(s): 

Sharifi Atieh | Mahdavi Amin

Issue Info: 
  • Year: 

    2019
  • Volume: 

    15
  • Issue: 

    4 (38)
  • Pages: 

    95-109
Measures: 
  • Citations: 

    0
  • Views: 

    562
  • Downloads: 

    0
Abstract: 

Keywords are the main focal points of interest within a text, which intends to represent the principal concepts outlined in the document. Determining the keywords using traditional methods is a time consuming process and requires specialized knowledge of the subject. For the purposes of indexing the vast expanse of electronic documents, it is important to automate the keyword extraction task. Since keywords structure is coherent, we focus on the relation between words. Most of previous methods in Persian are based on statistical relation between words and didn’ t consider the sense relations. However, by existing ambiguity in the meaning, using these statistic methods couldn’ t help in determining relations between words. Our method for extracting keywords is a supervised method which by using lexical chain of words, new features are extracted for each word. Using these features beside of statistic features could be more effective in a supervised system. We have tried to map the relations amongst word senses by using lexical chains. Therefore, in the proposed model, “ FarsNet” plays a key role in constructing the lexical chains. Lexical chain is created by using Galley and McKeown's algorithm that of course, some changes have been made to the algorithm. We used java version of hazm library to determine candidate words in the text. These words were identified by using POS tagging and Noun phrase chunking. Ten features are considered for each candidate word. Four features related to frequency and position of word in the text and the rest related to lexical chain of the word. After extracting the keywords by the classifier, post-processing performs for determining Two-word key phrases that were not obtained in the previous step. The dataset used in this research was chosen from among Persian scientific papers. We only used the title and abstract of these papers. The results depicted that using semantic relations, besides statistical features, would improve the overall performance of keyword extraction for papers. Also, the Naive Bayes classifier gives the best result among the investigated classifiers, of course, eliminating some of the features of the lexical chain improved its performance.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 562

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    2014
  • Volume: 

    2
  • Issue: 

    3 (7)
  • Pages: 

    51-63
Measures: 
  • Citations: 

    0
  • Views: 

    1661
  • Downloads: 

    0
Abstract: 

Automatic text summarization systems are one type of management systems of huge information. This paper discusses one type of Persian text summarization based on a query named "an extractive text summarization" which is very useful for leaders to review information about special topics. The most important phase in this type of summarization is calculation of the similarity between the query phrase and components of the original text. For this purpose, after preprocessing the phase, converting the query to a sentence, and clarifying the word sense, it is possible to calculate the similarity between the query phrase and sentences using FarsNet. Then, those sentences that are the most similar to those in the query are selected to be used in the summary. The results of the proposed method show that this method results in quite acceptable success. Since Persian is very young in processing the original language, this paper and all alike can be a great help to its result improvement.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 1661

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    2022
  • Volume: 

    5
  • Issue: 

    2
  • Pages: 

    77-85
Measures: 
  • Citations: 

    0
  • Views: 

    59
  • Downloads: 

    31
Abstract: 

Word Sense Disambiguation (WSD) is a long standing task in Natural Language Processing (NLP) that aims to automatically identify the most relevant meaning of the words in a given context. Developing standard WSD test collections can be mentioned as an important prerequisite for developing and evaluating different WSD systems in the language of interest. Although many WSD test collections have been developed for a variety of languages, no standard All-words WSD benchmark is available for Persian. In this paper, we address this shortage for the Persian language by introducing SBU-WSD-Corpus, as the first standard test set for the Persian All-words WSD task. SBU-WSD-Corpus is manually annotated with senses from the Persian WordNet (FarsNet) sense inventory. To this end, three annotators used SAMP (a tool for sense annotation based on FarsNet lexical graph) to perform the annotation task. SBU-WSD-Corpus consists of 19 Persian documents in different domains such as Sports, Science, Arts, etc. It includes 5892 content words of Persian running text and 3371 manually sense annotated words (2073 nouns, 566 verbs, 610 adjectives, and 122 adverbs). Providing baselines for future studies on the Persian All-words WSD task, we evaluate several WSD models on SBU-WSD-Corpus.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 59

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 31 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Author(s): 

DAMI SINA | SHIRAZI HOSSEIN | ABDOLLAHZADEH BARFOROUSH AHMAD

Issue Info: 
  • Year: 

    2017
  • Volume: 

    5
  • Issue: 

    4
  • Pages: 

    11-25
Measures: 
  • Citations: 

    0
  • Views: 

    1034
  • Downloads: 

    0
Abstract: 

A novel method for future event prediction is proposed in textual environment. Proposed method is able to produce an event prediction model through generalization of cause events and then predict the effect events by using causal rules. First, the events of interest are extracted from domain-specific texts via an event representation model at semantic level, and are stored in the form of a graphical model in ontology as a posteriori (dynamic) knowledge. Then, a set of domain-specific causal rules in first-order logic (FOL) are fed into the machine as a priori (common-sense) knowledge. In addition to this common-sense knowledge, several large-scale ontologies containing DBpedia, VerbNet and WordNet are used for modeling contextual (static) knowledge and generalizing events. Finally, all types of these knowledge are integrated in a standard Web ontology Language (OWL) to perform causal inference. Empirical evaluation on real news articles showed that our method was better than the baselines.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 1034

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    2017
  • Volume: 

    9
  • Issue: 

    2
  • Pages: 

    35-44
Measures: 
  • Citations: 

    0
  • Views: 

    199
  • Downloads: 

    82
Abstract: 

This paper presents an automated supervised method for Persian wordnet construction. Using a Persian corpus and a bi-lingual dictionary, the initial links between Persian words and Princeton WordNet synsets have been generated. These links will be discriminated later as correct or incorrect by employing seven features in a trained classification system. The whole method is just a classification system which has been trained on a train set containing a pre-existing Persian wordnet, FarsNet, as a set of correct instances. A set of some sophisticated distributional and semantic features is proposed to be used in the classification system. Furthermore, a set of randomly selected links have been added to training data as incorrect instances. The links classified as correct are collected to be included in the final wordnet. State of the art results on the automatically derived Persian wordnet is achieved. The resulted wordnet with a precision of 91. 18% includes more than 16, 000 words and 22, 000 synsets.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 199

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 82 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    2023
  • Volume: 

    9
Measures: 
  • Views: 

    110
  • Downloads: 

    0
Abstract: 

In this article, we introduce solutions for solving crossword puzzles by machine using natural language processing techniques. This task is divided into two subtasks of finding possible answers for each table description and then selecting the target word and placing it in the table. The first subtask, which is dedicated to finding the word from its description, has many other uses as in text generation and paraphrasing. For this purpose, we used a combination of different methods, including searching and finding semantic similarities on the data of previously solved tables, searching in dictionary and Wikipedia articles, using a masked language model, and finding related words in FarsNet and the Farsiyar tool. The results show that the combination of these methods has a better result (82% recall) compared to their individual implementation. In the next subtask, we give the list of possible answers to a constraint-satisfaction search algorithm to choose the correct answer that can be placed in the table, taking into account the constraints of the table, and fill the empty cells in the best way and solve the crossword. The overall evaluation shows 80. 22% precision and 68. 86% recall in solving the crossword puzzle.

Yearly Impact:   مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 110

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0
Author(s): 

HESABI AKBAR

Issue Info: 
  • Year: 

    2017
  • Volume: 

    9
  • Issue: 

    24
  • Pages: 

    87-109
Measures: 
  • Citations: 

    0
  • Views: 

    655
  • Downloads: 

    0
Abstract: 

This descriptive-analytic study explores the radial categories of the polysemous words of head (sar) domain in Persian. For this purpose the senses of these polysemous words were collected from Farhange Ruze Soxan, FarsNet, different digital books, weblogs and observation of people using these words. These senses were then categorized, and using the relations among the radial sets mentioned in Lewandowska-Tomaszczyk (2007), the senses were examined in these categories. The relations including conceptual metaphor, metonymy, synecdoche, and image schemas were used for developing the radial categories shown in diagrams. In addition, in this study the following questions were addressed: is it possible to categorize and develop radial categories for the head body parts of polysemous words using the relations between the prototype and the extended senses? Which type of expressions including these words cannot be explained using these relations? Can these explanations be employed in lexicography and teaching languages? The data analysis indicated that radial categorization of the polysemous words can be formed using these relations, but the application of them for categorization of idioms was not possible. It seems that although idioms can be categorized using the meaning of the whole idiom, the relations mentioned for radial categories were not useful for this purpose. The results indicated that the proposed radial categories can be used in teaching vocabularies and modern lexicography.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 655

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
litScript
telegram sharing button
whatsapp sharing button
linkedin sharing button
twitter sharing button
email sharing button
email sharing button
email sharing button
sharethis sharing button