Search Results/Filters    

Filters

Year

Banks



Expert Group






Full-Text


Issue Info: 
  • Year: 

    2019
  • Volume: 

    10
  • Issue: 

    2
  • Pages: 

    87-96
Measures: 
  • Citations: 

    0
  • Views: 

    141
  • Downloads: 

    81
Abstract: 

Word embeddings (WE) have received much attention recently as word to numeric vectors architec-ture for all text processing approaches and has been a great asset for a large variety of NLP tasks. Most of text processing task tried to convert text components like sentences to numeric matrix to apply their processing algorithms. But the most important problems in all word vector-based text processing approaches are di erent sentences size and as a result, di erent dimension of sentences matrices. In this paper, we suggest an e cient but simple statistical method to convert text sen-tences into equal dimension and normalized matrices Proposed method aims to combines three most e cient methods (averaging based, most likely n-grams, and words mover distance) to use their advantages and reduce their constraints. The unique size resulting matrix does not depend on lan-guage, Subject and scope of the text and words semantic concepts. Our results demonstrate that normalized matrices capture complementary aspects of most text processing tasks such as coherence evaluation, text summarization, text classi cation, automatic essay scoring, and question answering.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 141

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 81 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Author(s): 

Hajipoor O. | SADIDPOUR S.S.

Issue Info: 
  • Year: 

    2020
  • Volume: 

    8
  • Issue: 

    2 (30)
  • Pages: 

    105-114
Measures: 
  • Citations: 

    0
  • Views: 

    1073
  • Downloads: 

    0
Abstract: 

With the growing number of Persian electronic documents and texts, the use of quick and inexpensive methods to access desired texts from the extensive collection of these documents becomes more important. One of the effective techniques to achieve this goal is the extraction of the keywords which represent the main concept of the text. For this purpose, the frequency of a word in the text can not be a proper indication of its significance and its crucial role. Also, most of the keyword extraction methods ignore the concept and semantic of the text. On the other hand, the unstructured nature of new texts in news and electronic documents makes it difficult to extract these words. In this paper, an automated, unsupervised method for keywords extraction in the Persian language that does not have a proper structure is proposed. This method not only takes into account the probability of occurrence of a word and its frequency in the text, but it also understands the concept and semantic of the text by learning Word2vec model on the text. In the proposed method, which is a combination of statistical and machine learning methods, after learning Word2vec on the text, the words that have the smallest distance with other words are extracted. Then, a statistical equation is proposed to calculate the score of each extracted word using co-occurence and frequency. Finally, words which have the highest scores are selected as the keywords. The evaluations indicate that the efficiency of the method by the F-measure is 53. 92% which is 11% superior to other methods.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 1073

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    2021
  • Volume: 

    18
  • Issue: 

    1 (47)
  • Pages: 

    51-60
Measures: 
  • Citations: 

    0
  • Views: 

    245
  • Downloads: 

    0
Abstract: 

Keywords can present the main concepts of the text without human intervention according to the model. Keywords are important vocabulary words that describe the text and play a very important role in accurate and fast understanding of the content. The purpose of extracting keywords is to identify the subject of the text and the main content of the text in the shortest time. Keyword extraction plays an important role in the fields of text summarization, document labeling, information retrieval, and subject extraction from text. For example, summarizing the contents of large texts into smaller texts is difficult, but having keywords in the text can make you aware of the topics in the text. Identifying keywords from the text with common methods is time-consuming and costly. Keyword extraction methods can be classified into two types with observer and without observer. In general, the process of extracting keywords can be explained in such a way that first the text is converted into smaller units called the word, then the redundant words are removed and the remaining words are weighted, then the keywords are selected from these words. Our proposed method in this paper for identifying keywords is a method with observer. In this paper, we first calculate the word correlation matrix per document using a feed forward neural network and Word2vec algorithm. Then, using the correlation matrix and a limited initial list of keywords, we extract the closest words in terms of similarity in the form of the list of nearest neighbors. Next we sort the last list in descending format, and select different percentages of words from the beginning of the list, and repeat the process of learning the neural network 10 times for each percentage and creating a correlation matrix and extracting the list of closest neighbors. Finally, we calculate the average accuracy, recall, and F-measure. We continue to do this until we get the best results in the evaluation, the results show that for the largest selection of 40% of the words from the beginning of the list of closest neighbors, the acceptable results are obtained. The algorithm has been tested on corpus with 800 news items that have been manually extracted by keywords, and laboratory results show that the accuracy of the suggested method will be 78%.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 245

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Author(s): 

Issue Info: 
  • Year: 

    2021
  • Volume: 

    53
  • Issue: 

    5
  • Pages: 

    2214-2225
Measures: 
  • Citations: 

    1
  • Views: 

    18
  • Downloads: 

    0
Keywords: 
Abstract: 

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 18

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 1 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    2022
  • Volume: 

    19
  • Issue: 

    1
  • Pages: 

    115-124
Measures: 
  • Citations: 

    0
  • Views: 

    66
  • Downloads: 

    8
Abstract: 

For data mining studies, due to the complexity of doing feature selection process in tasks by hand, we need to send some of labeling to the workers with crowdsourcing activities. The process of outsourcing data mining tasks to users is often handled by software systems without enough knowledge of the age or geography of the users' residence. We use convolutional neural network, for doing classification in six classes: USAGE, TOPIC, COMPARE, MODEL-FEATURE, RESULT and PART-WHOLE. This article extracts the data from the abstract of 450 scientific articles and it is a total of 835 relations. One hundred of these abstracts have been selected by the crowdsourcing. Classification results in this article have been done with a slight improvement in accuracy. In this study, we computed the classification results on a combination of vocabulary vectors with using of 450 abstract relation data (100 crowd source datasets with 350 standards). The results of the implementation of the classification algorithm give us performance improvement. This paper uses the population power to perform preparing data mining works. The proposed method by adding crowdsource data to the previous data was able to obtain better results rather than the top 5 methods.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 66

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 8 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Writer: 

Hosseini Moghadam Emami Zahra Sadat | Tabatabayi seifi Shohreh | IZADI MOHAMMAD | Tavakoli Mohammad

Issue Info: 
  • Year: 

    2021
  • Volume: 

    7
Measures: 
  • Views: 

    294
  • Downloads: 

    0
Abstract: 

Text processing, as one of the main issues in the field of artificial intelligence, has received a lot of attention in recent decades. Numerous methods and algorithms are proposed to address the task of semantic textual similarity which is one of the sub-branches of text processing. Due to the special features of the Persian language and its non-standard writing system, finding semantic similarity is an even more challenging task in Persian. On the other hand, producing a proper corpus that can be used for training a model for finding semantic similarities, is of great importance. In this study, the main purpose is to propose a method for measuring the semantic similarity between short Persian texts. To do so, first, we try to build an appropriate corpus, and then propose an efficient approach based on neural networks. The proposed method involves three steps. The first step is data collection and building a parallel corpus. In the next step, namely the pre-processing step, the data is normalized. Finally, Semantic similarity recognition is done by the neural network using vector representations of the words. The suggested model is built upon the produced corpus made of movie and tv show subtitles containing 35266 sentence pairs. The F-measure of the proposed approach on PAN2016 is 75. 98% with 4 tags and 98. 87% with 2 tags. We also achieved an F-measure of 98. 86% for our model tested on the parallel corpus with 2 tags.

Yearly Impact:   مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 294

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0
Issue Info: 
  • Year: 

    2022
  • Volume: 

    11
  • Issue: 

    3
  • Pages: 

    25-33
Measures: 
  • Citations: 

    0
  • Views: 

    48
  • Downloads: 

    233
Abstract: 

Heusler alloys are intermetallic that offer a unique and broad array of properties. These properties are both scientifically intriguing and valuable for a variety of beneficial practical applications. One of these applications is magnetic cooling, taking advantage of the giant magnetocaloric effect (GMCE) in some Heusler alloys. Since the late 1990s, numerous scientific papers were published, attempting to harness Heusler alloys for green refrigeration. Manufacturing the alloys by additive manufacturing further offers control and enables tuning of their properties by controlling their microstructure. Although the scientific literature contains extensive information on these alloys’,chemistry and performance, it is the massive volume of scientific papers that makes it difficult, if not impossible, to keep up to date with relevant discoveries. To enable predicting the composition of excellent performing giant magnetocaloric Heusler alloys, manufactured by laser powder bed fusion (LPBF), we employed artificial intelligence, specifically unsupervised learning in the current work. We trained an unsupervised learning model using word embedding and the Word2vec algorithm on different data sets in the literature to extract hidden knowledge, relations, and interactions based on words that appear in similar contexts in the text while often having similar meanings. Properties inherent to giant magnetocaloric materials were addressed in the model. The outcome was the prediction of Heusler alloys, manufactured by LPBF, with an excellent giant magnetocaloric effect.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 48

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 233 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    2019
  • Volume: 

    49
  • Issue: 

    3 (89)
  • Pages: 

    1345-1357
Measures: 
  • Citations: 

    0
  • Views: 

    527
  • Downloads: 

    0
Abstract: 

Content based image retrieval (CBIR) applies machine vision techniques to extract similar images for a given query image. The main challenge of CBIR is the semantic gap between low level pixel and segment based features and high-level concepts in the image. An approach towards reducing this gaps is to use high level region and object based features. However, the low-level features describe image details and enforce between image discriminations. Accordingly, it is expected that the use of both feature types will lead to better results. This paper tries to reduce the mentioned gap by combining decision results at four granularities, namely pixel, region, object, and concept levels. Pixel level retrieval adopts SIFT features and local binary patterns. Region level subsystem partitions the image into a set of segments and extracts their color and texture features using hue descriptor and Gabor filters for subsequent processing. AlexNet convolutional neural network is employed for object based retrieval. Word2vec embedding is used for concept level retrieval that exploits conceptual relations between objects to enhance the retrieval results. Experiments over Wang and GHIM datasets confirm the feasibility of the proposed combination and conclude that it improves overall performance of the retrieval system.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 527

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    2017
  • Volume: 

    13
Measures: 
  • Views: 

    179
  • Downloads: 

    107
Abstract: 

SOCIAL MEDIA WEBSITES CAPTURED WEB SPACE. THE MEMBERS OF THESE MEDIA'S INCREASING DAILY. WITH THE DATA SHARED BY PEOPLE, RESEARCHERS TRY TO USE THEM IN A PROPER WAY TO HELP RECOMMENDER SYSTEMS. ONE OF THE HOT RESEARCH AREAS IS USER INTEREST DETECTION. INTELLIGENT WEB SYSTEMS TRY TO EXTRACT USER PRIMITIVE INTEREST FROM CONTENTS WHICH ARE SHARED BY USERS. WHILE MOST OF THE WORKS CONCENTRATE ON EXTRACTING USER INITIAL INTEREST, LESS ATTEMPT DEDICATED TO UNDERSTANDING LATENT ONES. IN THIS PAPER, WE DEMONSTRATE HOW WORD EMBEDDING METHODS COULD HELP US TO ENRICH USER INTERESTS PROFILE. WE GENERATING STATE-OF-ART USER INTEREST MODELING WHICH DEPLOYS Word2vec METHOD FOR ENRICHING USER INITIAL INTERESTS THAT EXTRACTED FROM USER'S TWITTER ACCOUNT. OUR EXPERIMENTAL RESULTS DEMONSTRATE THAT USING SEMANTIC SIMILARITY MEASURES, ESPECIALLY WHEN USING WORD EMBEDDING METHODS, OUTPERFORM TRADITIONAL METHODS. EMPIRICAL RESULTS SHOW THAT ENRICHING USER INTEREST PROFILE LEADS TO BETTER PERSONALIZED CONTENT BASED RECOMMENDATION.

Yearly Impact:   مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 179

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 107
Author(s): 

Chavosh Narjes | Emadi Sima

Issue Info: 
  • Year: 

    2023
  • Volume: 

    1
  • Issue: 

    1
  • Pages: 

    108-115
Measures: 
  • Citations: 

    0
  • Views: 

    40
  • Downloads: 

    0
Abstract: 

Today, due to the large volume of opinions published by people in cyberspace, sentiment analysis plays a key role in extracting information. One of the new techniques based on studies has been done to determine the exact polarity of the sentence in sentiment analysis is deep learning algorithms. In this research, two deep learning algorithms, namely RNN and LSTM, has been used to determine sentence polarity in order to achieve more accurate results. Moreover, in the proposed technique, pre-trained word embedding algorithm, namely Wordtovec, was used to determine the semantic relationships between words to increase the accuracy of the proposed method. The proposed method was evaluated on two data sets; airline-tweet and IMDB. The evaluation results show that on the airline-tweet dataset, the proposed method has an accuracy of 0.78 and accuracy of 0.84 on the IMDB data set.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 40

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
litScript
telegram sharing button
whatsapp sharing button
linkedin sharing button
twitter sharing button
email sharing button
email sharing button
email sharing button
sharethis sharing button