Search Results/Filters    

Filters

Year

Banks



Expert Group





Full-Text


Issue Info: 
  • Year: 

    2023
  • Volume: 

    12
  • Issue: 

    1
  • Pages: 

    48-59
Measures: 
  • Citations: 

    0
  • Views: 

    21
  • Downloads: 

    0
Abstract: 

The lack of explicit support for inter-block synchronization in the CUDA programming model has weakened performance in some applications. Therefore, in such applications, inter-block synchronization must be implemented in software. Lock-based and lock-free methods have been implemented for this problem. In lock-based synchronization, the execution time increases significantly with the increase in the number of blocks, and in the lock-free methods, there is a limit to the number of blocks. In this paper, two inter-block synchronization methods are proposed. The first method is lock-based, which reduces the impact of increasing the number of blocks on the execution time by grouping the blocks. The second proposed method is lock-free synchronization, which removes the limitation of the number of blocks in synchronization by creating a tree hierarchy of blocks. These methods were used for inter-block synchronization in Smith-Waterman and Bitonic algorithms. Experimental results show that the proposed lock-based method improves the execution time of the synchronization and recorded a speedup of 1. 84 in the Smith-Waterman algorithm and 2. 24 in the Bitonic sorting algorithm. Also, the results show that in the proposed lock-free method, any number of blocks can be synchronized by correctly choosing the number of levels of the tree hierarchy, and therefore the limitation of the number of blocks has been removed.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 21

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    1395
  • Volume: 

    3
Measures: 
  • Views: 

    291
  • Downloads: 

    0
Abstract: 

لطفا برای مشاهده چکیده به متن کامل (PDF) مراجعه فرمایید.

Yearly Impact:   مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 291

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0
Issue Info: 
  • Year: 

    2016
  • Volume: 

    6
Measures: 
  • Views: 

    156
  • Downloads: 

    117
Abstract: 

IMAGE WATERMARKING IN DCT DOMAIN HAS A HIGH COMPUTATIONAL COMPLEXITY ESPECIALLY FOR COLOR AND HIGH RESOLUTION IMAGES, WHERE USAGE OF THEM HAS BEEN SIGNIFICANTLY GROWN. TO ADDRESS THIS ISSUE, IN THIS ARTICLE, A DATA-PARALLEL COLOR DCT WATERMARKING APPROACH IS PROPOSED AND IMPLEMENTED ON GPU USING CUDA. ALSO, IN THIS WORK, BEFORE EMBEDDING, THE COLOR WATERMARK IS COMPRESSED USING A MODIFIED METHOD TO GET LESS DISTORTION. CUDA IMPLEMENTATION OF 8×8 DCT OFFERS 12X-43X SPEEDUP WITH GT 540M AND 94X-105X SPEEDUP WITH GTX 580, FOR DIFFERENT IMAGE SIZES. IN CASE OF EMBEDDING PROCEDURE, THE SPEEDUP OBTAINED BY GT 540M IS BETWEEN 7X AND 26X, AND THE SPEEDUP OBTAINED BY GTX 580 IS BETWEEN 46X AND 73X, FOR VARIOUS CASE STUDIES. FURTHERMORE, IN CASE OF EXTRACTING PROCEDURE, GT 540M LEADS TO A SPEEDUP BETWEEN 10X AND 29X, AND GTX 580 LEADS TO A SPEEDUP BETWEEN 75X AND 80X, FOR VARIOUS CASE STUDIES.

Yearly Impact:   مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 156

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 117
Issue Info: 
  • Year: 

    2017
  • Volume: 

    3
  • Issue: 

    2
  • Pages: 

    81-88
Measures: 
  • Citations: 

    0
  • Views: 

    220
  • Downloads: 

    196
Abstract: 

There are several different methods to make an efficient strategy for steganalysis of digital images. A very powerful method in this area is rich model consisting of a large number of diverse sub-models in both spatial and transform domain that should be utilized. However, the extraction of a various types of features from an image is so time consuming in some steps, especially for training phase with a large number of high resolution images that consist of two steps: train and test.Multithread programming is a near solution to decreasing the required time but it’s limited and it ‘snot so scalable too. In this paper, we present a CUDA based approach for data-parallelization and optimization of sub-model extraction process. Also, construction of the rich model is analyzed in detailed, presenting more efficient solution. Further, some optimization techniques are employed to reduce the total number of GPU memory accesses. Compared to single-thread and multi-threaded CPU processing, 10x-12x and 3x-4x speedups are achieved with implementing our CUDA-based parallel program on GT 540M and it can be scaled with several CUDA cards to achieve better speedups…..

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 220

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 196 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    2016
  • Volume: 

    2
  • Issue: 

    2
  • Pages: 

    185-202
Measures: 
  • Citations: 

    0
  • Views: 

    204
  • Downloads: 

    80
Abstract: 

Among different discretization approaches, Finite Difference Method (FDM) is widely used for acoustic and elastic full-wave form modeling. An inevitable deficit of the technique, however, is its sever requirement to computational resources. A promising solution is parallelization, where the problem is broken into several segments, and the calculations are distributed over different processors. For the present FD routines, however, such parallelization technique inevitably needs domain-decomposition and inter-core data exchange, due to the coupling of the governing equations. In this study, a new FD-based procedure for seismic wave modeling, named as ‘ Modal Finite Difference Method (MFDM)” is introduced, which deals with the simulation in the decoupled modal space; thus, neither domaindecomposition nor inter-core data exchange is anymore required, which greatly simplifies parallelization for both MPI-and CUDA implementations over CPUs and GPUs. With MFDM, it is also possible to simply cut off less-significant modes and run the routine for just the important ones, which will effectively reduce computation and storage costs. The efficiency of the proposed MFDM is shown by some numerical examples.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 204

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 80 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Author(s): 

Mahmoodi Darian Hossein

Issue Info: 
  • Year: 

    2019
  • Volume: 

    16
  • Issue: 

    55
  • Pages: 

    113-131
Measures: 
  • Citations: 

    0
  • Views: 

    354
  • Downloads: 

    0
Abstract: 

In this paper an efficient method for defining multi-variable functions using expression templates for array computations in computational fluid dynamics simulations in C++ is introduced. The method is implemented using variadic templates which is a new feature in C++. One of the advantages of the method is its easy of use for users of computational fields. The user can define and use his own function with any number of input arguments without having knowledge of templates programming concepts. The present method may replace conventional expression templates in developing numerical libraries. For three different functions, including arithmetic operations and trigonometric functions, the efficiency of the proposed method for arrays of different sizes is compared with that of the conventional expression templates, two different C++ syntax and Fortran language. Furthermore, the performance of the method in terms of the compilation time and executable file size is demonstrated. A similar comparison on Graphic Processing Units (GPU) using CUDA is made and the efficiency of the method is shown. The results indicate that, for any array size, the present method has a very good performance in terms of computational time, compilation time and executable file size. Finally, as an application of the proposed method, a numerical simulation is done.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 354

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    2008
  • Volume: 

    34
  • Issue: 

    3
  • Pages: 

    67-73
Measures: 
  • Citations: 

    0
  • Views: 

    1985
  • Downloads: 

    0
Abstract: 

We present an algorithm for NVIDIA CUDA platform based on Smith-Waterman algorithm for sequence alignment problem. CUDA is a new programming language which is very similar to the standard C language, with some extensions. By using the Smith-Waterman algorithm which is used to find similarity between two sequences by making a scoring matrix; this application tries to find the similarity between a sequence which called query sequence and sequences in a database file. In this program, each thread is used for calculating one scoring matrix between the query sequence and one of the sequences in database. The algorithm utilizes small but fast shared memory which is inside the GPU for holding four intermediate columns in each cycle for each thread. If the database is large enough, it can keep the GPU in full working order and results in better speedup in comparison to CPU. To demonstrate the performance, we used a high-end CPU, Intel Core 2 Due E6600 and a mid-end GPU, GeForce 8600GT and saw the speedup in about twenty times more than CPU.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 1985

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    2015
  • Volume: 

    16
Measures: 
  • Views: 

    139
  • Downloads: 

    97
Abstract: 

A DOUBLE-GPU CODE IS DEVELOPED TO SIMULATE COMPRESSIBLE VISCOUS EQUATIONS. THE CODE WRITTEN IN CUDA PROGRAMMING LANGUAGE IS DEVELOPED BY MODIFYING A SINGLE-GPU CODE. THE OPENMP LIBRARY IS USED FOR PARALLEL EXECUTION OF THE CODE ON BOTH THE GPUS. DATA TRANSFER BETWEEN GPUS WHICH IS THE MAIN ISSUE IN DEVELOPING THE CODE, IS CARRIED OUT BY DEFINING HALO POINTS FOR NUMERICAL GRIDS AND ALSO BY USING CUDA BUILT-IN FUNCTIONS.

Yearly Impact:   مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 139

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 97
Author(s): 

Mahmoodi Darian Hossein

Issue Info: 
  • Year: 

    2017
  • Volume: 

    48
  • Issue: 

    2
  • Pages: 

    161-170
Measures: 
  • Citations: 

    0
  • Views: 

    295
  • Downloads: 

    81
Abstract: 

A double-GPU code is developed to accelerate WENO schemes. The test problem is a compressible viscous flow. The convective terms are discretized using third-to ninth-order WENO schemes and the viscous terms are discretized by the standard fourth-order central scheme. The code written in CUDA programming language is developed by modifying a single-GPU code. The OpenMP library is used for parallel execution of the code on both the GPUs. Data transfer between GPUs which is the main issue in developing the code, is carried out by defining halo points for numerical grids and by using a CUDA built-in function. The code is executed on a PC equipped with two heterogeneous GPUs. The computational times of different schemes are obtained and the speedups with respect to the single-GPU code are reported for different number of grid points. Furthermore, the developed code is analyzed by CUDA profiling tools. The analyze helps to further increase the code performance.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 295

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 81 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    2022
  • Volume: 

    10
  • Issue: 

    1
  • Pages: 

    75-88
Measures: 
  • Citations: 

    0
  • Views: 

    125
  • Downloads: 

    86
Abstract: 

Background and Objectives: Louvain is a time-consuming community detection algorithm especially in large-scale networks. Using Graphic Processing Unit (GPU) in order to calculate modularity sigma, which is a major processing section in Louvain algorithm, can reduce algorithm execution time and make it practical for large-scale networks. Methods: The proposed algorithm Dynamic CUDA Louvain Method (DCLM) blocks hardware threads dynamically on cores inside GPU. By considering the properties of GPU, this algorithm allocates the maximal number of processing cores to each Stream Multi-Processor (SM) as number of threads in a block. If the number of nodes in the graph is smaller than all physical cores on GPU, number of threads per block Is equal to the ratio number of graph nodes over the number of SMs. Results: The implementation results demonstrated that the proposed algorithm is able to decrease the run time by 15% in comparison with the best past method in the large-scale graph. Conclusion: We have introduced DCLM algorithm based on GPU that accelerates Louvain community detection algorithm. Dynamic allocation of threads to each block has a significant effect on the reduction of algorithm execution time. However, incrementing the number of threads per block alone does not result to acceleration the speed of calculations.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 125

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 86 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
litScript
telegram sharing button
whatsapp sharing button
linkedin sharing button
twitter sharing button
email sharing button
email sharing button
email sharing button
sharethis sharing button