The two-Dimensional Discrete Wavelet Transform (2D-DWT) is widely used in various applications for multimedia data processing, including image and video compression standards. However, this transform is computational intensive than conventional conversions, such as the discrete cosine transform. In this paper, in order to improve the performance of 2D-DWT, we use Single Instruction, Multiple Data (SIMD) set instructions including Advanced Vector Extensions (AVX), Fused Multiply-Add (FMA), and AVX2 supported by most General-Purpose Processors (GPP). These technologies capable to process 256-bit data located in SIMD registers. The AVX technology can process eight 32-bit floating point numbers, while AVX2 processes sixteen 16-bit fixed-point numbers. In other words, it is possible to exploit 8-and 16-way data-level parallelism. In addition, two different way of parallelism, Row Column Wavelet Transform (RCWT) which processes rows and columns separately and Line-Based Wavelet Transform (LBWT) that processes both rows and columns in a single loop are used. Experimental results of different wavelet transform with different image sizes on a GPP show that the speedups of up to 28. 8x yield. Furthermore, LBWT approach improves performance more than RCWT. This is because it uses memory hierarchy structure more efficiently than RCWT approach.