In this paper three algorithms of Cyclic-Reduction, Parallel-Cyclic-Reduction and Parallel-Thomas are introduced to solve the Tridiagonal system of equations using GPUs and the effect of coalesced-memory-access and uncoalesced-memory-access to global memory are studied. To assess the ability of these algorithms, as a case-study the simulation of the lid-driven cavity flow have been compared to the results of Runtimes and physical parameters of the classical Thomas algorithm, executed on CPU. The maximum speed-up of these algorithms against CPU runtime is about 4.4x, 5.2x and 38.5x, respectively. Also, approximately a 2x speed-up achieved in coalesced-memory access on GPU.