Automatic capture and analysis of human motion, based on images or video is important issue in computer vision due to the vast number of applications in animation, surveillance, biomechanics, Human Computer Interaction, entertainment and game industry. In these applications, it is clear that 3D human pose estimation is an essential part. Therefore, its accuracy has a great effect on the performance of these applications. Because of the variation in appearance and articulations of human, self-occlusion and high dimensional state-space of human pose, 3D human pose estimation from image observations is a challenging problem. In this paper, a new method of 3D human pose estimation from multi-view video sequence is introduced. In the proposed method, instead of seeking directly over the high dimensional states-space of human pose and employing the complex inferring algorithms, a hierarchical search method with distinct objective function for each part of the body and direct optimization methods is employed. Advantages of the proposed method are: automatic initialization, labeling of parts of the body contour and using separate objective function for different parts of the body. Experimental results demonstrate that the proposed method can be effectively used as a marker-less system to estimate 3D human pose in a multi-view sequence.