One of the challenges of isolated Persian digit recognition is similar pronunciation of some digits such as "zero and three", "nine and two" and "five, seven and eight". This challenge leads to the high substitution errors and reduces the recognition accuracy. In this paper, a combined solution based on short-term memory (LSTM) and hidden Markov model (HMM) is proposed to solve the mentioned challenge. The proposed approach increases the recognition rate of Persian digits on average 2 percent and in the best case 8 percent in comparison to the HMM-based approach. In the following of this work, due to the intensification of the mentioned challenge in noisy conditions, the robust recognition of Persian digits with similar pronunciation was considered. In order to increase the robustness of the LSTM-based recognizer, robust features extracted from the speech spectrum such as spectral entropy, burst degree, bisector frequency, spectral flatness, first formant and autocorrelation-based zero crossing rate were used. Using these features, while reducing the number of features for recognizing similar Persian digits from 39 coefficients to a maximum of 4 and a minimum of 1 coefficient, on average improved the robustness of the isolated digit recognizer in different noisy conditions (30 different situations resulting from five noise types of white, pink, babble, factory and car noises and six signal-to-noise ratios of-5, 0, 5, 10, 15 and 20 decibels) by 10%, 13%, 15% and 13% compared to the HMM-based, LSTM-based, deep belief network-based recognizers with Mel-Cepstrum coefficients and a convolutional neural network-recognizer with Mel Spectrogram features.