Introduction In developed societies, residential customers use high-level appliances. The progress in the smart grids and the internet of things have eased the way for home energy management to schedule controllable appliances. Looking to demand increment, demand response strategies aiming at energy management, to achieve goals such as demand reduction and improving reliability, has received attention. A deep review of the existing literature shows the notable efforts put into optimizing the home energy management problem through classic and meta-heuristic optimization algorithms such as game theory, genetic algorithm, and PSO. But, it is worth saying that these algorithms are not pragmatic due to the inherent nature of the home energy management problem. To be more precise, as the environment of the problem changes continuously, these algorithms fail to solve the problem. Hence, some essential assumptions such as considering fixed scenarios are presumed in previous works to enable the conventional algorithm to solve the problem. This is while machine learning addresses this issue by extracting the main features from input data and constructing a general description of the environment. Implementation of machine learning-based algorithms to a home energy management problem requires smart appliances. Hence, in the case of having a smart home, taking the advantage of artificial intelligence for energy management would be feasible and useful. It should be noted that electricity cost reduction can make the demand response program inviting, where customer satisfaction is taken into consideration. Accordingly, customer satisfaction should be considered in the problem formulation. Regarding the mentioned issues, lately, with the remarkable progress in machine learning, novel algorithms evolved for solving optimal decision-making problems such as demand response. Machine learning can be categorized into three main categories, namely supervised learning, unsupervised learning, and reinforcement learning (RL). Among them, reinforcement learning has shown notable performance in decision-making problems. Q-Learning is a model-free RL algorithm that solves nonlinear problems through estimating and maximizing the cumulative reward, triggered by decided actions. The fundamental idea of this algorithm is to identify the best action in each situation. This paper aims to provide a day-ahead demand response program for a smart home. It is done by specifying the quantity of the energy consumption of each appliance, aiming to reduce the electricity cost and user dissatisfaction. In this respect, it is presumed that the smart home is equipped with smart appliances. Moreover, smart meters are installed on appliances to monitor the statuses and receive the command signals from the devices at each hour. These appliances can be divided into three categories, non-responsive, time-shiftable, and controllable loads. Dishwasher and washing machine as time-shiftable loads, EV, air conditioner, and lighting system as controllable loads, and TV and refrigerator as non-responsive loads are taken into account. All in all, we recommend an advanced home energy management system proposing the following contributions: i) Proposing a day-ahead multi-agent Q-Learning method to minimize the electricity cost. ii) Proposing a satisfaction-based framework, which employs a precise model of the customer dissatisfaction functions (i. e., thermal comfort, battery degradation, and desirable operation period). Materials and methods In this paper, a multi-agent Q-Learning approach is used to solve the home energy management for a smart home. Q-learning is a popular model-free algorithm among reinforcement learning algorithms, due to the fact that its convergence is proven, and it is feasible to implement, as well. In order to deploy Q-Learning on a home energy management system, first of all, smart home should be formed as a Markov decision process. A Markov decision process consists of four fundamental parameters namely, state, action, reward, and transition probability matrix. Afterward, an agent is trained through experiencing a specific state, taking an action, transition to a new state, and calculating the cumulative reward. By doing so, after visiting a considerable number of states and taking diverse decisions, it will learn gradually to select the optimum action whatever the state is. Another fundamental aspect of this paper is the proposed approach to take customer satisfaction into account. In this paper, a non-linear thermal comfort model, non-linear desirable operation period model, and linear battery degradation model are deployed to consider the customer dissatisfaction, precisely. It should be noted that all simulations have been implemented by python 3. 6 programming language without making use of any commercial solver. Result Various case studies have been designed to verify the effectiveness of the proposed method. Scenario 1 is designed to simulate the behavior of a smart home associated with a random manner of energy usage. Scenario 2 is designed to verify the effectiveness of the proposed home energy management system, where Q-Learning is conducted. In this case, battery degradation is overlooked. Scenario 3 is similar to the previous one, where battery degradation is also taken into consideration. Comparing the obtained results indicates that the proposed algorithm has successfully reduced the electricity bill by 31. 3% and 24. 8% in scenarios 2 and 3, respectively. It is worth saying that customer satisfaction is not violated in mentioned scenarios. Furthermore, in order to evaluate the effect of thermal comfort on the electricity bill, another case study is deployed, where the thermal comfort coefficient is decreased to smaller magnitudes. As expected, the less thermal comfort coefficient, the less electricity bill. The reason behind this is that having a lower thermal comfort coefficient leads to less importance of temperature control compared to the electricity bill. Conclusion This paper proposed a method for home energy management, regarding minimizing the electricity bill and user discomfort. In this paper, a multi-agent reinforcement learning via Q-Learning is used to make optimal decisions for home appliances, which are categorized into non-shiftable loads, time-shiftable loads, and controllable loads. Comparing to classic optimization methods, the proposed approach in this paper is capable of modeling more appliances and solving complex problems, due to the inherent nature of the Q-Learning algorithm. Implementing the proposed method in the numerical study section led to a 24. 8% electricity bill reduction. The numerical results prove the effectiveness of the proposed approach.