Nowadays, with the growing demand for network-based services, the use of wireless networks, especially those that do not require predetermined infrastructures, such wireless Ad-hoc networks (WANETs), has attracted many applicants. Such networks with minimal configuration and rapid deployment are suitable for emergencies such as disasters, crises, and military applications. WANETs are recognized as one of the most used networks, especially in emergencies. In this paper, optimal radio resource allocation of WANETs with dynamic topology is investigated from the perspective of optimization frameworks, cross-layer design, and routing. Research is being done in the field of cross-layer with the aim of optimizing energy consumption and network performance. The key steps that optimize the system model are discussed. The formulas of single-objective or multi-objective optimization problems in WANETs are studied by examining the achievements of new research done with each of these methods. Common algorithms and metrics used in optimization and tasks performed on WANETs are examined. By studying the work done using the deep reinforcement learning method, we show that by taking decision feedback from the system, we can have a significant impact on cost control and management. A deep reinforcement learning technique is a viable solution for resource allocation in the complex environment of next-generation networks, with a feedback loop between decision and system performance that refines and optimizes decisions.