BULLETIN OF THE POLISH ACADEMY OF SCIENCES. TECHNICAL SCIENCES, cilt.74, sa.4, ss.1-12, 2026 (Hakemli Dergi)
State-of-the-art deep reinforcement learning (DRL) techniques such as Soft Actor-Critic (SAC), Twin Delayed Deep Deterministic
Policy Gradient (TD3), and Deep Deterministic Policy Gradient (DDPG) demonstrate promising results in developing control strategies. In this
study,weproposeES-SAC,ahybridlearningframeworkthatintegratesEvolutionaryStrategy(ES)withtheSACalgorithmtoenhancehumanoid
robotlocomotioncontrol. ES-SACleveragestheglobalsearchcapabilitiesofevolutionaryalgorithmsandthesampleefficiencyandconvergence
properties of DRL. The performance of the ES-SAC agent was evaluated on a bipedal robot simulation and compared to other hybrid methods
employing deterministic agents, including ES-TD3 and ES-DDPG. The ES-SAC agent exhibited superior average reward performance and a
morestablelearningprocess. Incontrast,theES-TD3agentachievedfastercoursecompletionbutexhibitedcontrolinstabilities. Thisstudyalso
highlights the importance of physical and behavioral metrics – such as torque efficiency, horizontal and vertical deflection, and Q0 values – in
assessing the reliability of DRL-based locomotion control. Our findings suggest that relying solely on cumulative reward for evaluation can be
misleading, underscoring the need fora more comprehensive analysisin future research.