Automotive Innovation ›› 2024, Vol. 7 ›› Issue (3): 403-417.doi: 10.1007/s42154-023-00260-1
Jingliang Duan and Yiting Kong have contributed equally to this work. Jingliang Duan 1,2, Yiting Kong 2, Chunxuan Jiao 1, Yang Guan 2, Shengbo Eben Li2, Chen Chen 2, Bingbing Nie 2 & Keqiang Li 2
Jingliang Duan and Yiting Kong have contributed equally to this work.#br# Jingliang Duan 1,2, Yiting Kong 2, Chunxuan Jiao 1, Yang Guan 2, Shengbo Eben Li2, Chen Chen 2, Bingbing Nie 2 & Keqiang Li 2
摘要: Merging into the highway from the on-ramp is an essential scenario for automated driving. The decision-making in this scenario needs to balance safety and efficiency to optimize a long-term objective, which is challenging due to the dynamic, stochastic, and adversarial characteristics. The existing learning-based methods struggle to meet the safety requirements. This paper proposes a reinforcement-learning-based decision-making method under a framework of offline training and online correction, called the Shielded Distributional Soft Actor-critic (Shielded DSAC). The Shielded DSAC adopts the policy evaluation with safety considerations in offline training, and a safety shield parameterized with the barrier function in online correction. These two measures support each other in achieving better safety without sacrificing efficiency performance. The study verified the Shielded DSAC in a simulated on-ramp merge scenario. The results indicate that the Shielded DSAC has the best safety performance compared to baseline algorithms and achieves efficient driving simultaneously.