基于强化学习的城市场景下巡飞弹自主协同饱和攻击方法

张婷婷; 杨学军

doi:10.3969/j.issn.2096-0204.2023.04.0457

基于强化学习的城市场景下巡飞弹自主协同饱和攻击方法

Autonomous Coordination Saturation Attacks Method for Loitering Munitions in Urban Scenarios Based on Reinforcement Learning

摘要

摘要: 针对城市场景下巡飞弹自主协同饱和攻击问题, 将其建模为分布式部分可观测马尔可夫决策过程（Dec-POMDPs） , 设计了确保巡飞弹在极小时间间隔内到达的专用奖励函数, 并结合使用联合权重参数的奖励函数, 采用循环多智能体深度确定性策略梯度算法（R-MADDPG）训练巡飞弹自主协同饱和攻击策略, 使用蒙特卡罗方法分析指标成功率. 仿真实验结果表明, 在训练后的决策模型引导下, 巡飞弹执行自主协同饱和攻击的任务成功率为 93.2%, 其中, 机间避撞率为 94.4%、空中突防成功率为 99.5%, 95.3%回合到达最大时间间隔小于 0.4 s.

Abstract: In order to address the problem of autonomous coordination saturation attack of loitering munitions in urban scenarios, it is modeled as a decentralized partially observable Markov decision process (Dec-POMDPs). A specific reward function to ensure the arrival of loitering munitions at minimum time intervals and other reward functions with joint weight parameters are designed. Recurrent multi-agent deep deterministic policy gradient algorithm (R-MADDPG) is employed to train the policy for autonomous coordination saturation attack of loitering munitions. The success rate of several indicators is analyzed by Monte Carlo simulation method. The simulation and experiment results show that the mission success rate of autonomous cooperative saturation attack by the loitering munitions is 93.2% after training under the guidance of decision-making model, among which the mid-air collision avoidance rate between loitering munitions is 94.4%, the success rate of defense penetration in the air is 99.5%, the maximum time interval within 0.4 seconds is 95.3%.

HTML全文

参考文献(0)

施引文献

资源附件(0)