Abstract:Aiming at the high requirements of command and control technology in typical Airsea Battle, a two-layer reinforcement learning framework inspired by prior knowledge is proposed. By studying reward shaping method inspired by prior knowledge, the aggregation method of combat subtask design state is extracted, so as to map the specific state to the abstract state. Then, based on the abstract state, Markov decision process(MDP)theory is used in modeling, and reinforcement learning algorithm is used to solve the model. Finally, the abstract state value function is used to shape the reward based on potential energy. The above process is solved in parallel with the lower specific MDP process, so as to build a double-layer reinforcement learning algorithm framework. The research is based on the military chess deduction platform of the national military chess deduction competition, and refines the algorithm in state space, action space, reward function, etc. It is pointed out that prior knowledge represents the top-down task-based command mode, and multi-agent reinforcement learning conforms to the
bottom-up event-based command mode in some structures. The combination of the two methods makes the combat unit under the control of the algorithm learn cooperative combat tactics and have better robustness in the face of complex environment. Simulation results show, the red side agent controlled by the algorithm can obtain 70 % victory rate against the blue side controlled by the rule agent.