Abstract:
Object grasping pose estimation is an important visual task for embodied robots. Most 6-degree-of-freedom (6-DoF) grasping pose estimation methods mainly rely on point clouds for grasping pose estimation, neglecting the texture information of the object, and the computation process is time-consuming. To address this issue, a fast and efficient grasping pose prediction method is proposed. A grasping region prediction module is built to predict the grasping regions and some of the 6-DoF grasp parameters in the RGB-D images. Then, a non-uniform grasping pose search algorithm is established to improve the accuracy and diversity of the grasping pose. To achieve robot embodiment, a visual-language large model is integrated to build an embodied robot grasping pose prediction system, enabling the robot to accurately understand human instructions and perform object grasping tasks. The experimental results show that the proposed method balances both real-time performance and accuracy, and can be effectively applied to robot grasping tasks in human-robot interaction scenarios.