Abstract:
Reasoning ability is one of the most crucial capabilities of a foundation model, signifying its capacity to address complex reasoning tasks of foundation models. Chain-of-Thought(CoT)technique is one of the effective methods for enhancing the reasoning ability of foundation models, and the reasoning process of CoT is linear, step-by-step, similar to personal logical reasoning, suitable for solving general and slightly complicated problems. On the contrary, the thinking pattern of an expert owns two prominent characteristics that cannot be handled appropriately in CoT, i.e., high-order multi-hop reasoning and multimodal comparative judgment. In order to transcend CoT and construct a reasoning paradigm that can think like an expert, and using the hyper-edge of a hypergraph could connect various vertices, which is suitable for simulating high-order relationships. A multimodal Hypergraph-of-Thought(HoT) reasoning paradigm is proposed, which enables the foundation models to possess the expert-level ability of high-order multi-hop reasoning and multimodal comparative judgment.Specifically, a textual hypergraph-of-thought is constructed to simulate higher-order relationships, and a hyperedge-of-thought is generated through multi-hop wandering to achieve multi-hop reasoning. Furthermore, a visual hypergraph-ofthought is designed to interact with the textual hypergraph-of-thought via cross-modal collaborative graph learning and to implement the multimodal comparative verification. The experimentations on the ScienceQA benchmark demonstrate the proposed HoT-based T5 outperforms those of CoT-based GPT3.5 and chatGPT, and equals with the function of CoT-based GPT4 with a smaller model size.