思维超图推理增强的多模态基础模型

姚方龙; 田昌元; 刘金涛; 张泽群; 孙显

doi:10.20278/j.jc2.2096-0204.2024.0239

思维超图推理增强的多模态基础模型

Hypergraph-of-thought Reasoning Enhanced Multimodal Foundation Models

摘要

摘要: 推理能力是基础模型最关键的能力之一，标志着基础模型处理复杂推理任务的能力。思维链技术是提高基础模型推理能力的有效方法之一，其推理过程是线性的、循序渐进的，类似于个人逻辑推理，适用于解决一般的、稍复杂的问题。相反，专家的思维模式有两个突出特点是思维链无法恰当处理的，即高阶多跳推理和多模态比较判断。为超越思维链，构建一种能像专家一样思考的推理范式，借鉴超图的超边可以连接不同的顶点适合于模拟高阶关系，提出多模态思维超图推理范式，使基础模型具备专家级的高阶多跳推理和多模态比较判断能力。构建了一个文本思维超图来模拟高阶关系，并通过多跳游走生成思维超边来实现多跳推理。设计了视觉思维超图，通过跨模态协同图学习与文本思维超图交互，实现多模态对比印证。在ScienceQA基准上进行的实验表明，基于思维超图的T5 优于基于思维链的GPT3.5 和ChatGPT，并与基于思维链的GPT4 性能相当但模型规模更小。

Abstract: Reasoning ability is one of the most crucial capabilities of a foundation model, signifying its capacity to address complex reasoning tasks of foundation models. Chain-of-Thought（CoT）technique is one of the effective methods for enhancing the reasoning ability of foundation models, and the reasoning process of CoT is linear, step-by-step, similar to personal logical reasoning, suitable for solving general and slightly complicated problems. On the contrary, the thinking pattern of an expert owns two prominent characteristics that cannot be handled appropriately in CoT, i.e., high-order multi-hop reasoning and multimodal comparative judgment. In order to transcend CoT and construct a reasoning paradigm that can think like an expert, and using the hyper-edge of a hypergraph could connect various vertices, which is suitable for simulating high-order relationships. A multimodal Hypergraph-of-Thought（HoT） reasoning paradigm is proposed, which enables the foundation models to possess the expert-level ability of high-order multi-hop reasoning and multimodal comparative judgment.Specifically, a textual hypergraph-of-thought is constructed to simulate higher-order relationships, and a hyperedge-of-thought is generated through multi-hop wandering to achieve multi-hop reasoning. Furthermore, a visual hypergraph-ofthought is designed to interact with the textual hypergraph-of-thought via cross-modal collaborative graph learning and to implement the multimodal comparative verification. The experimentations on the ScienceQA benchmark demonstrate the proposed HoT-based T5 outperforms those of CoT-based GPT3.5 and chatGPT, and equals with the function of CoT-based GPT4 with a smaller model size.

HTML全文

参考文献(0)

施引文献

资源附件(0)