o1 models

o1 models

简介：

在 o1 之前你可以把大多数模型想象成孩子，它们总是说首先想到事情，它们需要接收宝贵的教训，让它们说话之前先思考一下。而 o1 在每次说话之前都会仔细思考，这有助于它在复杂任务上达到新的水平。

它的做法是使用思维链 (CoT) 来探索所有可能得路径，并在产生答案时验证其答案。

意味着可以用更少的上下文产生有效的结果。

tokens

o1 模型引入了推理标记。这些模型使用这些推理标记来“思考”，分解他们对提示的理解，并考虑多种方法来生成响应。

生成推理标记后，模型会生成一个答案作为可见的完成标记，并从其上下文中丢弃推理标记。

以下是用户和助理之间的多步骤对话的示例。每个步骤的输入和输出标记都会被保留，而推理标记会被丢弃。

notion image

notion image

成绩对比

notion image

notion image

o1 模型是如何工作的？

利用大规模的强化学习生成一条思维链 (CoT)

在回答之前，生成的这条思维链更长，质量更高，通常会比单独用提示词 (prompt) 所达到的效果更好。

它还包含错误纠正和尝试多种策略等行为并选择最后的一个，或者是将问题分解成更小的步骤等。

思维链请参考文章：

Learning to Reason with LLMs

We are introducing OpenAI o1, a new large language model trained with reinforcement learning to perform complex reasoning. o1 thinks before it answers—it can produce a long internal chain of thought before responding to the user.

Learning to Reason with LLMs

https://openai.com/index/learning-to-reason-with-llms/

Learning to Reason with LLMs

Learning to Reason with LLMs

抽象推理

notion image

notion image

提示词

notion image

notion image

© Baocheng Huang 2023 - 2026