Agentic 循环:_stream 怎么自己转起来

这是 avante 的心脏。本章讲清楚「模型边想边调工具」这件事在代码里到底是怎么实现的——答案是递归:_stream 在响应结束时,根据停止原因决定要不要再调一次自己。

1. 它要解决的小问题

Agentic coding 的本质是个循环:

模型说「我要读文件 X」 → 系统读 X 把内容喂回 → 模型说「我要改 Y」 → 系统改 Y 喂回结果 → 模型说「完成了」

难点不在「调一次模型」,而在「怎么把这个不定长的来回循环,接到流式响应 + 异步工具 + 可取消 + 可压缩历史上」。avante 的答案很克制:不写显式 while,而是让 _stream 在每轮结束后判断「还要不要继续」,要就调用自身。

2. 直觉:递归 = 循环

 M._stream(opts)
   │  发请求、流式解析
   ▼
 handler_opts.on_stop(stop_opts)
   │
   ├─ reason == "tool_use"  ── 串行跑所有 pending 工具
   │                            每个工具产出 tool_result(追加到历史)
   │                            最后一个工具是 attempt_completion?
   │                              ├─ 是 → on_stop({reason="complete"}) 收尾
   │                              └─ 否 → M._stream(new_opts)  ← 递归!再调模型
   │
   ├─ reason == "complete" (agentic) ── 检查模型是否「忘了收尾」
   │                                     是 → 补一条 system-reminder → M._stream  ← 递归
   │                                     否 → 真正结束
   │
   ├─ reason == "rate_limit" ── 倒计时后 M._stream(opts)  ← 递归重试
   └─ reason == "cancelled"/"error" ── 终止

每一条「→ M._stream」都是循环的「下一圈」。新一圈的 history_messages 包含了上一圈追加的工具结果,模型于是「看到」了工具产出,继续推进。

3. pending tools:这一轮要跑哪些工具

模型可能在一次响应里要求调多个工具。_stream 用 History.get_pending_tools(history_messages)(history/init.lua:325)找出「有 tool_use、但还没有对应 tool_result」的工具调用:

-- history/init.lua:333 起(节选):从最后往前扫,同一个 turn 内
--   收集每个 tool_use,跳过已经见过 tool_result 的(说明已跑过)
for idx = #messages, 1, -1 do
  local message = messages[idx]
  if last_turn_id and message.turn_id ~= last_turn_id then break end
  local use = Helpers.get_tool_use_data(message)
  if use then
    if not tool_result_seen[use.id] then
      table.insert(pending_tool_uses, 1, partial_tool_use)  -- 还没结果 → 待跑
    end
  else
    local result = Helpers.get_tool_result_data(message)
    if result then tool_result_seen[result.tool_use_id] = true end
  end
end

关键设计:只看「当前 turn」(last_turn_id,history/init.lua:336),且用 tool_use_id 配对来判断某个工具是否已经跑过。这让循环天然幂等——历史里已经有结果的工具不会被重复执行。

4. 串行执行 + 把结果喂回

handle_next_tool_use(llm.lua:1825)是个按索引递归的小状态机,一次只跑一个工具,跑完才跑下一个:

当 tool_use_index > #tool_uses(全跑完):把所有 tool_results 包成 user 角色的 tool_result 消息追加进历史(llm.lua:1832-1844),然后判断是否要递归。
否则:取第 index 个工具,调 LLMTools.process_tool_use(llm.lua:1917),在它的回调 handle_tool_result 里把结果存进 tool_results,再 handle_next_tool_use(..., index + 1)(llm.lua:1893)继续下一个。

为什么串行而不并行?因为改文件类工具会动同一个 buffer、会弹确认框,必须一个一个来。

收尾判定就在全跑完那一刻(llm.lua:1845):

-- llm.lua:1845 起(节选):最后一个工具若是 attempt_completion,就真正结束
local the_last_tool_use = tool_uses[#tool_uses]
if the_last_tool_use and the_last_tool_use.name == "attempt_completion" then
  opts.on_stop({ reason = "complete" })
  return
end
-- 否则:带上最新历史,递归再调模型
local new_opts = vim.tbl_deep_extend("force", opts, {
  history_messages = opts.get_history_messages and opts.get_history_messages() or {},
})
if not streaming_tool_use then M._stream(new_opts) end

attempt_completion 是一个特殊工具——模型调它表示「我干完了」,循环就此停下(它的定义在 llm_tools/attempt_completion.lua,被注册在工具表 init.lua:1197)。

5. 防「模型偷懒不收尾」:system-reminder 补救

有时模型在 agentic 模式下回了一段文字但没调任何工具、也没调 attempt_completion(reason 直接是 complete)。avante 不接受这种「半途而废」——它在 complete 分支里检查并补一条隐藏提醒,然后递归再逼模型一次(llm.lua:1929-1988):

-- llm.lua:1957 起(节选):有未完成 todo 就提醒用 write_todos,否则提醒用 attempt_completion
if #unfinished_todos > 0 then
  message = History.Message:new("user",
    "<system-reminder>You should use tool calls to answer the question, "
    .. "for example, use write_todos if the task step is done or cancelled.</system-reminder>",
    { visible = false })
else
  message = History.Message:new("user",
    "<system-reminder>You should use tool calls to answer the question, "
    .. "for example, use attempt_completion if the job is done.</system-reminder>",
    { visible = false })
end
opts.on_messages_add({ message })
M._stream(new_opts)  -- 再来一圈

有意思的细节:这个提醒最多补 3 次(user_reminder_count < 3,llm.lua:1952),除非还有未完成的 todo——避免和一个铁了心不收尾的模型无限对话。每次真正进入 tool_use 分支时计数器清零(llm.lua:1991)。

6. 历史压缩(memory compaction):防上下文爆

长任务会让历史越堆越长。_stream 在组装 prompt 后会检查是否有「待压缩」的历史消息(llm.lua:1789-1796):

if prompt_opts.pending_compaction_history_messages
   and #prompt_opts.pending_compaction_history_messages > 0
   and opts.on_memory_summarize then
  opts.on_memory_summarize(prompt_opts.pending_compaction_history_messages)
  return
end

on_memory_summarize(在 agent_loop 里定义,llm.lua:237)会调 M.summarize_memory(llm.lua:32)用一个「专门的摘要模型」把旧消息压成一段 memory 文本,从历史里删掉被压缩的消息,然后再调 _stream 用压缩后的历史继续。memory 文本之后会被塞进 prompt(_memory.avanterules 模板),让模型「记得」被压缩掉的内容。

7. 速率限制:倒计时后递归重试

当 provider 报 reason == "rate_limit",_stream 不是直接失败,而是起一个 1 秒 tick 的 uv 定时器倒计时,期间在聊天流里刷新「Retrying in N seconds」,到点后 M._stream(opts) 再来一圈(llm.lua:1994-2041)。倒计时期间用户取消会触发 dispatch_cancel_message 优雅退出(llm.lua:2008-2012)。

8. 两个入口:agent_loop vs stream

M.agent_loop(opts)(llm.lua:185):给「子 agent / 一次性任务」用的轻量入口——它自己维护一份本地 history_messages,把任务包成 <task>...</task> 用户消息,设好 on_memory_summarize 后调 _stream。dispatch_agent 工具就靠它跑子任务。
M.stream(opts)(llm.lua:2143):侧边栏主对话用的入口。它先用 vim.schedule_wrap 把 on_chunk/on_stop 等回调包一层(保证回到主线程、且 is_completed 后不再触发),再按 dual_boost 配置决定是单流还是「双模型对照」(_dual_boost_stream,llm.lua:2088),最后落到 _stream。

9. 巧妙之处

用递归表达不定长循环:没有显式循环变量,循环深度 = 模型调工具的轮数。状态全靠 history_messages 和 session_ctx 携带,递归调用天然把它们带到下一圈。
tool_use_id 配对实现幂等:get_pending_tools 只跑「没有结果」的工具,所以即使重入也不会重复执行已完成工具。
「不肯收尾」也有上限:reminder 计数 + todo 检查,既逼模型完成又不至于死循环。

10. 边界

串行执行意味着多个独立工具不会并行加速(改文件类确实必须串行,但只读工具理论上可并行——avante 选择了简单与安全)。
收尾完全依赖模型调 attempt_completion;若模型始终不调且 3 次提醒用尽,循环就停在最后一次响应。

11. 代码地图

主题	文件	符号
循环主体 + on_stop 分支	`lua/avante/llm.lua`	`M._stream`
串行跑工具的递归状态机	`lua/avante/llm.lua`	`handle_next_tool_use`(`_stream` 内闭包)
找出本轮待跑工具	`lua/avante/history/init.lua`	`M.get_pending_tools`
轻量子任务入口	`lua/avante/llm.lua`	`M.agent_loop`
主对话入口 + 回调包裹	`lua/avante/llm.lua`	`M.stream`
历史压缩 / 记忆摘要	`lua/avante/llm.lua`	`M.summarize_memory`、`on_memory_summarize`
收尾工具	`lua/avante/llm_tools/attempt_completion.lua`	`M`(name="attempt_completion")

1. 它要解决的小问题​

2. 直觉:递归 = 循环​

3. pending tools:这一轮要跑哪些工具​

4. 串行执行 + 把结果喂回​

5. 防「模型偷懒不收尾」:system-reminder 补救​

6. 历史压缩(memory compaction):防上下文爆​

7. 速率限制:倒计时后递归重试​

8. 两个入口:agent_loop vs stream​

9. 巧妙之处​

10. 边界​

11. 代码地图​