第 1 章:一个 LlmAgent 内部怎么转

本章只讲单个 LLM agent,不碰图。读完你能回答:我给 Agent 一段指令和几个 Python 函数,它是怎么真的「调用工具、把结果再喂回模型」的?

1.1 LlmAgent 是什么

Agent 只是 LlmAgent 的别名(agents/__init__.py 里 from .llm_agent import Agent)。它是一个 pydantic 模型,字段就是你能配置的一切。最常用的几个:

字段	作用	定义处
`name`	agent 名,必须是合法 Python 标识符且树内唯一,不能叫 `user`	`agents/base_agent.py:121`、校验在 `validate_name`(`base_agent.py:582`)
`model`	模型名(`"gemini-2.5-flash"`)或 `BaseLlm` 实例;留空则向祖先继承	`agents/llm_agent.py:213`
`instruction`	系统指令,支持 `{var}` 占位符,运行时用 state 填充	`agents/llm_agent.py:228`
`tools`	工具列表:可以是裸函数、`BaseTool`、或 `BaseToolset`	`agents/llm_agent.py:306`
`output_schema`	让模型产出结构化输出(pydantic 模型)	继承自 `BaseNode.output_schema`(`workflow/_base_node.py:102`)
`sub_agents`	子 agent;影响「能转交给谁」	`agents/base_agent.py:146`
`mode`	`chat`/`task`/`single_turn`,决定委派语义(见第 4 章)	`agents/llm_agent.py:319`

1.2 「canonical」:把配置懒解析成运行时形式

你给的字段是「声明」,运行时需要「实物」。ADK 用一组 canonical_* 属性做这层转换,用到时才算。

模型解析的直觉: 你写 model="gemini-2.5-flash"(字符串),但真正调用需要一个 BaseLlm 对象。canonical_model 负责:已是对象就直接用;是字符串就去注册表查;留空则沿父链向上找,都没有就用默认模型。

旁注:示例里出现的 "gemini-2.5-flash" 是 README 的演示串;本提交里真正的 DEFAULT_MODEL 常量是 'gemini-3.5-flash'(llm_agent.py:201),别从示例误推默认模型版本。

真实实现(agents/llm_agent.py:584,canonical_model):

# 真实源码节选 agents/llm_agent.py:589-599
if isinstance(self.model, BaseLlm):
  return self.model
elif self.model:                       # 非空字符串
  return LLMRegistry.new_llm(self.model)
else:                                  # 向祖先继承
  ancestor_agent = self.parent_agent
  while ancestor_agent is not None:
    if isinstance(ancestor_agent, LlmAgent):
      return ancestor_agent.canonical_model
    ancestor_agent = ancestor_agent.parent_agent
  return self._resolve_default_model()

这段在干嘛:把 model 字段解析成可调用的 BaseLlm,并实现「子 agent 不写模型就用父 agent 的」继承。

工具解析的直觉: 你可以直接把一个普通函数塞进 tools,框架会自动把它包成工具。看 _convert_tool_union_to_tools(agents/llm_agent.py:140):

# 真实源码节选 agents/llm_agent.py:179-186
if isinstance(tool_union, BaseTool):
  return [tool_union]
if callable(tool_union):
  return [FunctionTool(func=tool_union)]   # 裸函数 → FunctionTool
# 否则是 BaseToolset,展开成多个工具
return await tool_union.get_tools_with_prefix(ctx)

重点看:裸 Python 函数会被自动包成 FunctionTool(tools/function_tool.py:42),它能从函数签名+docstring 反推出给模型看的 JSON 工具声明(_get_declaration,function_tool.py:93,内部调 build_function_declaration)。这就是为什么你「写个带类型注解和文档字符串的函数」就能当工具用。

1.3 核心机制:agentic 循环(调模型→调工具→再调模型)

这是单个 agent 最重要的东西。LlmAgent._run_async_impl(agents/llm_agent.py:504)其实只是个壳,真正的循环在它持有的 flow 里(agents/llm_agent.py:524,self._llm_flow.run_async(ctx))。

它要解决的小问题。 模型一次回复可能说「我要调 get_weather(city='Tokyo')」。框架必须:发现这个工具调用 → 真的执行 get_weather → 把返回值作为一条「function response」塞回对话 → 再让模型基于结果继续。这个「再继续」可能又触发新工具调用,所以是个循环。

两种 flow。 SingleFlow 处理「自己 + 工具」;AutoFlow 是 SingleFlow 再加上「agent 转交」能力(flows/llm_flows/auto_flow.py:23,class AutoFlow)。有子 agent 或允许 transfer 时用 AutoFlow。

循环骨架。 看 BaseLlmFlow.run_async(flows/llm_flows/base_llm_flow.py:909):

# 真实源码节选 flows/llm_flows/base_llm_flow.py:913-922
while True:
  last_event = None
  async with Aclosing(self._run_one_step_async(invocation_context)) as agen:
    async for event in agen:
      last_event = event
      yield event
  if not last_event or last_event.is_final_response() or last_event.partial:
    ...
    break

重点看:while True + 「直到出现 is_final_response() 才 break」。每一圈 = 一次 LLM 调用 + 可能的工具执行。

一圈里发生了什么(_run_one_step_async,base_llm_flow.py:924):

┌─ _preprocess_async ───────────────────────────────┐
│  跑一串 request_processors,逐个往 LlmRequest 里塞东西 │
│  (指令、身份、历史 contents、工具声明、缓存配置…)      │
└───────────────────────────────────────────────────┘
        │
        ▼
   _call_llm_async   ← 真正请求模型,流式拿回 LlmResponse
        │
        ▼
┌─ _postprocess_async ──────────────────────────────┐
│  跑 response_processors;若回复里含 function_call:    │
│  → handle_function_calls_async 执行工具             │
│  → 把 function_response 作为 Event 产出(回灌历史)   │
└───────────────────────────────────────────────────┘

请求是「处理器流水线」组装的。 SingleFlow 在构造时把一串 processor 串起来(flows/llm_flows/single_flow.py:_create_request_processors)。顺序有讲究,几个关键:

处理器	往请求里加什么	文件
`basic`	基础生成配置(温度等)	`flows/llm_flows/basic.py`
`instructions`	把 `instruction` 里的 `{var}` 用 state 填充后作为系统指令	`flows/llm_flows/instructions.py:136`
`identity`	agent 身份信息	`flows/llm_flows/identity.py`
`contents`	把 session 历史(过往 Event)拼成模型要的对话上下文	`flows/llm_flows/contents.py`
`_nl_planning`	自然语言规划(planner 支持)	`flows/llm_flows/_nl_planning.py`
`_code_execution`	代码执行工具支持	`flows/llm_flows/_code_execution.py`

AutoFlow 在 __init__ 里把 agent_transfer.request_processor 追加到处理器链末尾(auto_flow.py:44),给模型注入「你可以 transfer_to_agent(...)」的能力。

1.4 工具调用的真实执行

模型说要调工具后,handle_function_calls_async(flows/llm_flows/functions.py:404)负责真正执行。要点:

长时工具(long-running)会被识别出来并打上 long_running_tool_ids。get_long_running_function_calls(functions.py:274)扫描哪些被调用的工具 is_long_running。这类工具(或需要授权/确认的)不会原地阻塞等结果,而是触发暂停——这正是 HITL 与异步工具的基础(第 3 章细讲)。
授权 / 确认:有专门的伪函数名 adk_request_credential(functions.py:59,REQUEST_EUC_FUNCTION_CALL_NAME)和确认请求事件,用来把「需要 OAuth / 需要用户点确认」表达成一次工具调用 + 一次中断。

判断「最终回复」的逻辑(决定循环停不停)在 Event.is_final_response(events/event.py:275):若事件带 skip_summarization 或还有 long_running_tool_ids(event.py:284),就不算最终回复——意味着还得继续(或暂停),不能结束循环。

1.5 巧妙之处

「函数即工具」零样板。 把裸函数放进 tools,FunctionTool 自动从签名+docstring 生成模型工具声明(tools/function_tool.py:93)。开发者不写一行 JSON schema。
模型继承父链。 子 agent 不配 model 就自动用最近的 LlmAgent 祖先的模型(llm_agent.py:593-598),多 agent 树只需在根上配一次。
请求组装是有序处理器链,不是一坨。 加新能力(缓存、规划、代码执行)= 往链里插一个 processor,顺序约束写在 single_flow.py 的注释里(如「compaction 必须在 contents 之前」)。
静态指令 vs 动态指令分离做缓存。 static_instruction(llm_agent.py:255)放永不变的内容、置于请求最前,便于模型侧上下文缓存命中;动态 instruction 放后面。

1.6 边界与坑

单个 agent 的「最终回复」判定只看末尾事件;is_final_response 注释明确,长时工具后跟多条文本时,2-event 窗口的暂停检查可能不准(base_llm_flow.py:955-957 的 NOTE)。
instruction 占位符 {var} 依赖 state 里真有这个键;可选键要写 {var?}(样例 workflows/loop/agent.py 里的 {feedback?})。
SingleFlow 明确「不允许子 agent」(single_flow.py docstring);要 transfer 必须走 AutoFlow。

1.7 接着读

单个 agent 懂了,下一步是把多个节点连成图:见 02-workflow-runtime.md——那里讲 _run_impl 怎么从「壳」变成「图调度循环」。

1.1 LlmAgent 是什么​

1.2 「canonical」:把配置懒解析成运行时形式​

1.3 核心机制:agentic 循环(调模型→调工具→再调模型)​

1.4 工具调用的真实执行​

1.5 巧妙之处​

1.6 边界与坑​

1.7 接着读​