AI JAVA

Spring AI Agent 实战：从零搭建一个能调用工具的智能助手

Matuto

2026-01-07 / 0 评论 / 176 阅读 / 0 点赞

01/07

最近在做视频生成平台时，需要让 AI 不仅能聊天，还得能干活——比如调用图片生成接口、上传文件到 COS 这些。折腾了一圈 Spring AI 的 Agent 功能，踩了不少坑，这里把经验整理出来。

先说说 Agent 是个啥

简单讲，普通的 ChatBot 只能动嘴，Agent 能动手。

你跟 ChatGPT 说"帮我生成一张猫的图片"，它只会给你返回一段描述或者告诉你去哪生成。但 Agent 不一样，它会：

理解你要生成图片
自己去调用图片生成的 API
把生成结果返回给你

这就是所谓的 Tool Calling（工具调用），也有叫 Function Calling 的。

准备工作

Maven 依赖

<!-- Spring AI BOM，统一版本管理 -->
<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-bom</artifactId>
            <version>1.1.2</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

<dependencies>
    <!-- Spring AI OpenAI Starter，支持 OpenAI 兼容的 API -->
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-starter-model-openai</artifactId>
    </dependency>
</dependencies>

这里用的是 spring-ai-starter-model-openai，别被名字骗了，它其实支持所有 OpenAI 兼容的 API，包括 Gemini、DeepSeek、通义千问这些。

核心概念：Tool 的定义

Spring AI 提供了 @Tool 注解来定义工具，比传统的 JSON Schema 方式简洁多了。

一个简单的例子

@Component
public class ImageGenerationTools {

    @Tool(description = "文生图：根据文本描述生成图片")
    public String textToImage(
            @ToolParam(description = "图片描述，需要详细描述想要生成的图片内容") String prompt,
            @ToolParam(description = "模型名称", required = false) String model,
            @ToolParam(description = "图片宽高比，如 16:9, 4:3, 1:1", required = false) String aspectRatio
    ) {
        // 调用实际的图片生成服务
        Long taskId = imageService.createTask(prompt, model, aspectRatio);
        return "图片生成任务已创建，任务ID: " + taskId;
    }

    @Tool(description = "查询图片生成任务状态")
    public String queryImageTask(
            @ToolParam(description = "任务ID") Long taskId,
            @ToolParam(description = "查询模式：poll(轮询等待) 或 once(单次查询)", required = false) String mode
    ) {
        // 查询任务状态
        ImageTask task = imageService.getTask(taskId);
        if (task.isCompleted()) {
            return "生成完成，图片地址: " + task.getImageUrl();
        }
        return "任务进行中，当前状态: " + task.getStatus();
    }
}

几个要点：

@Tool 的 description 很重要，模型靠这个判断什么时候该调用这个工具
@ToolParam 描述参数，required = false 表示可选
返回值是 String，会作为工具执行结果返回给模型

稍微复杂点：支持文件上传

@Component
public class FileUploadTools {

    @Resource
    private CosService cosService;

    @Tool(description = "从URL下载文件并上传到云存储，返回可访问的文件地址")
    public String uploadFileFromUrl(
            @ToolParam(description = "要下载的文件URL") String fileUrl,
            @ToolParam(description = "保存的文件名", required = false) String fileName
    ) {
        try {
            // 下载文件
            byte[] fileData = HttpUtil.downloadBytes(fileUrl);

            // 上传到 COS
            String objectKey = generateObjectKey(fileName, fileUrl);
            String uploadedUrl = cosService.upload(objectKey, fileData);

            return "文件上传成功: " + uploadedUrl;
        } catch (Exception e) {
            return "上传失败: " + e.getMessage();
        }
    }
}

工具注册：把 @Tool 变成 Agent 能用的东西

定义好工具类后，需要注册到 Spring AI 的工具系统里：

@Configuration
public class AgentToolsConfig {

    @Bean
    public ToolCallbackProvider imageGenerationToolProvider(ImageGenerationTools tools) {
        return MethodToolCallbackProvider.builder()
                .toolObjects(tools)
                .build();
    }

    @Bean
    public ToolCallbackProvider fileUploadToolProvider(FileUploadTools tools) {
        return MethodToolCallbackProvider.builder()
                .toolObjects(tools)
                .build();
    }

    /**
     * 汇总所有工具，方便注入
     */
    @Bean
    public List<ToolCallback> allToolCallbacks(List<ToolCallbackProvider> providers) {
        return providers.stream()
                .flatMap(provider -> Arrays.stream(provider.getToolCallbacks()))
                .collect(Collectors.toList());
    }
}

MethodToolCallbackProvider 会自动扫描对象里的 @Tool 注解方法，转换成 Spring AI 的 ToolCallback。

Agent Service：把它们组装起来

@Service
@Slf4j
public class AgentService {

    private final OpenAiApi openAiApi;
    private final List<ToolCallback> toolCallbacks;

    // 缓存 ChatClient，避免重复创建
    private final ConcurrentHashMap<Long, ChatClient> clientCache = new ConcurrentHashMap<>();

    public AgentService(List<ToolCallback> toolCallbacks) {
        this.toolCallbacks = toolCallbacks;

        // 这里用 Gemini 的 API，它兼容 OpenAI 协议
        this.openAiApi = OpenAiApi.builder()
                .baseUrl(" https://generativelanguage.googleapis.com/v1beta/openai/ ")
                .apiKey("your-api-key")
                .build();
    }

    public ChatClient getChatClient(Long siteId) {
        return clientCache.computeIfAbsent(siteId, id -> {
            OpenAiChatModel chatModel = OpenAiChatModel.builder()
                    .openAiApi(openAiApi)
                    .defaultOptions(OpenAiChatOptions.builder()
                            .model("gemini-2.5-flash")
                            .temperature(0.7)
                            .build())
                    .build();

            return ChatClient.builder(chatModel)
                    .defaultTools(toolCallbacks.toArray(new ToolCallback[0](@ref))
                    .build();
        });
    }

    public String chat(Long siteId, String userMessage) {
        ChatClient client = getChatClient(siteId);

        return client.prompt()
                .user(userMessage)
                .call()
                .content();
    }

    public String chatWithSystem(Long siteId, String systemPrompt, String userMessage) {
        ChatClient client = getChatClient(siteId);

        return client.prompt()
                .system(systemPrompt)
                .user(userMessage)
                .call()
                .content();
    }
}

关键点：

defaultTools() 把工具注册到 ChatClient
当模型判断需要调用工具时，Spring AI 会自动执行对应的方法
工具执行结果会自动发回给模型，模型再基于结果生成最终回复

一个实际问题：Tool 里怎么拿到当前用户信息？

工具方法被调用时，没法直接拿到 Controller 层的 HttpServletRequest。我的做法是用 ThreadLocal：

public class AgentContext {

    private static final ThreadLocal<Long> USER_ID = new ThreadLocal<>();
    private static final ThreadLocal<Long> SITE_ID = new ThreadLocal<>();

    public static void setContext(Long userId, Long siteId) {
        USER_ID.set(userId);
        SITE_ID.set(siteId);
    }

    public static Long getUserId() {
        return USER_ID.get();
    }

    public static Long getSiteId() {
        return SITE_ID.get();
    }

    public static void clear() {
        USER_ID.remove();
        SITE_ID.remove();
    }
}

在 Controller 里设置：

@RestController
@RequestMapping("/agent")
public class AgentController {

    @Resource
    private AgentService agentService;

    @PostMapping("/chat")
    public Result<String> chat(@RequestBody ChatRequest request) {
        Long userId = StpUtil.getLoginIdAsLong();
        Long siteId = getCurrentSiteId();

        try {
            // 设置上下文，Tool 里就能拿到了
            AgentContext.setContext(userId, siteId);

            String response = agentService.chatWithSystem(
                    siteId,
                    request.getSystemPrompt(),
                    request.getMessage()
            );

            return Result.success(response);
        } finally {
            // 一定要清理，不然线程池复用会出问题
            AgentContext.clear();
        }
    }
}

Tool 里这样用：

@Tool(description = "上传文件到当前用户的空间")
public String uploadFile(@ToolParam(description = "文件URL") String fileUrl) {
    Long siteId = AgentContext.getSiteId();
    Long userId = AgentContext.getUserId();

    // 根据 siteId 获取对应的 COS 配置
    CosConfig config = siteConfigProvider.getCosConfig(siteId);
    // ... 上传逻辑
}

多租户支持：每个站点独立的 API 配置

实际项目里，不同租户可能用不同的 API Key，甚至不同的模型。我的做法是把 ChatClient 按 siteId 缓存：

@Service
public class GeminiChatClientFactory {

    @Resource
    private SiteConfigProvider configProvider;

    private final ConcurrentHashMap<Long, OpenAiChatModel> modelCache = new ConcurrentHashMap<>();

    public OpenAiChatModel getChatModel(Long siteId) {
        return modelCache.computeIfAbsent(siteId, id -> {
            // 从数据库读取该站点的 API 配置
            GeminiApiConfig config = configProvider.getGeminiApiConfig(id);

            OpenAiApi api = OpenAiApi.builder()
                    .baseUrl(config.getBaseUrl())
                    .apiKey(config.getApiKey())
                    .build();

            return OpenAiChatModel.builder()
                    .openAiApi(api)
                    .defaultOptions(OpenAiChatOptions.builder()
                            .model(config.getModel())
                            .temperature(config.getTemperature())
                            .build())
                    .build();
        });
    }

    /**
     * 配置变更时刷新缓存
     */
    public void refreshClient(Long siteId) {
        modelCache.remove(siteId);
    }
}

实战技巧

1. Tool 的 description 写得好，效果差很多

// 不好的写法
@Tool(description = "生成图片")

// 好的写法
@Tool(description = "文生图：根据文本描述生成AI图片。适用于需要创建新图片的场景。输入详细的画面描述，返回生成任务ID")

模型是根据 description 判断什么时候调用工具的，写得越清楚，调用越准确。

2. 工具返回值要对模型友好

// 不好的写法
return taskId.toString();

// 好的写法
return "图片生成任务已创建成功，任务ID为 " + taskId + "。你可以使用 queryImageTask 工具查询生成进度。";

返回值会发给模型，写成自然语言，模型更容易理解下一步该干啥。

3. 处理好工具执行失败的情况

@Tool(description = "查询图片任务状态")
public String queryImageTask(@ToolParam(description = "任务ID") Long taskId) {
    try {
        ImageTask task = imageService.getTask(taskId);
        if (task == null) {
            return "未找到该任务，请确认任务ID是否正确";
        }
        // ... 正常逻辑
    } catch (Exception e) {
        log.error("查询任务失败", e);
        return "查询失败: " + e.getMessage() + "，请稍后重试";
    }
}

别直接抛异常，返回友好的错误信息，模型会根据错误信息决定是重试还是告诉用户。

4. 轮询任务的处理

图片生成这类异步任务，可以在 Tool 里实现轮询：

@Tool(description = "查询图片任务，支持轮询等待完成")
public String queryImageTask(
        @ToolParam(description = "任务ID") Long taskId,
        @ToolParam(description = "查询模式：poll(轮询等待) 或 once(单次查询)", required = false) String mode
) {
    ImageTask task = imageService.getTask(taskId);

    if ("poll".equals(mode) && !task.isCompleted()) {
        // 轮询等待，最多等30秒
        for (int i = 0; i < 6; i++) {
            ThreadUtil.sleep(5000);
            task = imageService.getTask(taskId);
            if (task.isCompleted()) break;
        }
    }

    if (task.isCompleted()) {
        return "生成完成！图片地址: " + task.getImageUrl();
    }
    return "任务进行中，状态: " + task.getStatus() + "，可以稍后再查询";
}

完整调用流程

用户: "帮我生成一张赛博朋克风格的城市夜景"
    ↓
ChatClient 发送请求到 Gemini
    ↓
Gemini 分析后决定调用 textToImage 工具
    ↓
Spring AI 自动执行 textToImage("赛博朋克风格的城市夜景", null, null)
    ↓
返回: "图片生成任务已创建，任务ID: 12345"
    ↓
Gemini 收到工具返回，决定调用 queryImageTask 查询状态
    ↓
Spring AI 执行 queryImageTask(12345, "poll")
    ↓
返回: "生成完成！图片地址: https://xxx.cos.xxx/image.png "
    ↓
Gemini 整理最终回复给用户:
"已为您生成赛博朋克风格的城市夜景图片，您可以通过以下链接查看： https://xxx... "

整个过程对用户来说是一句话的事，但背后 Agent 可能调用了多次工具。

最后

Spring AI 的 Agent 功能还在快速迭代，目前用下来感觉已经够用了。核心就是三步：

用 @Tool 定义工具
用 MethodToolCallbackProvider 注册
用 ChatClient.defaultTools() 启用

剩下的，Spring AI 都帮你处理了。

有问题欢迎交流。

版权属于: 上线我的 2.0 - 马图图的学习笔记,马景振个人Blog,上线我的 2.0

本文链接: www.majingzhen.com/article/springaiagentsz

作品采用: 《署名-非商业性使用-相同方式共享 4.0 国际 (CC BY-NC-SA 4.0)》许可协议授权