API 简介

认证方式

流式响应

基础用法

错误处理

OpenAI 兼容性

Anthropic 兼容性

API 端点详解

生成接口 (POST /api/generate)

聊天接口 (POST /api/chat)

嵌入接口 (POST /api/embeddings)

模型列表 (GET /api/tags)

运行中模型 (GET /api/ps)

模型详情 (POST /api/show)

创建模型 (POST /api/create)

复制模型 (POST /api/copy)

拉取模型 (POST /api/pull)

推送模型 (POST /api/push)

删除模型 (DELETE /api/delete)

获取版本 (GET /api/version)

Python 开发

Python SDK 安装与配置

Ollama Python 生成

Python 流式处理

Python 异步编程

JavaScript/TypeScript 开发

JavaScript SDK 安装与配置

JavaScript 生成与聊天

JavaScript 流式处理

TypeScript 类型定义

Go 语言开发

Go 客户端配置

Go 生成与聊天

Go 流式处理

Go 并发处理

高级应用

构建聊天机器人

构建 RAG 应用

多模态应用

构建代码助手

构建翻译工具

批量处理

性能优化

连接池管理

模型管理

并发与限流

模型自定义

缓存策略

模型量化

超时与重试

模型性能优化

部署与集成

与 LangChain 集成

最佳实践

故障排除

与 LlamaIndex 集成

更多资源

Web 应用集成

微服务架构

聊天接口 (POST /api/chat)

聊天接口专门为对话场景设计，支持多轮对话、角色区分，比生成接口更适合构建聊天应用。

基本用法

最简单的聊天请求：

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [
    {"role": "user", "content": "你好"}
  ]
}'

响应：

{
  "model": "llama3.2",
  "created_at": "2024-01-15T10:00:00Z",
  "message": {
    "role": "assistant",
    "content": "你好！有什么可以帮你的吗？"
  },
  "done": true
}

消息格式

每条消息包含 role 和 content：

{
  "role": "user",
  "content": "消息内容"
}

角色类型

角色	说明
system	系统提示，定义 AI 行为
user	用户消息
assistant	AI 回复

system 消息

放在消息列表开头，定义 AI 的行为：

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [
    {"role": "system", "content": "你是一个专业的 Python 开发者，回答简洁准确"},
    {"role": "user", "content": "什么是装饰器？"}
  ]
}'

也可以使用单独的 system 参数：

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "system": "你是一个专业的 Python 开发者",
  "messages": [
    {"role": "user", "content": "什么是装饰器？"}
  ]
}'

多轮对话

聊天接口天然支持多轮对话，把历史消息传入即可：

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [
    {"role": "user", "content": "我叫小明"},
    {"role": "assistant", "content": "你好小明！很高兴认识你。"},
    {"role": "user", "content": "我叫什么名字？"}
  ]
}'

响应：

{
  "message": {
    "role": "assistant",
    "content": "你叫小明。"
  }
}

请求参数

参数	类型	必需	说明
model	string	是	模型名称
messages	array	是	消息列表
stream	bool	否	是否流式，默认 true
format	string	否	输出格式
options	object	否	模型参数
system	string	否	系统提示
keep_alive	string	否	模型保留时间
tools	array	否	工具定义

options 参数

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [{"role": "user", "content": "写一首诗"}],
  "options": {
    "temperature": 0.8,
    "num_ctx": 4096,
    "top_p": 0.9
  }
}'

format 参数

强制 JSON 输出：

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [
    {"role": "user", "content": "生成一个包含姓名和年龄的用户信息"}
  ],
  "format": "json",
  "stream": false
}'

流式响应

默认是流式输出：

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [{"role": "user", "content": "写一首诗"}]
}'

返回多个 JSON 对象：

{"model":"llama3.2","created_at":"2024-01-15T10:00:00Z","message":{"role":"assistant","content":"春"},"done":false}
{"model":"llama3.2","created_at":"2024-01-15T10:00:00Z","message":{"role":"assistant","content":"风"},"done":false}
...
{"model":"llama3.2","created_at":"2024-01-15T10:00:00Z","message":{"role":"assistant","content":""},"done":true,"total_duration":5000000000}

响应字段

字段	说明
model	模型名称
created_at	创建时间
message	消息对象
message.role	角色（assistant）
message.content	内容
done	是否完成
total_duration	总耗时
eval_count	生成 token 数

图片消息

支持多模态模型（如 llava）：

curl http://localhost:11434/api/chat -d '{
  "model": "llava",
  "messages": [
    {
      "role": "user",
      "content": "这张图片里有什么？",
      "images": ["iVBORw0KGgoAAAANSUhEUgAA..."]
    }
  ]
}'

图片需要 Base64 编码。

Python 示例

import base64
import requests

def chat_with_image(image_path, question, model="llava"):
    with open(image_path, "rb") as f:
        image_data = base64.b64encode(f.read()).decode()
    
    response = requests.post(
        "http://localhost:11434/api/chat",
        json={
            "model": model,
            "messages": [
                {
                    "role": "user",
                    "content": question,
                    "images": [image_data]
                }
            ],
            "stream": False
        }
    )
    
    return response.json()["message"]["content"]

result = chat_with_image("photo.jpg", "描述这张图片")
print(result)

工具调用

聊天接口支持 Function Calling：

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [
    {"role": "user", "content": "北京今天天气怎么样？"}
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "获取城市天气",
        "parameters": {
          "type": "object",
          "properties": {
            "city": {"type": "string", "description": "城市名称"}
          },
          "required": ["city"]
        }
      }
    }
  ]
}'

响应：

{
  "message": {
    "role": "assistant",
    "content": "",
    "tool_calls": [
      {
        "function": {
          "name": "get_weather",
          "arguments": {"city": "北京"}
        }
      }
    ]
  }
}

代码示例

Python 对话类

import requests

class ChatSession:
    def __init__(self, model="llama3.2", system=None):
        self.model = model
        self.messages = []
        if system:
            self.messages.append({"role": "system", "content": system})
    
    def send(self, content, stream=False):
        self.messages.append({"role": "user", "content": content})
        
        response = requests.post(
            "http://localhost:11434/api/chat",
            json={
                "model": self.model,
                "messages": self.messages,
                "stream": stream
            },
            stream=stream
        )
        
        if stream:
            full_response = ""
            for line in response.iter_lines():
                if line:
                    import json
                    data = json.loads(line)
                    if data.get("message", {}).get("content"):
                        text = data["message"]["content"]
                        full_response += text
                        yield text
            self.messages.append({"role": "assistant", "content": full_response})
        else:
            data = response.json()
            self.messages.append({"role": "assistant", "content": data["message"]["content"]})
            return data["message"]["content"]

# 使用
chat = ChatSession(system="你是一个友好的助手")

# 非流式
reply = chat.send("你好")
print(reply)

# 流式
for text in chat.send("写一首诗", stream=True):
    print(text, end="", flush=True)

JavaScript 对话类

class ChatSession {
    constructor(model = 'llama3.2', system = null) {
        this.model = model;
        this.messages = [];
        if (system) {
            this.messages.push({ role: 'system', content: system });
        }
    }
    
    async send(content) {
        this.messages.push({ role: 'user', content });
        
        const response = await fetch('http://localhost:11434/api/chat', {
            method: 'POST',
            headers: { 'Content-Type': 'application/json' },
            body: JSON.stringify({
                model: this.model,
                messages: this.messages,
                stream: false
            })
        });
        
        const data = await response.json();
        this.messages.push({ role: 'assistant', content: data.message.content });
        return data.message.content;
    }
    
    async *sendStream(content) {
        this.messages.push({ role: 'user', content });
        
        const response = await fetch('http://localhost:11434/api/chat', {
            method: 'POST',
            headers: { 'Content-Type': 'application/json' },
            body: JSON.stringify({
                model: this.model,
                messages: this.messages,
                stream: true
            })
        });
        
        const reader = response.body.getReader();
        const decoder = new TextDecoder();
        let fullResponse = '';
        
        while (true) {
            const { done, value } = await reader.read();
            if (done) break;
            
            const lines = decoder.decode(value).split('\n').filter(Boolean);
            for (const line of lines) {
                const data = JSON.parse(line);
                if (data.message?.content) {
                    fullResponse += data.message.content;
                    yield data.message.content;
                }
            }
        }
        
        this.messages.push({ role: 'assistant', content: fullResponse });
    }
}

// 使用
const chat = new ChatSession('llama3.2', '你是一个友好的助手');
const reply = await chat.send('你好');
console.log(reply);

Go 对话实现

package main

import (
    "bytes"
    "encoding/json"
    "fmt"
    "io"
    "net/http"
)

type Message struct {
    Role    string `json:"role"`
    Content string `json:"content"`
}

type ChatRequest struct {
    Model    string    `json:"model"`
    Messages []Message `json:"messages"`
    Stream   bool      `json:"stream"`
}

type ChatResponse struct {
    Message Message `json:"message"`
    Done    bool    `json:"done"`
}

type ChatSession struct {
    Model    string
    Messages []Message
}

func NewChatSession(model string, system string) *ChatSession {
    session := &ChatSession{
        Model:    model,
        Messages: []Message{},
    }
    if system != "" {
        session.Messages = append(session.Messages, Message{
            Role: "system", Content: system,
        })
    }
    return session
}

func (s *ChatSession) Send(content string) (string, error) {
    s.Messages = append(s.Messages, Message{
        Role: "user", Content: content,
    })
    
    req := ChatRequest{
        Model:    s.Model,
        Messages: s.Messages,
        Stream:   false,
    }
    
    body, _ := json.Marshal(req)
    resp, err := http.Post(
        "http://localhost:11434/api/chat",
        "application/json",
        bytes.NewReader(body),
    )
    if err != nil {
        return "", err
    }
    defer resp.Body.Close()
    
    data, _ := io.ReadAll(resp.Body)
    var result ChatResponse
    json.Unmarshal(data, &result)
    
    s.Messages = append(s.Messages, result.Message)
    return result.Message.Content, nil
}

func main() {
    chat := NewChatSession("llama3.2", "你是一个友好的助手")
    reply, _ := chat.Send("你好")
    fmt.Println(reply)
}

与生成接口的区别

特性	聊天接口	生成接口
输入格式	messages 数组	prompt 字符串
多轮对话	原生支持	需要 context
角色区分	支持	不支持
工具调用	支持	不支持
图片输入	支持	不支持
使用场景	对话应用	文本生成

上一章：生成接口 (POST /api/generate)

下一章：嵌入接口 (POST /api/embeddings)