API 简介

认证方式

流式响应

基础用法

错误处理

OpenAI 兼容性

Anthropic 兼容性

API 端点详解

生成接口 (POST /api/generate)

聊天接口 (POST /api/chat)

嵌入接口 (POST /api/embeddings)

模型列表 (GET /api/tags)

运行中模型 (GET /api/ps)

模型详情 (POST /api/show)

创建模型 (POST /api/create)

复制模型 (POST /api/copy)

拉取模型 (POST /api/pull)

推送模型 (POST /api/push)

删除模型 (DELETE /api/delete)

获取版本 (GET /api/version)

Python 开发

Python SDK 安装与配置

Ollama Python 生成

Python 流式处理

Python 异步编程

JavaScript/TypeScript 开发

JavaScript SDK 安装与配置

JavaScript 生成与聊天

JavaScript 流式处理

TypeScript 类型定义

Go 语言开发

Go 客户端配置

Go 生成与聊天

Go 流式处理

Go 并发处理

高级应用

构建聊天机器人

构建 RAG 应用

多模态应用

构建代码助手

构建翻译工具

批量处理

性能优化

连接池管理

模型管理

并发与限流

模型自定义

缓存策略

模型量化

超时与重试

模型性能优化

部署与集成

与 LangChain 集成

最佳实践

故障排除

与 LlamaIndex 集成

更多资源

Web 应用集成

微服务架构

拉取模型 (POST /api/pull)

拉取模型接口用于从 Ollama 模型库下载模型到本地。

基本用法

curl http://localhost:11434/api/pull -d '{
  "name": "llama3.2"
}'

响应（流式）：

{"status":"pulling manifest"}
{"status":"downloading digest","digest":"abc123...","total":4661224676,"completed":0}
{"status":"downloading digest","digest":"abc123...","total":4661224676,"completed":1000000000}
...
{"status":"verifying sha256 digest"}
{"status":"writing manifest"}
{"status":"removing any unused layers"}
{"status":"success"}

请求参数

参数	类型	必需	说明
name	string	是	模型名称
insecure	bool	否	是否允许不安全连接
stream	bool	否	是否流式输出，默认 true

指定标签

curl http://localhost:11434/api/pull -d '{
  "name": "llama3.2:3b"
}'

不指定标签时默认使用 latest。

响应状态

状态	说明
pulling manifest	拉取模型清单
downloading digest	下载模型文件
verifying sha256 digest	验证文件完整性
writing manifest	写入模型清单
removing any unused layers	清理无用文件
success	下载成功

代码示例

Python

import requests

def pull_model(model_name):
    response = requests.post(
        "http://localhost:11434/api/pull",
        json={"name": model_name},
        stream=True
    )
    
    for line in response.iter_lines():
        if line:
            import json
            data = json.loads(line)
            status = data.get("status", "")
            
            if "downloading" in status:
                total = data.get("total", 0)
                completed = data.get("completed", 0)
                if total > 0:
                    percent = (completed / total) * 100
                    print(f"\r下载进度: {percent:.1f}%", end="", flush=True)
            else:
                print(status)
    
    print("\n下载完成")

pull_model("llama3.2")

带进度显示

def pull_with_progress(model_name):
    response = requests.post(
        "http://localhost:11434/api/pull",
        json={"name": model_name},
        stream=True
    )
    
    for line in response.iter_lines():
        if line:
            import json
            data = json.loads(line)
            
            if data.get("status") == "downloading digest":
                total = data.get("total", 0)
                completed = data.get("completed", 0)
                if total > 0:
                    bar_length = 40
                    filled = int(bar_length * completed / total)
                    bar = "█" * filled + "░" * (bar_length - filled)
                    percent = (completed / total) * 100
                    print(f"\r[{bar}] {percent:.1f}%", end="", flush=True)
            elif data.get("status") == "success":
                print("\n✓ 下载完成")
            else:
                print(data.get("status", ""))

pull_with_progress("mistral:7b")

JavaScript

async function pullModel(modelName) {
    const response = await fetch('http://localhost:11434/api/pull', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ name: modelName })
    });
    
    const reader = response.body.getReader();
    const decoder = new TextDecoder();
    
    while (true) {
        const { done, value } = await reader.read();
        if (done) break;
        
        const lines = decoder.decode(value).split('\n').filter(Boolean);
        for (const line of lines) {
            const data = JSON.parse(line);
            
            if (data.status === 'downloading digest') {
                const total = data.total || 0;
                const completed = data.completed || 0;
                if (total > 0) {
                    const percent = ((completed / total) * 100).toFixed(1);
                    process.stdout.write(`\r下载进度: ${percent}%`);
                }
            } else {
                console.log(data.status);
            }
        }
    }
    
    console.log('\n下载完成');
}

await pullModel('llama3.2');

Go

package main

import (
    "bytes"
    "encoding/json"
    "fmt"
    "io"
    "net/http"
    "strings"
)

type PullRequest struct {
    Name string `json:"name"`
}

func pullModel(modelName string) error {
    req := PullRequest{Name: modelName}
    body, _ := json.Marshal(req)
    
    resp, err := http.Post(
        "http://localhost:11434/api/pull",
        "application/json",
        bytes.NewReader(body),
    )
    if err != nil {
        return err
    }
    defer resp.Body.Close()
    
    decoder := json.NewDecoder(resp.Body)
    for {
        var data map[string]interface{}
        if err := decoder.Decode(&data); err != nil {
            if err == io.EOF {
                break
            }
            continue
        }
        
        status, _ := data["status"].(string)
        if strings.Contains(status, "downloading") {
            total, _ := data["total"].(float64)
            completed, _ := data["completed"].(float64)
            if total > 0 {
                percent := (completed / total) * 100
                fmt.Printf("\r下载进度: %.1f%%", percent)
            }
        } else {
            fmt.Println(status)
        }
    }
    
    fmt.Println("\n下载完成")
    return nil
}

func main() {
    pullModel("llama3.2")
}

实际应用

批量拉取模型

def pull_models(model_list):
    for model in model_list:
        print(f"\n正在拉取: {model}")
        pull_model(model)

pull_models(["llama3.2", "mistral:7b", "codellama"])

检查并拉取

def ensure_model(model_name):
    models = requests.get("http://localhost:11434/api/tags").json()["models"]
    
    for model in models:
        if model["name"].startswith(model_name):
            print(f"模型 {model_name} 已存在")
            return
    
    print(f"模型 {model_name} 不存在，正在拉取...")
    pull_model(model_name)

ensure_model("llama3.2")

常用模型

模型	大小	说明
llama3.2	2-4 GB	最新 Llama 小模型
llama3.1:8b	4.7 GB	Llama 3.1 8B
mistral:7b	4.1 GB	Mistral 7B
codellama	4-7 GB	代码专用
llava	4.5 GB	多模态模型
nomic-embed-text	274 MB	嵌入模型

注意事项

网络要求：需要能访问 ollama.ai
磁盘空间：确保有足够空间
下载时间：大模型可能需要较长时间
断点续传：中断后重新拉取会继续

上一章：复制模型 (POST /api/copy)

下一章：推送模型 (POST /api/push)