参数	类型	说明
model	string	模型名称，必需
stream	bool	是否流式返回，默认 true
format	string	输出格式，可选 "json"
options	object	模型参数
keep_alive	string	模型在内存中保留的时间

model 参数

指定要使用的模型：

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2"
}'

可以是本地已有的模型，也可以是模型库中的模型（会自动下载）。

format 参数

让模型输出 JSON 格式：

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "format": "json",
  "prompt": "生成一个用户信息，包含姓名、年龄、邮箱"
}'

输出会是合法的 JSON：

{
  "name": "张三",
  "age": 25,
  "email": "zhangsan@example.com"
}

options 参数

传递模型运行参数：

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "写一首诗",
  "options": {
    "temperature": 0.7,
    "num_ctx": 4096,
    "top_p": 0.9
  }
}'

常用 options：

参数	说明	默认值
temperature	随机性	1.0
num_ctx	上下文长度	2048
num_predict	最大生成 token 数	-1
top_p	核采样	0.9
top_k	候选词数	40
stop	停止词	[]
repeat_penalty	重复惩罚	1.1

keep_alive 参数

控制模型在内存中保留的时间：

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "你好",
  "keep_alive": "10m"
}'

可选值：

值	说明
"5m"	保留 5 分钟
"10m"	保留 10 分钟
"1h"	保留 1 小时
"-1"	永久保留
"0"	立即卸载

生成接口参数

基本请求

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "为什么天空是蓝色的？"
}'

完整参数示例

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "用 Python 写一个快速排序",
  "stream": false,
  "format": "json",
  "options": {
    "temperature": 0.3,
    "num_ctx": 4096,
    "stop": ["```", "---"]
  },
  "system": "你是一个 Python 编程专家",
  "template": "问题：{{ .Prompt }}\n\n回答：",
  "context": [1, 2, 3],
  "raw": false,
  "keep_alive": "5m"
}'

context 参数

用于继续之前的对话：

# 第一次请求，返回 context
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "我叫小明"
}'

# 响应包含 context
# {"context": [1, 2, 3, ...], ...}

# 第二次请求，传入 context
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "我叫什么名字？",
  "context": [1, 2, 3, ...]
}'

raw 参数

跳过模板处理，直接传入原始提示：

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "自定义格式的提示",
  "raw": true
}'

聊天接口参数

基本请求

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [
    {"role": "user", "content": "你好"}
  ]
}'

多轮对话

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [
    {"role": "system", "content": "你是一个友好的助手"},
    {"role": "user", "content": "我叫小明"},
    {"role": "assistant", "content": "你好小明！"},
    {"role": "user", "content": "我叫什么名字？"}
  ]
}'

消息角色

角色	说明
system	系统提示，定义模型行为
user	用户消息
assistant	模型回复

带图片的消息

curl http://localhost:11434/api/chat -d '{
  "model": "llava",
  "messages": [
    {
      "role": "user",
      "content": "这张图片里有什么？",
      "images": ["iVBORw0KGgoAAAANSUhEUgAA..."]
    }
  ]
}'

图片需要 Base64 编码。

嵌入接口参数

curl http://localhost:11434/api/embeddings -d '{
  "model": "llama3.2",
  "prompt": "这是一段需要向量化的文本"
}'

{
  "embedding": [0.1, 0.2, -0.3, ...]
}

嵌入向量可用于：

文本相似度计算
语义搜索
聚类分析
RAG 应用

原始 API 调用示例

Python

import requests

def generate(prompt, model="llama3.2"):
    response = requests.post(
        "http://localhost:11434/api/generate",
        json={
            "model": model,
            "prompt": prompt,
            "stream": False
        }
    )
    return response.json()["response"]

def chat(messages, model="llama3.2"):
    response = requests.post(
        "http://localhost:11434/api/chat",
        json={
            "model": model,
            "messages": messages,
            "stream": False
        }
    )
    return response.json()["message"]["content"]

# 使用
print(generate("写一首诗"))
print(chat([{"role": "user", "content": "你好"}]))

JavaScript

async function generate(prompt, model = 'llama3.2') {
    const response = await fetch('http://localhost:11434/api/generate', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({
            model,
            prompt,
            stream: false
        })
    });
    const data = await response.json();
    return data.response;
}

async function chat(messages, model = 'llama3.2') {
    const response = await fetch('http://localhost:11434/api/chat', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({
            model,
            messages,
            stream: false
        })
    });
    const data = await response.json();
    return data.message.content;
}

Go

package main

import (
    "bytes"
    "encoding/json"
    "fmt"
    "io"
    "net/http"
)

type GenerateRequest struct {
    Model  string `json:"model"`
    Prompt string `json:"prompt"`
    Stream bool   `json:"stream"`
}

type GenerateResponse struct {
    Response string `json:"response"`
}

func generate(prompt string) (string, error) {
    req := GenerateRequest{
        Model:  "llama3.2",
        Prompt: prompt,
        Stream: false,
    }
    
    body, _ := json.Marshal(req)
    resp, err := http.Post(
        "http://localhost:11434/api/generate",
        "application/json",
        bytes.NewReader(body),
    )
    if err != nil {
        return "", err
    }
    defer resp.Body.Close()
    
    data, _ := io.ReadAll(resp.Body)
    var result GenerateResponse
    json.Unmarshal(data, &result)
    
    return result.Response, nil
}

func main() {
    result, _ := generate("写一首诗")
    fmt.Println(result)
}

上一章：流式响应

下一章：错误处理