极客智坊支持实时语音对话功能,支持在对话请求中将模型设置为支持语音对话的模型即可,你可以在模型广场筛选支持实时语音对话的模型:

OpenAI

以下是 OpenAI 平台实时语音对话示例代码:

curl "https://geekai.co/api/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $GEEKAI_API_KEY" \
-d '{
    "model": "gpt-4o-mini-audio-preview",
    "modalities": ["text", "audio"],
    "audio": { "voice": "alloy", "format": "wav" },
    "messages": [
        {
            "role": "user",
            "content": [
                { "type": "text", "text": "What is in this recording?" },
                { 
                    "type": "input_audio", 
                    "input_audio": { 
                        "data": "<url or base64 bytes here>", 
                        "format": "wav" 
                    }
                }
            ]
        }
    ]
}'

modalities 传入 textaudio 表示返回文本、语音输出,如果需要文本输出传入 text 即可。audio 用于设置语音输出的音色和格式,对于 OpenAI 平台而言,具体设置和文本转语音的参数一致。

国内平台

OpenAI 输出语音音色为外国人音色(如果输出设置为文本就无所谓),对中文场景不友好,此时你也可以选择国内实时语音对话模型,如智谱清言和通义千问,智谱清言无需 modalitiesaudio 参数配置,更简单明了,通义千问的 modalities 配置和 OpenAI 一致,音色(audiovoice 字段)上支持中文音色:

  • Cherry(不支持开源模型)
  • Serena(不支持开源模型)
  • Ethan
  • Chelsie

音频输出格式(audioformat 字段)仅支持 wav

以下是通义千问语音对话示例:

curl
curl "https://geekai.co/api/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $GEEKAI_API_KEY" \
-d '{
    "model": "qwen-omni-turbo",
    "modalities": ["text", "audio"],
    "audio": { "voice": "Cherry", "format": "wav" },
    "messages": [
        {
            "role": "user",
            "content": [
                { "type": "text", "text": "音频中包含什么内容?" },
                { 
                    "type": "input_audio", 
                    "input_audio": { 
                        "data": "<url or base64 bytes here>", 
                        "format": "wav" 
                    }
                }
            ]
        }
    ]
}'