citations
), search billing units (billed_units
), video in message content, image/video input tokens, and inference mode settings (thinking
). The response structure changes depending on whether it is a streaming output or not. You can refer to the request examples below to make a judgment:
cURL Request Example
Authorizations
token
Body
chat model
"gpt-4o-mini"
input messages
[
{
"content": "your are a helpful assistant",
"role": "system"
},
{ "content": "hi", "role": "user" }
]
reasoning parameters,only o1/o3-mini/claude-3.7-sonnet support
enable stream output
true
enable web search
true
auto retry times, default 0, means no retry
0
sampling temperature, controls the randomness of the output
1.3
maximum completion token length
1024
enable json output mode
true
tool functions
tool choice
"auto"
enable parallel tool calls
true
stop words
enable logprobs output
false
top logprobs
2
frequency penalty coefficient
0
presence penalty coefficient
0
top p sampling
1
random seed
number of completions
1
additional metadata
session ID
"123e4567-e89b-12d3-a456-426614174000"
Response
Successful response
Request ID
Unix timestamp of the request creation
Model used for this conversation turn
Response object type
The list of generated dialogues.
Token usage statistics
List of cited documents/links
System fingerprint