外观
Chats
约 1621 字大约 5 分钟
2025-04-25
和大模型对话
请求方式: POST
请求地址: /chat/completions
请求参数:
model
: 必选 , 模型名, 对应接口/v1/models
中的 id, 例如"model": "deepseek-r1:32b-16k"
messages
: 必选 , 推理请求消息结构, list类型,0KB<messages内容包含的字符数<4MB,支持中英文。 tokenizer之后的token数量小于或等于maxInputTokenLen
,maxSeqLen-1
,max_position_embeddings
和1MB
之间的最小值. 其中,max_position_embeddings从权重文件config.json
中获取,其他相关参数从配置文件中获取role
: 必选 , 消息角色, 可选值为"system"
或"user"
或"assistant"
提示
deepseek-r1 模型只支持 "user" 类型的消息
content
: 必选 , 消息内容, 字符串类型
stream
: 可选, 是否为流式推理, 默认值为false
, [true: 流式推理, false: 文本推理]presence_penalty
: 可选, 存在惩罚介于-2.0和2.0之间,它影响模型如何根据到目前为止是否出现在文本中来惩罚新token. 正值将通过惩罚已经使用的词,增加模型谈论新主题的可能性, 负值将通过惩罚已经使用的词,减少模型谈论新主题的可能性. 默认值为0.0
, 可选范围[-2.0, 2.0]frequency_penalty
: 可选, 频率惩罚介于-2.0和2.0之间,它影响模型如何根据token的频率来惩罚新token. 正值将通过惩罚常用词,增加模型谈论新主题的可能性, 负值将通过惩罚常用词,减少模型谈论新主题的可能性. 默认值为0.0
, 可选范围[-2.0, 2.0]repetition_penalty
: 可选, 重复惩罚是一种技术,用于减少在文本生成过程中出现重复片段的概率. 它对之前已经生成的文本进行惩罚,使得模型更倾向于选择新的, 不重复的内容. 默认值为1.0
, 可选范围(0.0, 2.0]temperature
: 可选, 温度参数介于0.0和1.0之间,它控制模型生成的文本的创造性. 较低的温度会导致更可预测的文本,而较高的温度会导致更随机的文本. 默认值为1.0
, 可选范围[0.0, 2.0], 0.0表示完全确定性,1.0表示完全随机性, deepseek-r1 模型官方建议设置为0.6
top_p
: 可选, 控制模型生成过程中考虑的词汇范围,使用累计概率选择候选词,直到累计概率超过给定的阈值. 该参数也可以控制生成结果的多样性,它基于累积概率选择候选词,直到累计概率超过给定的阈值为止. 默认值为1.0
, 可选范围(0.0, 1.0]top_k
: 可选, 控制模型生成过程中考虑的词汇范围,只从概率最高的k个候选词中选择. int32类型,取值范围[0, 2147483647],字段未设置时,默认值由后端模型确定seed
: 可选, 用于指定推理过程的随机种子,相同的seed值可以确保推理结果的可重现性,不同的seed值会提升推理结果的随机性. uint_64类型,取值范围[0, 18446744073709551615],不传递该参数,系统会产生一个随机seed值stop
: 可选, 停止推理的token id列表. 输出结果默认不包含停止推理列表中的token id. List[int32]类型,超出int32的元素将会被忽略,默认值null
include_stop_str_in_output
: 可选, 决定是否在生成的推理文本中包含停止字符串. bool类型,默认值false
。 true: 包含停止字符串. false: 不包含停止字符串. 不传入stop或stop_token_ids时,此字段会被忽略.skip_special_tokens
: 可选, 指定在推理生成的文本中是否跳过特殊tokens. bool类型,默认值true
。 true: 跳过特殊token. false: 不跳过特殊tokenignore_eos
: 可选, 指定在推理文本生成过程中是否忽略eos_token结束符. bool类型,默认值false
。 true: 忽略EOS token. false: 不忽略EOS tokenmax_length
: 可选, 允许推理生成的最大token个数. 实际产生的token数量同时受到配置文件maxIterTimes参数影响,推理token个数小于或等于min(maxIterTimes, max_tokens). int32类型,取值范围[0, 2147483647],默认值为模型设置的maxIterTimes等参数
当所选模型为 qwen3:235b
时, 可通过增加如下参数来控制模型不深度思考:
"chat_template_kwargs": {"enable_thinking": false}
添加这个参数后, 模型将不会进行深度思考, 直接给出回答。 如果需要启用深度思考, 则可以将 enable_thinking
设置为 true
或者不设置该参数。
请求和响应示例 - 非流式
curl -X POST "https://uni-api.cstcloud.cn/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer {Token}" \
-d '{
"model": "deepseek-r1:32b-16k",
"messages": [
{
"role": "user",
"content": "Hello!"
}
]
}'
返回结果
{
"id": "chatcmpl-702",
"object": "chat.completion",
"created": 1739622167,
"model": "deepseek-r1:32b-16k",
"system_fingerprint": "fp_ollama",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "<think>\n\n\</think>\n\nGreetings! I'm DeepSeek-R1, an artificial intelligence assistant created by DeepSeek. I'm at your service and would be delighted to assist you with any inquiries or tasks you may have."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 6,
"completion_tokens": 44,
"total_tokens": 50
}
}
请求和响应示例 - 流式
curl -X POST "https://uni-api.cstcloud.cn/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer {Token}" \
-d '{
"model": "deepseek-r1:32b-16k",
"messages": [
{
"role": "user",
"content": "Hello!"
}
],
"stream": true
}'
返回结果 (server-send-event)
data: {"id": "chatcmpl-949","object": "chat.completion.chunk","created": 1739622469,"model": "deepseek-r1:32b-16k","system_fingerprint": "fp_ollama","choices": [{"index": 0,"delta": {"role": "assistant","content": "<think>"},"finish_reason": null}]}
data: {"id": "chatcmpl-949","object": "chat.completion.chunk","created": 1739622469,"model": "deepseek-r1:32b-16k","system_fingerprint": "fp_ollama","choices": [{"index": 0,"delta": {"role": "assistant","content": "\n\n"},"finish_reason": null}]}
...
data: {"id": "chatcmpl-949","object": "chat.completion.chunk","created": 1739622470,"model": "deepseek-r1:32b-16k","system_fingerprint": "fp_ollama","choices": [],"usage": {"prompt_tokens": 6,"completion_tokens": 44,"total_tokens": 50}}
data: [DONE]
特别提示
当所选模型为 spark-70b-x1
时,流式返回格式如下:
data:{"choices":[{"delta":{"reasoning_content":"\n\n\n","role":"assistant"},"index":1}],"created":1745398469729,"id":"gux5i4h3@spark-gpt-v30@turing-spark@16","model":"xf_IFLYLLM-Spark-X1-ProX@V4.4.2R2.4.0.1_kvxiezai","object":"chat.completion.chunk"}
data:{"choices":[{"delta":{"reasoning_content":"用户发送","role":"assistant"},"index":2}],"created":1745398469775,"id":"gux5i4h3@spark-gpt-v30@turing-spark@16","model":"xf_IFLYLLM-Spark-X1-ProX@V4.4.2R2.4.0.1_kvxiezai","object":"chat.completion.chunk"}
......
data:{"choices":[{"delta":{"reasoning_content":"连贯。\n","role":"assistant"},"index":74}],"created":1745398473762,"id":"gux5i4h3@spark-gpt-v30@turing-spark@16","model":"xf_IFLYLLM-Spark-X1-ProX@V4.4.2R2.4.0.1_kvxiezai","object":"chat.completion.chunk"}
data:{"choices":[{"delta":{"content":"\n\nHello","role":"assistant"},"index":75}],"created":1745398473822,"id":"gux5i4h3@spark-gpt-v30@turing-spark@16","model":"xf_IFLYLLM-Spark-X1-ProX@V4.4.2R2.4.0.1_kvxiezai","object":"chat.completion.chunk"}
data:{"choices":[{"delta":{"content":"! How","role":"assistant"},"index":76}],"created":1745398473878,"id":"gux5i4h3@spark-gpt-v30@turing-spark@16","model":"xf_IFLYLLM-Spark-X1-ProX@V4.4.2R2.4.0.1_kvxiezai","object":"chat.completion.chunk"}
......
data:{"choices":[{"delta":{"content":" here!","role":"assistant"},"index":91}],"created":1745398474747,"id":"gux5i4h3@spark-gpt-v30@turing-spark@16","model":"xf_IFLYLLM-Spark-X1-ProX@V4.4.2R2.4.0.1_kvxiezai","object":"chat.completion.chunk"}
data:{"choices":[{"delta":{"role":"assistant"},"index":92}],"created":1745398474776,"id":"gux5i4h3@spark-gpt-v30@turing-spark@16","model":"xf_IFLYLLM-Spark-X1-ProX@V4.4.2R2.4.0.1_kvxiezai","object":"chat.completion.chunk"}
data:{"choices":[{"delta":{},"finish_reason":"stop","index":92}],"created":1745398474,"id":"gux5i4h3@spark-gpt-v30@turing-spark@16","object":"chat.completion.chunk","usage":{"completion_tokens":35,"prompt_tokens":64,"total_tokens":248}}