今天繼續來測試 OpenAI API 的常用參數, 本系列前兩篇測試文章參考 :
本系列全部文章索引參考 :
本篇主要測試 OpenAI API 的 stream 參數.
9. stream 參數 :
此參數用來控制串流回應模式是否開啟, 其值為一個布林值 (True/False) :
- False (預設) :
關閉串流回應, API 會等整個回應稱成完畢後才一次傳回結果, 適合短回應或非即時應用. - True :
開啟串流回應模式, 生成的內容會逐步傳回, 適合聊天機器人或即時應用.
在測試之前先匯入 OpenAI 類別並建立 OpenAI 物件 :
>>> from openai import OpenAI
>>> api_key='填入 API key'
>>> client=OpenAI(api_key=api_key)
>>> type(client)
<class 'openai.OpenAI'>
例如預設 stream=False 串流關閉時 :
>>> chat_completion=client.chat.completions.create(
messages=[{'role': 'user', 'content': '嗨'}],
model='gpt-3.5-turbo'
)
>>> print(chat_completion.choices[0].message.content)
您好!有什么可以帮助您的吗?
如果傳入 stream=True 的話, API 會傳回一個 Stream 類型的迭代器物件, 用迴圈去迭代它就會依序傳回包含模型生成的字的 ChatComletionChunk 物件.
為了減少生成之回應以利觀察物件內容, 下面先用 max_tokens 限制只生成 2 個 token, 模型將依序傳回最前面的兩個字 '您' 與 '好' 的 ChatComletionChunk 物件 :
>>> chunks=client.chat.completions.create(
messages=[{'role': 'user', 'content': '嗨?'}],
model='gpt-3.5-turbo',
max_tokens=2,
stream=True
)
>>> type(chunks)
<class 'openai.Stream'>
為了能格式化物件內容之顯示, 以下使用第三方 rich 模組之 print() 函式來顯示物件內容, 匯入時將其取名為 pprint 避免汙染 Python 內建函式的命名空間 :
>>> from rich import print as pprint
關於 rich 模組用法參考 :
接下來迭代 Stream 生成器物件並用 pprint() 顯示所生成的每個 ChatComletionChunk 物件內容 :
>>> for chunk in chunks:
print(type(chunk))
pprint(chunk)
<class 'openai.types.chat.chat_completion_chunk.ChatCompletionChunk'>
ChatCompletionChunk(
id='chatcmpl-B9TA5q5SiomPcrOddT9nd5QG1JE9i',
choices=[
Choice(
delta=ChoiceDelta(
content='', => 開始物件內容為空字串
function_call=None,
refusal=None,
role='assistant', => 標示角色為 AI 助理
tool_calls=None
),
finish_reason=None,
index=0, => 索引標示哪一組回應
logprobs=None
)
],
created=1741596749,
model='gpt-3.5-turbo-0125',
object='chat.completion.chunk',
service_tier='default',
system_fingerprint=None,
usage=None
)
<class 'openai.types.chat.chat_completion_chunk.ChatCompletionChunk'>
ChatCompletionChunk(
id='chatcmpl-B9TA5q5SiomPcrOddT9nd5QG1JE9i',
choices=[
Choice(
delta=ChoiceDelta(
content='你',
function_call=None,
refusal=None,
role=None,
tool_calls=None
),
finish_reason=None,
index=0,
logprobs=None
)
],
created=1741596749,
model='gpt-3.5-turbo-0125',
object='chat.completion.chunk',
service_tier='default',
system_fingerprint=None,
usage=None
)
<class 'openai.types.chat.chat_completion_chunk.ChatCompletionChunk'>
ChatCompletionChunk(
id='chatcmpl-B9TA5q5SiomPcrOddT9nd5QG1JE9i',
choices=[
Choice(
delta=ChoiceDelta(
content='好',
function_call=None,
refusal=None,
role=None,
tool_calls=None
),
finish_reason=None,
index=0,
logprobs=None
)
],
created=1741596749,
model='gpt-3.5-turbo-0125',
object='chat.completion.chunk',
service_tier='default',
system_fingerprint=None,
usage=None
)
<class 'openai.types.chat.chat_completion_chunk.ChatCompletionChunk'>
ChatCompletionChunk(
id='chatcmpl-B9TA5q5SiomPcrOddT9nd5QG1JE9i',
choices=[
Choice(
delta=ChoiceDelta(
content=None, => 結束物件無內容
function_call=None,
refusal=None,
role=None,
tool_calls=None
),
finish_reason='length', => 因為長度限制而結束
index=0,
logprobs=None
)
],
created=1741596749,
model='gpt-3.5-turbo-0125',
object='chat.completion.chunk',
service_tier='default',
system_fingerprint=None,
usage=None
)
可見串流總共傳回 4 個 ChatCompletionChunk 物件, 生成的 token 內容會放在 ChoiceDelta 物件的 content 屬性中, 其中開始與結束物件的 content 分別為空字串與 None, 前者用來標示角色為 AI 助理, 後者用來標示結束原因, 生成的回應則是放在兩者之間.
與預設 stream=False 的回應比較, 可知串流回應是放在 Choices 物件的 delta 屬性內 (非串流則是在 message 屬性內), 因此可以在迭代 Stream 生成器物件時串接 chunk.choices[0].message.content 得到完整的回應, 例如 :
>>> chunks=client.chat.completions.create(
messages=[{'role': 'user', 'content': '嗨'}],
model='gpt-3.5-turbo',
stream=True
)
>>> for chunk in chunks:
print(chunk.choices[0].delta.content, end='')
你好!有什么可以帮助你的吗?None
注意, 此處 print() 要傳入 end='' 參數以便每個 token 能串接 (不跳行), 但是結尾物件的 None 也會顯示出來, 解決辦法是將串流內容與空字串做 or 運算即可 :
>>> chunks=client.chat.completions.create(
messages=[{'role': 'user', 'content': '嗨'}],
model='gpt-3.5-turbo',
stream=True
)
>>> for chunk in chunks:
print(chunk.choices[0].delta.content or '', end='')
你好!有什么可以帮助你的吗?
也可以將上面程式碼寫成函式, 用 yield 語法將串流結果變成生成器 (generator) :
>>> def ask_gpt_s(prompt, model='gpt-4o-mini'):
replies=client.chat.completions.create(
messages=[{"role": "user", "content": prompt}],
model=model,
stream=True
)
for reply in replies:
yield reply.choices[0].delta.content or ''
只要迭代生成器並使用 print() 串接串流片段即可得到完整回應 :
>>> for reply in ask_gpt_s('嗨', 'gpt-3.5-turbo'):
print(reply, end='')
你好!有什么可以帮助你的吗?
如果傳入 n 參數指定生成多組串流回應時要利用 index 來區別故組回應, 例如 :
>>> chunks=client.chat.completions.create(
messages=[{'role': 'user', 'content': '嗨?'}],
model='gpt-3.5-turbo',
max_tokens=4,
stream=True,
n=2
)
此例傳入參數 n=2 要求生成兩組回應, 但為了縮減輸出長度也指定了參數 max_tokens=4 以利觀察結果 :
>>> for chunk in chunks:
pprint(chunk)
ChatCompletionChunk(
id='chatcmpl-B9sD8eQhh6vVWt84oYryrR26Z23vC',
choices=[
Choice(
delta=ChoiceDelta(
content='',
function_call=None,
refusal=None,
role='assistant',
tool_calls=None
),
finish_reason=None,
index=0,
logprobs=None
)
],
created=1741693038,
model='gpt-3.5-turbo-0125',
object='chat.completion.chunk',
service_tier='default',
system_fingerprint=None,
usage=None
)
ChatCompletionChunk(
id='chatcmpl-B9sD8eQhh6vVWt84oYryrR26Z23vC',
choices=[
Choice(
delta=ChoiceDelta(
content='你',
function_call=None,
refusal=None,
role=None,
tool_calls=None
),
finish_reason=None,
index=0,
logprobs=None
)
],
created=1741693038,
model='gpt-3.5-turbo-0125',
object='chat.completion.chunk',
service_tier='default',
system_fingerprint=None,
usage=None
)
ChatCompletionChunk(
id='chatcmpl-B9sD8eQhh6vVWt84oYryrR26Z23vC',
choices=[
Choice(
delta=ChoiceDelta(
content='',
function_call=None,
refusal=None,
role='assistant',
tool_calls=None
),
finish_reason=None,
index=1,
logprobs=None
)
],
created=1741693038,
model='gpt-3.5-turbo-0125',
object='chat.completion.chunk',
service_tier='default',
system_fingerprint=None,
usage=None
)
ChatCompletionChunk(
id='chatcmpl-B9sD8eQhh6vVWt84oYryrR26Z23vC',
choices=[
Choice(
delta=ChoiceDelta(
content='您',
function_call=None,
refusal=None,
role=None,
tool_calls=None
),
finish_reason=None,
index=1,
logprobs=None
)
],
created=1741693038,
model='gpt-3.5-turbo-0125',
object='chat.completion.chunk',
service_tier='default',
system_fingerprint=None,
usage=None
)
ChatCompletionChunk(
id='chatcmpl-B9sD8eQhh6vVWt84oYryrR26Z23vC',
choices=[
Choice(
delta=ChoiceDelta(
content='好',
function_call=None,
refusal=None,
role=None,
tool_calls=None
),
finish_reason=None,
index=0,
logprobs=None
)
],
created=1741693038,
model='gpt-3.5-turbo-0125',
object='chat.completion.chunk',
service_tier='default',
system_fingerprint=None,
usage=None
)
ChatCompletionChunk(
id='chatcmpl-B9sD8eQhh6vVWt84oYryrR26Z23vC',
choices=[
Choice(
delta=ChoiceDelta(
content='好',
function_call=None,
refusal=None,
role=None,
tool_calls=None
),
finish_reason=None,
index=1,
logprobs=None
)
],
created=1741693038,
model='gpt-3.5-turbo-0125',
object='chat.completion.chunk',
service_tier='default',
system_fingerprint=None,
usage=None
)
ChatCompletionChunk(
id='chatcmpl-B9sD8eQhh6vVWt84oYryrR26Z23vC',
choices=[
Choice(
delta=ChoiceDelta(
content='!',
function_call=None,
refusal=None,
role=None,
tool_calls=None
),
finish_reason=None,
index=0,
logprobs=None
)
],
created=1741693038,
model='gpt-3.5-turbo-0125',
object='chat.completion.chunk',
service_tier='default',
system_fingerprint=None,
usage=None
)
ChatCompletionChunk(
id='chatcmpl-B9sD8eQhh6vVWt84oYryrR26Z23vC',
choices=[
Choice(
delta=ChoiceDelta(
content='!',
function_call=None,
refusal=None,
role=None,
tool_calls=None
),
finish_reason=None,
index=1,
logprobs=None
)
],
created=1741693038,
model='gpt-3.5-turbo-0125',
object='chat.completion.chunk',
service_tier='default',
system_fingerprint=None,
usage=None
)
ChatCompletionChunk(
id='chatcmpl-B9sD8eQhh6vVWt84oYryrR26Z23vC',
choices=[
Choice(
delta=ChoiceDelta(
content='有',
function_call=None,
refusal=None,
role=None,
tool_calls=None
),
finish_reason=None,
index=0,
logprobs=None
)
],
created=1741693038,
model='gpt-3.5-turbo-0125',
object='chat.completion.chunk',
service_tier='default',
system_fingerprint=None,
usage=None
)
ChatCompletionChunk(
id='chatcmpl-B9sD8eQhh6vVWt84oYryrR26Z23vC',
choices=[
Choice(
delta=ChoiceDelta(
content='有',
function_call=None,
refusal=None,
role=None,
tool_calls=None
),
finish_reason=None,
index=1,
logprobs=None
)
],
created=1741693038,
model='gpt-3.5-turbo-0125',
object='chat.completion.chunk',
service_tier='default',
system_fingerprint=None,
usage=None
)
ChatCompletionChunk(
id='chatcmpl-B9sD8eQhh6vVWt84oYryrR26Z23vC',
choices=[
Choice(
delta=ChoiceDelta(
content=None,
function_call=None,
refusal=None,
role=None,
tool_calls=None
),
finish_reason='length',
index=0,
logprobs=None
)
],
created=1741693038,
model='gpt-3.5-turbo-0125',
object='chat.completion.chunk',
service_tier='default',
system_fingerprint=None,
usage=None
)
ChatCompletionChunk(
id='chatcmpl-B9sD8eQhh6vVWt84oYryrR26Z23vC',
choices=[
Choice(
delta=ChoiceDelta(
content=None,
function_call=None,
refusal=None,
role=None,
tool_calls=None
),
finish_reason='length',
index=1,
logprobs=None
)
],
created=1741693038,
model='gpt-3.5-turbo-0125',
object='chat.completion.chunk',
service_tier='default',
system_fingerprint=None,
usage=None
)
可見串流中有 index=0 與 index=1 兩組回應, 分別是 '你好! ...' 與 '您好! ...'.
若要將串流片段內容組合成完整回應, 可以先建立一個空字典來儲存每一組回應, 其鍵為組編號 (即 index), 其值為儲存生成 token 的串列 :
>>> responses={i: [] for i in range(2)}
>>> responses
{0: [], 1: []}
然後傳入 n=2 重新提出請求 :
>>> chunks=client.chat.completions.create(
messages=[{'role': 'user', 'content': '嗨?'}],
model='gpt-3.5-turbo',
stream=True,
n=2
)
然後在迭代串流物件時, 依據 index 將每個片段存入各組回應的串列中 :
>>> for chunk in chunks:
for choice in chunk.choices:
index=choice.index
content=choice.delta.content or '' # or 空字串去除 None
responses[index].append(content)
檢視回應字典 responses 內容 :
>>> responses
{0: ['', '你', '好', '!', '有', '什', '么', '可以', '帮', '助', '你', '的', '吗', '?', ''], 1: ['', '您', '好', '!', ' ', '有', '什', '么', '我', '可以', '帮', '助', '您', '的', '吗', '?', '']}
可見這兩組回應的片段都分別存入 index 鍵的值串列中, 利用串列生成式將這些片段組合成完整的語句 :
>>> responses={i: ''.join(responses[i]) for i in range(2)}
>>> responses
{0: '你好!有什么可以帮助你的吗?', 1: '您好! 有什么我可以帮助您的吗?'}
用迴圈列印回應字典內容 :
>>> for i, response in responses.items():
print(f"回應 {i+1}:{response}")
回應 1:你好!有什么可以帮助你的吗?
回應 2:您好! 有什么我可以帮助您的吗?
這樣便取得 2 組回應內容了.
沒有留言 :
張貼留言