2024年9月10日 星期二

Groq Python API 圖片測試

除了文字聊天外, Groq 的 llava-v1.5-7b-4096-preview 模型也具有圖像識別與描述能力, 可以指定網路上的圖片網址要求模型描述此圖片; 也可以透過 base64 套件將本機圖片編碼後上傳並要求對圖片加以描述. 

本系列之前的文章參考 :


本篇測試參考官方教學文件 :



1. 指定圖片網址 :

我在網路上隨便找了一張車牌圖片 : 




先從隱藏檔 .env 讀出 Grok 的 API key :

>>> import os   
>>> from dotenv import load_dotenv   
>>> load_dotenv()   
True
>>> groq_api_key=os.environ.get('GROQ_API_KEY')    

然後從 groq 匯入 Client 類別, 呼叫其建構式 Client() 並傳入 api_key 參數 : 

>>> from groq import Client  
>>> client=Client(api_key=groq_api_key)   

使用 llava-v1.5-7b-4096-preview 模型向其詢問圖片內容是甚麼 (What's in this image?), 與文字聊天一樣是呼叫 client.chat.completions.create(), 但 messages 裡面的 content 屬性不是傳入字串, 而是一個字典串列, prompt 放在 type 為 'text' 的屬性中; 而圖片網址則是放在 type 為 'image_url' 的屬性中, 語法架構如下 : 

prompt='What is in this image?'
image_url='圖片網址'
content=[{'type': 'text', 'text': prompt}, 
               {'type': 'image_url', 'image_url': {'url': image_url}}]
chat_completion=client.chat.completions.create(
     messages=[{'role': 'user', 'content': content}],
     model='llava-v1.5-7b-4096-preview'
     )

例如 : 

>>> prompt='What is in this image?'    
>>> image_url='https://maxaiot.com/image/alpr/alprBgImage.png'      
>>> content=[{'type': 'text', 'text': prompt},  
               {'type': 'image_url', 'image_url': {'url': image_url}}]   
>>> content    
[{'type': 'text', 'text': 'What is in this image?'}, {'type': 'image_url', 'image_url': {'url': 'https://maxaiot.com/image/alpr/alprBgImage.png'}}]   
>>> chat_completion=client.chat.completions.create(
     messages=[{'role': 'user', 'content': content}],
     model='llava-v1.5-7b-4096-preview'
     )    

檢視回應內容, 有正確辨識出此為白色車牌, 但車牌號碼卻辨識錯誤為 BC-55  : 

>>> chat_completion.choices[0].message.content   
'The image features an old looking white license plate that starts with the initials BC. The plate number, BC-55, is displayed on the plate. It creates an atmosphere reminiscent of a vintage photo.'

由於隨機性, 再次執行會得到不同結果 : 

>>> chat_completion=client.chat.completions.create(
     messages=[{'role': 'user', 'content': content}],
     model='llava-v1.5-7b-4096-preview'
     ) 
>>> chat_completion.choices[0].message.content    
'The image features a closeup view of a license plate, with each letter and number contrasting to the background. The license plate comprises the letters "BBC" and the digits "C1 S-5657". The plate has a black-and-white appearance and frames the entire image.'

辨識結果也是不完全正確, 可見此模型效果目前不是很好. 改用另一張圖也是一樣 :





>>> image_url='https://img.ltn.com.tw/Upload/auto/page/2019/11/03/191103-13921-1-apGNK.png'   
>>> content=[{'type': 'text', 'text': prompt},  
               {'type': 'image_url', 'image_url': {'url': image_url}}]   
>>> chat_completion=client.chat.completions.create(
     messages=[{'role': 'user', 'content': content}],
     model='llava-v1.5-7b-4096-preview'
     )   
>>> chat_completion.choices[0].message.content         
'The image features a white license plate with bright blue stars and the numbers 338 underneath. The license plate is displayed sideways and is very big, covering a large portion of the image. It appears to be sitting on a small side on the bottom left corner as well. The rest of the plate is prominent, giving a clear view of the number plate.'

數字 3388 只辨識出 338, 字母 NBX 沒有辨識出來. 再次辨識結果不同 :

>>> chat_completion=client.chat.completions.create(
     messages=[{'role': 'user', 'content': content}],
     model='llava-v1.5-7b-4096-preview'
     )   
>>> chat_completion.choices[0].message.content   
'The image features a vehicle license plate, specifically an Illinois license plate from the state of Illinois. The license plate reads "NBOX-38338388." The license plate is round, placed below vents, and has a blue star in the upper-middle section, representing a city within the state. The license plate is attached to the front of a vehicle, indicating its registration in Illinois.'

這次字母辨識出 NBOX 多了 O, 數字 3388 辨識為 3833838, 是伊利諾州車牌? 


2. 上傳本機圖片 :

若要上傳本機圖片讓 Groq 的模型辨識, 需使用 base64 套件予以編碼, 先匯入套件 :

>>> import base64   

將上面那張 ABC-5678 車牌抓圖後另存為 ABC-5678.png 檔案於目前工作目錄下, 然後以 'rb' 模式開啟後傳給 base64.b64encode() 函式進行編碼後以 utf-8 格式解碼 :

>>> with open('ABC-5678.png', "rb") as image_file:  
  base64_image=base64.b64encode(image_file.read()).decode('utf-8')   

檢視圖片用 base64 編碼後的結果 : 

>>> base64_image      
'iVBORw0KGgoAAAANSUhEUgAACPkAAAP2CAIAAAAQUE3KAAD//0lEQVR42uzd6VsbV5r38TIISUgIse+IHTuxE++Jk87TfV3zYv7puWZ6piduO46XxNgxNptYxb5IgASS6Oe07ubM4ZQky9gsQt/PC11SURS12EdV9av7nGt7e3tVVVXXrl1Tr//4xz+u5TiOc+2Ymqg+/iNHT5Qp2Wz26OhIT6+urtY/lRkcF/1TAAAAAAAAAAAAlCMdFX10Hrd/HHOM+Kkqx5zH/VuSSWUyGXkvKdXh4eG1ZDKpYypZnHMcdMkvy2+qVz1dsjH9IzMDKx50Fd82AAAAAAAAAAAAXHKlhEFOCVmXjrt0UZY5T95f1MmUBF2ZTEa9uXZwcCBL0YmZ9fv6N9Wr/DFdv2XNaRaBmcv51HCLMAwAAAAAAAAAAOByKpR1WcFQobhHyqt01mVVYZkT8yZN2WxW12hlcv6ZdclMknVJXqWrtcwKMsfo2NAs4bKYQZw1j/WR2i8AAAAAAAAAAICy4+7D0F0f5RSIe4oESeZCzB+Z6Zr569ls9p9Z1+HhoTXgljlrobCqeBZlZXF5V8txlX8BAAAAAAAAAADgkvvUuq5CiZf10cyxnHz1V9INoXMcksnMauI1KfWSn0ldl1XalTc3Ky5vTuacS9Zl/olCSy5SkfYF1wQAAAAAAAAAAOCsFcluzij4+NTxuj6aLrmDLudk34Y66zJ/KkHXP7MunXrpZKuqqspMoYp0V/ip++6ss64S+0gk6wIAAAAAAAAAAFdAKTHSmf7RUsarKnEl82ZdjhF6ubsVlPTrXyNy6bIvMx8ru+yn9PHAiuxKAAAAAAAAAACAy69QhlRi3VX50uVb/+poUGdl/8q+rlDWZW229cZxDXH2qeVrAAAAAAAAAAAAF8tKcz46wlTe3yoXZu1XVVXVvzawQrIuJ18FnFnspscqu+iNAAAAAAAAAAAA+LhCY0VZYceVGdrJHNZLD8vlmFmX+bMyzbq04hGljrj0q7lHAAAAAAAAAAAAykVFZV16TC7nuJDpn7VM7mqnK5B1OfmOnLv3QivrckoYHg0AAAAAAAAAAKCMXJk+DCXMkn4KHXfWpecw+/Eru+38pPG38o7gBQAAAAAAAAAAUBZKLOO5MjmI2YehUyjrMndNOW583uo0a6Py/tZFrzgAAAAAAAAAAMAnKyXoKpSDlGNfd1bcZfdh6N62sguBrKzLKXCcrkzHlAAAAAAAAAAAoGK5R6SyEhArA/qk7vEus6Mc57i060qN11XiCrsPXvkeTgAAAAAAAAAAUJmKlDO5Z3Mru3BEr3A2m9Ujc/1zumzh0dGR7tywHDevkEJZZaG9AwAAAAAAAAAAUC6sLgqtvOOKxR96cyTV+r8iLinyKjLSVRkpVJpnBpt5w7wy3V4AAAAAAAAAAFCxzJGrdNJhRh5XLP4wB+E6se3Sp6HeETroUq9l14eh+5i5y/cKzXnFjjcAAAAAAAAAALjarKIuUaTap+xyn0JbbQ3FdS2TyVgVXeU7XpdTOO761N8CAAAAAAAAAAC45IqHIIX6w8v707KQd+WvpdNpnYBVVVWZHRiWY9ZlKseDBAAAAAAAAAAAcBasPg/LLkbRWZ1ZuOVI1uW4KrrM3ykvRQblKrRH/rUjyu2IAgAAAAAAAACASmZmP06BIbus+a246KK34DTb67hioP/LuqSoyynPiKv4RuadzR64rNyOKAAAAAAAAAAAqGRWkZYORwrlWDpGKdNkJG8M9M/cLpPJmBtvjmNWjhvprlxzOzo6UrMdHSvTIwoAAAAAAAAAACqWVaGlh6nSb0oJPsq0/Mms3frnHshmszrrkqlHR0fOZeqosVAW5e5u0UrpJMeSWEttprwqmUxGv8p0h6wLAAAAAAAAAACUFR2gSL5Vdczj8VQfUx/lVRSKwT419CpSR3QOJUY66/pXEdTR0ZH+e5dwUDJz51rrY2VdZi+TTi7o0slW+tjhMT1F5illfK/S1xMAAAAAAAAAAKAUXySLqTJUV1fXHPPmyHvPMSv3kiXkjTkKpVZFspvziZnMfgr/lXUVWsVTr4S1R6xuE90b+X/J2zFrSK3iXUzq4i3dM2Em5/Dw8CAnmUymUil5FTrxMuu6Tr3fCboAAAAAAAAAAMDpfH4co4u6pJZLUi6fz+fPqa2tlTe+HMm9JACTxMu9Ju6AxjESJp31mIGZ9SP3T89i1/0rciqUdX1m2ubO9KysSzbS2mX6eJjxlWMEUVYcpSMxqd/SZVsSce0bdMSVTqel90JJxcyV/Mx/TJehDA4AAAAAAAAAAJSLz8wXrCG7NLPAy+v16sQrEAhI7iWhl3ovhV/S56G7zCtv/Za7j0D9I4ld3MmOc8ZVQ3myri97hJySsy7pTVFnXZJF6WDQKqZzTpZzpdPpg4OD3d3dvRzJt9THRCKhXpPJpKRc8is62NScL5FUkXUBAAAAAAAAAIDSfU4CZBVR6YlZg5MLLzwej8RdwWAwkCO5V11dnZqiXtV76erQcQ0g5ZyMe9xTCm2OO+j6/G4FCzmTrKv41n4069JFcJJj6QTSMQIz3UuhhFipVGpvby8ejycSCfVGeiyUTgsPDg704dQBphTuSaWeHDzntP+k9KaRdQEAAAAAAAAAgNLpYOJ0CYUVd0l6ItVBEpTo7u6qqqqk30Ip5NLRVygUqs+R3Mus8TJXzKo/s9KvvGFQ3qGsnLMZx+ussq5SVt3aSJ11mRPlvS7qktnUUZEuCnUVl3qVKi7prlDNI4mlLEEfv1qDZF1y2Kxjc+pNBgAAAAAAAAAAKN1nlnY5Rp4iAYqM9JRMJvf396UoSNGjO8kvSveG0o1hXV2dJF7hcFi9CQaDfr/fSrwK/em8/RNKmlMo63LOIO46w7quvD02Flp79+hcelHmMtWRkIO0u7sbj8c3c7a2tqSvQjVdNkeGXJODJIV4uihPkYhLxl6zOqD8HGfa1yQAAAAAAAAAALhiPjOesEqPdCGQTrx06KXLhyT9UjNIAZKM+uT3++vr6xsaGlpbW5uamhobGwOBgJqofmSNMGWla3kTFvlp3j4Mz6i06wzH69LZlWMUZpmb6hQYyszcTnmVw6MOjNRv7eRsb28nctTEdDqt5jFzyOBJUsglEZccGz0GmPOFqrLIugAAAAAAAAAAQOm+YORjJl5SOyRlXnocKEUiFRkK6uDgQHo4VNT8Ho/H5/OFw+GGhgaJu9Sburo6qSAyUyv5E85xlGOOV+Wc7LTv3IbsOqusq3hRlzU6mbnxkvU5xqBfag2lc8nd3d2tra3Nzc2NjY3t7e14PK4OgzpOajaJuHSyFQgEzKBLD82lx/2ykjYAAAAAAAAAAICrR0IWqe7a3d2V0i5d5iUDRcnIXplMpqamJhAINDQ0tLS0tLW1qVdJvDwej5QPWVGWrikSeeuaig/99UWcSdZlbYy7wC1vnZrVt6OkjlJnp3Z0IpHY3Nxcz1Fv1N5X030+n9rpUlinqDdSwuU9VpNjFtm5B0kj8QIAAAAAAAAAAFeAu3DKClx0pZeOvqTeKx6P7+zsqPdqTo/H09jY2NLS0tHRoV6bmprMLg3N0EcnL8XXwd3t35d15lmXVdRlFtC5f6TfS2GdVNWpXby9vb21tWWWczm5QbkaGhrU7m5ublZvwuGw7Gs1vTrH6qjQWrcv3h0kAAAAAAAAAADAudGVUtKnnRU7ucORo2OSeyWP7e3txePx7e3tzc3NRCJxcHCgFuj1esPhcFOO7tJQUhiPx2N20ae7NHSK1hfl7fzvizjDrEt/NHexuSslhdL9CjpGUZfs5Z2dHbVb19bWVldXNzY2tra21P7NZrMySFpzc3Nra6skitJlpC7hsvZm3vo488995s4lMwMAAAAAAAAAAKdz6nhCIh7167p/u0JZV97fTR87ODiQDvbW19c3cra3t3d3d9USpEvDjo6O9vb2trY2Sby8Xq/ZpWE2m5W/ZfXz57j6LbRymS/lArIuGQ/NOe7J0dpyiRPVHtzZ2Vk9puvm/H5/MBhsamqSoEv6LdQpV6F9ZHUZaWZd5l7+zKzrLMruAAAAAAAAAADAVfWZ+YKZdbmTpI8W6pjlSel0+vDwUHraW19fX1lZWV1djcfjmUxGLVx3adjc3Kzeh0IhGU9K99UnCzQTH3OjrBG8yiDrsoIuq5dC2WU6wTP7GJQYTO3Kvb09KeeKxWLqdWtrS+1KNZukXC056k1DQ0MgEPD5fGbKZf11cx109Zh7hR2SKgAAAAAAAAAAUD6KlExZg0y5f0vo3ESXKh0cHOzt7VnFSMlksqamJhgMNjQ0SDGSZDR1dXVqusfjcf+tIuVl5ZR1mfvR3FkSdMlP3UFXKpXa3d3d2NhYXl6OxWLqVUbnUvursbGxM6e9vb2+vj4QCKg9aFbkmQOA5d/UEgLML7srAAAAAAAAAAAAzoFOZySI0T3emf3eyY9033u6lEjXKanXbDabTqetxGt3d1dNVIuqq6sLh8PtOa2trcFgUPreM+MeWZr8RTMJcs4siDmPrMvcBqtOTW+2dAcp5VwrKyvqzfb2djKZVD/1er3SY2FXV1dLS0tjY6Oaog6AuV+unVRorcz5zXVwPq+6i/G6AAAAAAAAAADA6XzBBEhnNHlzEx2GOcd97zkn65Rkiow2lUqltre3NzY21tbWZASvvb09NY/H42lqampra+vs7GxsbJQCLzVRghtJyxRZWnXOlcq69O7TUZNs8+HhoQx6tpSzsrKiPqrpXq9X7SC1m9Qua21tVa+hUCgQCOhiOn3AzGCwUNZllpRZx/iLjNcFAAAAAAAAAADwqU4dT+QdmstddGRlXWZQYqZfijnul5qYTCZ3dnakTml1dVW9JhKJw8NDv9/f1NTU3t4u8Y16r4fvkpxMEhy1NJ2BfebIZMWdSdZ14g+c3IPWnpVgUO2ara2tjY2NhYWF5eVltePUxgeDQbWDWlpa1GtjY2M4HJaeH6uqqtQ+kpDMMQb…

看起來是非常常的亂碼. 

網路上也有將圖片轉成 base64 編碼的線上服務, 參考 :





結果與上面是一樣的. 呼叫聊天 API 時只要將 image_url 從網址改成如下使用 f 字串嵌入 base64 編碼的字串即可 : 

f'data:image/jpeg;base64,{base64_image}'    

語法架構如下 : 

prompt='What is in this image?'
image_url=f'data:image/jpeg;base64,{base64_image}
content=[{'type': 'text', 'text': prompt}, 
               {'type': 'image_url', 'image_url': {'url': image_url}}]
chat_completion=client.chat.completions.create(
     messages=[{'role': 'user', 'content': content}],
     model='llava-v1.5-7b-4096-preview'
     )

例如 :

>>> image_url=f'data:image/jpeg;base64,{base64_image}'     
>>> content=[{'type': 'text', 'text': prompt}, 
               {'type': 'image_url', 'image_url': {'url': image_url}}]    
>>> chat_completion=client.chat.completions.create(
     messages=[{'role': 'user', 'content': content}],
     model='llava-v1.5-7b-4096-preview'
     )   
>>> chat_completion.choices[0].message.content     
"The image features a vehicle's licence expired license plate and the numbers CD-536 from Quebec. This provides some information about the vehicle's ownership. The white license plate is a primary component of the photo. As a notable detail, the white numbers are slightly obscured or partially blacked out, giving the appearance of possibly expired or invalid."     

從以上測試可知, Grok 的 llava-v1.5-7b-4096-preview 模型對圖片中的文字辨認之正確性還有待加強. 最後來試試一般圖片的描述, 我在 Pixabay 上找到一張狗狗圖 :


Source : Pixabay


此圖片的網址是 :


先用英文提示詞 :

>>> image_url='https://cdn.pixabay.com/photo/2024/03/15/17/50/dogs-8635461_1280.jpg'   
>>> prompt='What is in this image?'  
>>> content=[{'type': 'text', 'text': prompt},  
               {'type': 'image_url', 'image_url': {'url': image_url}}]         
>>> chat_completion=client.chat.completions.create(
     messages=[{'role': 'user', 'content': content}],
     model='llava-v1.5-7b-4096-preview'
     )      
>>> chat_completion.choices[0].message.content    
'In this image, there are a group of young white dogs, possibly a litter, sleeping together on a bed. They are either laying or sleeping near each other, making it a heartwarming scene of puppies cuddling in close.'

結果還不錯, 改用中文問看看 : 

>>> prompt='這是一張甚麼樣的圖? 請用繁體中文(zh_Hant)回答'    
>>> content=[{'type': 'text', 'text': prompt},  
               {'type': 'image_url', 'image_url': {'url': image_url}}]         
>>> chat_completion=client.chat.completions.create(
     messages=[{'role': 'user', 'content': content}],
     model='llava-v1.5-7b-4096-preview'
     )      
>>> chat_completion.choices[0].message.content     
'這是一張多片白小狗在毛中睡覺和朋友們相靠垂。他們的相似和和親密的聚結一起產生一張吸引人心的照片。'

可見有些詞彙有點怪, 但基本上還可以. 

但是如果問圖中有幾隻狗, 跟辨識車牌號碼一樣不準, 先用英文問 : 

>>> image_url='https://cdn.pixabay.com/photo/2024/03/15/17/50/dogs-8635461_1280.jpg'       
>>> prompt='How many dogs are there in the picture?'       
>>> content=[{'type': 'text', 'text': prompt},  
               {'type': 'image_url', 'image_url': {'url': image_url}}]        
>>> chat_completion=client.chat.completions.create(
     messages=[{'role': 'user', 'content': content}],
     model='llava-v1.5-7b-4096-preview'
     )      
>>> chat_completion.choices[0].message.content     
'Since the image features a line of dogs sleeping together on the same bed, it is likely that there are several dogs present in the picture. However, without a specific count or visible number of dogs, it is not possible to provide an exact number of dogs.'

只說很多狗, 不講幾隻. 哈哈. 再問一次回答 6 隻 : 

>>> chat_completion=client.chat.completions.create(
     messages=[{'role': 'user', 'content': content}],
     model='llava-v1.5-7b-4096-preview'
     )    
>>> chat_completion.choices[0].message.content    
'There are six Poodle puppies in the picture.'

用中文問 :

>>> prompt='這張圖中有幾隻狗狗? 請用繁體中文(zh_Hant)回答'   
>>> content=[{'type': 'text', 'text': prompt},   
               {'type': 'image_url', 'image_url': {'url': image_url}}]    
>>> chat_completion=client.chat.completions.create(
     messages=[{'role': 'user', 'content': content}],
     model='llava-v1.5-7b-4096-preview'
     )     
>>> chat_completion.choices[0].message.content   
'在這張圖中有十幾隻小狗.'   

總之, 這個 preview 的視覺模型描述還可以, 但辨識還不行. 


3. 聊天函式 :

我把上面的兩種圖片聊天與之前的文字聊天程式碼包裝成 ask_llm 模組方便呼叫 : 

# ask_llm.py 
import base64
from groq import Client

def ask_groq_text(prompt, api_key, model='llama3-70b-8192'):
    client=Client(api_key=api_key)
    chat_completion=client.chat.completions.create(
        messages=[
            {"role": "user",
             "content": prompt,
             }],
        model=model,   
        )
    return chat_completion.choices[0].message.content

def ask_groq_vision_file(prompt, file, api_key):
    client=Client(api_key=api_key)
    with open(file, 'rb') as image_file:  
        base64_image=base64.b64encode(image_file.read()).decode('utf-8')
    image_url=f'data:image/jpeg;base64,{base64_image}' 
    content=[{'type': 'text', 'text': prompt}, 
             {'type': 'image_url', 'image_url': {'url': image_url}}]
    chat_completion=client.chat.completions.create(
         messages=[{'role': 'user', 'content': content}],
         model='llava-v1.5-7b-4096-preview'
         )
    return chat_completion.choices[0].message.content

def ask_groq_vision_url(prompt, url, api_key):
    client=Client(api_key=api_key)
    content=[{'type': 'text', 'text': prompt},  
             {'type': 'image_url', 'image_url': {'url': url}}] 
    chat_completion=client.chat.completions.create(
         messages=[{'role': 'user', 'content': content}],
         model='llava-v1.5-7b-4096-preview'
         )
    return chat_completion.choices[0].message.content

使用時先從 ask_llm 模組匯入所有函式, 只要傳入 prompt, url/file, 與 api_key 參數即可呼叫 ask_groq_vision_url() 或 ask_groq_vision_file() 進行圖片聊天 :

>>> from ask_llm import *    

首先測試 url 圖片聊天, 先從隱藏檔 .env 讀取 GROQ_API_KEY 變數 :

>>> import os      
>>> from dotenv import load_dotenv   
>>> load_dotenv()   
True
>>> groq_api_key=os.environ.get('GROQ_API_KEY')   

設定提示詞與圖片 url 變數 :

>>> prompt='What is in this image?'   
>>> url='https://maxaiot.com/image/alpr/alprBgImage.png'     

呼叫 ask_groq_vision_url() 函式 :

>>> ask_groq_vision_url(prompt, url, groq_api_key)    
"In this image, there is a black and white picture of a car tag or vehicle assignment on a license plate. The tag is accompanied by numbers, such as `55-BC' and `57-C<55' depending on which description refers to the specific numbers. The tag has a black background with white lettering and possibly includes information related to the vehicle assigned to the tag."

接下來是上傳本機圖片, 設定提示詞與圖片之路徑檔名 :

>>> prompt='What is in this image?'   
>>> file='ABC-5678.png'    

呼叫 ask_groq_vision_file() 函式 :

>>> ask_groq_vision_file(prompt, file, groq_api_key)
'The image is a license plate, showcasing the licence plate designated C-566.'   

沒有留言 :