本篇旨在測試 Ollama REST API 的第一種用法 : 使用 curl.exe 提出請求.
當執行 Ollama 應用程式後, 它會自動啟動一個 REST API 伺服器, 網址為 :
把這網址貼到瀏覽器, 若顯示 "Ollama is running" 表示 Ollama 伺服器已啟動 :
以下測試必須在 Ollama 伺服器執行中才會成功.
此 REST API 伺服器的大部分端點都要用 POST 方法提出請求 (主要是叫模型做事或改變系統狀態), 因為這些操作需要傳遞複雜的參數 (以 JSON 格式打包在 Body 中); 少部分則可以用 GET 方法 (只查詢看資料不改變任何東西的唯讀操作, 例如查詢狀態或列舉清單).
以下是常用的 POST 端點 :
| POST 操作端點 | 說明 |
|---|---|
| /api/generate | 文字生成 (用於單次提示詞的輸入與基本文字接龍)。 |
| /api/chat | 多輪對話 (用於需要記錄上下文, 用 user/assistant/system 區分角色)。 |
| /api/embeddings | 使用 Embedding 模型將文字轉成可用於 RAG 語意搜尋的嵌入向量。 |
| /api/pull | 從 Ollama 官方模型庫下載指定的 LLM 模型到本機。 |
| /api/push | 將模型推入 Ollama 官方模型庫 (需登入 & 輸入金鑰)。 |
| /api/show | 顯示模型的詳細資訊。 |
| /api/create | 透過傳入 Modelfile 的內容從本機檔案建立或自訂一個全新的模型。 |
| /api/copy | 在本地將一個現有的模型複製並重新命名為另一個名稱。 |
常用的 GET 端點 :
| GET 操作端點 | 說明 |
|---|---|
| /api/tags | 列出本地所有模型。 |
| /api/ps | 查看目前有哪些模型正載入在記憶體 (DRAM/VRAM)。 |
| /api/version | 取得 Ollama 的版本資訊。 |
刪除模型則會用到 DELETE 方法 :
| DELETE 操作端點 | 說明 |
|---|---|
| /api/delete | 刪除本地指定之模型 (使用 name 鍵指定)。 |
向 REST API 伺服器提出請求可以在命令列用 curl 指令或在 Python 程式中使用 requests 套件的 post(), get(), 或 delete() 函式.
1. 使用 cURL 提出請求 :
在 CMD 或 PS 視窗用 curl.exe 指令呼叫 REST API 端點, 例如 :
curl.exe http://localhost:11434/api/tags
此指令會列出本地所有模型 :
PS C:\Users\USER> curl.exe http://localhost:11434/api/tags
{"models":[{"name":"llama3.2-vision:11b","model":"llama3.2-vision:11b","modified_at":"2026-05-25T11:30:36.0322613+08:00","size":7816589186,"digest":"6f2f9757ae97e8a3f8ea33d6adb2b11d93d9a35bef277cd2c0b1b5af8e8d0b1e","details":{"parent_model":"","format":"gguf","family":"mllama","families":["mllama"],"parameter_size":"10.7B","quantization_level":"Q4_K_M"}},{"name":"phi4:14b","model":"phi4:14b","modified_at":"2026-05-24T17:23:24.1907029+08:00","size":9053116391,"digest":"ac896e5b8b34a1f4efa7b14d7520725140d5512484457fab45d2a4ea14c69dba","details":{"parent_model":"","format":"gguf","family":"phi3","families":["phi3"],"parameter_size":"14.7B","quantization_level":"Q4_K_M"}},{"name":"dagbs/deepseek-coder-v2-lite-instruct:q4_k_m","model":"dagbs/deepseek-coder-v2-lite-instruct:q4_k_m","modified_at":"2026-05-24T00:17:21.5900783+08:00","size":10364417401,"digest":"a6f5c73087ad25fc8666929492449eb0dc694326e4ca5b2313fef75b66645583","details":{"parent_model":"","format":"gguf","family":"deepseek2","families":["deepseek2"],"parameter_size":"15.7B","quantization_level":"Q4_K_M"}},{"name":"mannix/deepseek-coder-v2-lite-instruct:q4_k_m","model":"mannix/deepseek-coder-v2-lite-instruct:q4_k_m","modified_at":"2026-05-23T21:15:32.5340567+08:00","size":10364432240,"digest":"6171206208d0529a47806ebcf8ed37a88fe322859e269396dd16fdd98a56a102","details":{"parent_model":"","format":"gguf","family":"deepseek2","families":["deepseek2"],"parameter_size":"15.7B","quantization_level":"Q4_K_M"}},{"name":"deepseek-r1:14b","model":"deepseek-r1:14b","modified_at":"2026-05-23T16:43:35.5527992+08:00","size":8988112209,"digest":"c333b7232bdb521236694ffbb5f5a6b11cc45d98e9142c73123b670fca400b09","details":{"parent_model":"","format":"gguf","family":"qwen2","families":["qwen2"],"parameter_size":"14.8B","quantization_level":"Q4_K_M"}},{"name":"qwen3:14b","model":"qwen3:14b","modified_at":"2026-05-22T00:30:36.3395998+08:00","size":9276198565,"digest":"bdbd181c33f2ed1b31c972991882db3cf4d192569092138a7d29e973cd9debe8","details":{"parent_model":"","format":"gguf","family":"qwen3","families":["qwen3"],"parameter_size":"14.8B","quantization_level":"Q4_K_M"}},{"name":"gemma4:latest","model":"gemma4:latest","modified_at":"2026-05-20T11:45:08.2471048+08:00","size":9608350718,"digest":"c6eb396dbd5992bbe3f5cdb947e8bbc0ee413d7c17e2beaae69f5d569cf982eb","details":{"parent_model":"","format":"gguf","family":"gemma4","families":["gemma4"],"parameter_size":"8.0B","quantization_level":"Q4_K_M"}}]}
可見全部模型會放在鍵為 models 的串列中, 以 JSON 格式傳回. 注意, 一定要用 curl.exe 不要用 curl, 前者呼叫的是 Linux 生態系那個真正的 cURL 工具; 而單純的 curl 只是 PowerShell 幫內建指令 Invoke-WebRequest 取的一個別名, 其傳回值是一個包含模型資訊的複雜 JSON 物件, 長度太長會被自動截掉, 無法顯示全部模型清單.
接下來呼叫 /api/ps 查詢目前記憶體中是否有載入任何模型 :
PS C:\Users\USER> curl.exe http://localhost:11434/api/ps
{"models":[]}
models 鍵之值為空串列, 表示記憶體中未載入模型.
開啟一個新的 PS 視窗, 用 ollama run 載入執行一個模型, 例如 Gemma 4 :
PS C:\Users\USER> ollama run gemma4:e4b
>>>
然後回原 PS 視窗再次呼叫 /api/ps, 這時就會顯示已載入 gemma4:e4b 模型 :
PS C:\Users\USER>curl.exe http://localhost:11434/api/ps
{"models":[{"name":"gemma4:e4b","model":"gemma4:e4b","size":10579079040,"digest":"c6eb396dbd5992bbe3f5cdb947e8bbc0ee413d7c17e2beaae69f5d569cf982eb","details":{"parent_model":"","format":"gguf","family":"gemma4","families":["gemma4"],"parameter_size":"8.0B","quantization_level":"Q4_K_M"},"expires_at":"2026-05-28T22:41:52.1342692+08:00","size_vram":10579079040,"context_length":4096}]}
呼叫 /api/version 會傳回 Ollama 的版本 :
PS C:\Users\USER> curl.exe http://localhost:11434/api/version
{"version":"0.24.0"}
如果要呼叫 POST 方法的 API 端點, 必須在 -d 參數後面帶入要放入 Body 中的資料 (-d 是 -data 的縮寫), 例如呼叫 /api/show 會顯示指定模型之詳細資訊. 注意, 在 Windows PowerShell 中使用 curl.exe 傳送 JSON 資料時, 外層須用單引號 ', 內層的雙引號必須加上反斜線 \" 進行轉義, 例如 :
curl.exe http://localhost:11434/api/show -d '{\"name\": \"llama3.2-vision:11b\"}'
PS C:\Users\USER> curl.exe http://localhost:11434/api/show -d '{\"name\": \"gemma4:e4b\"}'
{"license":" Apache License\n Version 2.0, January 2004\n http://www.apache.org/licenses/\n\n TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION\n\n 1. Definitions.\n\n \"License\" shall mean the terms and conditions for use, reproduction,\n and distribution as defined by Sections 1 through 9 of this document.\n\n \"Licensor\" shall mean the copyright owner or entity authorized by\n the copyright owner that is granting the License.\n\n \"Legal Entity\" shall mean the union of the acting entity and all\n other entities that control, are controlled by, or are under common\n control with that entity. For the purposes of this definition,\n \"control\" means (i) the power, direct or indirect, to cause the\n ... 非常長 ...
... (略) ...
3072]},{"name":"v.blk.9.ln1.weight","type":"F32","shape":[768]},{"name":"v.blk.9.ln2.weight","type":"F32","shape":[768]},{"name":"v.patch_embd.weight","type":"F16","shape":[16,16,3,768]},{"name":"v.position_embd.weight","type":"F32","shape":[768,10240,2]}],"capabilities":["completion","vision","audio","tools","thinking"],"modified_at":"2026-05-28T22:36:46.3502486+08:00","requires":"0.20.0"}
比較常用的 API 端點是 /api/generate (文字生成) 與 /api/chat (對話), 例如呼叫 /api/generate 指令如下 (注意參數內容的雙引號要用 \ 轉義) :
curl.exe http://localhost:11434/api/generate -d '{
\"model\": \"gemma4:e4b\",
\"prompt\": \"你是誰?\",
\"stream\": false
}'
PS C:\Users\USER> curl.exe http://localhost:11434/api/generate -d '{
>> \"model\": \"gemma4:e4b\",
>> \"prompt\": \"你是誰?\",
>> \"stream\": false
>> }'
{"model":"gemma4:e4b","created_at":"2026-05-28T15:05:57.5062972Z","response":"我叫 **Gemma 4**。\n\n我是一個由 Google DeepMind 開發的大型語言模型 (Large Language Model)。\n\n我的作用是處理和生成人類語言,我可以幫助您回答問題、寫作文本、翻譯語言,或是進行創意寫作。\n\n請問您需要我幫您做些什麼呢?","done":true,"done_reason":"stop","context":[2,105,9731,107,98,107,106,107,105,2364,107,95841,240560,236881,106,107,105,4368,107,100,45518,107,120474,12364,236787,108,236770,236761,138,1018,115863,506,16499,53121,669,2430,4733,623,95841,240560,7462,568,236797,240622,127880,704,155268,78546,837,55544,531,623,15938,659,611,7462,107,236778,236761,138,1018,102752,17354,46320,53121,564,1202,531,2847,496,63510,532,11459,3890,2721,580,1041,19080,11808,236761,107,140,236829,139,1567,236787,147224,236743,236812,236761,107,140,236829,139,96089,236787,6475,22267,65153,236761,107,140,236829,139,46797,236787,25093,22160,9483,568,2182,236792,769,107,140,236829,139,2328,236787,7607,18710,2028,236761,107,236800,236761,138,1018,102752,22160,53121,669,2744,5192,691,43899,8555,568,504,164557,236764,840,506,7609,3904,563,4077,8555,779,834,506,3072,1921,577,528,8555,236761,107,236812,236761,138,1018,88293,506,14503,568,495,8555,1473,1018,107,140,236829,139,6302,684,29354,506,1463,236787,26911,237026,147224,236743,236812,568,236777,1006,147224,236743,236812,769,107,140,236829,139,3112,506,9059,236786,39822,236787,26911,90432,56762,146569,26609,568,236777,1006,496,2455,5192,2028,769,107,140,236829,139,3112,506,19788,236787,108965,6475,22267,65153,65706,238623,568,165684,684,6475,22267,65153,769,107,140,236829,139,236769,43983,840,11045,1473,99082,9779,1292,568,236744,236761,236759,1126,9795,3890,4137,236764,5712,1816,769,107,236810,236761,138,1018,30852,532,10867,688,568,25864,236772,135778,236786,87228,1473,1018,41152,506,3072,563,54651,236764,11459,236764,532,5467,19246,506,11172,1651,98469,531,506,5221,19080,236761,108,236825,236761,138,1018,17667,16887,32955,99382,568,2094,9025,531,506,3847,8555,3072,2907,101,237169,239138,5213,236823,12367,236743,236812,1018,236924,108,237169,90432,237852,6475,22267,65153,65706,238623,29854,237731,146569,26609,568,31534,22160,9483,45511,108,21480,24654,237026,93521,237206,25352,126592,146569,236900,183868,116904,238602,49695,18053,236951,240564,237284,57489,236951,205963,146569,236900,67375,43682,215960,240564,237284,236924,108,130557,238602,10042,237169,240975,238602,237893,237709,26549,238463,237536],"total_duration":9540299900,"load_duration":4356764800,"prompt_eval_count":19,"prompt_eval_duration":28490800,"eval_count":366,"eval_duration":4822382400}
context 鍵裡面放的一大堆整數是 "提示詞 + 模型的回答" 的 Token ID, 是用來記錄對話歷史的記憶卡, 用來延續對話, 下一次發送請求時會把這串 context 原封不動放在 Payload 的 "context" 欄位裡傳回去, Ollama 就能立刻讀取這串 ID, 知道剛才聊了什麼, 達到記憶上下文的效果.
/api/chat 端點用來回應對話, 它與 /api/generate 最大的不同點是不需要自己管理那串整數的 context (Token ID), 而是直接傳送一組聊天紀錄列表 (messages 串列) 給它, 用 user/system/assistant 三種不同角色來區分誰說的, 格式與 OpenAI API 相同.
下面是起始對話 :
curl.exe http://localhost:11434/api/chat -d '{
\"model\": \"gemma4:e4b\",
\"messages\": [{\"role\":
\"user\",
\"content\": \"我有兩隻貓咪, 咪咪與萬萬。\"}],
\"stream\": false
}'
PS C:\Users\USER> curl.exe http://localhost:11434/api/chat -d '{
>> \"model\": \"gemma4:e4b\",
>> \"messages\": [{\"role\":
>> \"user\",
>> \"content\": \"我有兩隻貓咪, 咪咪與萬萬。\"}],
>> \"stream\": false
>> }'
{"model":"gemma4:e4b","created_at":"2026-05-28T15:56:58.0837637Z","message":{"role":"assistant","content":"哇,兩隻貓咪!🐈🐈 🐱\n\n咪咪和萬萬,名字很可愛耶!牠們的個性怎麼樣呢?是活潑好動的類型,還是比較溫柔黏人的?\n\n歡迎多向我分享關於牠們的故事,我很喜歡聽貓咪的故事!🐈❤️","thinking":"Thinking Process:\n\n1. **Analyze the user's input:** The user provided a statement: \"我有兩隻貓咪, 咪咪與萬萬。\" (I have two cats, Mimi and Wanwan.)\n2. **Determine the goal:** The user is sharing personal information. The appropriate response should be warm, engaging, and encourage further interaction, showing interest in the topic.\n3. **Identify key elements for the response:**\n * Acknowledge the information (two cats).\n * Use warm, positive language.\n * Ask open-ended questions related to cats (e.g., their personalities, breeds, feeding habits, or recent activities).\n\n4. **Drafting the response (in Chinese, matching the input language):**\n\n * *Greeting/Acknowledgement:* Oh, you have two cats! (哇,有兩隻貓咪!)\n * *Express interest:* That must be fun/lovely. (一定很可愛/很高興!)\n * *Engage (Question 1 - Personality):* What are they like? Are they playful? (牠們個性怎麼樣?是活潑的嗎?)\n * *Engage (Question 2 - Interaction/Detail):* Do they get along well? (牠們相處融洽嗎?)\n\n5. **Review and refine (Tone check):** The tone is friendly, affectionate, and curious. (This is a good response.)"},"done":true,"done_reason":"stop","total_duration":11303125100,"load_duration":5890298300,"prompt_eval_count":29,"prompt_eval_duration":30451400,"eval_count":383,"eval_duration":5043021600}
接下來起始第二輪對話, 把上面的第一輪對話內容放入 messages 串列中 (提問角色為 user, 回應角色為 assistant), 另外加入第二輪提問 (role=user) :
我家的兩隻貓咪叫甚麼名字?
curl.exe http://localhost:11434/api/chat -d '{
\"model\": \"gemma4:e4b\",
\"messages\": [{\"role\":
\"user\",
\"content\": \"我有兩隻貓咪, 咪咪與萬萬。\"},
{\"role\": \"assistant\",
\"content\": \"哇,兩隻貓咪!🐈🐈 🐱\n\n咪咪和萬萬,名字很可愛耶!牠們的個性怎麼樣呢?是活潑好動的類型,還是比較溫柔黏人的?\n\n歡迎多向我分享關於牠們的故事,我很喜歡聽貓咪的故事!🐈❤️\"},
{\"role\": \"user\",
\"content\": \"我家的兩隻貓咪叫甚麼名字?\"}],
\"stream\": false}'
PS C:\Users\USER> curl.exe http://localhost:11434/api/chat -d '{
>> \"model\": \"gemma4:e4b\",
>> \"messages\": [{\"role\":
>> \"user\",
>> \"content\": \"我有兩隻貓咪, 咪咪與萬萬。\"},
>> {\"role\": \"assistant\",
>> \"content\": \"哇,兩隻貓咪!🐈🐈 🐱\n\n咪咪和萬萬,名字很可愛耶!牠們的個性怎麼樣呢?是活潑好動的類型,還是比較溫柔黏人的?\n\n歡迎多向我分享關於牠們的故事,我很喜歡聽貓咪的故事!🐈❤️\"},
>> {\"role\": \"user\",
>> \"content\": \"我家的兩隻貓咪叫甚麼名字?\"}],
>> \"stream\": false}'
{"model":"gemma4:e4b","created_at":"2026-05-28T16:03:10.0080872Z","message":{"role":"assistant","content":"您在最一開始就告訴我了呀!🐱🐱\n\n牠們叫做:\n\n**咪咪 (Mimi)** 和 **萬萬 (Wanwan)** 💖\n\n這麼可愛的名字!😊","thinking":"Thinking Process:\n\n1. **Analyze the user's input:** The user asked: \"我家的兩隻貓咪叫甚麼名字?\" (What are the names of my two cats?).\n2. **Review the previous context:** The user stated earlier: \"我有兩隻貓咪, 咪咪與萬萬。\" (I have two cats, Mimi and Wanwan).\n3. **Identify the contradiction/question:** The user is asking for the names, but they already provided them in the very first turn of the conversation.\n4. **Determine the appropriate response:**\n * Politely point out that the names were mentioned before.\n * Restate the names to confirm the information.\n * Maintain a friendly, conversational, and engaging tone.\n5. **Draft the response (Internal Monologue Check):**\n * *Option A (Too blunt):* You already told me. They are Mimi and Wanwan.\n * *Option B (Ideal - Friendly and Confirming):* 您剛才告訴我了!牠們叫做咪咪和萬萬。\n6. **Final Polish (Chinese):** (Drafting the final empathetic and helpful response.)"},"done":true,"done_reason":"stop","total_duration":8877803600,"load_duration":4535185600,"prompt_eval_count":115,"prompt_eval_duration":57920400,"eval_count":305,"eval_duration":4005207200}
模型回應了正確答案.























































