View Source Ollama (Ollama v0.5.0)
Ollama is a nifty little tool for running large language models locally, and this is a nifty little library for working with Ollama in Elixir.
- API client fully implementing the Ollama API.
- Stream API responses to any Elixir process.
Installation
The package can be installed by adding ollama
to your list of dependencies
in mix.exs
.
def deps do
[
{:ollama, "0.5.0"}
]
end
Quickstart
API change
The last two minor versions have introduced breaking API changes. We'll stop doing this at version 1.0.0 - promise 🙏🏻.
0.5.0
- Streaming requests no longer return aTask.t/0
, they return aOllama.Streaming.t/0
struct. Refer to the section on Streaming.0.4.0
- TheOllama.API
module has been deprecated in favour of the top levelOllama
module.Ollama.API
will be removed in version 1.
Assuming you have Ollama running on localhost, and that you have installed a
model, use completion/2
or chat/2
interact with the model.
1. Generate a completion
iex> client = Ollama.init()
iex> Ollama.completion(client, [
...> model: "llama2",
...> prompt: "Why is the sky blue?",
...> ])
{:ok, %{"response" => "The sky is blue because it is the color of the sky.", ...}}
2. Generate the next message in a chat
iex> client = Ollama.init()
iex> messages = [
...> %{role: "system", content: "You are a helpful assistant."},
...> %{role: "user", content: "Why is the sky blue?"},
...> %{role: "assistant", content: "Due to rayleigh scattering."},
...> %{role: "user", content: "How is that different than mie scattering?"},
...> ]
iex> Ollama.chat(client, [
...> model: "llama2",
...> messages: messages,
...> ])
{:ok, %{"message" => %{
"role" => "assistant",
"content" => "Mie scattering affects all wavelengths similarly, while Rayleigh favors shorter ones."
}, ...}}
Streaming
By default, all endpoints are called with streaming disabled, blocking until
the HTTP request completes and the response body is returned. For endpoints
where streaming is supported, the :stream
option can be set to true
, and
the function returns a Ollama.Streaming.t/0
struct.
iex> Ollama.completion(client, [
...> model: "llama2",
...> prompt: "Why is the sky blue?",
...> stream: true,
...> ])
{:ok, %Ollama.Streaming{}}
iex> Ollama.chat(client, [
...> model: "llama2",
...> messages: messages,
...> stream: true,
...> ])
{:ok, %Ollama.Streaming{}}
Ollama.Streaming
implements the Enumerable
protocol, so can be used
directly with Stream
functions. Most of the time, you'll just want to
asynchronously call Ollama.Streaming.send_to/2
, which will run the stream
and send each message to a process of your chosing.
Messages are sent in the following format, allowing the receiving process
to pattern match against the reference/0
of the streaming request:
{request_ref, {:data, data}}
Each data chunk is a map. For its schema, Refer to the Ollama API docs.
A typical example is to make a streaming request as part of a LiveView event, and send each of the streaming messages back to the same LiveView process.
defmodule MyApp.ChatLive do
use Phoenix.LiveView
alias Ollama.Streaming
# When the client invokes the "prompt" event, create a streaming request and
# asynchronously send messages back to self.
def handle_event("prompt", %{"message" => prompt}, socket) do
{:ok, streamer} = Ollama.completion(Ollama.init(), [
model: "llama2",
prompt: prompt,
stream: true,
])
pid = self()
{:noreply,
socket
|> assign(current_request: streamer.ref)
|> start_async(:streaming, fn -> Streaming.send_to(streaming, pid) end)
}
end
# The streaming request sends messages back to the LiveView process.
def handle_info({_request_ref, {:data, _data}} = message, socket) do
ref = socket.assigns.current_request
case message do
{^ref, {:data, %{"done" => false} = data}} ->
# handle each streaming chunk
{^ref, {:data, %{"done" => true} = data}} ->
# handle the final streaming chunk
{_ref, _data} ->
# this message was not expected!
end
end
# When the streaming request is finished, remove the current reference.
def handle_async(:streaming, :ok, socket) do
{:noreply, assign(socket, current_request: nil)}
end
end
Summary
Functions
Generates the next message in a chat using the specified model. Optionally streamable.
Checks a blob exists in ollama by its digest or binary data.
Generates a completion for the given prompt using the specified model. Optionally streamable.
Creates a model with another name from an existing model.
Creates a blob from its binary data.
Creates a model using the given name and model file. Optionally streamable.
Deletes a model and its data.
Generate embeddings from a model for the given prompt.
Creates a new Ollama API client. Accepts either a base URL for the Ollama API,
a keyword list of options passed to Req.new/1
, or an existing Req.Request.t/0
struct.
Lists all models that Ollama has available.
Downloads a model from the ollama library. Optionally streamable.
Upload a model to a model library. Requires registering for ollama.ai and adding a public key first. Optionally streamable.
Shows all information for a specific model.
Types
@type client() :: %Ollama{req: Req.Request.t()}
Client struct
@type message() :: map()
Chat message
A chat message is a map/0
with the following fields:
:role
(String.t/0
) - Required. The role of the message, eithersystem
,user
orassistant
.:content
(String.t/0
) - Required. The content of the message.:images
(list ofString.t/0
) - (optional) List of Base64 encoded images (for multimodal models only).
@type response() :: {:ok, map() | boolean() | Ollama.Streaming.t()} | {:error, term()}
Client response
Functions
Generates the next message in a chat using the specified model. Optionally streamable.
Options
:model
(String.t/0
) - Required. The ollama model name.:messages
(list ofmap/0
) - Required. List of messages - used to keep a chat memory.:template
(String.t/0
) - Prompt template, overriding the model default.:format
(String.t/0
) - Set the expected format of the response (json
).:stream
(boolean/0
) - See section on streaming. The default value isfalse
.:keep_alive
- How long to keep the model loaded.:options
- Additional advanced model parameters.
Message structure
Each message is a map with the following fields:
:role
(String.t/0
) - Required. The role of the message, eithersystem
,user
orassistant
.:content
(String.t/0
) - Required. The content of the message.:images
(list ofString.t/0
) - (optional) List of Base64 encoded images (for multimodal models only).
Examples
iex> messages = [
...> %{role: "system", content: "You are a helpful assistant."},
...> %{role: "user", content: "Why is the sky blue?"},
...> %{role: "assistant", content: "Due to rayleigh scattering."},
...> %{role: "user", content: "How is that different than mie scattering?"},
...> ]
iex> Ollama.chat(client, [
...> model: "llama2",
...> messages: messages,
...> ])
{:ok, %{"message" => %{
"role" => "assistant",
"content" => "Mie scattering affects all wavelengths similarly, while Rayleigh favors shorter ones."
}, ...}}
# Passing true to the :stream option initiates an async streaming request.
iex> Ollama.chat(client, [
...> model: "llama2",
...> messages: messages,
...> stream: true,
...> ])
{:ok, Ollama.Streaming{}}
@spec check_blob(client(), Ollama.Blob.digest() | binary()) :: response()
Checks a blob exists in ollama by its digest or binary data.
Examples
iex> Ollama.check_blob(client, "sha256:fe938a131f40e6f6d40083c9f0f430a515233eb2edaa6d72eb85c50d64f2300e")
{:ok, true}
iex> Ollama.check_blob(client, "this should not exist")
{:ok, false}
Generates a completion for the given prompt using the specified model. Optionally streamable.
Options
:model
(String.t/0
) - Required. The ollama model name.:prompt
(String.t/0
) - Required. Prompt to generate a response for.:images
(list ofString.t/0
) - A list of Base64 encoded images to be included with the prompt (for multimodal models only).:system
(String.t/0
) - System prompt, overriding the model default.:template
(String.t/0
) - Prompt template, overriding the model default.:context
- The context parameter returned from a previousf:completion/2
call (enabling short conversational memory).:format
(String.t/0
) - Set the expected format of the response (json
).:stream
(boolean/0
) - See section on streaming. The default value isfalse
.:keep_alive
- How long to keep the model loaded.:options
- Additional advanced model parameters.
Examples
iex> Ollama.completion(client, [
...> model: "llama2",
...> prompt: "Why is the sky blue?",
...> ])
{:ok, %{"response": "The sky is blue because it is the color of the sky.", ...}}
# Passing true to the :stream option initiates an async streaming request.
iex> Ollama.completion(client, [
...> model: "llama2",
...> prompt: "Why is the sky blue?",
...> stream: true,
...> ])
{:ok, %Ollama.Streaming{}}
Creates a model with another name from an existing model.
Options
:source
(String.t/0
) - Required. Name of the model to copy from.:destination
(String.t/0
) - Required. Name of the model to copy to.
Example
iex> Ollama.copy_model(client, [
...> source: "llama2",
...> destination: "llama2-backup"
...> ])
{:ok, true}
Creates a blob from its binary data.
Example
iex> Ollama.create_blob(client, data)
{:ok, true}
Creates a model using the given name and model file. Optionally streamable.
Any dependent blobs reference in the modelfile, such as FROM
and ADAPTER
instructions, must exist first. See check_blob/2
and create_blob/2
.
Options
:name
(String.t/0
) - Required. Name of the model to create.:modelfile
(String.t/0
) - Required. Contents of the Modelfile.:stream
(boolean/0
) - See section on streaming. The default value isfalse
.
Example
iex> modelfile = "FROM llama2\nSYSTEM \"You are mario from Super Mario Bros.\""
iex> Ollama.create_model(client, [
...> name: "mario",
...> modelfile: modelfile,
...> stream: true,
...> ])
{:ok, Ollama.Streaming{}}
Deletes a model and its data.
Options
:source
(String.t/0
) - Required. Name of the model to copy from.:destination
(String.t/0
) - Required. Name of the model to copy to.
Example
iex> Ollama.delete_model(client, name: "llama2")
{:ok, true}
Generate embeddings from a model for the given prompt.
Options
:model
(String.t/0
) - Required. The name of the model used to generate the embeddings.:prompt
(String.t/0
) - Required. The prompt used to generate the embedding.:keep_alive
- How long to keep the model loaded.:options
- Additional advanced model parameters.
Example
iex> Ollama.embeddings(client, [
...> model: "llama2",
...> prompt: "Here is an article about llamas..."
...> ])
{:ok, %{"embedding" => [
0.5670403838157654, 0.009260174818336964, 0.23178744316101074, -0.2916173040866852, -0.8924556970596313,
0.8785552978515625, -0.34576427936553955, 0.5742510557174683, -0.04222835972905159, -0.137906014919281
]}}
@spec init(Req.url() | keyword() | Req.Request.t()) :: client()
Creates a new Ollama API client. Accepts either a base URL for the Ollama API,
a keyword list of options passed to Req.new/1
, or an existing Req.Request.t/0
struct.
If no arguments are given, the client is initiated with the default options:
@default_req_opts [
base_url: "http://localhost:11434/api",
receive_timeout: 60_000,
]
Examples
iex> client = Ollama.init("https://ollama.service.ai:11434/api")
%Ollama{}
Lists all models that Ollama has available.
Example
iex> Ollama.list_models(client)
{:ok, %{"models" => [
%{"name" => "codellama:13b", ...},
%{"name" => "llama2:latest", ...},
]}}
Downloads a model from the ollama library. Optionally streamable.
Options
:name
(String.t/0
) - Required. Name of the model to pull.:stream
(boolean/0
) - See section on streaming. The default value isfalse
.
Example
iex> Ollama.pull_model(client, name: "llama2")
{:ok, %{"status" => "success"}}
# Passing true to the :stream option initiates an async streaming request.
iex> Ollama.pull_model(client, name: "llama2", stream: true)
{:ok, %Ollama.Streaming{}}
Upload a model to a model library. Requires registering for ollama.ai and adding a public key first. Optionally streamable.
Options
:name
(String.t/0
) - Required. Name of the model to pull.:stream
(boolean/0
) - See section on streaming. The default value isfalse
.
Example
iex> Ollama.push_model(client, name: "mattw/pygmalion:latest")
{:ok, %{"status" => "success"}}
# Passing true to the :stream option initiates an async streaming request.
iex> Ollama.push_model(client, name: "mattw/pygmalion:latest", stream: true)
{:ok, %Ollama.Streaming{}}
Shows all information for a specific model.
Options
:name
(String.t/0
) - Required. Name of the model to show.
Example
iex> Ollama.show_model(client, name: "llama2")
{:ok, %{
"details" => %{
"families" => ["llama", "clip"],
"family" => "llama",
"format" => "gguf",
"parameter_size" => "7B",
"quantization_level" => "Q4_0"
},
"modelfile" => "...",
"parameters" => "...",
"template" => "..."
}}