View Source Tutorial
Mix.install(
[
# {:instructor, ">= 0.0.2"}
{:instructor, path: "/Users/thomas/code/instructor_ex"}
],
config: [
instructor: [
adapter: Instructor.Adapters.OpenAI
],
openai: [
api_key: System.fetch_env!("LB_OPENAI_API_KEY"),
http_options: [recv_timeout: 10 * 60 * 1000]
]
]
)
Introduction
Instructor is a library to do structured prompting with OpenAI and open source LLMs. While the idea is pretty simple, through this and the other examples you'll realize how powerful a concept this is.
So first off, what is structure prompting?
What if the LLM returned data conforming to a complicated nested schema that your code knows how to work with? Well, that's structure prompting. It's a way of cohercing the LLM to producing it's response in a known format that your downstream code can handle. In the case of Instructor, we use Ecto to provide those schemas. Good old Ecto, something you're already familiar with.
So, without further ado, let's take define a schema and take it for a spin!
defmodule Politician do
use Ecto.Schema
use Instructor.Validator
@doc """
A description of United States Politicians and the offices that they held,
## Fields:
- first_name: Their first name
- last_name: Their last name
- offices_held:
- office: The branch and position in government they served in
- from_date: When they entered office or null
- until_date: The date they left office or null
"""
@primary_key false
embedded_schema do
field(:first_name, :string)
field(:last_name, :string)
embeds_many :offices_held, Office, primary_key: false do
field(:office, Ecto.Enum,
values: [:president, :vice_president, :govenor, :congress, :senate]
)
field(:from_date, :date)
field(:to_date, :date)
end
end
end
{:module, Politician, <<70, 79, 82, 49, 0, 0, 17, ...>>,
[__schema__: 1, __schema__: 1, __schema__: 1, __schema__: 1, __schema__: 2, __schema__: 2, ...]}
Great, we have our schema describing politicans and the offices they held. Let's notice a few things that may stand out from regular Ecto usage. First, since there is no database backing the schema, it doesn't make sense to give it a primary_key. This also makes sense because there is no sensible value for the LLM to respond with.
Also we use a @doc
on the schema. This isn't just for documentation purposes of the tutorial. Instructor will take any @doc
tag and provide it to the LLM. Generally you'll want to use this to provide semantic descriptions of the fields and general context to the LLM to ensure you get the outputs you want. In our case we want to push the LLM to understand that we are only considering American politicians.
So, let's try asking the LLM to give us some politicians.
Instructor.chat_completion(
model: "gpt-3.5-turbo",
response_model: Politician,
messages: [
%{
role: "user",
content:
"Who won the American 2020 election and what offices have they held over their career?"
}
]
)
{:ok,
%Politician{
first_name: "Joe",
last_name: "Biden",
offices_held: [%Politician.Office{office: :president, from_date: ~D[2021-01-20], to_date: nil}]
}}
Amazing, right? Using nothing more than one of the top libraries in Elixir, Ecto, we were able to get structured output from our LLM. The data returned is ready to be processed by our regular Elixir code. Instructor supports all field types that you can express in Ecto, including embedded and associated schemas.
It's almost as if the LLM inputted the data into a Phoenix Form. All the utilities that you use to process that kind of data, you can use to process the outputs of Instructor.
One of the superpowers of this is that since we're just using changesets under the hood, you can use the same validations that you would use elsewhere in your app. Let's look at that in the next section.
Validations
Instructor provides a lightweight behavior where you can define a callback function that we will call to validate the data returned by the LLM using Ecto changesets. There is nothing fancy to this API. It's just a changeset in and a changeset out.
defmodule NumberSeries do
use Ecto.Schema
use Instructor.Validator
@primary_key false
embedded_schema do
field(:series, {:array, :integer})
end
@impl true
def validate_changeset(changeset) do
changeset
|> Ecto.Changeset.validate_length(:series, min: 10)
|> Ecto.Changeset.validate_change(:series, fn
field, values ->
if Enum.sum(values) |> rem(2) == 0 do
[]
else
[{field, "The sum of the series must be even"}]
end
end)
end
end
{:module, NumberSeries, <<70, 79, 82, 49, 0, 0, 18, ...>>, {:validate_changeset, 1}}
In this albeit contrived example, we're going to get the LLM to return a series of numbers and validate whether it has at least 10 numbers and that the sum of the series is even.
When we ask for fewer than ten numbers, Instructor will return an error tuple with a change set that is invalid.
{:error, changeset} =
Instructor.chat_completion(
model: "gpt-3.5-turbo",
response_model: NumberSeries,
messages: [
%{role: "user", content: "Give me the first 5 integers"}
]
)
# Render our the errors down to strings.
errors =
Ecto.Changeset.traverse_errors(changeset, fn {msg, opts} ->
Regex.replace(~r"%{(\w+)}", msg, fn _, key ->
opts |> Keyword.get(String.to_existing_atom(key), key) |> to_string()
end)
end)
{changeset.changes, errors}
{%{series: [1, 2, 3, 4, 5]},
%{series: ["The sum of the series must be even", "should have at least 10 item(s)"]}}
Now the beauty of this is that since we have human readable errors from our validations, we can just turn around and pass those back into the LLM to get it to fix its own errors.
Instructor provides a convenience parameter, max_retries
for you in the initial call which will retry against the validations up to n times.
Instructor.chat_completion(
model: "gpt-3.5-turbo",
response_model: NumberSeries,
max_retries: 10,
messages: [
%{role: "user", content: "Give some random integers"}
]
)
23:59:13.641 [debug] Retrying LLM call for Elixir.NumberSeries...
{:ok, %NumberSeries{series: [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]}}