View Source Telemetry Data
Mix.install(
[
{:guesswork, "~> 0.7"},
{:kino, "~> 0.13"},
{:kino_explorer, "~> 0.1.23"}
])
Collecting Telemetry Data
import Guesswork.Ast
alias Guesswork.Ast.And
alias Guesswork.Ast.Assign
alias Guesswork.Ast.OneOf
alias Guesswork.Answer.Result
alias Guesswork.Query
alias Guesswork.Telemetry
alias Guesswork.Telemetry.QuerySpan
alias Guesswork.Telemetry.QueryRun
require Explorer.DataFrame, as: DF
Guesswork provides data about your queries via :telemetry
events and spans so that
you can figure out what is slowing a query down.
If you would like to hook into the events yourself, take a look at the
Guesswork.Telemetry.EventHandler
module, but a good place to start is just
by looking at the data collected by Guesswork.Telemetry
.
Start by starting up the telemetry GenServer
.
{:ok, pid} = Telemetry.start_link()
Then, we'll rerun the pythagorean triples queries, and see what data we've collected.
You'll notice that these queries take noticibly longer. Running queries with telemetry is expensive (you'll see why below), and should mostly be used for debugging.
query =
Query.new(
term(
And.new([
OneOf.new(a, Stream.iterate(1, &(&1 + 1))),
OneOf.new(b, Stream.iterate(1, &(&1 + 1))),
OneOf.new(c, Stream.iterate(1, &(&1 + 1))),
is(true, fn a, b -> a < b end),
is(0, fn a, b, c -> Integer.pow(c, 2) - Integer.pow(a, 2) - Integer.pow(b, 2) end)
])
)
)
Result.run(query, 10)
Result.run(query, 100)
To pull the data we use the list_query_runs/3
function.
The first thing to note is the ids: query_id
and run_id
.
query_id
is assigned when the query is created by Guesswork.query.new/2
,
and propagated to all the statements within it.
As a result of this implementation, all metrics and spans are for the
query, not the run, which has a unique run_id
per entire result
set.
Note here that we're using an Explorer.DataFrame
for all of our analysis
(and it has a great integration with Livebook).
runs = Telemetry.list_query_runs(pid).values
|> Enum.map(&Map.from_struct/1)
|> DF.new()
Once we have the query_id
(this also could have been pulled from
the query
struct) we pull the metrics for both runs.
The metrics show about 600, and over 6.2 million tests run.
This makes a lot of sense given the above query is so brute force, and must
run so many test to remove possible combinations of a
, b
, and c
.
query_id = runs["query_id"][0]
Telemetry.get_query_metrics(pid, query_id)
The last kind of telemetry data Guesswork
tracks are spans; currently
just spans on calculated unions (Guesswork.Ast.And
).
This is the primary action taken, because it combines possible values.
You should note that most of these spans report no results (unions_calculated
).
This is in line with the number of tests we have to run to get those
110 results.
Telemetry.list_query_spans(pid, query_id, page_size: 1000).values
|> Enum.map(&Map.from_struct/1)
|> Enum.map(&Map.delete(&1, :ctx))
|> Enum.map(fn row -> Map.update(row, :type, "N/A", &Atom.to_string/1) end)
|> DF.new()
|> DF.unnest(:metadata)
Improving Queries
Now that we're collecting this data, lets see if its possible to improve
the performance of our query.
You'll note that that first test (is(true, fn a, b -> a < b)
) exists just
to remove duplicate entries and doesn't remove all the possible options we
could get with simple comparisons.
So, lets try a query that does.
Note that we'll continue to run the query twice to keep some parity.
%Query{id: query_id} = query =
Query.new(
term(
And.new([
OneOf.new(a, Stream.iterate(1, &(&1 + 1))),
OneOf.new(b, Stream.iterate(1, &(&1 + 1))),
OneOf.new(c, Stream.iterate(1, &(&1 + 1))),
is(true, fn a, b, c -> a < b and c > a and c > b end),
is(0, fn a, b, c -> Integer.pow(c, 2) - Integer.pow(a, 2) - Integer.pow(b, 2) end)
])
)
)
Result.run(query, 10)
Result.run(query, 100)
This this worse, much worse. All the times are below.
Telemetry.list_query_runs(pid).values
|> Enum.map(&Map.from_struct/1)
|> DF.new()
To try understand why, lets take a look at the metrics for our new query. You should notice that it took the exact same number of assignments (we just have to try that many options to find our answer set), but way more tests!
This is a side effect of how Guesswork.Ast.And
operates.
As it calculates unions (combining possible answer sets) it removes invalid
answer sets at each iteration.
You should also note that unions grow exponentially and our new test requires
three variables to assigned instead of two.
This means that the so by the test must be run later and by time you can run
the test, there are just more answer sets to filter out!
(It is worth pointing out that the tests themselves are run less often, but
that more computation is still happening here.)
Telemetry.get_query_metrics(pid, query_id)
Using this information, we can guess that its better to split that is
statement up.
%Query{id: query_id} = query =
Query.new(
term(
And.new([
OneOf.new(a, Stream.iterate(1, &(&1 + 1))),
OneOf.new(b, Stream.iterate(1, &(&1 + 1))),
OneOf.new(c, Stream.iterate(1, &(&1 + 1))),
is(true, fn a, b -> a < b end),
is(true, fn b, c -> c > b end),
is(0, fn a, b, c -> Integer.pow(c, 2) - Integer.pow(a, 2) - Integer.pow(b, 2) end)
])
)
)
Result.run(query, 10)
Result.run(query, 100)
This did much better. We can see that it takes a little under 70% of the time of our first query, and it is almost twice as fast our second query!
Telemetry.list_query_runs(pid).values
|> Enum.map(&Map.from_struct/1)
|> DF.new()
Finally, we can confirm that, indeed, this query ran many fewer tests.
Telemetry.get_query_metrics(pid, query_id)