Version: 1.1.3
Authors: Maxim Fedorov, (maximfca@gmail.com).
Erlang Performance & Benchmarking Suite. Simple way to say "this code is faster than that one".
Build:
$ rebar3 as prod escriptize
Find out how many times per second a function can be run (beware of shell escaping your code!):
$ ./erlperf 'timer:sleep(1).' Code || QPS Rel timer:sleep(1). 1 500 100%
Run erlperf with two concurrent samples of code
$ ./erlperf 'rand:uniform().' 'crypto:strong_rand_bytes(2).' --samples 10 --warmup 1 Code Concurrency Throughput Relative rand:uniform(). 1 4303 Ki 100% crypto:strong_rand_bytes(2). 1 1485 Ki 35%
Or just measure how concurrent your code is (example below shows saturation with only 1 process):
$ ./erlperf 'code:is_loaded(local_udp).' --init 'code:ensure_loaded(local_udp).' --squeeze Code || QPS code:is_loaded(local_udp). 1 614 Ki
If you need some initialisation done before running the test, and clean up after:
$ ./erlperf 'pg:join(scope, self()), pg:leave(scope, self()).' --init 'pg:start_link(scope).' --done 'gen_server:stop(scope).' Code || QPS Rel pg:join(scope, self()), pg:leave(scope, self()). 1 287 Ki 100%
Run two versions of code against each other, with some init prior:
$ ./erlperf 'runner(X) -> timer:sleep(X).' --init '1.' 'runner(Y) -> timer:sleep(Y).' --init '2.' Code || QPS Rel pg:join(scope, self()), pg:leave(scope, self()). 1 287 Ki 100%
Determine how well pg2 (removed in OTP 24) is able to have concurrent group modifications when there are no nodes in the cluster:
$ ./erlperf 'runner(Arg) -> ok = pg2:join(Arg, self()), ok = pg2:leave(Arg, self()).' --init_runner 'G = {foo, rand:uniform(10000)}, pg2:create(G), G.' -q Code || QPS runner(Arg) -> ok = pg2:join(Arg, self()), ok = pg2:leave(Arg, 13 76501
Watch the progress of your test running (use -v option):
$ ./erlperf 'rand:uniform().' -q -v YYYY-MM-DDTHH:MM:SS-oo:oo Sched DCPU DIO Procs Ports ETS Mem Total Mem Proc Mem Bin Mem ETS <0.92.0> 2019-06-21T13:25:21-07:00 11.98 0.00 0.47 52 3 20 32451 Kb 4673 Kb 179 Kb 458 Kb 3761 Ki 2019-06-21T13:25:22-07:00 12.57 0.00 0.00 52 3 20 32702 Kb 4898 Kb 201 Kb 460 Kb 4343 Ki <...> 2019-06-21T13:25:54-07:00 100.00 0.00 0.00 63 3 20 32839 Kb 4899 Kb 203 Kb 469 Kb 14296 Ki 2019-06-21T13:25:55-07:00 100.00 0.00 0.00 63 3 20 32814 Kb 4969 Kb 201 Kb 469 Kb 13810 Ki 2019-06-21T13:25:56-07:00 100.00 0.00 0.00 63 3 20 32809 Kb 4964 Kb 201 Kb 469 Kb 13074 Ki Code || QPS rand:uniform(). 8 14918 Ki
Command-line benchmarking does not save results anywhere. It is designed to provide a quick answer to the question "is that piece of code faster".
# Continuous benchmarking Starting erlperf as an Erlang application enables continuous benchmarking. This process is designed to help during development and testing stages, allowing to quickly notice performance regressions.
Usage example (assuming you're running an OTP release, or rebar3 shell for your application):
$ rebar3 shell --sname mynode
> application:start(erlperf). ok. > {ok, Logger} = ep_file_log:start(group_leader()). {ok,<0.235.0>} > {ok, JobPid} = ep_job:start(#{name => myname, init_runner => "myapp:generate_seed().", runner => "runner(Arg) -> Var = mymodule:call(Arg), mymodule:done(Var).", initial_concurrency => 1}). {ok,<0.291.0>} > ep_job:set_concurrency(JobPid, 2). ok. % watch your job performance (local node) % modify your application code, do hot code reload > c:l(mymodule). {module, mymodule}. % stop logging for local node > ep_file_log:stop(Logger). ok % stop the job > ep_job:stop(JobPid). ok. % restart the job (code is saved for your convenience, and can be accessed by name) % makes it easy to restart the job if it crashed due to bad code loaded > ep_job:start(erlperf:load(myname)). {ok,<0.1121.0>} % when finished, just shut everything down with one liner: > application:stop(erlperf). ok.
Benchmarking is done either using already compiled code, or a free-form code source. Latter case involves an extra step to form an Erlang module, compile it and load into VM.
erlperf:run({rand, uniform, [1000]}).
erlperf:run([{rand, uniform, [10]}, {erlang, node, []}]).
,
see [recording call chain](#recording-call-chain) erlperf:run(fun() -> rand:uniform(100) end).
erlperf:run(fun(Init) -> io_lib:format("~tp", [Init]) end).
erlperf:run("runner() -> rand:uniform(20).").
Source code cannot be mixed with MFA. Call chain may contain only complete MFA tuples and cannot be mixed with functions.
Startup and teardownerlperf:run( #{ runner => fun(Arg) -> rand:uniform(Arg) end, init => {pg, start_link, []}, init_runner => fun ({ok, Pid}) -> {total_heap_size, THS} = erlang:process_info(Pid, total_heap_size), THS end, done => fun ({ok, Pid}) -> gen_server:stop(Pid) end } ).Same example with source code:
erlperf:run( #{ runner => "runner(Max) -> rand:uniform(Max).", init => "init() -> pg:start_link().", init_runner => "init_runner({ok, Pid}) -> {total_heap_size, THS} = erlang:process_info(Pid, total_heap_size), THS.", done => "done({ok, Pid}) -> gen_server:stop(Pid)." } ).
erlperf:run({erlang, node, []}, #{concurrency => 2, samples => 10, warmup => 1}).
For list of options available, refer to benchmark reference.
Sometimes it's necessary to measure code running multiple concurrent processes, and find out when it saturates the node. It can be used to detect bottlenecks, e.g. lock contention, single dispatcher process bottleneck etc.. Example (with maximum concurrency limited to 50):
erlperf:run(Code, #{warmup => 1}, #{max => 50}).
All currently running benchmarking jobs can be accessed via programmatic interface. It is possible to run a job continuously, to examine performance gains or losses while doing hot code reload.
Benchmark code database can be maintained for this purpose. Persisting code samples may not present repeatable tests, but allow re-running in the future.
Not yet measured. There may be a better way (using call time tracing instead of counter injections), to be determined later.
Benchmarking can only work with exported functions (unlike ep_prof module, that is also able to record arguments of local function calls).
> f(Trace), Trace = erlperf:record(pg2, '_', '_', 1000). ... % for things working with ETS, isolation is recommended > erlperf:run(Trace, #{isolation => #{}}). ... % Trace can be saved to file before executing: > erlperf:run(#{runner => Trace, name => "pg2"}). % run the trace with > erlperf:run(erlperf:load("pg2")).It's possible to create a Common Test testcase using recorded samples. Just put the recorded file into xxx_SUITE_data:
QPS = erlperf:run(erlperf:load(?config(data_dir, "pg2"))), ?assert(QPS > 500). % catches regression for QPS falling below 500
It's possible to run a job on a separate node in the cluster.
% watch the entire cluster (printed to console) > {ok, ClusterLogger} = ep_cluster_monitor:start(). {ok, <0.211.0>} % also log cluster-wide reports (jobs & sched_util) > {ok, ClusterLogger} = ep_cluster_monitor:start("/tmp/cluster", [time, sched_util, jobs]). {ok, <0.223.0>} % start the benchmarking process in a separate beam > {ok, Node, Job} = erlperf:start(#{ runner => {timer, sleep, [2]}, initial_concurrency => 4}, #{}). {ok,job_53480@centos,<16735.104.0>} % tail the file with "tail -f /tmp/cluster" % change concurrency: > rpc:call(Node, ep_job, set_concurrency, [Job, 8]). ok % watch the log file tailCluster-wide monitoring will reflect changes accordingly.
Generated by EDoc