erlperf

Simple way to say "this code is faster than that one".

Build:

$ rebar3 escriptize
$ cp _build/default/bin/erlperf ./

TL; DR

Find out how many times per second a function can be run (beware of shell escaping your code!):

$ ./erlperf 'timer:sleep(1).'
Code                    ||        QPS     Rel
timer:sleep(1).          1        500    100%

Run erlperf with two concurrently running samples of code

$ ./erlperf 'rand:uniform().' 'crypto:strong_rand_bytes(2).' --samples 10 --warmup 1
Code                         Concurrency   Throughput   Relative
rand:uniform().                        1      4303 Ki       100%
crypto:strong_rand_bytes(2).           1      1485 Ki        35%

Or just measure how concurrent your code is (example below shows saturation with only 1 process):

$ ./erlperf 'pg2:join(foo, self()), pg2:leave(foo, self()).' --init 1 'pg2:create(foo).' --squeeze
Code                                               ||        QPS
pg2:join(foo, self()), pg2:leave(foo, self()).      1      29836

If you need some initialisation done before running the test:

$ ./erlperf 'pg2:join(foo, self()), pg2:leave(foo, self()).' --init 1 'pg2:create(foo).' --done 1 'pg2:delete(foo).'
Code                                                   ||        QPS     Rel
pg2:join(foo, self()), pg2:leave(foo, self()).          1      30265    100%

Determine how well pg2 is able to have concurrent group modifications when there are no nodes in the cluster:

$ ./erlperf 'runner(Arg) -> ok = pg2:join(Arg, self()), ok = pg2:leave(Arg, self()).' --init_runner 1 'G = {foo, rand:uniform(10000)}, pg2:create(G), G.' -q
ode                                                               ||        QPS
runner(Arg) -> ok = pg2:join(Arg, self()), ok = pg2:leave(Arg,     13      76501

Watch the progress of your test running (use -v option):

$ ./erlperf dane$ _build/default/bin/erlperf 'rand:uniform().' -q -v

YYYY-MM-DDTHH:MM:SS-oo:oo  Sched   DCPU    DIO    Procs    Ports     ETS Mem Total  Mem Proc   Mem Bin   Mem ETS  <0.92.0>
2019-06-21T13:25:21-07:00  11.98   0.00   0.47       52        3      20  32451 Kb   4673 Kb    179 Kb    458 Kb    3761 Ki
2019-06-21T13:25:22-07:00  12.57   0.00   0.00       52        3      20  32702 Kb   4898 Kb    201 Kb    460 Kb    4343 Ki
<...>
2019-06-21T13:25:54-07:00 100.00   0.00   0.00       63        3      20  32839 Kb   4899 Kb    203 Kb    469 Kb   14296 Ki
2019-06-21T13:25:55-07:00 100.00   0.00   0.00       63        3      20  32814 Kb   4969 Kb    201 Kb    469 Kb   13810 Ki
2019-06-21T13:25:56-07:00 100.00   0.00   0.00       63        3      20  32809 Kb   4964 Kb    201 Kb    469 Kb   13074 Ki
Code                ||        QPS
rand:uniform().      8   14918 Ki

Command-line benchmarking does not save results anywhere. It is designed to provide a quick answer to the question "is that piece of code faster".

Continuous benchmarking

Starting erlperf as an Erlang application enables continuous benchmarking. This process is designed to help during development and testing stages, allowing to quickly notice performance regressions.

Usage example (assuming you're running an OTP release, or rebar3 shell for you application):

$ rebar3 shell --sname mynode
> application:start(erlperf).
ok.

> {ok, Logger} = ep_file_log:start(group_leader()).
{ok,<0.235.0>}

> {ok, JobPid} = ep_job:start(#{name => myname, 
    init_runner => "myapp:generate_seed().", 
    runner => "runner(Arg) -> Var = mymodule:call(Arg), mymodule:done(Var).",
    initial_concurrency => 1}).
{ok,<0.291.0>}

> ep_job:set_concurrency(JobPid, 2).
ok.

% watch your job performance (local node)

% modify your application code, do hot code reload
> c:l(mymodule).
{module, mymodule}.

% stop logging for local node
> ep_file_log:stop(Logger).
ok

% stop the job
> ep_job:stop(JobPid).
ok.

% restart the job (code is saved for your convenience, and can be accessed by name)
% makes it easy to restart the job if it crashed due to bad code loaded
> ep_job:start(erlperf:load(myname)).
{ok,<0.1121.0>}

% when finished, just shut everything down with one liner:
> application:stop(erlperf).
ok.

Reference Guide

Terms

runner: process repeatedly executing the same set of calls
hooks: functions run when the new job starts (init), new runner starts (init_runner), or when job is stopped (done)
job: single instance of the running benchmark (multiple processes)
concurrency: how many processes are running concurrently, executing runner code
throughput: total number of calls per sampling interval (for all concurrent processes)

Build

$ rebar3 do compile, escriptize

Usage

Benchmarking is done either using already compiled code, or a free-form code source. Latter case involves an extra step to form an Erlang module, compile it and load into VM.

One-time throughput run

Supported use-cases:

single run for MFA: erlperf:benchmark(rand, uniform, [1000]).
call chain: erlperf:benchmark([{rand, uniform, [10]}, {erlang, node, []}]}., see recording call chain
function object: erlperf:benchmark(fun() -> rand:uniform(100) end).
function object with an argument: erlperf:benchmark(fun(Init) -> io_lib:format("~tp", [Init]) end).
source code: erlperf:benchmark("runner() -> rand:uniform(20).").

Source code cannot be mixed with MFA. Call chain may contain only complete MFA tuples and cannot be mixed with functions.

Startup and teardown

init, done and init_runner hooks are available (there is no done_runner, because it is never stopped in a graceful way)
init_runner and done may be defined with arity 0 and 1 (in the latter case, result of init/0 passed as an argument)
runner could be of any arity, erlang:function_exported/3 used to determine if it accepts exact or +1 argument from init_runner

Example with mixed MFA:

   erlperf:benchmark(
       #{
           runner => fun(Arg) -> rand:uniform(Arg),
           init => 
               {myserver, start_link(), []},
           init_runner => 
               fun ({ok, Pid}) -> 
                   {total_heap_size, THS} = erlang:process_info(Pid, total_heap_size),
                   THS
               end,
           done => fun ({ok, Pid}) -> gen_server:stop(Pid) end
       }
   ).

Same example with source code:

erlperf:benchmark(
    {
        runner => "runner(Max) -> rand:uniform(Max).",
        init => "init() -> myserver:start_link().",
        init_runner => "init_runner() -> "
            init_runner({ok, Pid}) ->
            {total_heap_size, THS} = erlang:process_info(Pid, total_heap_size),
            THS.",
        done => "done({ok, Pid}) -> gen_server:stop(Pid)."
    }       
).

Measurement options

Benchmarking is done by counting number of erlang:apply() calls done for a specified period of time (sample_duration). By default, erlperf performs no warmup cycle, then takes 3 consecutive samples, using concurrency of 1 (single runner). It is possible to tune this behaviour by specifying run_options:

    erlperf:benchmark(Code, #{concurrency => 2, samples => 10, warmup => 1}).

For list of options available, refer to benchmark reference.

Concurrency test

Sometimes it's necessary to measure code running multiple concurrent processes, and find out when it saturates the node. It can be used to detect bottlenecks, e.g. lock contention, single dispatcher process bottleneck etc.. Example (with maximum concurrency limited to 50):

    erlperf:benchmark(Code, #{warmup => 1}, #{max => 50}).

Continuous benchmarking

All currently running benchmarking jobs can be accessed via programmatic interface. It is possible to run a job continuously, to examine performance gains or losses while doing hot code reload.

Saving benchmarks for future use

Benchmark code database can be maintained for this purpose. Persisting code samples may not present repeatable tests, but allow re-running in the future.

Benchmarking overhead

Not yet measured. There may be a better way (using call time tracing instead of counter injections), to be determined later.

Experimental: recording call chain

Benchmarking can only work with exported functions (unlike ep_prof module, that is also able to record arguments of local function calls).

    > f(Trace), Trace = erlperf:record(pg2, '_', '_', 1000).
    ...
    
    % for things working with ETS, isolation is recommended
    > erlperf:run(Trace, #{isolation => #{}}).
    ...
    
    % Trace can be saved to file before executing:
    > erlperf:run(#{runner => Trace, name => "pg2"}).
    
    % run the trace with
    > erlperf:run(erlperf:load("pg2")).

It's possible to create a Common Test testcase using recorded samples. Just put the recorded file into xxx_SUITE_data:

    QPS = erlperf:run(erlperf:load(?config(data_dir, "pg2"))),
    ?assert(QPS > 500). % catches regression for QPS falling below 500

Experimental: starting jobs in a cluster

It's possible to run a job on a separate node in the cluster.

    % watch the entire cluster (printed to console)
    > {ok, ClusterLogger} = ep_cluster_monitor:start().
    {ok, <0.211.0>}
    
    % also log cluster-wide reports (jobs & sched_util)
    > {ok, ClusterLogger} = ep_cluster_monitor:start("/tmp/cluster", [time, sched_util, jobs]).
    {ok, <0.223.0>}

    % start the benchmarking process in a separate beam
    > {ok, Node, Job} = erlperf:start(#{
        runner => {timer, sleep, [2]}, initial_concurrency => 4}, 
        #{}).
    {ok,job_53480@centos,<16735.104.0>}

    % tail the file with "tail -f /tmp/cluster"

    % change concurrency:
    > rpc:call(Node, ep_job, set_concurrency, [Job, 8]).
    ok
    
    % watch the log file tail

Cluster-wide monitoring will reflect changes accordingly.

uldza / erlperf Goto Github PK

erlperf's Introduction

erlperf

TL; DR

Continuous benchmarking

Reference Guide

Terms

Build

Usage

One-time throughput run

Measurement options

Concurrency test

Continuous benchmarking

Saving benchmarks for future use

Benchmarking overhead

Experimental: recording call chain

Experimental: starting jobs in a cluster

erlperf's People

Contributors

Watchers

Recommend Projects

Recommend Topics

Recommend Org