View Source erlarg - v1.0.0

erlarg CI

An Erlang lib that parsed a list of arguments into structured data.
Useful for handling options/parameters of escript

Installation

Add erlarg to in the deps of your rebar.config:

{deps, [{erlarg, "1.0.0"}]}
% or
{depts, [{erlarg, {git, "https://github.com/Eptwalabha/erlarg.git", {tag, "v1.0.0"}}}]}

If you're building an escript, add erlarg to the list of apps to include in the binary

{escript_incl_apps, [erlarg, ]}.

fetch and compile the dependencies of your project:

rebar3 compile --deps_only

That's it, you're good to go.

How does it work ?

Imagine this command :

./my-script --limit=20 -m 0.25 --format "%s%t" -o output.tsv -

The main/1 function of my-script will receive this list of arguments:

["--limit=20", "-m", "0.25", "--format", "%s%t", "-o", "output.tsv", "-"]

The function erlarg:parse will help you convert them into a structured data:

main(Args) ->
    Syntax = {any, [erlarg:opt({"-l", "--limit"}, limit, int),
                    erlarg:opt({"-f", "--format"}, format, binary),
                    erlarg:opt("-o", file, string),
                    erlarg:opt("-", stdin),
                    erlarg:opt({"-m", "--max"}, max, float)
                   ]}
    {ok, {Result, RemainingArgs} = erlarg:parse(Args, Syntax),
    ...

For this example, parse will return this proplist:

 % Result
[{limit, 20},
 {max, 0.25},
 {format, <<"%s%t">>},
 {file, "output.tsv"},
 stdin].

The functions erlarg:parse/2 & erlarg:parse/3 will transform a list of arguments into a structured data.

  • Args: A list of arguments (generaly what's given to main/1)
  • Syntax: The syntax (or the specification) that describes how the arguments should be parsed
  • Aliases: [optional] the list of options, types or sub-syntax that are referenced in Syntax

Syntax

The syntax will describe to the parser how to handle each arguments (Args). It will consume each argument one by one while building the structured data.

A syntax could be any of those things:

  • a type
  • a named type
  • a custom type
  • an option
  • an alias
  • a sub-syntax (which is a syntax itself)
  • a syntax operator
  • a list of all the above

It can be pretty complex, but for now, let's go simple.

Imagine this fictionnal script print_n_time that takes a string and an integer as argument

# this will print the string "hello" 3 times
$ print_n_time hello 3

Here's the simplest spec needed to handle the arguments:

Syntax = [string, int].
erlarg:parse(Args, Syntax). % here Args = ["hello", "3"]
{ok, {["hello", 3], []}} % erlang:parse/2 result

We explicitly asked the parser to handle two arguments, the first <u>must</u> be a string, the second <u>must</u> be an int.
If if the parsing is successful, it will return the following tuple:

{ok, {Data, RemainingArgs}}.

Where Data is the structured data generated by the parser (["hello", 3]) and RemainingArgs is the list of arguments not consumed by the parser ([]).

Parsing failure

If the parser encounter a problem with an argument, it will fail and return the nature of the problem:

> erlarg:parse(["world"], [int]).
{error, {not_int, "word"}} % it failed to convert the word "world" into an int

or

> erlang:parse(["one"], [string, string]). % expect two strings but only got one
{error, {missing, arg}}

[!TIP] These errors can be used to explain to the user what's wrong with the command it typed

Remaining Args

Remaining args are the arguments not consumed by the parser when this one terminates successfuly.
If we add some extra arguments at the end of our command:

$ print_n_time hello 3 some extra arguments

this time, calling erlarg:parse/2 with the same syntax as before will give this result:

Syntax = [string, int].
{ok, {_, RemainingArgs}} = erlarg:parse(Args, Syntax).
["some", "extra", "arguments"] % RemainingArgs

The parser will consume the two first arguments, the remaining argument will be returned in the RemainingArgs.

[!NOTE] Having unconsumed arguments does not generate an error

Types

The parser can convert the argument to more types than just string and int.
Here are all the types currently available :

  • int: cast the argument into an int
  • float: cast the argument into a float (will cast int into float)
  • number: cast the argument into an int. If it fails it will cast the argument into a float
  • string: returns the given argument
  • binary: cast the argument into a binary list
  • atom: cast the arg to an atom
  • bool: return the boolean value of the arg
syntaxargresultnote
int"1"1-
int"1.2"errornot an int
float"1.2"1.2-
float"1"1.0cast int into float
float"1.234e2"123.4-
number"1"1-
number"1.2"1.2-
string"abc""abc"-
binary"äbc"<<"äbc"/utf8>>use unicode:characters_to_binary
atom"super-top"'super-top'-

the bool conversion:

argboolnote
"true"truecase insensitive
"yes"true
"abcd"trueany non-empty string
"1"true
"0.00001"true
"false"falsecase insensitive
"no"false
""falseempty-string
"0"false
"0.0"false

[!TIP] converting an argument into string, binary, bool or atom it will always succeed.

If you need more complicated "type", see the chapter on Custom types

Naming parameters

Converting an argument into a specific type is important, but it doesn't really help us understand what these values are for:

> Syntax = [string, int].
> {ok, {Result, _}} = erlarg:parse(["hello", "3"], Syntax).
["hello", 3]. % Result

To avoid this issue, you can give "name" to the parsed parameters with the following syntax:

{Name :: atom(), Type :: base_type()}

If we rewrite the syntax as such:

Syntax = [{text, string()}, {nbr, int}].
{ok, {Result, _}} = erlarg:parse(["hello", "3"], Syntax).
[{text, "hello"}, {nbr, 3}] % Result

you can even name a list of parameters if you want:

Syntax = [{a, [string, {a2, float}]}, {b, binary}],
{ok, {Result, _}} = erlang:parse(["abc", "2.3", "bin"], Syntax).
[{a, ["abc", {a2, 2.3}]}, {b, <<"bin">>}] % Result

Options

Naming and casting parameters into types is neat, but most programs use options. An option is an argument that usually (not always…) starts with dash and has zero or more parameters.

$ date -d --utc --date=STRING

Option can have several formats a short one (a dash followed by a letter eg. -v) and/or a long one (double dash and a word eg. --version)

This table summarizes the formats handled/recognized by the parser:

formatnote
-s
-s <u>VALUE</u>
-s<u>VALUE</u>same as -s VALUE
-abc <u>VALUE</u>same as -a -b -c VALUE
-abc<u>VALUE</u>same as -a -b -c VALUE
--long
--long <u>VALUE</u>
--long=<u>VALUE</u>

In this chapter, we'll see how to tell the parser how to recognise three kind of options:

  • option without parameter
  • option with parameters
  • option with sub options

option without parameter

$ grep -v "bad"
$ grep --invert-match "bad"

We can define this option with erlarg:opt like so:

> Syntax = [erlarg:opt({"-v", "--invert-match"}, invert_match)].
> {ok, {Result, _}} = erlarg:parse(["-v"], Syntax),
[invert_match] % Result

The first parameter of erlarg:opt is the option:

{"-s", "--long"} % short and long options
"-s" % only short option
{"-s", undefined} % same as above
{undefined, "--long"} % only long option

The second parameter is the name of the option, in this case invert_match

option with parameter(s)

Option can have parameters

$ date --date 'now -3 days'
$ date --date='now -3 days'
$ date -d'now -3 days'
> Syntax = [erlarg:opt({"-d", "--date"}, date, string)].
> {ok, {Result, _}} = erlarg:parse(["--date", "now -3 days"], date, string).
[{date, "now -3 days"}] % Result

The third parameter is the syntax of the parameters expected by the option. In this case after matching the argument --date this option is expecting a string ("now -3 days").

Maybe one of the option of your program is expecting two parameters ? No problem :

erlang:opt({"-d", "--dimension"}, dimension, [int, string]}).
[{dimension, [3, "inch"]}] % Result for "-d 3 inch"

You can even use name

erlang:opt({"-d", "--dimension"}, dimension, [{size, int}, {unit, string}]).
[{dimension, [{size, 3}, {unit, "inch"}]}] % Result for "-d 3 inch"

option with sub-option(s):

Because the third parameter is a syntax, and because an option is a syntax itself, that means you can put options into option :

$ my-script --opt1 -a "param of a" -b "param of opt1" --opt2 …

In this fictionnal program, the option --opt1 has two sub-options (-a that expects a parameter and -b that doesn't). We can define opt1 this way:

Opt1 = erlarg:opt({"-o", "--opt1"}, % option
                  opt1, % option's name 
                  [erlarg:opt("-a", a, string), % sub-option 1
                   erlarg:opt("-b", b),  % sub-option 2
                   {value, string} % the param under the name 'value'
                  ]).
{ok, {Result, _}} = erlarg:parse(["--opt1", "-a", "abc", "-b", "def"], Opt1).
[{opt1, [{a, "abc"}, b, {value, "def"}]}] % Result

Well… that's quite unreadable… fortunately, you can use Aliases to avoid this mess.

Aliases

Aliases, let you define all your options, sub-syntax and custom types in a map. It helps keep the Syntax clear and readable.

Aliases = #{
    option1 => erlarg:opt({"-o", "--opt1"}, opt1, [opt_a, opt_b, {value, string}]),
    option2 => erlarg:opt({undefined, "--opt2"}, opt2),
    opt_a => erlarg:opt("-a", a, string),
    opt_b => erlarg:opt("-b", b)
},
Syntax = [option1, option2],
{ok, {Result, _}} = erlarg:parse(["--opt1", "-a", "abc", "-b", "def", "--opt2"],
                                 Syntax, Aliases).
[{opt1, [{a, "abc"}, b, {value, "def"}]}, opt2] % Result

Here Syntax is a list of two aliases, option1 and option2

Syntax operators

Operator tells the parser how to handle a list of syntax

sequence operator

Take the following syntax:

[opt({"-d", "--date"}, date, string), opt({"-u", "--utc"}, utc)]

It would parse this command without problem:

$ date -d "now -3 days" --utc # yay!

But will crash with this one:

$ date --utc --date="now -3 days" # boom !

Why ? Aren't these two commands identical ?
That's because a list of syntax is considered by the parser as a sequence operator :

[syntax1, syntax2, ]

A sequence is expecting the arguments to match in the same order as the elements of the list. The first argument must match syntax1, the second syntax2, …) if any fails, the whole sequence fails.

All elements of the list must succeed in order for the operator to succeed.

syntaxargsresultnote
[int, string]["1", "a"][1, "a"]
[int]["1", "a"][1]remaining: ["a"]
[int, int]["1", "a"]error"a" isn't an int
[int, string, int]["1", "a"]errormissing a third argument

So how to parse arguments if we're not sure of they order… moreover, some option are… optionnal ! how do we do ? That's where the any operator comes to play.

any operator

format:

{any, [syntax1, syntax2, ]}

The parser will try to consume arguments as long as one of syntax matches. If an element of the syntax fails, the operator fails.

syntaxargsresultnote
{any, [int]}["1", "2", "abc"][1, 2]remaining: ["abc"]
{any, [{key, int}]}["1", "2"][{key, 1}, {key, 2}]
{any, [int, {s, string}]}["1", "2", "abc", "3"][1, 2, {s, "abc"}, 3]
{any, [string]}["1", "-o", "abc", "3"]["1", "-o", "abc", "3"]even if "-o" is an option

No matter the number of matching element, any will always succeed. If nothing matches no arguments will be consumed.

[!NOTE] Keep in mind that if the list given to any contains types like string or binary, it will consume all the remaining arguments.
{any, [string, custom_type]}, custom_type will never be executed because the type string will always consume argument

first

format:

{first, [syntax1, syntax2, ]}

The parser will return the first element of the syntax to succeed. It'll fail if no element matches.
The following table use Args = ["a", "b", "1"]

syntaxresultremaining
{first, [int]}[1]["2", "a", "3", "b"]
{first, [{opt, int}]}[{opt, 1}]["a", "3", "b"]
{any, [int, {b, binary}]}[1, 2, {b, <<"a">>}, 3, {b, <<"b">>}][]
{any, [string]}["1", "2", "a", "3", "b"][]

Custom types

Sometime, you need to perfom some operations on an argument or do more complexe verifications. This is what custom type is for.
A custom type is a function that takes a list of arguments and return the formated / checked value to the parser:

-spec fun(Args) -> {ok, Value, RemainingArgs} | Failure) where
    Args :: args(),
    Value :: any(),
    RemainingArgs :: args(),
    Failure :: any().
  • Args: The list of arguments not yet consumed by the parser
  • Value: The Value you want to return to the parser
  • RemainingArgs: The list of arguments your function didn't consumed
  • Failure: some explanation on why the function didn't accept the argument

Example 1:
Let say your script has an option -f FILE where FILE must be an existing file. In this case the type string won't be enought. You could write your own function to perform this check:

existing_file([File | RemainingArgs]) ->
    case filelib:is_regular(File) of
        true -> {ok, File, RemainingArgs};
        _ -> {not_a_file, File}
    end.

To use your custom type:

Spec = #{
    syntax => {any, [file]},
    definitions => #{
        file => erlarg:opt({"-f", "--file"}, existing_file),
        existing_file => fun existing_file/1
    }
}.

or directly as a syntax:

Spec = {any, [{file, erlarg:opt({"-f", "--file"}, fun existing_file/1)}]}.

Example 2:
In this case, your script needs to fetch the informations of a particular user from a config file with the option --consult USERS_FILE USER_ID where USERS_FILE is the file containing the users data and USER_ID is the id of the user:

get_user_config([DatabaseFile, UserID | RemainingArgs]) ->
    case file:consult(DatabaseFile) of
        {ok, Users} ->
            case proplists:get_value(UserID, Users, not_found) of
                not_found -> {user_not_found, UserID};
                UserData -> {ok, UserData, RemainingArgs}
            end;
        Error -> {cannot_consult, DatabaseFile, Error}
    end;
get_user_config(_) ->
    {badarg, missing_arguments}.