View Source erlarg - v1.0.0
An Erlang lib that parsed a list of arguments into structured data.
Useful for handling options/parameters of escript
Installation
Add erlarg
to in the deps
of your rebar.config
:
{deps, [{erlarg, "1.0.0"}]}
% or
{depts, [{erlarg, {git, "https://github.com/Eptwalabha/erlarg.git", {tag, "v1.0.0"}}}]}
If you're building an escript
, add erlarg
to the list of apps to include in the binary
{escript_incl_apps, [erlarg, …]}.
fetch and compile the dependencies of your project:
rebar3 compile --deps_only
That's it, you're good to go.
How does it work ?
Imagine this command :
./my-script --limit=20 -m 0.25 --format "%s%t" -o output.tsv -
The main/1
function of my-script
will receive this list of arguments:
["--limit=20", "-m", "0.25", "--format", "%s%t", "-o", "output.tsv", "-"]
The function erlarg:parse
will help you convert them into a structured data:
main(Args) ->
Syntax = {any, [erlarg:opt({"-l", "--limit"}, limit, int),
erlarg:opt({"-f", "--format"}, format, binary),
erlarg:opt("-o", file, string),
erlarg:opt("-", stdin),
erlarg:opt({"-m", "--max"}, max, float)
]}
{ok, {Result, RemainingArgs} = erlarg:parse(Args, Syntax),
...
For this example, parse
will return this proplist:
% Result
[{limit, 20},
{max, 0.25},
{format, <<"%s%t">>},
{file, "output.tsv"},
stdin].
The functions erlarg:parse/2
& erlarg:parse/3
will transform a list of arguments into a structured data.
Args
: A list of arguments (generaly what's given tomain/1
)Syntax
: The syntax (or the specification) that describes how the arguments should be parsedAliases
: [optional] the list of options, types or sub-syntax that are referenced inSyntax
Syntax
The syntax will describe to the parser how to handle each arguments (Args
).
It will consume each argument one by one while building the structured data.
A syntax could be any of those things:
- a type
- a named type
- a custom type
- an option
- an alias
- a sub-syntax (which is a syntax itself)
- a syntax operator
- a list of all the above
It can be pretty complex, but for now, let's go simple.
Imagine this fictionnal script print_n_time
that takes a string and an integer as argument
# this will print the string "hello" 3 times
$ print_n_time hello 3
Here's the simplest spec needed to handle the arguments:
Syntax = [string, int].
erlarg:parse(Args, Syntax). % here Args = ["hello", "3"]
{ok, {["hello", 3], []}} % erlang:parse/2 result
We explicitly asked the parser to handle two arguments, the first <u>must</u> be a string
, the second <u>must</u> be an int
.
If if the parsing is successful, it will return the following tuple:
{ok, {Data, RemainingArgs}}.
Where Data
is the structured data generated by the parser (["hello", 3]
) and RemainingArgs
is the list of arguments not consumed by the parser ([]
).
Parsing failure
If the parser encounter a problem with an argument, it will fail and return the nature of the problem:
> erlarg:parse(["world"], [int]).
{error, {not_int, "word"}} % it failed to convert the word "world" into an int
or
> erlang:parse(["one"], [string, string]). % expect two strings but only got one
{error, {missing, arg}}
[!TIP] These errors can be used to explain to the user what's wrong with the command it typed
Remaining Args
Remaining args are the arguments not consumed by the parser when this one terminates successfuly.
If we add some extra arguments at the end of our command:
$ print_n_time hello 3 some extra arguments
this time, calling erlarg:parse/2
with the same syntax as before will give this result:
Syntax = [string, int].
{ok, {_, RemainingArgs}} = erlarg:parse(Args, Syntax).
["some", "extra", "arguments"] % RemainingArgs
The parser will consume the two first arguments, the remaining argument will be returned in the RemainingArgs
.
[!NOTE] Having unconsumed arguments does not generate an error
Types
The parser can convert the argument to more types than just string
and int
.
Here are all the types currently available :
int
: cast the argument into an intfloat
: cast the argument into a float (will cast int into float)number
: cast the argument into an int. If it fails it will cast the argument into a floatstring
: returns the given argumentbinary
: cast the argument into a binary listatom
: cast the arg to an atombool
: return the boolean value of the arg
syntax | arg | result | note |
---|---|---|---|
int | "1" | 1 | - |
int | "1.2" | error | not an int |
float | "1.2" | 1.2 | - |
float | "1" | 1.0 | cast int into float |
float | "1.234e2" | 123.4 | - |
number | "1" | 1 | - |
number | "1.2" | 1.2 | - |
string | "abc" | "abc" | - |
binary | "äbc" | <<"äbc"/utf8>> | use unicode:characters_to_binary |
atom | "super-top" | 'super-top' | - |
the bool
conversion:
arg | bool | note |
---|---|---|
"true" | true | case insensitive |
"yes" | true | |
"abcd" | true | any non-empty string |
"1" | true | |
"0.00001" | true | |
"false" | false | case insensitive |
"no" | false | |
"" | false | empty-string |
"0" | false | |
"0.0" | false |
[!TIP] converting an argument into
string
,binary
,bool
oratom
it will always succeed.
If you need more complicated "type", see the chapter on Custom types
Naming parameters
Converting an argument into a specific type is important, but it doesn't really help us understand what these values are for:
> Syntax = [string, int].
> {ok, {Result, _}} = erlarg:parse(["hello", "3"], Syntax).
["hello", 3]. % Result
To avoid this issue, you can give "name" to the parsed parameters with the following syntax:
{Name :: atom(), Type :: base_type()}
If we rewrite the syntax as such:
Syntax = [{text, string()}, {nbr, int}].
{ok, {Result, _}} = erlarg:parse(["hello", "3"], Syntax).
[{text, "hello"}, {nbr, 3}] % Result
you can even name a list of parameters if you want:
Syntax = [{a, [string, {a2, float}]}, {b, binary}],
{ok, {Result, _}} = erlang:parse(["abc", "2.3", "bin"], Syntax).
[{a, ["abc", {a2, 2.3}]}, {b, <<"bin">>}] % Result
Options
Naming and casting parameters into types is neat, but most programs use options. An option is an argument that usually (not always…) starts with dash and has zero or more parameters.
$ date -d --utc --date=STRING
Option can have several formats a short one (a dash followed by a letter eg. -v
) and/or a long one (double dash and a word eg. --version
)
This table summarizes the formats handled/recognized by the parser:
format | note |
---|---|
-s | |
-s <u>VALUE</u> | |
-s<u>VALUE</u> | same as -s VALUE |
-abc <u>VALUE</u> | same as -a -b -c VALUE |
-abc<u>VALUE</u> | same as -a -b -c VALUE |
--long | |
--long <u>VALUE</u> | |
--long=<u>VALUE</u> |
In this chapter, we'll see how to tell the parser how to recognise three kind of options:
- option without parameter
- option with parameters
- option with sub options
option without parameter
$ grep -v "bad"
$ grep --invert-match "bad"
We can define this option with erlarg:opt
like so:
> Syntax = [erlarg:opt({"-v", "--invert-match"}, invert_match)].
> {ok, {Result, _}} = erlarg:parse(["-v"], Syntax),
[invert_match] % Result
The first parameter of erlarg:opt
is the option:
{"-s", "--long"} % short and long options
"-s" % only short option
{"-s", undefined} % same as above
{undefined, "--long"} % only long option
The second parameter is the name of the option, in this case invert_match
option with parameter(s)
Option can have parameters
$ date --date 'now -3 days'
$ date --date='now -3 days'
$ date -d'now -3 days'
> Syntax = [erlarg:opt({"-d", "--date"}, date, string)].
> {ok, {Result, _}} = erlarg:parse(["--date", "now -3 days"], date, string).
[{date, "now -3 days"}] % Result
The third parameter is the syntax of the parameters expected by the option. In this case after matching the argument --date
this option is expecting a string ("now -3 days"
).
Maybe one of the option of your program is expecting two parameters ? No problem :
erlang:opt({"-d", "--dimension"}, dimension, [int, string]}).
[{dimension, [3, "inch"]}] % Result for "-d 3 inch"
You can even use name
erlang:opt({"-d", "--dimension"}, dimension, [{size, int}, {unit, string}]).
[{dimension, [{size, 3}, {unit, "inch"}]}] % Result for "-d 3 inch"
option with sub-option(s):
Because the third parameter is a syntax, and because an option is a syntax itself, that means you can put options into option :
$ my-script --opt1 -a "param of a" -b "param of opt1" --opt2 …
In this fictionnal program, the option --opt1
has two sub-options (-a
that expects a parameter and -b
that doesn't). We can define opt1
this way:
Opt1 = erlarg:opt({"-o", "--opt1"}, % option
opt1, % option's name
[erlarg:opt("-a", a, string), % sub-option 1
erlarg:opt("-b", b), % sub-option 2
{value, string} % the param under the name 'value'
]).
{ok, {Result, _}} = erlarg:parse(["--opt1", "-a", "abc", "-b", "def"], Opt1).
[{opt1, [{a, "abc"}, b, {value, "def"}]}] % Result
Well… that's quite unreadable… fortunately, you can use Aliases
to avoid this mess.
Aliases
Aliases, let you define all your options, sub-syntax and custom types in a map. It helps keep the Syntax clear and readable.
Aliases = #{
option1 => erlarg:opt({"-o", "--opt1"}, opt1, [opt_a, opt_b, {value, string}]),
option2 => erlarg:opt({undefined, "--opt2"}, opt2),
opt_a => erlarg:opt("-a", a, string),
opt_b => erlarg:opt("-b", b)
},
Syntax = [option1, option2],
{ok, {Result, _}} = erlarg:parse(["--opt1", "-a", "abc", "-b", "def", "--opt2"],
Syntax, Aliases).
[{opt1, [{a, "abc"}, b, {value, "def"}]}, opt2] % Result
Here Syntax
is a list of two aliases, option1
and option2
Syntax operators
Operator tells the parser how to handle a list of syntax
sequence
operator
Take the following syntax:
[opt({"-d", "--date"}, date, string), opt({"-u", "--utc"}, utc)]
It would parse this command without problem:
$ date -d "now -3 days" --utc # yay!
But will crash with this one:
$ date --utc --date="now -3 days" # boom !
Why ? Aren't these two commands identical ?
That's because a list of syntax is considered by the parser as a sequence
operator :
[syntax1, syntax2, …]
A sequence
is expecting the arguments to match in the same order as the elements of the list. The first argument must match syntax1
, the second syntax2
, …) if any fails, the whole sequence fails.
All elements of the list must succeed in order for the operator to succeed.
syntax | args | result | note |
---|---|---|---|
[int, string] | ["1", "a"] | [1, "a"] | |
[int] | ["1", "a"] | [1] | remaining: ["a"] |
[int, int] | ["1", "a"] | error | "a" isn't an int |
[int, string, int] | ["1", "a"] | error | missing a third argument |
So how to parse arguments if we're not sure of they order… moreover, some option are… optionnal ! how do we do ?
That's where the any
operator comes to play.
any
operator
format:
{any, [syntax1, syntax2, …]}
The parser will try to consume arguments as long as one of syntax matches. If an element of the syntax fails, the operator fails.
syntax | args | result | note |
---|---|---|---|
{any, [int]} | ["1", "2", "abc"] | [1, 2] | remaining: ["abc"] |
{any, [{key, int}]} | ["1", "2"] | [{key, 1}, {key, 2}] | |
{any, [int, {s, string}]} | ["1", "2", "abc", "3"] | [1, 2, {s, "abc"}, 3] | |
{any, [string]} | ["1", "-o", "abc", "3"] | ["1", "-o", "abc", "3"] | even if "-o" is an option |
No matter the number of matching element, any
will always succeed. If nothing matches no arguments will be consumed.
[!NOTE] Keep in mind that if the list given to
any
contains types likestring
orbinary
, it will consume all the remaining arguments.{any, [string, custom_type]}
,custom_type
will never be executed because the typestring
will always consume argument
first
format:
{first, [syntax1, syntax2, …]}
The parser will return the first element of the syntax to succeed.
It'll fail if no element matches.
The following table use Args = ["a", "b", "1"]
syntax | result | remaining |
---|---|---|
{first, [int]} | [1] | ["2", "a", "3", "b"] |
{first, [{opt, int}]} | [{opt, 1}] | ["a", "3", "b"] |
{any, [int, {b, binary}]} | [1, 2, {b, <<"a">>}, 3, {b, <<"b">>}] | [] |
{any, [string]} | ["1", "2", "a", "3", "b"] | [] |
Custom types
Sometime, you need to perfom some operations on an argument or do more complexe verifications. This is what custom type is for.
A custom type is a function that takes a list of arguments and return the formated / checked value to the parser:
-spec fun(Args) -> {ok, Value, RemainingArgs} | Failure) where
Args :: args(),
Value :: any(),
RemainingArgs :: args(),
Failure :: any().
Args
: The list of arguments not yet consumed by the parserValue
: The Value you want to return to the parserRemainingArgs
: The list of arguments your function didn't consumedFailure
: some explanation on why the function didn't accept the argument
Example 1:
Let say your script has an option -f FILE
where FILE
must be an existing file. In this case the type string
won't be enought. You could write your own function to perform this check:
existing_file([File | RemainingArgs]) ->
case filelib:is_regular(File) of
true -> {ok, File, RemainingArgs};
_ -> {not_a_file, File}
end.
To use your custom type:
Spec = #{
syntax => {any, [file]},
definitions => #{
file => erlarg:opt({"-f", "--file"}, existing_file),
existing_file => fun existing_file/1
}
}.
or directly as a syntax:
Spec = {any, [{file, erlarg:opt({"-f", "--file"}, fun existing_file/1)}]}.
Example 2:
In this case, your script needs to fetch the informations of a particular user from a config file with the option --consult USERS_FILE USER_ID
where USERS_FILE
is the file containing the users data and USER_ID
is the id of the user:
get_user_config([DatabaseFile, UserID | RemainingArgs]) ->
case file:consult(DatabaseFile) of
{ok, Users} ->
case proplists:get_value(UserID, Users, not_found) of
not_found -> {user_not_found, UserID};
UserData -> {ok, UserData, RemainingArgs}
end;
Error -> {cannot_consult, DatabaseFile, Error}
end;
get_user_config(_) ->
{badarg, missing_arguments}.