Cldr Messages
Introduction and Getting Started
Implements the ICU Message Format for Elixir.
In any application that addresses audiences from different cultures, the need arises to support the presentation of user interfaces, messages, alerts and other content in the appropriate language for a user.
For nearly 30 years the go-to solution for this requirement in many computer langauges is gettext. There is a full-featured implementation for Elixir that is installed by default with Phoenix with over 10,000,000 downloads.
Given the maturity and widespread adoption of Gettext
, why implement another format? Leveraging the content from the Unicode CLDR project we can address some of the shortcomings of Gettext
. A good description of motivations and differences can be found in this presentation by Mark Davis from Google in 2012.
Two specific shortcomings that the ICU message format addresses:
Grammatical Gender
Many languages inflect in gender specific way. One example in French might be:
# You are the only participant for a male and female
Vous êtes the seul participant
Vous êtes la seule participante
# Married for a male and a female
Marié
Mariée
In Gettext
this requires individual messages and conditional code in the application in order to present the correct message to an audience. This is compounded by the fact that some languages have more than two grammatical genders (most have been two and four but but some are attested with up to 20.
The ICU message format provides a mechanism (the choice format) that helps translator and UX designers implement a single message to easily encapsulate messages conditional on grammatical gender (or any other selector)
Standardised plural rules
Although Gettext
supports pluralisation for messages through the Gettext.Plural module in Elixir and the Gettext
functions like Gettext.ngettext/4
, the plural rules for a language have to be implemented for each message. Give the wide differences in how plural forms are structured in different languages this can be a material challenge. For example:
- English has two plural forms: singular and plural
- French applies the singular rule to two values and a plural form to larger groupings
- Japanese does not differentiate
- Russian has 4 categories
- Arabic has 6 categories
Since CLDR has a strong set of pluralization rules defined for ~500 locales, each of which is supported by ex_cldr for Elixir, the ICU message format can reuse these pluralization rules in a simple and consisten fashion using the [plural format]{#Plural_Format}
Message format overview
ICU message formats are Elixir strings with embedded formatting directives inserted between {}
. Some examples:
# Insert the binding `name` into the string
"My name is {name}"
# Insert a date, formatting in a localized `short` format plus a localized plural form
# for the binding `num_photos`
"On {taken_date, date, short} {name} took {num_photos, plural,
=0 {no photos.}
=1 {one photo.}
other {# photos.}}"
# Insert localized messages based upon the gender of the audience with
# appropriate localized plural forms
"{gender_of_host, select,
female {
{num_guests, plural, offset: 1
=0 {{host} does not give a party.}
=1 {{host} invites {guest} to her party.}
=2 {{host} invites {guest} and one other person to her party.}
other {{host} invites {guest} and # other people to her party.}}}
male {
{num_guests, plural, offset: 1
=0 {{host} does not give a party.}
=1 {{host} invites {guest} to his party.}
=2 {{host} invites {guest} and one other person to his party.}
other {{host} invites {guest} and # other people to his party.}}}
other {
{num_guests, plural, offset: 1
=0 {{host} does not give a party.}
=1 {{host} invites {guest} to their party.}
=2 {{host} invites {guest} and one other person to their party.}
other {{host} invites {guest} and # other people to their party.}}}
}"
Message formatting
Using the above messages as examples:
iex> Cldr.Message.format! "My name is {name}", name: "Kip"
"My name is Kip"
iex> Cldr.Message.to_string! "On {taken_date, date, short} {name} took {num_photos, plural,
=0 {no photos.}
=1 {one photo.}
other {# photos.}}", taken_date: Date.utc_today, name: "Kip", num_photos: 10
"On 8/26/19 Kip took 10 photos."
As of ex_cldr_messages
version 0.3.0 a macro form is introduced which parses the message at compile time in order to optimize performance at run time. To use the macro, a backend module must be imported (or required) into a module that uses formatting. For example:
defmodule SomeModule do
# Import a <backend>.Cldr.Message module
import MyApp.Cldr.Message
def my_function do
format("this is a string with a param {param}", param: 3)
end
end
Installation
def deps do
[
{:ex_cldr_messages, "~> 0.3.0"}
]
end
Documentation is at https://hexdocs.pm/cldr_messages.
To Do
For the initial release. This is a simple function interface to message formatting. Before 1.0 it needs to also have a means like gettext of managing messages in multiple different locales for the same message content.
[X] Ignore whitespace between nested complex arguments at the top level. Example: {:select, {:named_arg, "gender_of_host"}, %{
"female" => [ {:literal, "\n "}, <---- Ignore this when its whitespace only {:plural, {:named_arg, "num_guests"},
[X] Support decimal numbers for selectors
[X] Won't do. Support
spellout
format forMoney.t
types ? (Maybe can't because of floating point RBNF rule limitations)[X] Check for all occurences in README's for
Cldr.get_current_locale/0
and change it toCldr.get_locale/0
[X] Implement explicit
=0
argument selection for plurals[X] Add remaining formatters for dates, times, datetimes
[X] In
ex_money
, if no configureddefault_cldr_backend
, delegate toCldr.default_backend/0
[X] Implement a
{arg, list, format}
formatter that usesex_cldr_lists
[X] Implement a
{arg, unit, format}
formatter that usesex_cldr_units
[X] Implement
offset
[X] Implement custom formats in backend config provider; probably requires updating the
struct
inex_cldr
[X] Implement
selectordinal
[X] Assert that
plural
,select
andselectordinal
all have another
clause[X] Tests
[X] @specs
[ ] Dialyzer. Ask José to push nimble_parsec 0.5.2 to remove combinator errors
[ ] Documentation
[ ] Implement
to_message
for parse trees. This will define a canonical form which we can use to compare messages and create keys.