BeamlabLanguages (beamlab_languages v0.1.0)
View SourceLinguistic metadata for human languages.
Answers questions like:
- Does this language use grammatical gender? Which genders?
- Is it written right-to-left?
- What's the canonical English name? The endonym?
- Can I collapse a BCP 47 tag like
"en-US"to a base code?
All data is curated and embedded at compile time. No runtime file I/O, no GenServer, no ETS, no runtime dependencies.
Gender codes
Genders are returned as strings. Consumers commonly see "m" (masculine),
"f" (feminine), and "n" (neuter), but also "c" (common) for the
Continental Scandinavian and Dutch systems where masculine and feminine
have merged: Danish, Dutch, Norwegian Bokmål via no, and Swedish all
use ["c", "n"]. Pattern-match on all four — a case g do "m" -> ...; "f" -> ...; "n" -> ... end will silently miss those languages.
Quick start
iex> BeamlabLanguages.has_gender?("fr")
true
iex> BeamlabLanguages.genders("de")
["m", "f", "n"]
iex> BeamlabLanguages.direction("ar")
:rtl
iex> BeamlabLanguages.normalize("en-US")
"en"Every function that takes a language code runs normalize/1 on it
internally — pass "en-US", "FR", or " fr " and lookups still work.
Roadmap
Planned for future versions and intentionally not in v1: localized language names, plural rules, articles, case marking, noun classes, scripts, IPA inventory, honorific levels.
Summary
Functions
Returns the writing direction.
Returns the list of gender codes a language uses.
Returns the language struct for a code, or nil if unknown.
Returns true iff the language uses grammatical gender.
Returns true iff the code maps to a known language.
Lists every known language struct, sorted by code.
Lists every known 2-letter base code, sorted.
Canonical English name of the language. Returns nil for unknown codes.
Native (endonym) name of the language — what speakers call it themselves.
Normalizes a language input string to a 2-letter base code.
Types
Functions
Returns the writing direction.
Returns :ltr for unknown codes — the safe default for most rendering
contexts where an unknown language shouldn't flip the page layout.
Examples
iex> BeamlabLanguages.direction("ar")
:rtl
iex> BeamlabLanguages.direction("en")
:ltr
iex> BeamlabLanguages.direction("xx")
:ltr
Returns the list of gender codes a language uses.
Returns [] for languages without grammatical gender, and [] for
unknown codes.
Examples
iex> BeamlabLanguages.genders("fr")
["m", "f"]
iex> BeamlabLanguages.genders("de")
["m", "f", "n"]
iex> BeamlabLanguages.genders("en")
[]
@spec get(any()) :: BeamlabLanguages.Language.t() | nil
Returns the language struct for a code, or nil if unknown.
Accepts BCP 47 input — "en-US", "zh-Hans-CN" — and sloppy casing.
Lookups are normalized internally via normalize/1.
Examples
iex> BeamlabLanguages.get("fr").name
"French"
iex> BeamlabLanguages.get("en-US").code
"en"
iex> BeamlabLanguages.get("xx")
nil
Returns true iff the language uses grammatical gender.
Returns false for unknown / nil / non-string input rather than
raising — callers (form validation, template rendering) often pass
whatever they received from the user.
Examples
iex> BeamlabLanguages.has_gender?("fr")
true
iex> BeamlabLanguages.has_gender?("en")
false
iex> BeamlabLanguages.has_gender?("xx")
false
Returns true iff the code maps to a known language.
Sugar over get/1. Returns false for unknown / nil / non-string input.
Examples
iex> BeamlabLanguages.known?("fr")
true
iex> BeamlabLanguages.known?("en-US")
true
iex> BeamlabLanguages.known?("xx")
false
@spec list() :: [BeamlabLanguages.Language.t()]
Lists every known language struct, sorted by code.
Sort order is stable so the result can drive UI dropdowns without flicker.
Examples
iex> langs = BeamlabLanguages.list()
iex> hd(langs).__struct__
BeamlabLanguages.Language
iex> length(langs) > 0
true
@spec list_codes() :: [code()]
Lists every known 2-letter base code, sorted.
Examples
iex> "en" in BeamlabLanguages.list_codes()
true
iex> codes = BeamlabLanguages.list_codes()
iex> codes == Enum.sort(codes)
true
Canonical English name of the language. Returns nil for unknown codes.
Examples
iex> BeamlabLanguages.name("fr")
"French"
iex> BeamlabLanguages.name("ja")
"Japanese"
iex> BeamlabLanguages.name("xx")
nil
Native (endonym) name of the language — what speakers call it themselves.
Returns nil for unknown codes.
Examples
iex> BeamlabLanguages.native_name("fr")
"Français"
iex> BeamlabLanguages.native_name("ja")
"日本語"
iex> BeamlabLanguages.native_name("xx")
nil
Normalizes a language input string to a 2-letter base code.
- Strips dialect tags (
"en-US"→"en","zh-Hans-CN"→"zh") - Accepts
_as a separator too ("en_US"→"en") - Lowercases (
"FR"→"fr") - Trims whitespace
- Maps deprecated / regional bases to their canonical entry:
"nb"(Bokmål) and"nn"(Nynorsk) collapse to"no"(Norwegian) - Returns
nilif no plausible 2-letter base can be extracted
This is what every other function calls internally before looking up
a code, so consumers never need to normalize before calling get/1,
name/1, etc. — but it's exposed because consumers sometimes need
the bare base code for their own purposes.
Examples
iex> BeamlabLanguages.normalize("en-US")
"en"
iex> BeamlabLanguages.normalize("FR")
"fr"
iex> BeamlabLanguages.normalize("zh-Hans-CN")
"zh"
iex> BeamlabLanguages.normalize("nb-NO")
"no"
iex> BeamlabLanguages.normalize("")
nil
iex> BeamlabLanguages.normalize(nil)
nil