BeamlabLanguages (beamlab_languages v0.7.0)
View SourceLinguistic metadata for human languages.
Answers questions like:
- Does this language use grammatical gender? Which genders?
- Is it written right-to-left?
- What's the canonical English name? The endonym?
- Can I collapse a BCP 47 tag like
"en-US"to a base code?
All data is curated and embedded at compile time. No runtime file I/O, no GenServer, no ETS, no runtime dependencies.
Gender codes
Genders are returned as strings. Consumers commonly see "m" (masculine),
"f" (feminine), and "n" (neuter), but also "c" (common) for the
Continental Scandinavian and Dutch systems where masculine and feminine
have merged: Danish, Dutch, Norwegian Bokmål via no, and Swedish all
use ["c", "n"]. Pattern-match on all four — a case g do "m" -> ...; "f" -> ...; "n" -> ... end will silently miss those languages.
Verb conjugation
has_verb_conjugation?/1, verb_groups/1, persons/1, persons/2,
conjugation_paradigm/1, tense_level/3, and reflexive?/2 expose
pedagogical conjugation metadata for language-learning UIs: the
modes/tenses a learner is taught, the proficiency level each tense is
taught at, the group system (e.g. French -er/-ir/-re), the pronoun list
(each tagged with grammatical :number), and whether a given lemma is a
reflexive / pronominal verb (reflexive?("fr", "se laver"),
reflexive?("it", "chiamarsi")).
The contract is "true iff we've curated a paradigm", not "true iff
the language inflects verbs". So has_verb_conjugation?("fr") is true,
has_verb_conjugation?("zh") is false, and has_verb_conjugation?("en")
is also false until an English paradigm is curated. v0.2 ships French
only — more languages will be added as consumers need them.
Every label entry carries both :label_native (the term in the target
language, e.g. "Indicatif") and :label_en (the canonical English
rendering, e.g. "Indicative"). Tenses additionally carry a :level
(CEFR / JLPT / HSK key) marking where they sit in the curriculum. Order
in every list is the teaching order — opinionated and stable across
versions.
Quick start
iex> BeamlabLanguages.has_gender?("fr")
true
iex> BeamlabLanguages.genders("de")
["m", "f", "n"]
iex> BeamlabLanguages.direction("ar")
:rtl
iex> BeamlabLanguages.normalize("en-US")
"en"
iex> BeamlabLanguages.has_verb_conjugation?("fr")
trueEvery function that takes a language code runs normalize/1 on it
internally — pass "en-US", "FR", or " fr " and lookups still work.
Proficiency levels
level_systems/0, levels/1, level_system_label/1, and
level_info/2 expose curated proficiency level systems (CEFR,
JLPT, HSK) for language-learning UIs. Order is pedagogical
(A1→C2, N5→N1, HSK1→HSK6), not alphabetical.
To go the other way — from a language to its system — use
level_system/1 ("fr" → "cefr", "zh" → "hsk", "ja" →
"jlpt") and language_levels/1 (the level keys for a language in
one call). CEFR is the default for any language without a more
specific system; Korean returns nil (TOPIK isn't modeled).
Roadmap
Planned for future versions and intentionally not in v1: localized language names, plural rules, articles, case marking, noun classes, scripts, IPA inventory, honorific levels. Verb conjugation paradigms ship per-language as consumers need them (French only as of v0.2).
Summary
Functions
Returns the conjugation paradigm — modes and their tenses — or nil.
Returns the writing direction.
Returns the list of gender codes a language uses.
Returns the language struct for a code, or nil if unknown.
Returns true iff the language uses grammatical gender.
Returns true iff a verb conjugation paradigm is curated for the language.
Returns true iff the code maps to a known language.
Returns the proficiency level keys for a language, in pedagogical order.
Returns metadata for a single level within a system.
Returns the proficiency level system for a language, or nil.
Returns the human-readable label for a proficiency system.
Lists every known proficiency level system key, sorted.
Lists the levels for a proficiency system, in pedagogical order.
Lists every known language struct, sorted by code.
Lists every known 2-letter base code, sorted.
Canonical English name of the language. Returns nil for unknown codes.
Native (endonym) name of the language — what speakers call it themselves.
Normalizes a language input string to a 2-letter base code.
Returns the person/pronoun list for a language's conjugation, or nil.
Like persons/1, but filters by grammatical number.
Returns true iff the lemma is a reflexive / pronominal verb in the language.
Returns the proficiency level for a single tense, or nil.
Returns the pedagogical verb groups for a language, or nil.
Types
Functions
Returns the conjugation paradigm — modes and their tenses — or nil.
Shape: %{modes: [%{key, label_native, label_en, tenses: [%{key, label_native, label_en, level}, ...]}, ...]}. Order of modes and
tenses is the teaching order — opinionated and stable across
versions.
Each tense carries a :level — the proficiency level (a CEFR key like
"B1" for European languages, or the relevant JLPT / HSK key for
zh / ja) at which the tense/mood is typically taught. It's a property
of the tense in the language's curriculum, independent of any specific
verb. nil when the level is genuinely unknown; every French tense has
a value. Use tense_level/3 to read one without walking the tree.
Persons live separately under persons/1, not inside the paradigm,
so the same paradigm can be paired with the language's pronoun list
in the consumer UI.
Returns nil for languages without a curated paradigm.
Examples
iex> paradigm = BeamlabLanguages.conjugation_paradigm("fr")
iex> length(paradigm.modes)
4
iex> [first | _] = paradigm.modes
iex> first.key
"indicatif"
iex> first.label_native
"Indicatif"
iex> first.label_en
"Indicative"
iex> length(first.tenses)
8
iex> hd(first.tenses)
%{key: "present", label_native: "Présent", label_en: "Present", level: "A1"}
iex> BeamlabLanguages.conjugation_paradigm("zh")
nil
Returns the writing direction.
Returns :ltr for unknown codes — the safe default for most rendering
contexts where an unknown language shouldn't flip the page layout.
Examples
iex> BeamlabLanguages.direction("ar")
:rtl
iex> BeamlabLanguages.direction("en")
:ltr
iex> BeamlabLanguages.direction("xx")
:ltr
Returns the list of gender codes a language uses.
Returns [] for languages without grammatical gender, and [] for
unknown codes.
Examples
iex> BeamlabLanguages.genders("fr")
["m", "f"]
iex> BeamlabLanguages.genders("de")
["m", "f", "n"]
iex> BeamlabLanguages.genders("en")
[]
@spec get(any()) :: BeamlabLanguages.Language.t() | nil
Returns the language struct for a code, or nil if unknown.
Accepts BCP 47 input — "en-US", "zh-Hans-CN" — and sloppy casing.
Lookups are normalized internally via normalize/1.
Examples
iex> BeamlabLanguages.get("fr").name
"French"
iex> BeamlabLanguages.get("en-US").code
"en"
iex> BeamlabLanguages.get("xx")
nil
Returns true iff the language uses grammatical gender.
Returns false for unknown / nil / non-string input rather than
raising — callers (form validation, template rendering) often pass
whatever they received from the user.
Examples
iex> BeamlabLanguages.has_gender?("fr")
true
iex> BeamlabLanguages.has_gender?("en")
false
iex> BeamlabLanguages.has_gender?("xx")
false
Returns true iff a verb conjugation paradigm is curated for the language.
The contract is data-driven: returns true exactly when
conjugation_paradigm/1 would return non-nil for the same code.
English and Swedish technically inflect verbs but currently return
false — they have no curated paradigm yet.
Returns false for unknown / nil / non-string input.
Examples
iex> BeamlabLanguages.has_verb_conjugation?("fr")
true
iex> BeamlabLanguages.has_verb_conjugation?("zh")
false
iex> BeamlabLanguages.has_verb_conjugation?("xx")
false
Returns true iff the code maps to a known language.
Sugar over get/1. Returns false for unknown / nil / non-string input.
Examples
iex> BeamlabLanguages.known?("fr")
true
iex> BeamlabLanguages.known?("en-US")
true
iex> BeamlabLanguages.known?("xx")
false
Returns the proficiency level keys for a language, in pedagogical order.
Convenience for levels(level_system(code)): resolves the language's
level system and lists its keys. Returns [] when the language has no
curated system (e.g. Korean) and for unknown / nil codes — the same
empty result levels/1 gives for an unknown system.
Examples
iex> BeamlabLanguages.language_levels("fr")
["A1", "A2", "B1", "B2", "C1", "C2"]
iex> BeamlabLanguages.language_levels("zh")
["HSK1", "HSK2", "HSK3", "HSK4", "HSK5", "HSK6"]
iex> BeamlabLanguages.language_levels("ja")
["N5", "N4", "N3", "N2", "N1"]
iex> BeamlabLanguages.language_levels("ko")
[]
iex> BeamlabLanguages.language_levels("xx")
[]
Returns metadata for a single level within a system.
Returns nil for unknown systems or unknown levels.
Examples
iex> BeamlabLanguages.level_info("cefr", "A1")
%{key: "A1", label: "A1", description: "Beginner"}
iex> BeamlabLanguages.level_info("cefr", "Z9")
nil
iex> BeamlabLanguages.level_info("unknown", "A1")
nil
Returns the proficiency level system for a language, or nil.
CEFR is the default for any known language without a more specific
system; Chinese ("zh") maps to HSK and Japanese ("ja") to JLPT.
Korean ("ko") has no curated system — TOPIK isn't among the three we
model — so it returns nil rather than a wrong default. The result is a
system key suitable for levels/1, level_system_label/1, etc.
Accepts BCP 47 input and sloppy casing like every other code-taking
function — lookups are normalized via normalize/1. Returns nil for
unknown / nil / non-string input.
Examples
iex> BeamlabLanguages.level_system("fr")
"cefr"
iex> BeamlabLanguages.level_system("fr-FR")
"cefr"
iex> BeamlabLanguages.level_system("zh")
"hsk"
iex> BeamlabLanguages.level_system("ja")
"jlpt"
iex> BeamlabLanguages.level_system("ko")
nil
iex> BeamlabLanguages.level_system("xx")
nil
Returns the human-readable label for a proficiency system.
Returns nil for unknown systems.
Examples
iex> BeamlabLanguages.level_system_label("cefr")
"CEFR"
iex> BeamlabLanguages.level_system_label("hsk")
"HSK"
iex> BeamlabLanguages.level_system_label("unknown")
nil
@spec level_systems() :: [String.t()]
Lists every known proficiency level system key, sorted.
Examples
iex> "cefr" in BeamlabLanguages.level_systems()
true
iex> BeamlabLanguages.level_systems() == Enum.sort(BeamlabLanguages.level_systems())
true
Lists the levels for a proficiency system, in pedagogical order.
Returns [] for unknown systems.
Examples
iex> BeamlabLanguages.levels("cefr")
["A1", "A2", "B1", "B2", "C1", "C2"]
iex> BeamlabLanguages.levels("jlpt")
["N5", "N4", "N3", "N2", "N1"]
iex> BeamlabLanguages.levels("unknown")
[]
@spec list() :: [BeamlabLanguages.Language.t()]
Lists every known language struct, sorted by code.
Sort order is stable so the result can drive UI dropdowns without flicker.
Examples
iex> langs = BeamlabLanguages.list()
iex> hd(langs).__struct__
BeamlabLanguages.Language
iex> length(langs) > 0
true
@spec list_codes() :: [code()]
Lists every known 2-letter base code, sorted.
Examples
iex> "en" in BeamlabLanguages.list_codes()
true
iex> codes = BeamlabLanguages.list_codes()
iex> codes == Enum.sort(codes)
true
Canonical English name of the language. Returns nil for unknown codes.
Examples
iex> BeamlabLanguages.name("fr")
"French"
iex> BeamlabLanguages.name("ja")
"Japanese"
iex> BeamlabLanguages.name("xx")
nil
Native (endonym) name of the language — what speakers call it themselves.
Returns nil for unknown codes.
Examples
iex> BeamlabLanguages.native_name("fr")
"Français"
iex> BeamlabLanguages.native_name("ja")
"日本語"
iex> BeamlabLanguages.native_name("xx")
nil
Normalizes a language input string to a 2-letter base code.
- Strips dialect tags (
"en-US"→"en","zh-Hans-CN"→"zh") - Accepts
_as a separator too ("en_US"→"en") - Lowercases (
"FR"→"fr") - Trims whitespace
- Maps deprecated / regional bases to their canonical entry:
"nb"(Bokmål) and"nn"(Nynorsk) collapse to"no"(Norwegian) - Returns
nilif no plausible 2-letter base can be extracted
This is what every other function calls internally before looking up
a code, so consumers never need to normalize before calling get/1,
name/1, etc. — but it's exposed because consumers sometimes need
the bare base code for their own purposes.
Examples
iex> BeamlabLanguages.normalize("en-US")
"en"
iex> BeamlabLanguages.normalize("FR")
"fr"
iex> BeamlabLanguages.normalize("zh-Hans-CN")
"zh"
iex> BeamlabLanguages.normalize("nb-NO")
"no"
iex> BeamlabLanguages.normalize("")
nil
iex> BeamlabLanguages.normalize(nil)
nil
Returns the person/pronoun list for a language's conjugation, or nil.
Each entry is a map with :key (a stable identifier like "1sg" or
"3pl"), :label_native (the pronoun in the target language),
:label_en (the English gloss, useful for learner UIs), and :number
(:singular, :plural, or :dual for a future dual-marking language;
nil if the key carries no recognisable number). Order is the teaching
order — singular persons first, then plural — so the list can drive a
conjugation grid without a separate person-ordering table.
The set of person keys may vary by language — a future Slovenian entry
would add a dual, Arabic would split 2nd person by gender, etc. Don't
assume a fixed six-person shape; filter on :number (or use
persons/2) rather than slicing the list by position.
Returns nil for languages without a curated paradigm.
Examples
iex> persons = BeamlabLanguages.persons("fr")
iex> length(persons)
6
iex> hd(persons)
%{key: "1sg", label_native: "je", label_en: "I", number: :singular}
iex> BeamlabLanguages.persons("zh")
nil
Like persons/1, but filters by grammatical number.
Pass number: :singular, number: :plural, or number: :dual to get only
the persons in that number, in teaching order. With no :number option this
is identical to persons/1. Lets a UI render singular and plural blocks
without hardcoding which person keys belong to each.
Returns nil for languages without a curated paradigm (same as persons/1),
and [] when the language has persons but none in the requested number.
Examples
iex> BeamlabLanguages.persons("fr", number: :singular) |> Enum.map(& &1.key)
["1sg", "2sg", "3sg"]
iex> BeamlabLanguages.persons("fr", number: :plural) |> Enum.map(& &1.key)
["1pl", "2pl", "3pl"]
iex> BeamlabLanguages.persons("fr", number: :dual)
[]
iex> BeamlabLanguages.persons("zh", number: :singular)
nil
Returns true iff the lemma is a reflexive / pronominal verb in the language.
Recognises the language's reflexive marker on a dictionary-form lemma:
French's leading pronoun ("se laver", "s'appeler") and Italian's enclitic
-rsi ending ("chiamarsi", "mettersi"). The lemma is lowercased and
trimmed internally, so values straight from user input or a database column
work as-is.
The contract is "true iff we recognise a reflexive marker for a language we
have a rule for". Returns false for languages without a curated reflexive
rule, for non-reflexive lemmas, and for unknown / nil codes or non-string
lemmas — callers routinely pass whatever they have.
Examples
iex> BeamlabLanguages.reflexive?("fr", "se laver")
true
iex> BeamlabLanguages.reflexive?("fr", "s'appeler")
true
iex> BeamlabLanguages.reflexive?("fr", "manger")
false
iex> BeamlabLanguages.reflexive?("fr", "semer")
false
iex> BeamlabLanguages.reflexive?("it", "chiamarsi")
true
iex> BeamlabLanguages.reflexive?("it", "parlare")
false
iex> BeamlabLanguages.reflexive?("en", "wash oneself")
false
iex> BeamlabLanguages.reflexive?("xx", "se laver")
false
Returns the proficiency level for a single tense, or nil.
Convenience reader over conjugation_paradigm/1 so consumers don't have
to walk the modes/tenses tree. The level is a CEFR key ("A1"…"C2")
for European languages, or the relevant JLPT / HSK key for zh / ja.
Returns nil for unknown / nil codes, unknown mode or tense keys, and
for tenses whose level is genuinely unknown.
Examples
iex> BeamlabLanguages.tense_level("fr", "subjonctif", "present")
"B1"
iex> BeamlabLanguages.tense_level("fr", "indicatif", "present")
"A1"
iex> BeamlabLanguages.tense_level("fr", "indicatif", "nonexistent")
nil
iex> BeamlabLanguages.tense_level("zh", "indicatif", "present")
nil
Returns the pedagogical verb groups for a language, or nil.
Verb groups are the curriculum buckets used to teach conjugation
(French's -er / -ir / -re, Spanish's -ar / -er / -ir, etc.). Each
entry is a map with :key, :label_native (target language), and
:label_en (English).
Returns nil when no paradigm is curated, and also when the
language has a paradigm but no meaningful pedagogical group system.
Examples
iex> groups = BeamlabLanguages.verb_groups("fr")
iex> length(groups)
3
iex> hd(groups)
%{key: "1", label_native: "1er groupe (verbes en -er)", label_en: "1st group (-er verbs)"}
iex> BeamlabLanguages.verb_groups("zh")
nil
iex> BeamlabLanguages.verb_groups("xx")
nil