BeamlabLanguages (beamlab_languages v0.2.0)

View Source

Linguistic metadata for human languages.

Answers questions like:

  • Does this language use grammatical gender? Which genders?
  • Is it written right-to-left?
  • What's the canonical English name? The endonym?
  • Can I collapse a BCP 47 tag like "en-US" to a base code?

All data is curated and embedded at compile time. No runtime file I/O, no GenServer, no ETS, no runtime dependencies.

Gender codes

Genders are returned as strings. Consumers commonly see "m" (masculine), "f" (feminine), and "n" (neuter), but also "c" (common) for the Continental Scandinavian and Dutch systems where masculine and feminine have merged: Danish, Dutch, Norwegian Bokmål via no, and Swedish all use ["c", "n"]. Pattern-match on all four — a case g do "m" -> ...; "f" -> ...; "n" -> ... end will silently miss those languages.

Verb conjugation

has_verb_conjugation?/1, verb_groups/1, persons/1, and conjugation_paradigm/1 expose pedagogical conjugation metadata for language-learning UIs: the modes/tenses a learner is taught, the group system (e.g. French -er/-ir/-re), and the pronoun list.

The contract is "true iff we've curated a paradigm", not "true iff the language inflects verbs". So has_verb_conjugation?("fr") is true, has_verb_conjugation?("zh") is false, and has_verb_conjugation?("en") is also false until an English paradigm is curated. v0.2 ships French only — more languages will be added as consumers need them.

Every label entry carries both :label_native (the term in the target language, e.g. "Indicatif") and :label_en (the canonical English rendering, e.g. "Indicative"). Order in every list is the teaching order — opinionated and stable across versions.

Quick start

iex> BeamlabLanguages.has_gender?("fr")
true

iex> BeamlabLanguages.genders("de")
["m", "f", "n"]

iex> BeamlabLanguages.direction("ar")
:rtl

iex> BeamlabLanguages.normalize("en-US")
"en"

iex> BeamlabLanguages.has_verb_conjugation?("fr")
true

Every function that takes a language code runs normalize/1 on it internally — pass "en-US", "FR", or " fr " and lookups still work.

Roadmap

Planned for future versions and intentionally not in v1: localized language names, plural rules, articles, case marking, noun classes, scripts, IPA inventory, honorific levels. Verb conjugation paradigms ship per-language as consumers need them (French only as of v0.2).

Summary

Functions

Returns the conjugation paradigm — modes and their tenses — or nil.

Returns the writing direction.

Returns the list of gender codes a language uses.

Returns the language struct for a code, or nil if unknown.

Returns true iff the language uses grammatical gender.

Returns true iff a verb conjugation paradigm is curated for the language.

Returns true iff the code maps to a known language.

Lists every known language struct, sorted by code.

Lists every known 2-letter base code, sorted.

Canonical English name of the language. Returns nil for unknown codes.

Native (endonym) name of the language — what speakers call it themselves.

Normalizes a language input string to a 2-letter base code.

Returns the person/pronoun list for a language's conjugation, or nil.

Returns the pedagogical verb groups for a language, or nil.

Types

code()

@type code() :: String.t()

direction()

@type direction() :: :ltr | :rtl

gender()

@type gender() :: String.t()

Functions

conjugation_paradigm(code)

@spec conjugation_paradigm(any()) :: map() | nil

Returns the conjugation paradigm — modes and their tenses — or nil.

Shape: %{modes: [%{key, label_native, label_en, tenses: [%{key, label_native, label_en}, ...]}, ...]}. Order of modes and tenses is the teaching order — opinionated and stable across versions.

Persons live separately under persons/1, not inside the paradigm, so the same paradigm can be paired with the language's pronoun list in the consumer UI.

Returns nil for languages without a curated paradigm.

Examples

iex> paradigm = BeamlabLanguages.conjugation_paradigm("fr")
iex> length(paradigm.modes)
4
iex> [first | _] = paradigm.modes
iex> first.key
"indicatif"
iex> first.label_native
"Indicatif"
iex> first.label_en
"Indicative"
iex> length(first.tenses)
8

iex> BeamlabLanguages.conjugation_paradigm("zh")
nil

direction(code)

@spec direction(any()) :: direction()

Returns the writing direction.

Returns :ltr for unknown codes — the safe default for most rendering contexts where an unknown language shouldn't flip the page layout.

Examples

iex> BeamlabLanguages.direction("ar")
:rtl

iex> BeamlabLanguages.direction("en")
:ltr

iex> BeamlabLanguages.direction("xx")
:ltr

genders(code)

@spec genders(any()) :: [gender()]

Returns the list of gender codes a language uses.

Returns [] for languages without grammatical gender, and [] for unknown codes.

Examples

iex> BeamlabLanguages.genders("fr")
["m", "f"]

iex> BeamlabLanguages.genders("de")
["m", "f", "n"]

iex> BeamlabLanguages.genders("en")
[]

get(code)

@spec get(any()) :: BeamlabLanguages.Language.t() | nil

Returns the language struct for a code, or nil if unknown.

Accepts BCP 47 input — "en-US", "zh-Hans-CN" — and sloppy casing. Lookups are normalized internally via normalize/1.

Examples

iex> BeamlabLanguages.get("fr").name
"French"

iex> BeamlabLanguages.get("en-US").code
"en"

iex> BeamlabLanguages.get("xx")
nil

has_gender?(code)

@spec has_gender?(any()) :: boolean()

Returns true iff the language uses grammatical gender.

Returns false for unknown / nil / non-string input rather than raising — callers (form validation, template rendering) often pass whatever they received from the user.

Examples

iex> BeamlabLanguages.has_gender?("fr")
true

iex> BeamlabLanguages.has_gender?("en")
false

iex> BeamlabLanguages.has_gender?("xx")
false

has_verb_conjugation?(code)

@spec has_verb_conjugation?(any()) :: boolean()

Returns true iff a verb conjugation paradigm is curated for the language.

The contract is data-driven: returns true exactly when conjugation_paradigm/1 would return non-nil for the same code. English and Swedish technically inflect verbs but currently return false — they have no curated paradigm yet.

Returns false for unknown / nil / non-string input.

Examples

iex> BeamlabLanguages.has_verb_conjugation?("fr")
true

iex> BeamlabLanguages.has_verb_conjugation?("zh")
false

iex> BeamlabLanguages.has_verb_conjugation?("xx")
false

known?(code)

@spec known?(any()) :: boolean()

Returns true iff the code maps to a known language.

Sugar over get/1. Returns false for unknown / nil / non-string input.

Examples

iex> BeamlabLanguages.known?("fr")
true

iex> BeamlabLanguages.known?("en-US")
true

iex> BeamlabLanguages.known?("xx")
false

list()

@spec list() :: [BeamlabLanguages.Language.t()]

Lists every known language struct, sorted by code.

Sort order is stable so the result can drive UI dropdowns without flicker.

Examples

iex> langs = BeamlabLanguages.list()
iex> hd(langs).__struct__
BeamlabLanguages.Language
iex> length(langs) > 0
true

list_codes()

@spec list_codes() :: [code()]

Lists every known 2-letter base code, sorted.

Examples

iex> "en" in BeamlabLanguages.list_codes()
true

iex> codes = BeamlabLanguages.list_codes()
iex> codes == Enum.sort(codes)
true

name(code)

@spec name(any()) :: String.t() | nil

Canonical English name of the language. Returns nil for unknown codes.

Examples

iex> BeamlabLanguages.name("fr")
"French"

iex> BeamlabLanguages.name("ja")
"Japanese"

iex> BeamlabLanguages.name("xx")
nil

native_name(code)

@spec native_name(any()) :: String.t() | nil

Native (endonym) name of the language — what speakers call it themselves.

Returns nil for unknown codes.

Examples

iex> BeamlabLanguages.native_name("fr")
"Français"

iex> BeamlabLanguages.native_name("ja")
"日本語"

iex> BeamlabLanguages.native_name("xx")
nil

normalize(input)

@spec normalize(any()) :: code() | nil

Normalizes a language input string to a 2-letter base code.

  • Strips dialect tags ("en-US""en", "zh-Hans-CN""zh")
  • Accepts _ as a separator too ("en_US""en")
  • Lowercases ("FR""fr")
  • Trims whitespace
  • Maps deprecated / regional bases to their canonical entry: "nb" (Bokmål) and "nn" (Nynorsk) collapse to "no" (Norwegian)
  • Returns nil if no plausible 2-letter base can be extracted

This is what every other function calls internally before looking up a code, so consumers never need to normalize before calling get/1, name/1, etc. — but it's exposed because consumers sometimes need the bare base code for their own purposes.

Examples

iex> BeamlabLanguages.normalize("en-US")
"en"

iex> BeamlabLanguages.normalize("FR")
"fr"

iex> BeamlabLanguages.normalize("zh-Hans-CN")
"zh"

iex> BeamlabLanguages.normalize("nb-NO")
"no"

iex> BeamlabLanguages.normalize("")
nil

iex> BeamlabLanguages.normalize(nil)
nil

persons(code)

@spec persons(any()) :: [map()] | nil

Returns the person/pronoun list for a language's conjugation, or nil.

Each entry is a map with :key (a stable identifier like "1sg" or "3pl"), :label_native (the pronoun in the target language), and :label_en (the English gloss, useful for learner UIs).

The set of person keys may vary by language — a future Slovenian entry would add a dual, Arabic would split 2nd person by gender, etc. Don't assume a fixed six-person shape.

Returns nil for languages without a curated paradigm.

Examples

iex> persons = BeamlabLanguages.persons("fr")
iex> length(persons)
6
iex> hd(persons)
%{key: "1sg", label_native: "je", label_en: "I"}

iex> BeamlabLanguages.persons("zh")
nil

verb_groups(code)

@spec verb_groups(any()) :: [map()] | nil

Returns the pedagogical verb groups for a language, or nil.

Verb groups are the curriculum buckets used to teach conjugation (French's -er / -ir / -re, Spanish's -ar / -er / -ir, etc.). Each entry is a map with :key, :label_native (target language), and :label_en (English).

Returns nil when no paradigm is curated, and also when the language has a paradigm but no meaningful pedagogical group system.

Examples

iex> groups = BeamlabLanguages.verb_groups("fr")
iex> length(groups)
3
iex> hd(groups)
%{key: "1", label_native: "1er groupe (verbes en -er)", label_en: "1st group (-er verbs)"}

iex> BeamlabLanguages.verb_groups("zh")
nil

iex> BeamlabLanguages.verb_groups("xx")
nil