mix text.gen_afinn_lexicons (Text v0.6.0)

Copy Markdown View Source

Converts the AFINN data vendored under data/affin/ into per-language TSV files under priv/sentiment/, ready for compile-time loading by Text.Sentiment.Lexicons.AFINN.

Three sources are processed:

  • data/affin/languages/AFINN-<tag>.json — per-language word→score maps. Each becomes priv/sentiment/afinn-<tag>.tsv.

    Hand-curated TSVs already present in priv/sentiment/ are preserved; only languages that don't yet have a curated TSV are written from the vendored JSON. (For example the existing Polish TSV has more entries than the upstream JSON, so we keep it.)

  • data/affin/emojis/Emoji_Sentiment_Data_v1.0.csv — Emoji Sentiment Ranking 1.0 frequency data. Mapped onto AFINN's −5..+5 scale via round((Positive − Negative) / Occurrences × 5) and written to priv/sentiment/afinn-emoji.tsv.

  • data/affin/languages/negators/all.json — per-language lists of negation phrases. Written to priv/sentiment/negators.tsv as <lang>\t<phrase> rows for compile-time loading.

Usage

mix text.gen_afinn_lexicons               # additive: keep curated TSVs
mix text.gen_afinn_lexicons --overwrite   # regenerate every TSV from JSON