BubbleMatch.Token (bubble_match v0.7.3)

View Source

A token is a single word or a part of the sentence. A sentence is a sequence of tokens.

Each token contains information and metadata that is used to match sentences on, and to extract information from.

Summary

Types

t()

Tokens contain the following fields

Functions

Get the base form of the given string; the downcased, ASCII version.

Test whether a token is an entity of the given kind.

Constructs a token from a Duckling entity definition

Given a single token in Spacy's JSON format, convert it into a token.

Constructs a token from a Spacy entity definition

Test whether a token mathces the given POS (part-of-speech) tag.

Test whether a token is punctuation

Test whether a token matches the given (optionally normalized) word.

Types

t()

@type t() :: %BubbleMatch.Token{
  end: term(),
  index: term(),
  raw: term(),
  start: term(),
  type: term(),
  value: term()
}

Tokens contain the following fields:

  • raw - the raw text value of the token, including any surrounding whitespace.

  • value - the normalized value of the token. In the case of word tokens, this is usually the normalized, lowercased version of the word. In the case of entities, this value holds a map with keys kind, provider and value.

  • start - the start index; where in the original sentence the token starts.

  • end - the end index; where in the original sentence the token ends.

  • index - the (zero-based) token index number; 0 if it's the first token, 1 if it's the second, etc.

  • type - the type of the token; an atom, holding either :entity, :spacy, :naive, depending on the way the token was originally created.

Functions

base_form(str)

Get the base form of the given string; the downcased, ASCII version.

entity?(t, kind)

Test whether a token is an entity of the given kind.

from_duckling_entity(duckling_entity)

Constructs a token from a Duckling entity definition

from_spacy(t)

@spec from_spacy(spacy_json_token :: map()) :: t()

Given a single token in Spacy's JSON format, convert it into a token.

from_spacy_entity(spacy_entity_json, sentence_text)

Constructs a token from a Spacy entity definition

pos?(arg1, tag)

Test whether a token mathces the given POS (part-of-speech) tag.

punct?(token)

Test whether a token is punctuation

word?(t, word)

Test whether a token matches the given (optionally normalized) word.