Lemma v0.1.1 Lemma View Source

A morphological parser (analyzer) / lemmatizer implemented with textbook standard method, using an abstraction called Finite State Transducer (FST).

FST is implemented in gen_fst package

A parser can be initilized with desired language using Lemma.new/1. This initialized parser can be used to parse words with Lemma.parse/2

Examples

en_parser = Lemma.new :en
#=> nil
en_parser |> Lemma.parse("plays")
#=> "play"

About morphological parsing / lemmatization

For grammatical reasons, documents are going to use different forms of a word, such as organize, organizes, and organizing. Additionally, there are families of derivationally related words with similar meanings, such as democracy, democratic, and democratization. In many situations, it seems as if it would be useful for a search for one of these words to return documents that contain another word in the set.
The goal of both stemming and lemmatization is to reduce inflectional forms and sometimes derivationally related forms of a word to a common base form. For instance:

am, are, is ⇒ be
car, cars, car’s, cars’ ⇒ car

The result of this mapping of text will be something like:
the boy’s cars are different colors ⇒ the boy car be differ color.
Stanford NLP Group

Link to this section Summary

Functions

Initialize a morphological parser for the given language

Use the given parser to parse a word or a list of words

Link to this section Functions

Initialize a morphological parser for the given language.

Only English (:en) is supported currently.

Use the given parser to parse a word or a list of words.