View Source FuzzyCompare.ChunkSet (fuzzy_compare v1.1.0)

For strings which among shared words also contain many dissimilar words the ChunkSet is ideal.

It works in the following way:

Our input strings are

  • "oscar claude monet"
  • "alice hoschedé was the wife of claude monet"

From the input string three strings are created.

  • common_words = "claude monet"
  • common_words_plus_remaining_words_left = "claude monet oscar"
  • common_words_plus_remaining_words_right = "claude monet alice hoschedé was the wife of"

These are then all compared with each other in pairs and the maximum ratio is returned.

Examples

iex> FuzzyCompare.ChunkSet.standard_similarity("oscar claude monet", "alice hoschedé was the wife of claude monet")
0.8958333333333334

iex> FuzzyCompare.ChunkSet.substring_similarity("oscar claude monet", "alice hoschedé was the wife of claude monet")
1.0

Summary

Functions

standard_similarity(left, right)

@spec standard_similarity(
  binary() | FuzzyCompare.Preprocessed.t(),
  binary() | FuzzyCompare.Preprocessed.t()
) :: float()

substring_similarity(left, right)

@spec substring_similarity(
  binary() | FuzzyCompare.Preprocessed.t(),
  binary() | FuzzyCompare.Preprocessed.t()
) :: float()