EntityFingerprint.Fingerprint (entityfingerprint v0.3.0)

Summary

Functions

Creates a fingerprint for the given entity name. It supports special character, emojis (because we all know that emoji's in company names are coming), and entity types in other non-latin scripts.

Functions

Creates a fingerprint for the given entity name. It supports special character, emojis (because we all know that emoji's in company names are coming), and entity types in other non-latin scripts.

Examples

  iex(1)> EntityFingerprint.create("ФИЛИАЛ КОМПАНИИ С ОГРАНИЧЕННОЙ")

  {:ok,
  [
    fingerprint: "filial kompanii ogranichennoy s",
    original: "ФИЛИАЛ КОМПАНИИ С ОГРАНИЧЕННОЙ",
    script: "cyrillic"
  ]}

  iex(2)> EntityFingerprint.create("ООО КУРЬЕР-РЕГИОН СТОЛИЦА")

  {:ok,
  [
    fingerprint: "kurerregion ooo stolitsa",
    original: "ООО КУРЬЕР-РЕГИОН СТОЛИЦА",
    script: "cyrillic"
  ]}

  iex(3)> EntityFingerprint.create("Google Limited Liability Company")

  {:ok,
  [
    fingerprint: "google llc",
    original: "Google Limited Liability Company",
    script: "latin"
  ]}

  iex(4)> EntityFingerprint.create("현대해상화재보험")

  {:ok,
  [
    fingerprint: "hyeondaehaesanghwajaeboheom",
    original: "현대해상화재보험",
    script: "hangul"
  ]}

  iex(5)> EntityFingerprint.create(" 💩 Limited Liability Company")

  {:ok,
  [
    fingerprint: "llc poop",
    original: " 💩 Limited Liability Company",
    script: "common"
  ]}

  iex(6)> EntityFingerprint.create("佐贤鸣智(上海)企业管理咨询有限公司")
  {:ok,
  [
    fingerprint: "guanlizixun shanghai zuoxianmingzhi",
    original: "佐贤鸣智(上海)企业管理咨询有限公司",
    script: "han"
  ]}

  iex(7)> EntityFingerprint.create("Siemens Aktiengesellschaft")
{:ok,
%{
 script: "latin",
 original: "Google Limited Liability Company",
 fingerprint: "382621CA5922751BB77F398DD0B3CB1B4EACE596",
 fingerprint_str: "google llc"
}}
iex(4)> EntityFingerprint.Fingerprint.create("현대해상화재보험")
{:ok,
%{
 script: "hangul",
 original: "현대해상화재보험",
 fingerprint: "00044613027E2BF63B36225EAFF0EB48352A68E4",
 fingerprint_str: "hyeondaehaesanghwajaeboheom"
}}
iex(5)> EntityFingerprint.Fingerprint.create(" 💩 Limited Liability Company")
{:ok,
%{
 script: "common",
 original: " 💩 Limited Liability Company",
 fingerprint: "2881722FEEB9C5AB87B7519C8FB711455690C330",
 fingerprint_str: "llc poop"
}}
iex(6)> EntityFingerprint.Fingerprint.create("佐贤鸣智(上海)企业管理咨询有限公司")
{:ok,
%{
 script: "han",
 original: "佐贤鸣智(上海)企业管理咨询有限公司",
 fingerprint: "51DDB4F4AA0F7484E7D9AD5CA2A81C4CAFAB5A4C",
 fingerprint_str: "guanlizixun shanghai zuoxianmingzhi"
}}
iex(7)> EntityFingerprint.Fingerprint.create("Siemens Aktiengesellschaft")
{:ok,
%{
 script: "latin",
 original: "Siemens Aktiengesellschaft",
 fingerprint: "069BCC150A2D09F1968E220F48B2362A655A7685",
 fingerprint_str: "ag siemens"
}}
iex(8)> EntityFingerprint.create("New York, New York")
** (UndefinedFunctionError) function EntityFingerprint.create/1 is undefined (module EntityFingerprint is not available)
  EntityFingerprint.create("New York, New York")
  iex:8: (file)
iex(8)> EntityFingerprint.Fingerprint.create("New York, New York")
{:ok,
%{
 script: "latin",
 original: "New York, New York",
 fingerprint: "DDDD9606DD438582C9642AA4DEB3C013CBC89148",
 fingerprint_str: "new york"
}}
iex(9)>

Thanks

This library was heavily inspired by the python tool alephdata/fingerprints

See also

  • Clustering in Depth, part of the OpenRefine documentation discussing how to create collisions in data clustering.