Type-III (near-miss) clone detection.
Uses an inverted index on structural sub-hashes for candidate-pair generation, with MinHash acceleration for large posting lists.
Each fragment carries a set of lightweight sub-hashes from its child
subtrees (computed during fingerprinting). The inverted index identifies
all pairs sharing at least one sub-hash. Small posting lists use exact
Jaccard similarity as a pre-filter; large ones (above @lsh_cutover)
switch to an O(k) MinHash estimate, avoiding quadratic blowup without
the recall loss of a hard posting-list cap.
Summary
Functions
Find Type-III clones from a list of fragments at the given similarity threshold.
Types
@type fuzzy_opts() :: [{:mass_tolerance, float()}]
Functions
@spec detect([map()], float(), MapSet.t(), fuzzy_opts()) :: [ ExDNA.Detection.Clone.t() ]
Find Type-III clones from a list of fragments at the given similarity threshold.