Fellegi
In the context of record linkage (also known as data linkage or entity resolution), Fellegi most commonly refers to the Fellegi-Sunter model, a mathematical framework for probabilistic record linkage. This model, developed by Ivan P. Fellegi and Alan B. Sunter, is a cornerstone of modern data linkage techniques.
The Fellegi-Sunter model aims to determine whether two records, one from each of two data sets, represent the same real-world entity. It achieves this by assigning weights to different agreements and disagreements between fields in the records. These weights are based on the probabilities of these agreements and disagreements occurring given that the records represent the same entity (the "m-probability") or different entities (the "u-probability").
Specifically, the model uses the likelihood ratio, which is the ratio of the probability that a particular agreement pattern occurs given the records match to the probability that the same agreement pattern occurs given the records do not match. A high likelihood ratio suggests that the records are more likely to represent the same entity.
The Fellegi-Sunter model typically involves setting two thresholds: an upper threshold above which record pairs are automatically linked and a lower threshold below which record pairs are automatically considered non-links. Pairs falling between these thresholds are considered potential matches and may require manual review or further analysis.
The strength of the Fellegi-Sunter model lies in its ability to handle imperfect data, including errors, missing values, and variations in data entry. It also provides a statistically rigorous basis for record linkage decisions. It requires careful estimation of m-probabilities and u-probabilities, which are often estimated from the data itself using iterative techniques.