Effective Use of Matching Algorithms
Identity Fusion NG uses similarity scoring to detect potential matching identities. This comprehensive guide helps you choose, configure, and tune the matching algorithms used in Attribute Matching Settings → Matching Settings for optimal matching results.
Overview: Matching in Identity Fusion
Matching algorithms calculate similarity scores (0–100) between attribute values from different identities. These scores determine whether two identities are potential matches.
| Component | Purpose | Configuration location |
|---|---|---|
| Fusion attribute matches | Define which attributes to compare | Attribute Matching Settings → Matching Settings |
| Matching algorithm | How to calculate similarity | Per attribute (Enhanced Name Matcher, Jaro-Winkler, Dice, Double Metaphone, Custom) |
| Minimum similarity (per rule) | Threshold for that rule; also its weight in the combined score | Per attribute (0–100) |
| Minimum combined match score | Global floor for the weighted combined score | Matching Settings (0–100) |
| Mandatory match | Rule must pass its minimum for a potential match | Per attribute (Yes/No) |
Screenshot placeholder: Fusion attribute matches configuration.

Algorithm selection guide
Algorithm comparison matrix
| Algorithm | Best for | Strengths | Weaknesses | Computational cost |
|---|---|---|---|---|
| Enhanced Name Matcher | Person names (full, first, last) | Handles order variations, titles, suffixes, cultural naming, nicknames | May be overly permissive for non-name fields | Medium |
| Jaro-Winkler | Short strings, codes, emails, usernames | Emphasizes prefix matching; good for typos at start; fast | Less effective for long text; suffix typos score lower | Low |
| Dice | Longer text (addresses, job titles, descriptions) | Robust for substring matching; handles reordering well | Can miss phonetic variations; requires adequate text length | Medium |
| Double Metaphone | Names with spelling variations, phonetic matching | Catches "Catherine"/"Katherine", "John"/"Jon", "Smith"/"Smyth" | May generate false positives for short names; language-dependent | Low |
| LIG3 | Compound identifiers, names with missing parts | Excellent with international accents and compound gap handling | Heavily punishes transpositions (e.g. inverted dates/names) | High |
| Custom | Domain-specific requirements | Your own logic via SaaS customizer | Requires development and testing | Variable |
Decision tree: Which algorithm to use?
What type of attribute are you comparing?
├─ Person name (full, first, last)
│ ├─ Standard spellings expected → Enhanced Name Matcher
│ └─ Phonetic variations expected → Double Metaphone or Enhanced Name Matcher
│
├─ Email address
│ ├─ Domain matters → Jaro-Winkler (emphasizes prefix before @)
│ └─ Typo tolerance → Jaro-Winkler
│
├─ Username / employee ID / short code
│ └─ High precision needed → Jaro-Winkler (high threshold: 95–100)
│
├─ Address / job title / longer text
│ └─ Substring/phrase matching → Dice
│
├─ Phone number
│ └─ After normalization → Jaro-Winkler
│
└─ Custom business logic
└─ Custom (from SaaS customizer)
Algorithm deep dive
Enhanced Name Matcher
Purpose: Specialized algorithm for person names with cultural awareness and variation handling.
How it works:
- Tokenizes names into components (first, middle, last, titles, suffixes)
- Normalizes order (handles "Smith, John" vs "John Smith")
- Recognizes titles (Dr., Mr., Mrs., Prof.) and suffixes (Jr., Sr., III)
- Handles cultural naming patterns (e.g., Asian name order, Hispanic compound surnames)
- Matches nicknames (e.g., "William" matches "Bill", "Robert" matches "Bob")
Recommended thresholds:
| Use case | Threshold | Rationale |
|---|---|---|
| Full name (e.g. "John A. Smith") | 75–85 | Allows middle initial variation, title differences |
| First name only | 80–90 | Less context; require closer match |
| Last name only | 85–92 | Critical identifier; be stricter |
| Display name (formatted) | 75–85 | May include titles, formatting differences |
Examples:
| String 1 | String 2 | Score | Match? (threshold 80) |
|---|---|---|---|
| John Smith | John Smith | 100 | Yes |
| John Smith | J. Smith | 85 | Yes |
| John Smith | Smith, John | 95 | Yes |
| Dr. John Smith | John Smith Jr. | 88 | Yes |
| John Smith | Jane Smith | 50 | No |
| John A. Smith | John B. Smith | 92 | Yes |
| William Johnson | Bill Johnson | 90 | Yes (nickname match) |
When to use:
- Comparing
name,displayName,firstname,lastnameattributes - You expect name variations (order, titles, middle initials)
- Cultural diversity in names
When NOT to use:
- Non-name fields (email, address, etc.) → use other algorithms
- You need exact or near-exact matches → use Jaro-Winkler with high threshold
Jaro-Winkler
Purpose: General-purpose string similarity with emphasis on prefix matching.
How it works:
- Calculates Jaro distance (transpositions and character matches)
- Applies prefix weighting (first 4 characters heavily weighted)
- Results in score 0–100 (higher = more similar)
Recommended thresholds:
| Use case | Threshold | Rationale |
|---|---|---|
| Email address | 90–95 | Should be nearly exact; prefix (before @) important |
| Username | 92–98 | Critical identifier; little tolerance for variation |
| Employee ID / badge number | 95–100 | Must be nearly exact |
| Phone number (normalized) | 85–92 | Some tolerance for formatting |
| Short text fields (5–15 chars) | 85–90 | Suitable for short strings |
Prefix weighting example:
| String 1 | String 2 | Score | Note |
|---|---|---|---|
| john.smith@company.com | john.smyth@company.com | 95 | High due to strong prefix match |
| john.smith@company.com | jane.smith@company.com | 82 | Lower due to prefix mismatch (john vs jane) |
| smithj@company.com | smithjo@company.com | 97 | Very close; prefix nearly identical |
When to use:
- Email addresses (prefix before @ is critical)
- Usernames, employee IDs (should be nearly exact)
- Short text with potential typos
- When beginning of string is more important than end
When NOT to use:
- Long text (addresses, descriptions) → use Dice
- Phonetic matching needed → use Double Metaphone
- Name variations (order, titles) → use Enhanced Name Matcher
Dice (Sørensen-Dice coefficient)
Purpose: Bigram-based similarity for longer text strings.
How it works:
- Breaks each string into bigrams (2-character sequences)
- Example: "hello" → ["he", "el", "ll", "lo"]
- Calculates:
2 * (shared bigrams) / (total bigrams in both strings) - Converts to 0–100 scale
Recommended thresholds:
| Use case | Threshold | Rationale |
|---|---|---|
| Address (street, city, full) | 70–80 | Allows reordering, abbreviations |
| Job title | 72–82 | Tolerates slight wording differences |
| Department name | 75–85 | Moderate strictness |
| Longer text fields (>20 chars) | 70–80 | Good for substring/phrase matching |
Examples:
| String 1 | String 2 | Score | Match? (threshold 75) |
|---|---|---|---|
| 123 Main Street | 123 Main St | 88 | Yes |
| Senior Software Engineer | Software Engineer | 78 | Yes |
| Engineering Department | Engineering Dept | 85 | Yes |
| 123 Main Street Apt 4B | 123 Main St Unit 4B | 82 | Yes |
| New York | Los Angeles | 42 | No |
When to use:
- Addresses (street, city, full address)
- Job titles
- Department names
- Any text field >15–20 characters
- When substring/phrase matching is important
When NOT to use:
- Names (cultural variations) → use Enhanced Name Matcher
- Short strings (<10 chars) → use Jaro-Winkler
- Phonetic matching → use Double Metaphone
Double Metaphone
Purpose: Phonetic algorithm that generates pronunciation codes for strings.
How it works:
- Generates one or two phonetic codes for each string
- Codes represent pronunciation (not spelling)
- Compares codes for similarity
- Language rules: English-centric (handles some European languages)
Recommended thresholds:
| Use case | Threshold | Rationale |
|---|---|---|
| First name (phonetic) | 75–85 | Allow phonetic variations |
| Last name (phonetic) | 80–88 | More critical; be slightly stricter |
| Full name (phonetic) | 75–85 | Combined phonetic matching |
Examples:
| String 1 | String 2 | Phonetic match? | Score (approx) |
|---|---|---|---|
| Catherine | Katherine | Yes (both → "K0RN") | 90 |
| John | Jon | Yes (both → "JN") | 95 |
| Smith | Smyth | Yes (both → "SM0") | 92 |
| Stephen | Steven | Yes (both → "STFN") | 88 |
| Philip | Phillip | Yes (both → "FLP") | 90 |
| Garcia | Garsia | Yes | 85 |
| McDonald | MacDonald | Yes | 88 |
When to use:
- Names with known spelling variations
- International names with multiple spellings
- When pronunciation matters more than spelling
- Complementary to Enhanced Name Matcher for difficult cases
When NOT to use:
- Email addresses, IDs (spelling is exact)
- Non-name fields
- Very short strings (<4 characters) → less reliable
- Non-English names (algorithm is English-centric)
LIG3
Purpose: Advanced hybrid algorithm combining token handling with Levenshtein-style penalties.
How it works:
- Evaluates character variations and normalizes accents (e.g., highly accurate for "José" vs "Jose").
- Considers gaps and missing elements conservatively across compound identifiers.
- Positional weighting prevents over-penalizing missing middle names.
Recommended thresholds:
| Use case | Threshold | Rationale |
|---|---|---|
| Compound identifier / Full name | 70–80 | Allows for missing words/tokens |
| Short identifier | 85–95 | Be stricter with short strings |
Examples:
| String 1 | String 2 | Score | Match? (threshold 75) |
|---|---|---|---|
| José Garcia | Jose Garcia | 100 | Yes |
| John Robert Doe | John Doe | 64 | No |
| 05-10-1990 | 10-05-1990 | 46 | No |
| Christopher | Christoper | 74 | No (borderline typo) |
When to use:
- Full names or compound identifiers where structural layout matters.
- You have international characters that need to be evaluated gracefully.
When NOT to use:
- You expect transpositions (e.g. swapped DOBs, or swapped first/last names). LIG3 heavily penalizes misordered data.
- Short substrings or pure typographical error matching—Jaro-Winkler handles typos better.
Custom (from SaaS customizer)
Purpose: Domain-specific matching logic implemented in a SailPoint SaaS Connectivity Customizer.
When to use:
- None of the built-in algorithms fit your needs
- You have proprietary matching logic (e.g., industry-specific identifiers)
- You need to call external APIs for matching (e.g., third-party identity resolution service)
- Complex business rules (e.g., "match if first 3 chars + last 2 chars identical")
Implementation:
- Develop custom algorithm in a Connectivity Customizer
- Return similarity score 0–100
- Configure as "Custom" in Fusion attribute match
Examples:
- Parse and compare structured employee IDs (e.g., "EMP-2024-001234")
- Call external identity verification service
- Apply industry-specific matching rules (healthcare NPI, financial institution codes)
Configuring attribute matches
Configuration fields
For each Fusion attribute match, configure:
| Field | Purpose | Options / Notes |
|---|---|---|
| Attribute | Identity attribute name to compare | Must exist on identities in scope; examples: name, email, firstname, lastname, displayName |
| Matching algorithm | Algorithm to calculate similarity | Enhanced Name Matcher, Jaro-Winkler, Dice, Double Metaphone, Custom |
| Minimum similarity [0-100] | Threshold and blend weight for this rule | Higher values are stricter and count more in the combined match score |
| Mandatory match? | Must pass this rule for a potential match | Passing mandatories contribute to the weighted combined score like other rules |
Single attribute vs multi-attribute matching
| Strategy | Configuration | Use when |
|---|---|---|
| Single attribute | One Fusion attribute match (e.g., name only) | Simple matching; one strong identifier |
| Multi-attribute (combined) | Several attribute matches + minimum combined match score | Weighted blend of similarities; tune global floor and per-rule minima/weights |
| Multi-attribute (strict) | Several mandatories with high minima | All critical attributes must pass; combined score must still meet global floor |
| Hybrid | Some mandatory, some optional | Critical attribute (email) must match; others (name, phone) support decision |
Example configurations:
Configuration 1: Name-only matching (simple)
- Attribute: name
- Algorithm: Enhanced Name Matcher
- Score: 85
→ Only name used; must score ≥85
Configuration 2: Name + email (balanced)
- Attribute: name, Algorithm: Enhanced Name Matcher, Score: 80
- Attribute: email, Algorithm: Jaro-Winkler, Score: 90
- Minimum combined score tuned with both rules contributing weighted similarity
→ Both contribute to combined score; mandatory rules must pass
Configuration 3: Strict email + supporting name
- Attribute: email, Algorithm: Jaro-Winkler, Score: 95, Mandatory: Yes
- Attribute: name, Algorithm: Enhanced Name Matcher, Score: 75, Mandatory: No
→ Email must match; name optional but helps
Configuration 4: Comprehensive combined score
- Attribute: firstname, Algorithm: Enhanced Name Matcher, Minimum similarity: 80
- Attribute: lastname, Algorithm: Enhanced Name Matcher, Minimum similarity: 80
- Attribute: email, Algorithm: Jaro-Winkler, Minimum similarity: 90
- Minimum combined match score: 80
→ Weighted combined score must be ≥80; evaluated mandatory rules must pass
Combined match score
Matching always uses a weighted combined score: for each evaluated (non-skipped) rule, multiply its similarity by its minimum similarity (weight; 0 → treated as 1), sum, and divide by the sum of weights. That value must be ≥ minimum combined match score. Evaluated mandatory rules must also meet their own minimums. Non-mandatory rules can be below their minimum while still contributing their raw similarity to the blend.
Example: three rules with minimums (weights) 80, 90, 75 — similarities 85, 90, 70:
Combined = (85×80 + 90×90 + 70×75) / (80+90+75) ≈ 82.5
With minimum combined match score 80 → potential match if all mandatory rules pass.
Tuning weights: Raise a rule’s minimum similarity to make that attribute stricter and give it more influence on the combined score.
Tuning tips
| Goal | Approach |
|---|---|
| Stricter on one attribute | Raise its minimum (stronger weight + harder to pass if mandatory) |
| Softer global bar | Lower minimum combined match score |
| Stricter overall | Raise minimum combined match score or add mandatory rules |
Tuning thresholds
Initial thresholds (starting points)
| Attribute type | Algorithm | Starting threshold | Adjust if... |
|---|---|---|---|
| Full name | Enhanced Name Matcher | 80 | Too many false positives → 85; missing matches → 75 |
| First name | Enhanced Name Matcher | 85 | Too strict → 80; too loose → 90 |
| Last name | Enhanced Name Matcher | 88 | Missing matches → 85; false positives → 92 |
| Jaro-Winkler | 92 | Very strict domain → 95; relaxed → 88 | |
| Username | Jaro-Winkler | 95 | Nearly exact needed → 98 |
| Phone | Jaro-Winkler | 88 | After normalization |
| Address | Dice | 75 | Strict → 80; relaxed → 70 |
| Job title | Dice | 78 | Strict → 82; relaxed → 73 |
Tuning workflow
| Phase | Action | Goal | Metrics |
|---|---|---|---|
| 1. Baseline | Use starting thresholds from table above | Conservative; low false positive rate | Review 10–20 initial matches manually |
| 2. Test with sample | Run on 100–500 accounts (recommended via custom:dryrun) |
Assess match quality | False positive rate, false negative rate |
| 3. Analyze results | Review all generated forms | Identify patterns | Are false positives due to one attribute? |
| 4. Adjust thresholds | Increase (stricter) or decrease (looser) | Balance precision vs recall | Target: <10% false positive rate |
| 5. Retest | Run on same or different sample | Validate improvements | Compare metrics to phase 2 |
| 6. Production | Remove sample limits | Full deployment | Monitor ongoing |
Balancing precision and recall
| Scenario | Symptom | Adjustment |
|---|---|---|
| High false positives | Many forms for obvious non-duplicates | Raise thresholds; add mandatory matches for critical attributes |
| High false negatives | Missing obvious matches | Lower thresholds; add more attributes; try different algorithms |
| Borderline cases | Many ambiguous matches | Enable Automatically assign on exact match? for obvious ones; manual review for borderline |
Screenshot placeholder: Review form showing per-attribute similarity scores.

Automatic assignment (exact scores)
When to use
Automatically assign on exact match? = Yes
Effect: Candidates that are an exact attribute match (every real rule scored 100 with none skipped) are assigned to that identity without manual review.
| Enable when... | Keep disabled when... |
|---|---|
| Thresholds are well-tuned | Initial setup / testing |
| False positive rate is <5% | High-risk merges (finance, healthcare) |
| Review burden is high (>50 forms/week) | You want manual approval for all merges |
| Obvious matches are common | Data quality is poor |
When it runs: When Automatically assign on exact match? is enabled, the connector skips the review form when every real rule was evaluated (none skipped for missing values) and all attribute similarity scores are 100.
Common matching patterns
Pattern 1: Conservative (high confidence only)
Goal: Only flag very obvious matches; minimize false positives.
- Attribute: email, Algorithm: Jaro-Winkler, Score: 95, Mandatory: Yes
- Attribute: name, Algorithm: Enhanced Name Matcher, Score: 88
→ Email must nearly match; name must also be very close
Use case: High-risk environments (financial, healthcare); initial rollout.
Pattern 2: Balanced (moderate confidence)
Goal: Balance between catching matches and avoiding false positives.
- Attribute: name, Algorithm: Enhanced Name Matcher, Minimum similarity: 80
- Attribute: email, Algorithm: Jaro-Winkler, Minimum similarity: 85
- Minimum combined match score: e.g. 80
→ Weighted combined score must meet global floor; mandatories must pass
Use case: General corporate environments; standard data quality.
Pattern 3: Aggressive (catch more matches)
Goal: Flag potential matches even with lower confidence; accept some false positives.
- Attribute: firstname, Algorithm: Enhanced Name Matcher, Minimum similarity: 75
- Attribute: lastname, Algorithm: Enhanced Name Matcher, Minimum similarity: 78
- Attribute: email, Algorithm: Jaro-Winkler, Minimum similarity: 70
- Minimum combined match score: 75
→ Relaxed per-rule minima; combined score must still reach global floor
Use case: Poor data quality; many known matches; strong review team.
Pattern 4: Phonetic (spelling variations)
Goal: Catch names with different spellings but same pronunciation.
- Attribute: name, Algorithm: Double Metaphone, Score: 80
- Attribute: email, Algorithm: Jaro-Winkler, Score: 85, Mandatory: Yes
→ Phonetic name match + email confirmation
Use case: International names; known spelling variations; diverse workforce.
Pattern 5: Hybrid (critical + supporting)
Goal: One critical mandatory attribute plus supporting optional attributes.
- Attribute: employeeId, Algorithm: Jaro-Winkler, Score: 98, Mandatory: Yes
- Attribute: name, Algorithm: Enhanced Name Matcher, Score: 75, Mandatory: No
- Attribute: email, Algorithm: Jaro-Winkler, Score: 80, Mandatory: No
→ Employee ID must match; name and email provide additional confidence
Use case: Strong business key exists; other attributes support verification.
Real-world matching examples (anonymized)
The rows below are fictional composites. Source A and Source B stand in for any two authoritative feeds from your own topology; do not treat the labels as product-specific. Use them to reason about algorithms, Map/Define normalization, and reviewer context.
Transposed date of birth
- Source A: Daniel Kim,
1999-03-08, M - Source B: Daniel Kim,
1999-08-03, M
Why it is ambiguous: The calendar values differ only by digit order (month/day swap), not by obvious typo in a name string.
What to do: Normalize both sides to the same canonical form (for example ISO YYYY-MM-DD or a comparable epoch) in Map or Define before matching—or exclude raw DOB from string similarity rules. Pure string algorithms on date text often mis-score transpositions; see Dates.
Last name change (marriage or legal change)
- Source A: Olivia Nguyen,
1997-06-21, F,olivia.nguyen@example.com - Source B: Olivia Patel,
1997-06-21, F,olivia.nguyen@example.com
Why it is ambiguous: Same person indicators (DOB, email, gender) align while surname differs after a legal change.
What to do: Enhanced Name Matcher on full name or separate firstname / lastname rules with sensible minima; treat email as a strong corroborating rule (Jaro-Winkler, often mandatory). Reviewers should see email + DOB on the form.
Preferred or nickname vs legal name
- Source A: Chris Johnson,
2000-09-15, M - Source B: Christopher Johnson,
2000-09-15, M
Why it is ambiguous: Enhanced Name Matcher is intended to relate common nickname ↔ legal pairs when comparing person-name attributes.
What to do: Prefer Enhanced Name Matcher on name or firstname; add a second signal (DOB, email, employee ID) if automatic assignment must stay conservative.
Multipart or cultural last-name variation
- Source A: Maria De La Cruz,
1996-11-02, F - Source B: Maria Cruz,
1996-11-02, F
Why it is ambiguous: One source keeps a compound surname; another collapses or splits tokens differently.
What to do: Enhanced Name Matcher on full name; optional LIG3 if you compare a single compound lastname field and need token-gap tolerance—tune thresholds and validate on your data. Ensure review attributes include the full name from both sides.
Phone formatting differences
- Source A: James Miller,
1995-02-10, M,(402) 555-2222 - Source B: James Miller,
1995-02-10, M,4025552222
Why it is ambiguous: Same digits, different punctuation and grouping.
What to do: Strip non-digits (and optionally normalize country code) in Define, then Jaro-Winkler on phone with thresholds in the Phone number (normalized) range described under Jaro-Winkler above. Do not match raw formatted strings without normalization.
Legal sex or gender marker difference
- Source A: Taylor Morgan,
1998-07-30, M - Source B: Taylor Morgan,
1998-07-30, F
Why it is ambiguous: Other attributes match exactly, but a policy-sensitive field disagrees—may be data error, timing, or identity semantics.
What to do: Decide by governance policy: either omit this attribute from automated matching, use Mandatory match? only when sources are contractually aligned, or always send to manual review with clear form copy. Do not rely on similarity alone for high-stakes demographic fields.
Partial data (missing attributes on one side)
- Source A: Aisha Khan,
2001-05-18, F — (no email, no phone on record) - Source B: Aisha Khan,
2001-05-18, F —aisha.khan@example.com,402-555-3333
Why it is ambiguous: With Skip match if missing = Yes (default), rules on email/phone are skipped when Source A is empty, so the combined score rests on fewer signals—higher false positive or false negative risk depending on thresholds.
What to do: Keep strong non-skipped rules (name + DOB) where populated; document reviewer expectations; consider Skip match if missing = No only for attributes you intentionally want to penalize when absent, understanding side effects on combined score and automatic assignment.
Typographical error
- Source A: Michael Anderson,
1994-12-05, M - Source B: Michael Andersn,
1994-12-05, M
Why it is ambiguous: Single-character suffix typo in last name.
What to do: Jaro-Winkler tolerates some end typos on short strings; Enhanced Name Matcher on full name often still scores well. If typos dominate, slightly lower last-name minimum similarity or add a phonetic rule (Double Metaphone) as a secondary signal, not the only gate.
International character variation
- Source A: José Garcia,
1993-08-14, M - Source B: Jose Garcia,
1993-08-14, M
Why it is ambiguous: Accent present in one system, ASCII in the other.
What to do: Enhanced Name Matcher handles accents; alternatively enable Normalize special characters? in Define before Jaro-Winkler / Dice on affected fields. LIG3 can score accented vs ASCII highly when configured appropriately; validate on samples.
Data Preprocessing and Edge Cases
The Normalizer Tool
Before relying entirely on matching algorithms, consider enabling the Normalize special characters? transformation during the Define phase. Normalization transliterates international accents and strips erratic punctuation (like apostrophes in "O'Conner" or hyphens).
- Why it matters: Algorithms like
Jaro-WinklerandDiceare strictly mechanically based on characters. "Renée" vs "Renee" scores poorly under Dice (50%) but scores 100% when normalized.LIG3penalizes punctuation as unmapped insertions (dropping scores to ~64%), which the normalizer effortlessly resolves. - Exception: The
Enhanced Name Matchernatively handles accents and unicode transliteration, so it is less reliant on upstream normalization.
Dates
Dates are notoriously poor candidates for pure string-matching algorithms due to format variance (e.g. 10/05/1990 vs 1990-10-05 vs Oct 5th 1990).
- String matching models (like
LIG3orDice) treat dates entirely as structural tokens which often drop similarity bounds below 50% if the standard is mixed. - Best Practice: Do not match raw dates using these algorithms. Standardize the date formats (either into epoch arrays or ISO standard strings) upstream using Velocity templates or the Map engine.
Long Addresses
- When addresses use standardized structural variations (e.g.
1234 Elm Street Suite 500vs1234 Elm St Ste 500), Jaro-Winkler is the most robust (90%), followed tightly by LIG3 (82%). - When addresses get structurally re-ordered (e.g.
Apt 12 400 Broad Stvs400 Broad St Apt 12), prefix-based algorithms likeJaro-WinklerandLIG3break down rapidly. In this specific format, Dice becomes the optimal choice due to its non-linear N-gram tokenizing (76% consistency).
Troubleshooting matching issues
| Issue | Possible cause | Solution |
|---|---|---|
| No matches found | Thresholds too high | Lower by 5–10 points; check if attributes exist on identities |
| Too many false positives | Thresholds too low; wrong algorithm | Raise thresholds; add mandatory match for critical attribute; switch algorithm |
| Name matches fail | Title/order differences; wrong algorithm | Use Enhanced Name Matcher (not Jaro-Winkler) for names |
| Email matches fail | Case sensitivity; domain differences | Normalize email to lowercase; check domain importance |
| Inconsistent results | Missing or null attribute values | Verify attributes exist and are populated on all identities |
| Algorithm seems wrong | Mismatched algorithm for attribute type | Review algorithm selection guide above |
Summary and decision guide
Quick algorithm selection
| Attribute | Recommended algorithm | Threshold range |
|---|---|---|
| Full name, display name | Enhanced Name Matcher | 75–85 |
| First name, last name | Enhanced Name Matcher | 80–92 |
| Missing middle names | LIG3 | 60-70 |
| International names | Enhanced Name Matcher / LIG3 | 80-92 |
| Jaro-Winkler | 90–95 | |
| Username, employee ID | Jaro-Winkler | 95–100 |
| Phone (normalized) | Jaro-Winkler | 85–92 |
| Address | Dice | 70–80 |
| Transposed identifiers | Dice | 85-95 |
| Job title, department | Dice | 72–85 |
| Name (phonetic) | Double Metaphone | 75–85 |
Key principles
- Start conservative — High thresholds initially; lower as you gain confidence
- Use appropriate algorithms — Names (Enhanced Name Matcher), short text (Jaro-Winkler), long text (Dice), phonetic (Double Metaphone)
- Test with samples — Don't run on full dataset until thresholds are tuned
- Monitor and adjust — Track false positive/negative rates; iterate
- Balance precision and recall — Lower thresholds catch more matches but increase false positives
- Consider automatic assignment — Enable after tuning to reduce manual review burden
Next steps:
- For full Match setup, see Identity Fusion for Match.
- For attribute merging and mapping, see Map.