Calculate Edit Distance Between Domain Names

Measure string similarity between domains and a reference value using the text:editDistance() function

Query

flowchart LR; %%{init: {"flowchart": {"defaultRenderer": "elk"}} }%% repo{{Events}} 1[(Function)] result{{Result Set}} repo --> 1 1 --> result
logscale
text:editDistance(target=forwarded, reference="crowdstrike.com", maxDistance=5)

Introduction

The text:editDistance() function can be used to calculate the Levenshtein edit distance between a target string and a reference string. The edit distance represents the minimum number of single-character edits required to change one string into another.

In this example, the text:editDistance() function is used to compare domain names against a reference domain to identify potential typosquatting attempts.

Example incoming data might look like this:

@timestampforwardedsource_iprequest_type
2025-10-15T10:00:00Zcrowdstrike.com192.168.1.100DNS
2025-10-15T10:01:00Zcrowdstreak.com192.168.1.101DNS
2025-10-15T10:02:00Zcrownstrike.com192.168.1.102DNS
2025-10-15T10:03:00Zlogscale.com192.168.1.103DNS
2025-10-15T10:04:00Zcrwd.com192.168.1.104DNS

Step-by-Step

  1. Starting with the source repository events.

  2. flowchart LR; %%{init: {"flowchart": {"defaultRenderer": "elk"}} }%% repo{{Events}} 1[(Function)] result{{Result Set}} repo --> 1 1 --> result style 1 fill:#ff0000,stroke-width:4px,stroke:#000;
    logscale
    text:editDistance(target=forwarded, reference="crowdstrike.com", maxDistance=5)

    Calculates the edit distance between the value in the forwarded field and the reference domain crowdstrike.com.

    The maxDistance parameter is set to 5. meaning that if the calculated distance is less than 5, it will return that distance, otherwise it will return 5 (the defined maxDistance).

    The function returns the edit distance value in a field named _distance.

  3. Event Result set.

Summary and Results

The query is used to identify how different a domain name is from a known legitimate domain by counting the minimum number of character changes needed.

This query is useful, for example, to detect potential phishing domains that are slight misspellings of legitimate domain names, or to identify typosquatting attempts in DNS queries.

Both the text:editDistance() and text:editDistanceAsArray() functions can be used to calculate Levenshtein edit distances between strings. While they serve similar purposes, they differ in their ability to handle reference values and in their output format. See Compare Domain Names Using Text Edit Distance Array

Sample output from the incoming example data:

_distance
0
3
1
5
5

Note that a distance of 0 indicates an exact match with the reference domain. The higher the number, the more different the domain is from the reference.

For domains with an edit distance greater than the specified maxDistance, the function returns the defined maxDistance as the result.

This data would be effective in a table widget showing the domain names alongside their edit distances. For security monitoring, you could create a line chart showing edit distances over time to identify patterns of similar domain variations. Consider setting up alerts for when domains with small edit distances (1-3) are detected, as these are common in phishing attempts.