Compare Domain Names Using Text Edit Distance Array

Calculate edit distance between domain names and reference values using the text:editDistanceAsArray() function

Query

flowchart LR; %%{init: {"flowchart": {"defaultRenderer": "elk"}} }%% repo{{Events}} 1[(Function)] result{{Result Set}} repo --> 1 1 --> result
logscale
text:editDistanceAsArray(target=forwarded, references=["crowdstrike.com","crwd.com"], maxDistance=5)

Introduction

The text:editDistanceAsArray() function can be used to calculate the Levenshtein edit distance between a target string and multiple reference strings, returning the results as an array. This is particularly useful for detecting typosquatting or similar domain names that might be used in phishing attempts.

In this example, the text:editDistanceAsArray() function is used to compare forwarded domains against known legitimate domains to identify potential typosquatting attempts.

Example incoming data might look like this:

@timestampforwardedsource_iprequest_type
2025-10-15T10:00:00Zcrowdstrike.com192.168.1.100DNS
2025-10-15T10:01:00Zcrowdstreak.com192.168.1.101DNS
2025-10-15T10:02:00Zcrownstrike.com192.168.1.102DNS
2025-10-15T10:03:00Zlogscale.com192.168.1.103DNS
2025-10-15T10:04:00Zcrwd.com192.168.1.104DNS

Step-by-Step

  1. Starting with the source repository events.

  2. flowchart LR; %%{init: {"flowchart": {"defaultRenderer": "elk"}} }%% repo{{Events}} 1[(Function)] result{{Result Set}} repo --> 1 1 --> result style 1 fill:#ff0000,stroke-width:4px,stroke:#000;
    logscale
    text:editDistanceAsArray(target=forwarded, references=["crowdstrike.com","crwd.com"], maxDistance=5)

    Calculates the edit distance between the value in the forwarded field and each reference domain (crowdstrike.com and crwd.com).

    The maxDistance parameter is set to 5. This means that for pairs (target, reference) where the calculated distance is less than 5, the result contains that distance, otherwise the result contains 5 (maxDistance).

    The function returns an array field named _distance containing objects with distance and reference properties for each comparison.

  3. Event Result set.

Summary and Results

The query is used to identify domain names that are similar to known legitimate domains, which can help detect potential phishing or typosquatting attempts.

This query is useful, for example, to monitor DNS queries for slightly misspelled versions of legitimate domain names that might be used in phishing campaigns.

Both the text:editDistance() and text:editDistanceAsArray() functions can be used to calculate Levenshtein edit distances between strings. While they serve similar purposes, they differ in their ability to handle reference values and in their output format. See Calculate Edit Distance Between Domain Names.

Sample output from the incoming example data:

_distance[0].distance_distance[0].reference_distance[1].distance_distance[1].reference
0crowdstrike.com5crwd.com
3crowdstrike.com5crwd.com
1crowdstrike.com5crwd.com
5crowdstrike.com5crwd.com
5crowdstrike.com0crwd.com

Note that a distance of 0 indicates an exact match with the reference domain. The results are in the order of the events.

Also note that each row contains comparison results against all reference domains, even if some are beyond the maxDistance threshold.

This data would be well-suited for visualization in a table widget showing the domain names and their edit distances. For security monitoring, you could create alerts for when domains with small but non-zero edit distances are detected. A bar chart could also be used to show the distribution of edit distances over time, helping identify patterns in typosquatting attempts.