The text:editDistance()
function returns
the edit distance (Levenshtein distance) between a target string
and a reference string. The Levenshtein distance represents the
minimum number of single-character edits required to transform
one string into another. Additionally, a variant of Levenshtein
also allows adjacent transpositions as a single operation.
The text:editDistance()
function calculates
the number of edit operations (addition/deletion/substitution)
needed to convert a target string to a reference string,
returning a numeric value (double) representing the edit
distance.
Note
Both the
text:editDistance()
andtext:editDistanceAsArray()
functions can be used to calculate Levenshtein edit distances between strings. While the functions serve similar purposes, they differ in their ability to handle reference values and in their output format.
Parameter | Type | Required | Default Value | Description |
---|---|---|---|---|
allowTranspositions | boolean | optional[a] | true | Enables adjacent character transposition as a single edit operation during distance calculation (instead of two seperate operations). |
as | string | optional[a] | _distance | The name of the output field. |
caseInsensitive | boolean | optional[a] | false | Converts both strings to lowercase before calculation when set to true (enabled). |
maxDistance | integer | required | The maximum edit distance (Levenshtein distance) to calculate. If the Levenshtein distance exceeds this value, maxDistance is returned. | |
Minimum | 1 | |||
Maximum | 100 | |||
reference | expression | required | The comparison string for distance calculation. | |
target | expression | required | The source string for distance calculation. | |
[a] Optional parameters use their default value unless explicitly set. |
The argument name for string
can be omitted.
text:editDistance()
Function Operation
The text:editDistance()
function has
specific implementation and operational considerations,
outlined below.
Input Processing
The text:editDistance()
function
evaluates two string expressions before performing
calculations:
The
text:editDistance()
function accepts a target string expression and a reference string expression.Both expressions convert to strings during evaluation.
The
text:editDistance()
function produces no output if either string is null or invalid.
Distance Calculation
The text:editDistance()
function
performs these core operations:
The
text:editDistance()
function calculates the Levenshtein distance between the target and reference strings.The output appears as a double value.
The calculation stops when the distance exceeds the
maxDistance
parameter.A lower
maxDistance
value improves computation speed.
Parameter Usage
Each parameter influences the calculation in specific ways:
maxDistanceLimits the maximum calculated distance.
Returns the maxDistance value if the actual distance exceeds it.
Improves performance by stopping calculations early.
When enabled (set to
true
), treats uppercase and lowercase characters as identical.May produce incorrect distances in special cases. For more information, see Special Considerations and Limitations.
When enabled (set to
true
), counts adjacent character swaps as a single edit operation.When disabled (set to
false
), counts adjacent character swaps as two separate (substitution) operations.
For more information, see Special Considerations and Limitations.
Special Considerations and Limitations
The function returns incorrect edit distance calculation results when processing uppercase characters that have different lengths when represented as codepoints in uppercase versus lowercase forms.
The following characters are known to cause this issue:
Sr.No. | Uppercase | Lowercase | Comments |
---|---|---|---|
1 | SS | ß |
The German sharp s (U+00DF ) capitalizes to SS . However, LogScale parser will treat SS as 2 English capital letters S . If possible, it is recommended to replace SS with (U+1E9E ) as the capital variant of (U+00DF ).
|
2 | K | k |
The K here refers to the kelvin sign (U+212A ) and not the English capital letter K . The kelvin sign lowercases to English lowercase letter k , which has a different length in codepoints.
|
3 | İ | i |
The Turkish İ (U+0130 ) lowercases to the English small letter i , which has a different length in codepoints.
|
The
allowTransposition
parameter determines if adjacent transpositions are allowed
during distance calculation. It defaults to
true
. Allowing adjacent transposition
means that during distance calculation, adjacent characters
can be swapped with a distance of 1
(the
transposition requires only one operation instead of two
operations).
For example, the distance between abc
and
acb
would be:
2
if adjacent transpositions are not allowed (abc
substitute(b,c)
acc
substitute(c,b)
acb
)1
if adjacent transpositions are allowed (abc
swap(b,c)
acb
)
Grapheme Clusters
The text:editDistance()
function works
on extended grapheme clusters as defined by the
Unicode
Standard Annex #29, specifically
UAX29-C1-1 similar to the
text:substring()
and the
text:length()
function. This means that
the edit distance between 🇩🇰😄😁 and 😄😄😁
would be 1
.
Furthermore, the text:editDistance()
function calculates more accurate edit distances for
non-Latin writing systems. For example, the distance between
नमस्ते
and
नमसते
would be 2 (replace
स्ते
with स
and add ते
).
text:editDistance()
Syntax Examples
This example calculates the edit distance between various
domain names and crowdstrike.com
:
text:editDistance(
target=forwarded,
reference="crowdstrike.com",
maxDistance=5
)
If input data was forwarded=crowdstrike.com
, forwarded=crowdstreak.com
,
forwarded=crownstrike.com
,
forwarded=logscale.com
, and
forwarded=crwd.com
it would return:
_distance |
---|
0 |
3 |
1 |
5 |
5 |
text:editDistance()
Examples
Click
next to an example below to get the full details.Calculate Edit Distance Between Domain Names
Measure string similarity between domains and a reference value
using the text:editDistance()
function
Query
text:editDistance(target=forwarded, reference="crowdstrike.com", maxDistance=5)
Introduction
In this example, the text:editDistance()
function
is used to compare domain names against a reference domain to identify
potential typosquatting attempts.
Example incoming data might look like this:
@timestamp | forwarded | source_ip | request_type |
---|---|---|---|
2025-10-15T10:00:00Z | crowdstrike.com | 192.168.1.100 | DNS |
2025-10-15T10:01:00Z | crowdstreak.com | 192.168.1.101 | DNS |
2025-10-15T10:02:00Z | crownstrike.com | 192.168.1.102 | DNS |
2025-10-15T10:03:00Z | logscale.com | 192.168.1.103 | DNS |
2025-10-15T10:04:00Z | crwd.com | 192.168.1.104 | DNS |
Step-by-Step
Starting with the source repository events.
- logscale
text:editDistance(target=forwarded, reference="crowdstrike.com", maxDistance=5)
Calculates the edit distance between the value in the forwarded field and the reference domain
crowdstrike.com
.The
maxDistance
parameter is set to5
. meaning that if the calculated distance is less than5
, it will return that distance, otherwise it will return5
(the definedmaxDistance
).The function returns the edit distance value in a field named _distance.
Event Result set.
Summary and Results
The query is used to identify how different a domain name is from a known legitimate domain by counting the minimum number of character changes needed.
This query is useful, for example, to detect potential phishing domains that are slight misspellings of legitimate domain names, or to identify typosquatting attempts in DNS queries.
Both the text:editDistance()
and
text:editDistanceAsArray()
functions can be used to
calculate Levenshtein edit distances between strings. While they serve
similar purposes, they differ in their ability to handle reference
values and in their output format. See
Compare Domain Names Using Text Edit Distance Array
Sample output from the incoming example data:
_distance |
---|
0 |
3 |
1 |
5 |
5 |
Note that a distance of 0
indicates an exact match
with the reference domain. The higher the number, the more different the
domain is from the reference.
For domains with an edit distance greater than the specified
maxDistance
, the
function returns the defined
maxDistance
as
the result.
This data would be effective in a table widget showing the domain names alongside their edit distances. For security monitoring, you could create a line chart showing edit distances over time to identify patterns of similar domain variations. Consider setting up alerts for when domains with small edit distances (1-3) are detected, as these are common in phishing attempts.