SHA-1 Hash Multiple Fields

SHA-1 hash multiple fields using the crypto:sha1() function

Query

logscale
crypto:sha1(field=[a,b,c])

Introduction

In LogScale it is possible to encode strings using different algorithms such as MD5, SHA-1, and SHA-256 and create a hash; also called a fingerprint. The MD5 hash function is the weakest of the three, whereas SHA-256 is the strongest. The crypto:sha1() function is used to create the SHA-1 hash by taking a string of any length and encoding it into a 160-bit fingerprint. The fingerprint is returned as hexadecimal characters. Encoding the same string using the SHA-1 algorithm will always result in the same 160-bit hash output (40 hexadecimal digits).

In this example, the crypto:sha1() function is used to hash the fields a,b,c and return the result into a field named _sha1.

Step-by-Step

  1. Starting with the source repository events.

  2. flowchart LR; %%{init: {"flowchart": {"defaultRenderer": "elk"}} }%% repo{{Events}} 0>Augment Data] result{{Result Set}} repo --> 0 0 --> result style 0 fill:#ff0000,stroke-width:4px,stroke:#000;
    logscale
    crypto:sha1(field=[a,b,c])

    Performs a cryptographic SHA1-hashing of a,b,c. The field argument can be omitted to write: crypto:sha1([a,b,c])

  3. Event Result set.

Summary and Results

The query is used to encode a string using the SHA-1 hash. When called with multiple values, crypto:sha1() function creates a single SHA-1 sum from the combined value of the supplied fields. Combining fields in this way and converting to an SHa-1 can be an effective method of creating a unique ID for a given fieldset which could be used to identify a specific event type. The SHA-1 is reproducible (for example, supplying the same values will produce the same SHA-1 sum), and so it can sometimes be an effective method of creating unique identifier or lookup fields for a join() across two different datasets.