Using match() in Multi-Cluster Scenarios

When executing a multi-cluster query using the match() function, the query is processed in two parts:

  1. Query up to and including the first aggregate function

    This part is executed on the remote clusters. For example, the query:

    logscale Syntax
    <filter> 
    | match(file="names.csv", field=id, include=[name])

    requires the file names.csv to be present on each remote cluster participating in the search. LogScale expects the files to be identical, in name and content, across all clusters.

    The method of file distribution across clusters depends on your LogScale version:

    LogScale version File distribution method
    1.163 and above
    • The file from your local cluster is automatically sent to all remote clusters.

    • No manual file synchronization is needed.

    • If a remote cluster contains a file with the same name, it will be ignored, and the version from the local cluster will be used instead.

    Below 1.163
    • You must manually upload identical files to all participating clusters.

    • LogScale does not automatically synchronize information across the clusters.

  2. Everything after the first aggregate function

    This part is executed on the local cluster. For example, the query:

    logscale
    groupBy(id) 
    | match(file="names.csv", field=id, include=[name])

    matches the results of groupBy() against the names.csv file stored on the local cluster. This is because groupBy()is an aggregate function so the match comes after the first aggregate. In this scenario, LogScale only requires the file to be present on the local cluster.

Note

For LogScale versions below 1.163, you need to enable the UNSAFE_ALLOW_FEDERATED_MATCH environment variable to use match().