Using match()
in Multi-Cluster Scenarios
When executing a multi-cluster query using the
match()
function, the query is processed in two
parts:
Query up to and including the first aggregate function
This part is executed on the remote clusters. For example, the query:
logscale Syntax<filter> | match(file="names.csv", field=id, include=[name])
requires the file
names.csv
to be present on each remote cluster participating in the search. LogScale expects the files to be identical, in name and content, across all clusters.The method of file distribution across clusters depends on your LogScale version:
LogScale version File distribution method 1.163 and above The file from your local cluster is automatically sent to all remote clusters.
No manual file synchronization is needed.
If a remote cluster contains a file with the same name, it will be ignored, and the version from the local cluster will be used instead.
Below 1.163 You must manually upload identical files to all participating clusters.
LogScale does not automatically synchronize information across the clusters.
Everything after the first aggregate function
This part is executed on the local cluster. For example, the query:
logscalegroupBy(id) | match(file="names.csv", field=id, include=[name])
matches the results of
groupBy()
against thenames.csv
file stored on the local cluster. This is becausegroupBy()
is an aggregate function so the match comes after the first aggregate. In this scenario, LogScale only requires the file to be present on the local cluster.
Note
For LogScale versions below 1.163, you need to enable the
UNSAFE_ALLOW_FEDERATED_MATCH
environment variable to use
match()
.