Query Function Limitations

Some query functions are not supported on multi-cluster views, specifically the ones that have complex flow of data and the ones that depend on state that may be inconsistent between clusters.

Multi-Cluster match() Support

Wen using the match() function in a multi-cluster scenario, care must be taken to ensure that the same file has been updloaded to each cluster in the multi-cluster view. LogScale does not automatically synchronise information across the clusters.

Although qurying is not limited or prevented when the versions of the file do not match, the results returned by the query may not return the expected result f the content of the file on each cluster are not identical.

If you do employ multi-cluster where individual clusters that have different versions of the same file then queries will behave in a well-defined but possibly unintuitive or unexpected way.

To understand this you need to understand a bit about how multi-cluster works. When executing a multi-cluster query we split the query into two parts:

The query up to and including the first aggregate function. This part is executed on the remote clusters. If a match appears in this part then each remote cluster will use their own version of the file. Everything after the first aggregate is executed on the federating cluster. If a match appears in this part then it will use the version of the file from the multi-cluster cluster.

This means that this multi-cluster query:

logscale
match(file="names.csv", field=id, include=[name])

Will use the version of names.csv on each of the remote clusters, whereas this query:

logscale
groupBy(id) | match(file="names.csv", field=id, include=[name])

Will use the names.csv present on the federating cluster. This is because groupBy() is an aggregate function so the match comes after the first aggregate.