Collects fields from multiple events into one event. It has a limit of 1Kb
per key when used as part of a groupBy()
operation.
This limits the number of values you can index during the aggregation.
Parameter | Type | Required | Default Value | Description |
---|---|---|---|---|
fields [a] | array of strings | required | Names of the fields to keep. | |
limit | integer | optional[b] | 2000 | Limit to number of distinct values in collect. |
Minimum | 1 | |||
multival | boolean | optional[b] | true | Collects the resulting value as multivalue (a single field value using separator ). |
separator | string | optional[b] | \n | Separator used for multiple values. |
[b] Optional parameters use their default value unless explicitly set. |
Hide omitted argument names for this function
Omitted Argument NamesThe argument name for
fields
can be omitted; the following forms of this function are equivalent:logscale Syntaxcollect(["value"])
and:
logscale Syntaxcollect(fields=["value"])
These examples show basic structure only.
The collect()
function is limited in the memory for
while collecting data before the data is aggregated. The limit changes
depending on whether collect()
runs as a top level
function — in which case its limit is 10 MiB:
#type = humio #kind=logs
| collect(myField)
or whether it runs in a subquery, or as a sub-aggregator to another function — in which case its limit is 1 MiB:
#type=humio #kind=logs
groupBy(myField, function=collect(myOtherField))
Warning
Collecting the @timestamp field currently only works when a single timestamp exists. You can work around this restriction by renaming or making another field and collecting that instead, for example:
timestamp := @timestamp
| collect(timestamp)
If you do not need more than a single value, consider using the
selectLast()
function or setting
limit=1
, if you experience that the
@timestamp field not having a value.
collect()
Examples
Collects visitors, each visitor defined as non-active after one minute.
groupby(client_ip, function=session(maxpause=1m, collect([url])))
Collect fields from multiple events, counting the collected field:
LocalAddressIP4 = * RemoteAddressIP4 = * aip = *
| groupBy([LocalAddressIP4, RemoteAddressIP4], function=([count(aip, as=aipCount, distinct=true), collect([aip])]))