Event Tags
LogScale stores data in physical partitions called LogScale Internal Architecture. Parsers can be configured to assign events to a particular data source based on specific fields — this is called tagging. Tagging is an advanced topic and you should only consider using tags if you need to optimize search speeds. If you create too many different tag combinations, performance will suffer. For more information on tags and data sources, see Tag Fields and Datasources.
When defining tags, the following considerations should be taken into account:
Tagging affects both the repository performance and the performance of the cluster as a whole.
When selecting the tags for a repository, the consideration should be to create datasources so that data flows at a rate of at least 100 kB/s for most tag combinations for optimum performance.
Ingest rates above 2MB/s will automatically be sharded. If when selecting tags the ingest rate can be optimized in the 100KB/s - 2MB/s range, then it is easier to get good data selection when searching. See Configure Auto-Sharding for High-Volume Data Sources.
For the cluster, the consideration is that more datasources drives memory requirements for all nodes in the cluster, as increasing the number of tags increases the amount of meta data. There is no hard rule here, but the order of magnitude here is 100K per datasource.
To ensure that information is not tagged by mistake, there are limits of the number of tags in a single repo. During ingest, the fields #tooManyTagValueCombination = true and #error = true will be added to events that exceed this value, and ignore any tags from the input.
The limit is configurable per repo (see Manage Data Sources Limits. The default can be set cluster wide using
MAX_DATASOURCES
.Avoid using tags that have a high-cardinality (i.e. have a large number of unique values), as this increases the number of combinations, memory requirements, and segment organization.
For more information on tagging and datasources, see Datasources. For more information regarding increasing the maximum number data sources per repository when using event tags, see Update Datasources Limit.
Example
Assume you have an Nginx server which is sending access logs to LogScale. You have also defined the two fields method and secret as tagging fields.
Now assume that some URL contains sensitive information and you would like to limit access to them to only a subset of your LogScale users. In this case we will say that any URL that starts with one of:
/transactions/
/admin/
Should be tagged as secret. Let's write the parser:
// The full accesslog parser has been left out for brevity.
...
|
case {
// CASE: Match events with a url field starting /transactions/ or /admin/
url = "/transactions/*" OR "/admin/*"
| secret := true;
// CASE: Match all other events
true
| secret := false;
}
We could now create a Views for the users that don't have access rights to look at the data marked as secret=true.
Note
We created a new field as part of the parsing process which was then used to tag the incoming events. Had we not defined secret as a tagging field the view would still work perfectly fine. In fact we would get the exact same results — albeit without the performance enhancement of the tag-based search.