Event Tags

LogScale stores data in physical partitions called LogScale Internal Architecture. Parsers can be configured to assign events to a particular data source based on specific fields — this is called tagging. Tagging is an advanced topic and you should only consider using tags if you need to optimize search speeds. If you create too many different tag combinations, performance will suffer. For more information on tags and data sources, see Tag Fields and Datasources.

When defining tags, the following considerations should be taken into account:

  • Tagging affects both the repository performance and the performance of the cluster as a whole.

  • When selecting the tags for a repository, the consideration should be to create datasources so that data flows at a rate of at least 100 kB/s for most tag combinations for optimum performance.

    Ingest rates above 2MB/s will automatically be sharded. If when selecting tags the ingest rate can be optimized in the 100KB/s - 2MB/s range, then it is easier to get good data selection when searching. See Configure Auto-Sharding for High-Volume Data Sources.

  • For the cluster, the consideration is that more datasources drives memory requirements for all nodes in the cluster, as increasing the number of tags increases the amount of meta data. There is no hard rule here, but the order of magnitude here is 100K per datasource.

  • To ensure that information is not tagged by mistake, there are limits of the number of tags in a single repo. During ingest, the fields #tooManyTagValueCombination = true and #error = true will be added to events that exceed this value, and ignore any tags from the input.

    The limit is configurable per repo (see Manage Data Sources Limits. The default can be set cluster wide using MAX_DATASOURCES.

  • Avoid using tags that have a high-cardinality (i.e. have a large number of unique values), as this increases the number of combinations, memory requirements, and segment organization.

For more information on tagging and datasources, see Datasources. For more information regarding increasing the maximum number data sources per repository when using event tags, see Update Datasources Limit.

Assigning Tags to Events

Parsers are responsible for assigning tags to events. Proper event tagging depends on how you intend to search in your repository, and therefore most built-in parsers will not tag events at all.

Tags are always assigned based on field values.

  1. To use a field as a tagging field go to ParsersSettings next to CodeTagging in the side menu to display the Tagging page while editing a parser.

    Tagging Data

    Figure 47. Tagging Data


  2. On the Tagging page, specify which fields should be used for tagging. Remember to limit yourself, as each unique tag combination will create a separate datasource and require heap memory.

    Make sure you pick fields with a limited value space. For example, for HTTP Access Logs a good tagging field could be the HTTP Method because it has a limited set of possible values (GET, POST, HEAD) and is something which would be part of almost every search query.

  3. Click Save.

Tagging and LogScale's Ingest API

It is also possible to specify tags as part of the data sent to LogScale. For instance, when using LogScale's structured ingest endpoint, all fields are already defined by the client and no parsing is involved during ingest. Here it is possible to specify which tags should be assigned as part of the message.

Tagging and Filebeat

When shipping data, the recommended way to tag is to specify the tagging fields in the parser settings. When shipping data through Filebeat, you can add tags as part of the Filebeat configuration on the client side. These tags will be added to all events.

The only exception is the #type tag. The #type tag can be used to specify parsers that should be used for ingestion when events arrive at LogScale. But in general it's a much more flexible solution to assign parsers through Ingest Tokens and avoid specifying the parser on the sender side.

Example

Assume you have an Nginx server which is sending access logs to LogScale. You have also defined the two fields method and secret as tagging fields.

Now assume that some URL contains sensitive information and you would like to limit access to them to only a subset of your LogScale users. In this case we will say that any URL that starts with one of:

  • /transactions/

  • /admin/

Should be tagged as secret. Let's write the parser:

logscale
// The full accesslog parser has been left out for brevity.
... 
| 
case {
  // CASE: Match events with a url field starting /transactions/ or /admin/
  url = "/transactions/*" OR "/admin/*" 
| secret := true;
  // CASE: Match all other events
  true 
| secret := false;
}

We could now create a Views for the users that don't have access rights to look at the data marked as secret=true.

Note

We created a new field as part of the parsing process which was then used to tag the incoming events. Had we not defined secret as a tagging field the view would still work perfectly fine. In fact we would get the exact same results — albeit without the performance enhancement of the tag-based search.