Vector

Vector is a lightweight and ultra-fast tool for building observability pipelines. It can be used to replace Logstash, Fluent, Telegraf, Beats, or similar tools. It has built-in support for shipping logs to LogScale through the humio_logs sink.

Vector can be installed on Linux, Windows, and MacOS. The Vector documentation includes several methods of installation.

Note

Vector supports sending its own internal metrics through an internal_metrics source. However, at this time, some of the internal metrics can lead to parsing performance issues & high system load when they are sent to LogScale. For this reason we recommend that you do not send Vector's internal_metrics to LogScale, and instead sink it to your other monitoring systems. Vector can send these metrics to Prometheus, statsd, and more.

Configuration

Sending data to LogScale with Vector is very easy using the humio_logs sink. We only need the URL of the LogScale cluster and an Ingest Tokens.

In the example below we configure Vector to read from standard input (stdin) and send each line to a LogScale sink, the LogScale cluster. Messages entered at the command-line after starting Vector will be sent to LogScale.

First, you'll need to create vector configuration file, vector.toml. Do this with a simple text editor and add the following lines:

ini
data_dir = "/var/lib/vector"

[sources.my_stdin_source]
type = "stdin"

[sinks.my_humio_cluster]
inputs = ["my_stdin_source"]
type = "humio_logs"
encoding.codec = "json"
host = "${HUMIO_URL}"
token = "${HUMIO_INGEST_TOKEN}"

By default, Vector sends events to LogScale as json. Vector version 0.9.1 added the option to send logs to LogScale in the raw text format by setting the encoding.codec to a value of text.

Now, run Vector with the environment variables HUMIO_URL and HUMIO_INGEST_TOKEN set appropriately and enter test messages:

verilog
HUMIO_URL=http://localhost:8080 HUMIO_INGEST_TOKEN=KL95YdaSYEWJ1tV9CPEqWGdMi4FVXghD0xxGrDAU3Wg5 vector --config vector.toml
Mar 04 13:40:19.770  INFO vector: Log level "info" is enabled.
Mar 04 13:40:19.770  INFO vector: Loading configs. path=["vector.toml"]
Mar 04 13:40:19.773  INFO vector: Vector is starting. version="0.8.1" git_version="v0.8.1" released="Wed, 04 Mar 2020 15:11:57 +0000" arch="x86_64"
Mar 04 13:40:19.773  INFO vector::topology: Running healthchecks.
Mar 04 13:40:19.773  INFO vector::topology: Starting source "my_stdin_source"
Mar 04 13:40:19.773  INFO vector::topology: Starting sink "my_humio_cluster"
Mar 04 13:40:19.774  INFO source{name=my_stdin_source type=stdin}: vector::sources::stdin: Capturing STDIN
Mar 04 13:40:19.781  INFO vector::topology::builder: Healthcheck: Passed.
Example Message 1
Example Message 2

If everything started properly, search your LogScale repository for the test messages. The messages in LogScale will have the following structure. Note that Vector adds timestamp and host to the messages.

javascript
{"@timestamp":1583349673000,"#type":"none","host":"MacBook-Pro.local","#repo":"vector-example","@timezone":"Z","message":"Example Message 2","@rawstring":"{\"host\":\"Daniels-MacBook-Pro.local\",\"message\":\"Example Message 2\"}","@id":"mENFVMQVJyQ2M5pV4D1sFMB9_1_1_1583349673"}
{"@timestamp":1583349669000,"#type":"none","host":"MacBook-Pro.local","#repo":"vector-example","@timezone":"Z","message":"Example Message 1","@rawstring":"{\"host\":\"Daniels-MacBook-Pro.local\",\"message\":\"Example Message 1\"}","@id":"mENFVMQVJyQ2M5pV4D1sFMB9_1_0_1583349669"}

As a next step you should configure Vector to watch some file sources or use one of Vector's many source types to gather data from other parts of your system.

Adding Fields

Vector makes it possible to add fields with static values using its transforms capability. In the example below a field called name will be added to the event sent to LogScale with the value set to Name:

ini
[transforms.sourcename_transform]
type = "add_fields"
inputs = ["sourcename"]
fields.name = "Name"

You'll need to update the inputs section of your sinks to point the transformation that you created in order for the new field to be added to the event (as illustrated below).

ini
[sinks.humio_out]
type = "humio_logs"
inputs = ["sourcename_transform"]
encoding.codec = "json"
token = "$api-token"
host = "$humio-url"

See Vector's documentation on Adding Fields for more information.

Setting the LogScale Parser

THe LogScale Logs plug-in for vector allows you to specify the parser to use to parse data.

ini
[sinks.my_humio_sink]
type = "humio_logs"
event_type = "my_custom_parser"

Where the value of the field event_type is the name of the parser you want to use. LogScale will then automatically use the value of this field to select the parser once ingested. For more information see Vector's LogScale logs plug-in

Multi-Line Events

By default, Vector creates one event for each line in the in a file. However, you can also split events in different ways. For example, stack traces in many programming languages span multiple lines.

You can specify multiline settings in the Vector configuration. See Vector's multiline configuration documentation

Often a log event starts with a timestamp, and we want to read all lines until we see a new line starting with a timestamp. In Vector that can be done like this:

ini
[sources.source_name.multiline]
# Example: [4/28/20 14:59:25:783 EDT]
start_pattern = "^\\[[0-9]{1,2}/[0-9]{1,2}/[0-9]{2}"
mode = "halt_before"
condition_pattern = "^\\[[0-9]{1,2}/[0-9]{1,2}/[0-9]{2}"
timeout_ms = 1000

The start_pattern should match your timestamp format.

Wildcard or "glob" log paths

Vector supports using a wildcard or "glob" to match log file pathnames which helps when aggregating logs from several hosts.

ini
# Ingest logfiles in a /var/log/%HOSTNAME%/%LOG%.log hierarchy.
[sources.testhostlogs]
type = "file" # required
include = ["/var/log/*.example.com/*.log"]

When using wildcards, keep the following in mind:

  • Wildcards are re-scanned every 1000ms by default. This is controlled by the Vector glob_minimum_cooldown setting.

  • If you have a directory as part of the glob path, as shown above, be sure that the vector user has both "read" and "execute" permissions on the directories used in the path.

Given the example above, without "read" permission on a directory that matches /var/log/*.example.com/, Vector will not be able to examine the directory contents to find matches for the *.log part of the path. Vector will ignore any directories that it cannot read, so check your file permissions if you are not seeing the expected log entries in LogScale.

To debug this and other issues, it's helpful to examine Vector's logs with:

shell
journalctl -fu vector

For other suggestions, see Vector Troubleshooting Guide.