Vector
Vector is a lightweight and ultra-fast tool for building observability pipelines. It can be used to replace Logstash, Fluent, Telegraf, Beats, or similar tools. It has built-in support for shipping logs to LogScale through the humio_logs sink.
Vector can be installed on Linux, Windows, and MacOS. The Vector documentation includes several methods of installation.
Note
Vector supports sending its own internal metrics through an internal_metrics source. However, at this time, some of the internal metrics can lead to parsing performance issues & high system load when they are sent to LogScale. For this reason we recommend that you do not send Vector's internal_metrics to LogScale, and instead sink it to your other monitoring systems. Vector can send these metrics to Prometheus, statsd, and more.
Configuration
Sending data to LogScale with Vector is very easy using the humio_logs sink. We only need the URL of the LogScale cluster and an Ingest Tokens.
In the example below we configure Vector to read from standard input (stdin) and send each line to a LogScale sink, the LogScale cluster. Messages entered at the command-line after starting Vector will be sent to LogScale.
First, you'll need to create vector configuration file,
vector.toml
. Do this with a simple
text editor and add the following lines:
data_dir = "/var/lib/vector"
[sources.my_stdin_source]
type = "stdin"
[sinks.my_humio_cluster]
inputs = ["my_stdin_source"]
type = "humio_logs"
encoding.codec = "json"
host = "${HUMIO_URL}"
token = "${HUMIO_INGEST_TOKEN}"
By default, Vector sends events to LogScale as json. Vector version
0.9.1 added the option to send logs to LogScale in the raw text format
by setting the encoding.codec
to a
value of text
.
Now, run Vector with the environment variables HUMIO_URL and HUMIO_INGEST_TOKEN set appropriately and enter test messages:
HUMIO_URL=http://localhost:8080 HUMIO_INGEST_TOKEN=KL95YdaSYEWJ1tV9CPEqWGdMi4FVXghD0xxGrDAU3Wg5 vector --config vector.toml
Mar 04 13:40:19.770 INFO vector: Log level "info" is enabled.
Mar 04 13:40:19.770 INFO vector: Loading configs. path=["vector.toml"]
Mar 04 13:40:19.773 INFO vector: Vector is starting. version="0.8.1" git_version="v0.8.1" released="Wed, 04 Mar 2020 15:11:57 +0000" arch="x86_64"
Mar 04 13:40:19.773 INFO vector::topology: Running healthchecks.
Mar 04 13:40:19.773 INFO vector::topology: Starting source "my_stdin_source"
Mar 04 13:40:19.773 INFO vector::topology: Starting sink "my_humio_cluster"
Mar 04 13:40:19.774 INFO source{name=my_stdin_source type=stdin}: vector::sources::stdin: Capturing STDIN
Mar 04 13:40:19.781 INFO vector::topology::builder: Healthcheck: Passed.
Example Message 1
Example Message 2
If everything started properly, search your LogScale repository for the
test messages. The messages in LogScale will have the following
structure. Note that Vector adds
timestamp
and
host
to the messages.
{"@timestamp":1583349673000,"#type":"none","host":"MacBook-Pro.local","#repo":"vector-example","@timezone":"Z","message":"Example Message 2","@rawstring":"{\"host\":\"Daniels-MacBook-Pro.local\",\"message\":\"Example Message 2\"}","@id":"mENFVMQVJyQ2M5pV4D1sFMB9_1_1_1583349673"}
{"@timestamp":1583349669000,"#type":"none","host":"MacBook-Pro.local","#repo":"vector-example","@timezone":"Z","message":"Example Message 1","@rawstring":"{\"host\":\"Daniels-MacBook-Pro.local\",\"message\":\"Example Message 1\"}","@id":"mENFVMQVJyQ2M5pV4D1sFMB9_1_0_1583349669"}
As a next step you should configure Vector to watch some file sources or use one of Vector's many source types to gather data from other parts of your system.
Adding Fields
Vector makes it possible to add fields with static values using its
transforms
capability. In the
example below a field called name
will be added to the event sent to LogScale with the value set to
Name
:
[transforms.sourcename_transform]
type = "add_fields"
inputs = ["sourcename"]
fields.name = "Name"
You'll need to update the inputs
section of your sinks
to point the
transformation that you created in order for the new field to be added
to the event (as illustrated below).
[sinks.humio_out]
type = "humio_logs"
inputs = ["sourcename_transform"]
encoding.codec = "json"
token = "$api-token"
host = "$humio-url"
See Vector's documentation on Adding Fields for more information.
Setting the LogScale Parser
THe LogScale Logs plug-in for vector allows you to specify the parser to use to parse data.
[sinks.my_humio_sink]
type = "humio_logs"
event_type = "my_custom_parser"
Where the value of the field event_type
is the name
of the parser you want to use. LogScale will then automatically use the
value of this field to select the parser once ingested. For more
information see
Vector's
LogScale logs plug-in
Multi-Line Events
By default, Vector creates one event for each line in the in a file. However, you can also split events in different ways. For example, stack traces in many programming languages span multiple lines.
You can specify multiline settings in the Vector configuration. See Vector's multiline configuration documentation
Often a log event starts with a timestamp, and we want to read all lines until we see a new line starting with a timestamp. In Vector that can be done like this:
[sources.source_name.multiline]
# Example: [4/28/20 14:59:25:783 EDT]
start_pattern = "^\\[[0-9]{1,2}/[0-9]{1,2}/[0-9]{2}"
mode = "halt_before"
condition_pattern = "^\\[[0-9]{1,2}/[0-9]{1,2}/[0-9]{2}"
timeout_ms = 1000
The start_pattern should match your timestamp format.
Wildcard or "glob" log paths
Vector supports using a wildcard or "glob" to match log file pathnames which helps when aggregating logs from several hosts.
# Ingest logfiles in a /var/log/%HOSTNAME%/%LOG%.log hierarchy.
[sources.testhostlogs]
type = "file" # required
include = ["/var/log/*.example.com/*.log"]
When using wildcards, keep the following in mind:
Wildcards are re-scanned every 1000ms by default. This is controlled by the Vector glob_minimum_cooldown setting.
If you have a directory as part of the glob path, as shown above, be sure that the
vector
user has both "read" and "execute" permissions on the directories used in the path.
Given the example above, without "read" permission on a directory that
matches /var/log/*.example.com/
, Vector will not be
able to examine the directory contents to find matches for the
*.log
part of the path. Vector
will ignore any directories that it cannot read, so check your file
permissions if you are not seeing the expected log entries in LogScale.
To debug this and other issues, it's helpful to examine Vector's logs with:
journalctl -fu vector
For other suggestions, see Vector Troubleshooting Guide.