Filebeat

Filebeat is a lightweight, open source program that can monitor log files and send data to servers. It has some properties that make it a great tool for sending file data to LogScale.

It uses limited resources, which is important because the Filebeat agent must run on every server where you want to capture data. It's also easy to install and run since Filebeat is written in the Go programming language, and is built into one binary. Finally, it handles network problems gracefully. When Filebeat reads a file, it keeps track of the last point it read. If there is no network connection, then Filebeat waits to retry data transmission. It continues data transmission when the connection is restored.

Check out Filebeat's official documentation for more information. You might also read, the Getting Started Guide.

Beats/Logstash Version LogScale 1.36 and below LogScale 1.37
Logstash 7.16 and up Incompatible Compatible
Filebeat 7 and below Compatible Compatible
Filebeat 8.0.0 Compatible but requires setup.ilm.enabled: false Compatible but requires setup.ilm.enabled: false

Filebeat 8.1.0

Compatible but requires setup.ilm.enabled: false and output.elasticsearch.allow_older_versions: true

Compatible but requires setup.ilm.enabled: false and output.elasticsearch.allow_older_versions: true

Warning

Beats 7.16 and later have compatibility issues with older versions of LogScale

See Troubleshooting: Beats and Logstash Log Shippers 7.13 and higher No Longer Work with LogScale for more information on compatibility issues.

Installation

We reccomend you use the latest version of beats available which you download here Filebeat OSS downloads page.

You can find installation documentation for Filebeat at the Filebeat Installation page. Remember to replace the download URL for Filebeat with the URL for the open source version of Filebeat.

Warning

The Elastic non-OSS version of Filebeat does not work with LogScale. Ensure you download the OSS version.

Configuration

LogScale supports parts of the ElasticSearch bulk ingest API. This API is served both as a sub-path of the standard LogScale API and on its own port (defaulting to 9200). Data can be sent to LogScale by configuring Filebeat to use the built-in Elastic Search output.

You can find configuration documentation for Filebeat at the Filebeat configuration page.

Editing the Configuration

You must make the following changes to the configuration, see Configuration.

  1. Insert a path section for each log file you want to monitor. It is possible to insert a input configuration (with paths and fields) for each file that Filebeat should monitor.

  2. Add other fields in the fields section. These fields, and their values, will be added to each event.

  3. Insert the URL of your humio installation and its port in the ElasticSearch output to match your configuration. For example, https://$YOUR_LOGSCALE_URL:443 where $YOUR_LOGSCALE_URL is the URL for your LogScale installation.

  4. Insert an Ingest Tokens from the repository as the password.

  5. Set the username to a value as required — it will be logged in the access log of any proxy on the path so using the hostname of the sender is a good option.

  6. Specify the text encoding to use when reading files using the encoding field. If the log files use special, non-ASCII characters, then set the encoding here. For example, utf-8 or latin1.

  7. If all your events are fairly small, you can increase bulk_max_size from the default of 200 to 300. The default of 200 is fine for most use cases. The LogScale server does not limit the size of the ingest request. But keep bulk_max_size low, as you may get the requests timed out if they get too large. In case of timeouts, Filebeat will back off, thus getting worse performance then with a lower bulk_max_size.

  8. You may want to increase the number of worker instances worker from the default of 1 to (say) 5 or 10 to achieve more throughput if Filebeat is not able to keep up with the inputs. To get higher throughput, also increase queue.mem.events to 32000, for example, to allow buffering for more workers.

  9. Restart Filebeat by running:

logscale
sudo systemctl restart filebeat

An important next step is choosing a Parsing Data for your Filebeat events.

Configuration Example

The following example is a simple 8.1.0 Filebeat configuration that sends data to LogScale, assuming the standard LogScale API is hosted on port 8080 and the elasticsearch API is available on port 9200:

yaml
filebeat.inputs:
  - paths:
      - /var/log/apache/*.log
    encoding: utf-8
    fields:
      aField: value

queue.mem:
  events: 8000
  flush.min_events: 1000
  flush.timeout: 1s

output:
  elasticsearch:
    hosts: ["https://cloud.humio.com:8080/api/v1/ingest/elastic-bulk"]
    username: anything
    password: 750y0940-ec68-4889-9e3a-e7e78d5536er
    compression_level: 5
    bulk_max_size: 200
    worker: 5

Important

For Filebeat 8.0 and higher you must add the following configuration to ensure compatibility with LogScale:

yaml
setup.ilm.enabled: false

Important

For Filebeat 8.1 and higher, the following line must be added to the configuration:

yaml
output.elasticsearch.allow_older_versions: true

The Filebeat configuration file is located at /etc/filebeat/filebeat.yml on Linux.

Configuration Objects

The section only aims to document the set of keys and value required to ship data to LogScale and therefore not all of the configuration options which are available in Filebeat are listed.

paths

The source block configures the sources of data that will be sent to LogScale.

  • encoding

    Specify the text encoding to use when reading files using the encoding field. If the log files use special, non-ASCII characters, then set the encoding here. For example, utf-8 or latin1.

  • fields

    Specify a field and value to add to each event.

queue.mem
  • events

    The amount of events to store in the buffer.

  • flush.min

    The minimum of amounts to send to LogScale when flushing the pipeline.

  • flush.timeout

    The maximum amount of time to allow for performing flush.

output

This object specifies all the configuration related to the output of the log shipper and where the data is sent.

  • elasticsearch

    This object contains all the configurations related to where the data is being shipped.

  • hosts

    The url of your LogScale account and port. Using the standard LogScale API (preferred) $YOUR_LOGSCALE_URL:8080/api/v1/ingest/elastic-bulk or using the elasticsearch port $YOUR_LOGSCALE_URL:9200.

  • username

    This value is not used by LogScale but will be logged by the proxy.

  • password

    Specify the ingest token of your LogScale repository.

  • compression_level

    The level of compression to apply the events.

  • bulk_max_size

    The default value of 200 is fine for most LogScale use cases but increasing the size to 300 is good pratice if you are ony managing small events, as timeouts may occur if the events are too large.

  • worker

    Specifies the amount of worker instances to increase processing speeds if filebeat cannot manage the quantity of inputs. If you increase this value you should also increase queue.mem.events to allow buffering for more workers.

Enabling Debug Logging

Add the following to your filebeat.yml configuration file to enable debug logging:

yaml
logging:
  level: debug
  to_files: true
  to_syslog: false
  files:
    path: /var/log/filebeat
    name: filebeat.log
    keepfiles: 3

Warning

If you're using Filebeat with systemd, more recent versions execute Filebeat with the -e flag by default. This will cause Filebeat to ignore many of these logging options. Notably, it will log to /var/log/messages regardless of what you've specified here. To fix this, you should remove Environment="BEAT_LOG_OPTS=-e" from Filebeats' systemd unit file. See this GitHub issue for more details.

Running Filebeat

Run Filebeat as a service on Linux with the following commands.

shell
sudo systemctl enable filebeat
sudo systemctl restart filebeat

On linux Filebeat is often placed at /usr/share/filebeat/bin/filebeat. To test it can be run like:

shell
/usr/share/filebeat/bin/filebeat -c /etc/filebeat/filebeat.yml

Parsing Data

LogScale uses parsers to parse the data from Filebeat into events. Parsers can extract fields from the input data thereby adding structure to the log events. For more information on parsers, see Parsing Data.

Take a look at LogScale's Built-in Parsers.

The recommended way of choosing a parser is by assigning a specific parser to the Assigning Parsers to Ingest Tokens used to authenticate the client. This allows you to change parsers in LogScale without changing the client. Alternatively you can specify the parser/type for each monitored file using the type field in the fields section in the Filebeat configuration.

yaml
filebeat.inputs:
  - paths:
      - $PATH_TO_LOG_FILE
    encoding: utf-8
    fields:
      "type": $TYPE

If no parser is specified LogScale's built in key value parser (kv) will be used. The key value parser expects the incoming string to start with a timestamp formatted in ISO 8601. It will also look for key value pairs in the string on the form a=b.

Parsing JSON Data

We do not recommend that you use the JSON parsing built into Filebeat. Instead, LogScale has it's own JSON support. Filebeat processes logs line by line, so JSON parsing will only work if there is one JSON object per line. By using LogScale's json you can get JSON fields extracted during ingest. You can also Example: Parsing JSON to get more control over the fields that are created.

Adding Fields

It's possible to add fields with static values using the fields section. These fields will be added to each event.

Filebeat automatically sends the host (beat.hostname) and filename (source) along with the data. LogScale adds these fields to each event. The fields are added as @host and @source in order to not collide with other fields in the event.

To avoid having the @host and @source fields, specify @host and @source in the fields section with an empty value.

Tags

LogScale saves data in Data Sources. You can provide a set of Tags to specify which Data Source the data is saved in. See Event Tags for more information about tags and Data Sources.

If a type is configured in Filebeat it's always used as tag. Other fields can be used as tags by defining the fields as tagFields in the Parsing Data pointed to by the type. In LogScale tags always start with a #. When turning a field into a tag, the name of the field will be prepended with #.

Keeping All Fields Added by Filebeat Agent

By default, the Filebeat handling in LogScale keeps only a subset of the fields shipped by Filebeat since the default handling targets just getting the message from the input files into LogScale as @rawstring, not all the extra fields that Filebeat may add. If you want to get the full set of fields, for instance if you are using Processors in the Filebeat configuration, then turn off the default handling by adding these lines to you Filebeat configuration:

yaml
# Skip default Filebeat field handling in LogScale by
# not including the word ``filebeat`` in the index name.
# The parser then gets all fields added by Filebeat.

setup.template.name: "beat"
setup.template.pattern: "beat"
output.elasticsearch.index: "beat"
Multi-Line Events

By default, Filebeat creates one event for each line in the in a file. However, you can also split events in different ways. For example, stack traces in many programming languages span multiple lines.

You can specify multiline settings in the Filebeat configuration. See Filebeat's multiline configuration documentation.

Often a log event starts with a timestamp, and we want to read all lines until we see a new line starting with a timestamp. In Filebeat that can be done like this:

yaml
multiline.pattern: "^[0-9]{4}-[0-9]{2}-[0-9]{2}"
multiline.negate: true
multiline.match: after

The multiline.pattern should match your timestamp format.

Below is an example of all of this. The $YOUR_LOGSCALE_URL variable is the URL for your installation.

yaml
filebeat:
  inputs:
    - paths:
        - /var/log/nginx/access.log
      multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
      multiline.negate: true
      multiline.match: after
      fields:
        aField: value
    - paths:
        - humio_std_out.log
      fields:
        service: humio
      multiline:
        pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
        negate: true
       match: after

queue.mem:
  events: 8000
  flush.min_events: 1000
  flush.timeout: 1s

output:
  elasticsearch:
    hosts: ["https://$YOUR_LOGSCALE_URL:8080/api/v1/ingest/elastic-bulk"]
    username: from-me
    password: "some-ingest-token"
    compression_level: 5
    bulk_max_size: 200
    worker: 1

logging:
  level: info
  to_files: true
  to_syslog: false
  files:
    path: /var/log/filebeat
    name: filebeat.log
    keepfiles: 3