Configuration File Examples

The following sections detail the specific configurations for each sources type along with example configuration files. Additionally, you can find a description of the fields below each example.

File File Source

The file source allows you to ship logs from file sources using glob patterns and it also allows gzip and bzip2 compressed formats. When type is set to file the following configurations apply:

Table: File Source

Parameter	Type	Required	Default Value	Description
`exclude`	string	optional^[a]		Specify the file paths to exclude when collecting data. This field supports environment variable expansions. To use an environment variable, reference it using the syntax `${VAR}`, where VAR is the name of the variable. The {}-braces may be omitted, however in that case the variable name can only contain: [a-z], [A-Z], [0-9] and "_".
`excludeExtensions`	string	optional^[a]		Specify the file extensions to exclude when collecting data. Some file extensions are automatically ignored even if they match an included pattern: `xz`, `tgz`, `z`, `zip`, `7z`. To include all formats set `excludeExtensions` to an empty array. This will have the effect that files will not be decompressed before ingest.
`inactivityTimeout`	integer	optional^[a]	`60`	Specify the period of inactivity in seconds for a file being monitored before the file descriptor is closed to release system resource. Whenever the file changes, it is re-opened and the timeout restarted.
`include`	string	optional^[a]		Specify the file paths to include when collecting data. This field supports environment variable expansions. To use an environment variable, reference it using the syntax `${VAR}`, where VAR is the name of the variable. The {}-braces may be omitted, however in that case the variable name can only contain: [a-z], [A-Z], [0-9] and "_".
`multiLineBeginsWith`	regex	optional^[a]		The file input can join consecutive lines together to create multiline events by using a regular expression. It can be configured to use a pattern to look for the beginning or the continuation of multiline events. Example all multiline events beginning with a date, for example. `2022` you would use: yaml `multiLineBeginsWith: ^20\d{2}-` in this case every line that doesn't match the pattern, gets appended to the latest line that did.
`multiLineContinuesWith`	regex	optional^[a]		The file input can join consecutive lines together to create multiline events by using a regular expression. It can be configured to use a pattern to look for the beginning or the continuation of multiline events. Lines that start with whitespace are continuations of the previous line. For example, to concatenate lines indented by whitespace (instead of starting at column 0): yaml `multiLineContinuesWith: ^\s+` In this case every line that matches the pattern, gets appended to the latest line that didn't.
`parser`	string	optional^[a]		Specify the parser within LogScale to use to parse the logs, if you install the parser through a package you must specify the type and name as displayed on the parsers page for example linux/system-logs:linux-filebeat. If a parser is assigned to the ingest token being used this parser will be ignored.
`sink`	regex	optional^[a]		Name of the configured sink that should be sent the collected events
`transforms`	string	optional^[a]		for more information, see MySourceName.
^[a]Optional parameters use their default value unless explicitly set.

See Configuration Elements for information on the common elements in the configuration, for example sinks, and their configuration parameters and details on the structure of the configuration files.

File Rotation Support

The Falcon LogScale Collector strives to support all kinds of file rotation.

The Collector fingerprints files larger than 256 bytes and increases the fingerprint block size up to 4096 bytes, as applicable.
The Collector supports rotation using the following methods:
- rename
- compression
- truncation
Where rename and compression files are detected as duplicates. Compressed files are considered static. Renamed files keep their fingerprints and further updates are supported. When files are truncated, the read offset is set to the new size, which may be 0 or non-zero. In the situation where the file is truncated followed by a quick update, the read offset depends on the time between the write and the processing of the event.

Reading Compressed Files

The Falcon LogScale Collector supports reading gzip and bzip2 compressed files.

If gzip or bzip2 compressed files are matched by the configured include patterns, these will be auto detected as gzip/bzip2 files (using the magic number at the beginning of the file), decompressed and ingested.

By default files with the following extensions will be ignored/skipped even if they match a configured include pattern:

.xz
.tgz
.z
.zip
.7z

File extensions to ignore/skip can be configured with the excludeExtensions config option. The default is:

yaml

excludeExtensions: ["xz", "tgz", "z", "zip", "7z"]

If excludeExtensions is set to an empty array, it is possible to override the default setting. These files will not be decompressed before ingest. For example:

yaml

excludeExtensions: []

Effectively sends files in the compressed format.

If it for some reason is desired to exclude gzip and bzip files in addition to the other excluded file extensions, the following option can be used (provided the compressed files are named *.gz, *.bz2):

yaml

excludeExtensions: ["xz", "tgz", "z", "zip", "7z", "gz", "bz2"]

Syslog

yaml

## This is YAML, so structure and indentation is important.
## Lines can be uncommented by removing the #. You should not need to change the number of spaces after that.
## Configuration options have a single #, comments have a ##. Only uncomment the single # lines if you need them.
#####
# Define the sink (destination) for the logs
sinks:
  logscale_sink:
    type: logscale  # Using LogScale as the destination
    url: "https://cloud.humio.com/"  # Replace with your LogScale instance URL
    token: "${LOGSCALE_TOKEN}"  # Use environment variable for the ingest token
    # Configure the queue for buffering events
    queue:
      # It is recommended to use a disk queue to persist syslog messages,
      # ensuring data integrity during network issues or system restarts.
      type: disk  # Use a disk-based queue for persistence
      maxLimitInMB: 10240  # Set the queue size to 10 GB (10 * 1024 MB)
      # A large disk queue is used to ensure data persistence and handle
      # high volumes of incoming syslog data, providing a robust buffer
      # against network issues or temporary outages.

      # fullAction: deleteOldest
      # Uncomment the line above to delete the oldest events when the queue is full.
      # This can be useful in high-volume environments where it's preferable to
      # lose some old data rather than pause ingestion of new data. However, use
      # this option with caution as it can result in data loss.

# Define the sources for syslog data
sources:
  syslog_udp:
    type: syslog
    mode: udp  # UDP syslog
    port: 514  # Standard syslog port
    sink: logscale_sink

    # Optional: Bind to a specific address
    # bind: "0.0.0.0"

    # Optional: Set the maximum event size (in bytes)
    # maxEventSize: 1048576  # 1 MB
    # The default maxEventSize is 2048 bytes. Increase this value if you expect
    # larger syslog messages. Be cautious when increasing this value, as it
    # affects memory usage and network bandwidth.

    # Optional: Set the number of worker threads (Linux only)
    # workers: 4
    # The 'workers' option controls the number of threads used to read syslog messages.
    # By default, it uses the number of CPU cores available on the system.
    # Adjust this value based on your system's capabilities and the expected message volume.

    # Optional: Configure the parser to be used in LogScale
    # parser: "syslog_rfc5424"

    # Optional: Add static fields
    # transforms:
    #   - type: static_fields
    #     fields:
    #       source_type: "syslog_udp"
    #       environment: "${ENV}"

  syslog_tcp:
    type: syslog
    mode: tcp  # TCP syslog
    port: 1514  # Using a different port for TCP
    sink: logscale_sink

    # Optional: Bind to a specific address
    # bind: "0.0.0.0"

    # Optional: Set the maximum event size (in bytes)
    # maxEventSize: 1048576  # 1 MB
    # The default maxEventSize is 2048 bytes. Increase this value if you expect
    # larger syslog messages. Be cautious when increasing this value, as it
    # affects memory usage and network bandwidth.

    # Optional: Enable strict parsing for TCP
    # strict: true
    # When strict parsing is enabled, the connection will be closed if an
    # invalid message is encountered. This helps maintain data integrity
    # but may result in lost messages if the client doesn't handle reconnection properly.

    # Optional: Support RFC6587 octet counting
    # supportsOctetCounting: true

    # Optional: Configure the parser to be used in LogScale
    # parser: "syslog_rfc5424"

    # Optional: Add static fields
    # transforms:
    #   - type: static_fields
    #     fields:
    #       source_type: "syslog_tcp"
    #       environment: "${ENV}"

Syslog Source

If type is set to syslog you must specify the port and mode fields.

Table: Syslog Source

Parameter	Type	Required	Default Value	Description
`bind`	string	optional^[a]	`all addresses`	Specify the address to bind to.
`maxEventSize`	number	optional^[a]		Maximum allowed syslog event size; syslog events larger than this will be truncated. If maxEventSize is also defined at sinks level the lower of the two values will be applied. Set this to the max value to avoid truncation issues.
		Maximum	`8388608`
`mode`	string	optional^[a]		Specify the protocol to listen to, which can be tcp or udp.
`port`	integer	optional^[a]	`514`	Specify the number of the port on which to listen.
`receiveBufferSize`	integer	optional^[a]	`64 times maxEventSize`	The receiveBufferSize is the size of the read buffer used to copy the received messages into the applications memory. This read buffer has to be able to contain at least one message (the largest). If it is too small, the message gets truncated. If the read buffer is large, it will be able to read several messages at once.
`sink`	string	optional^[a]		Name of the configured sink that should be sent the collected events.
`strict`	boolean	optional^[a]		Enable strict parsing of events. If an invalid event is encountered the connection will be closed. Only relevant when mode is tcp.
`supportsOctetCounting`	boolean	optional^[a]		Enable handling of octet counting framing as per RFC6587. Only relevant when mode is tcp.
`workers`	string	optional^[a]	`multiple workers`	UDP only. Specifies how many workers to use, you can set to 1 keep the 1.5 behavior, or to a value to override auto scale to CPU cores.
^[a]Optional parameters use their default value unless explicitly set.

See Configuration Elements for information on the common elements in the configuration, for example sinks, and their configuration parameters and details on the structure of the configuration files.

Journal

yaml

sources:
  journal:
    # Example for reading journald log data (linux only)
    type: journald
    sink: my_humio
    # Optional. If not specified collect from the local journal
    directory: /var/log/journal
    # If specified only collect from these units
    includeUnits:
      - systemd-modules-load.service
    # If specified collect from all units except these
    excludeUnits:
      - systemd-modules-load.service
    # Default: false. If true only collect logs from the current boot
    currentBootOnly: false
sinks:
  my_humio:
    type: humio
    token: &lt;ingest-token-repo2&gt; or an environment variable
    url: https://cloud.us.humio.com
    compression: gzip
    compressionLevel: 9
    tls:
      insecure: false

        -----BEGIN CERTIFICATE-----
        ...
        -----END CERTIFICATE-----
      caFile: /etc/ssl/cert.pem

    proxy: none

    queue:
      fullAction: deleteOldest
      memory:
        flushTimeOutInMillisecond: 200
        maxLimitInMB: 1024

Journal Source

type is set to Journald in order to read JournalD log data (linux only) you must specify the following fields:

Table: Journal Source

Parameter	Type	Required	Default Value	Description
`currentBootOnly`	string	optional^[a]	`false`	If true only collect logs from the current boot.
`directory`	string	optional^[a]		Allows you to specify the journal directory to collect from, if not specified collects from the local journal.
`excludeUnits`	string	optional^[a]		If specified the collector will not collect from these units.
`includeUnits`	string	optional^[a]		If specified the Collector will only collect from these units.
`sink`	string	required		Name of the sink, which you configured in sinks, that should be sent the collected events.
^[a]Optional parameters use their default value unless explicitly set.

See Configuration Elements for information on the common elements in the configuration, for example sinks, and their configuration parameters and details on the structure of the configuration files.

Exec Example

yaml

sources:
  cmd_ls:
    type: cmd
    cmd: ls
    # scheduled or streaming
    mode: scheduled
    args:
      - -l
      - -h
    workingDir: /foo
    # Interval between each invocation of the cmd
    interval: 60

    # Output mode when using mode 'scheduled'. Either 'streaming' (default) or 'consolidateOutput'.
    # When outputMode is set to 'consolidateOutput', the entire output of the scheduled command is sent as a single event.
    # outputMode: consolidateOutput

    # Environment variables can be configured and passed to the command
    environment:
      # define CONFIGURED_ENV1 as environment variable
      CONFIGURED_ENV1: my_configured_env_1
      # Pass environment variable: MY_ENV_VAR to command
      MY_ENV_VAR: $MY_ENV_VAR
    sink: my_humio

  cmd_tail:
    type: cmd
    cmd: tail
    mode: streaming
    args:
      - -F
    workingDir: /foo
    sink: my_humio

sinks:
  my_humio:
    type: humio
    token: <ingest-token-repo2> or an environment variable
    url: https://cloud.us.humio.com
    compression: gzip
    compressionLevel: 9
    tls:
      insecure: false

        -----BEGIN CERTIFICATE-----
        ...
        -----END CERTIFICATE-----
      caFile: /etc/ssl/cert.pem

    proxy: none

    queue:
      fullAction: deleteOldest
      memory:
        flushTimeOutInMillisecond: 200
        maxLimitInMB: 1024
# Example of exec source executing powershell to pass the script.
 powershell_monitor:
    type: cmd
    # Using PowerShell with -Command parameter to execute the script
    cmd: powershell
    mode: scheduled
    interval: 300  # Run every 5 minutes
    args:
      - -NoProfile
      - -NonInteractive
      - -Command
      - |
        # Multi-line PowerShell script
        $computerInfo = Get-ComputerInfo
        $processes = Get-Process | Select-Object -First 5
        $memory = Get-CimInstance Win32_OperatingSystem | Select-Object FreePhysicalMemory,TotalVisibleMemorySize

        # Create custom object with collected data
        $result = @{
            'Hostname' = $computerInfo.CsName
            'OS_Version' = $computerInfo.WindowsVersion
            'Top_Processes' = ($processes | ForEach-Object { $_.ProcessName }) -join ','
            'Free_Memory_GB' = [math]::Round($memory.FreePhysicalMemory/1MB, 2)
            'Total_Memory_GB' = [math]::Round($memory.TotalVisibleMemorySize/1MB, 2)
            'Timestamp' = Get-Date -Format "yyyy-MM-dd HH:mm:ss"
        }

        # Output as JSON
        ConvertTo-Json -InputObject $result
    sink: my_humio

Exec Source

If type is set to cmd you must specify the fields:

Table: Exec Source

Parameter	Type	Required	Description
`args`	string	optional^[a]	The arguments of the command.
`environment`	string	optional^[a]	Specify the Environment variables and pass them to commands using this section.
`interval`	string	required	Specifies how frequently the command should be invoked when set to `scheduled`. Specified in seconds.
`mode`	string	optional^[a]	Can be set to `scheduled` to collect data at intervals in which case you must specify the `interval` or `streaming` to collect data constantly. To create a single multiline event when running in the schedule mode set the option `consolidateOutput` to true.
`sink`	string	optional^[a]	Name of the sink, which you configured in sinks, that should be sent the collected events.
`workingDir`	string	required	Specifies the directory in which to run the command.
^[a]Optional parameters use their default value unless explicitly set.

See Configuration Elements for information on the common elements in the configuration, for example sinks, and their configuration parameters and details on the structure of the configuration files.

Linux Example

yaml

sources:
  # Collect local files.
  var_log:
  type: file
  include: /var/log/*
  exclude: /var/log/*.gz
  sink: humio

  # Collect syslog udp 5140.
  syslog_udp_5140:
  type: syslog
  mode: udp
  port: 5140
  sink: humio
  workers: 1

  # Collect syslog tcp 5140.
  syslog_tcp_5140:
  type: syslog
  mode: tcp
  port: 5140
  sink: humio

sinks:
  humio:
    type: humio
      # Replace with your specified ingest token.
    token: $INGEST_TOKEN
      # Replace with your "standard endpoint" API URL: https://library.humio.com/endpoints/
    url: $HUMIO_URL

File Linux Source

This configuration example which uses the file source with specific values for collecting var logs.

See Configuration Elements for information on the common elements in the configuration, for example sinks, and their configuration parameters and details on the structure of the configuration files.

Configuration File Examples

File Rotation Support

Reading Compressed Files

Other articles on this topic

Similar Content

Related KB Articles

Training

Enter search term