Sinks (sinks)

The documentation covers the configuration of sinks in the Falcon LogScale Collector, which determine where collected log data will be sent, including detailed parameters for compression, batch sizes, proxy settings, and TLS configurations. The documentation provides comprehensive tables outlining required and optional parameters for sink configuration, with special attention to proxy server usage and limitations such as the restriction of one sink per syslog data source.

The sinks block configures the sinks (where the data will be sent) that are used by the source or sources.

yaml
# Define the sink (destination) for the logs
sinks:
  example_sink:
    type: logscale
    url: "https://cloud.humio.com/"  # Replace with your LogScale instance URL
    token: "${INGEST_TOKEN}"  # Use environment variable for the ingest token

    # Optional: Override the maximum batch size (in bytes)
    # maxBatchSize: 8388608  # 8 MB
    # Default is 16 MB. It should not be increased, but may be lowered if needed.

    # Optional: Override the maximum event size (in bytes)
    # maxEventSize: 1048576  # 1 MB
    # Default is 1 MB, which is the default maximum that LogScale supports.

    # Optional: Specify compression method
    # compression: "gzip"  # Options: "auto", "none", "gzip", "deflate"
    # Default is "auto", which attempts "gzip" but falls back to "none" if unsupported.

    # Optional: Specify compression level (0-9, where 0 is no compression and 9 is best compression)
    # compressionLevel: 1

    # Optional: Specify a proxy
    # proxy: "http://proxy.example.com:8080"
    # Default is "auto", which uses the system proxy if possible, or falls back to "none".
    # You can also use "system" to force system proxy use, or "none" to disable proxy use.

    # Optional: Configure TLS options
    # tls:
    #   insecure: false  # Set to true to disable certificate validation (not recommended)
    #   # Note: caCert and caFile are mutually exclusive. Use only one of them.
    #   caCert: "-----BEGIN CERTIFICATE-----\n...\n-----END CERTIFICATE-----"
    #   # caCert is for providing an inline PEM encoded CA certificate
    #   # caFile: "/path/to/ca/cert.pem"
    #   # caFile is for providing a path to a file containing a PEM encoded CA certificate

    # Optional: Number of worker threads for sending data
    # workers: 4

    # Configure the queue for buffering events
    queue:
      # Memory queue configuration
      type: memory
      maxLimitInMB: 1024  # Maximum queue size in MB
      # Note: The queue size can be lowered if needed, but it should not be necessary to increase it.
      # Optional: Action to take when queue is full
      # fullAction: "pause"  # Options: "pause" or "deleteOldest"
      # Optional: Flush timeout in milliseconds
      # flushTimeOutInMillisecond: 1000

    # Alternate disk queue configuration (uncomment to use)
    # queue:
    #   type: disk
    #   maxLimitInMB: 4096  # Maximum queue size in MB
    #   # Note: Changing the disk queue size requires a rewrite of the queue storage file.
    #   # Optional: Action to take when queue is full
    #   # fullAction: "pause"  # Options: "pause" or "deleteOldest"
    #   # Optional: Storage directory
    #   # storageDir: "/path/to/queue/storage"
    #   # Default storage directory is under the dataDirectory of the program.
  my_examplesink1:
    type: logscale
    url: "https://cloud.humio.com/"  # Replace with your LogScale instance URL
    token: "${INGEST_TOKEN}"  # Use environment variable for the ingest token
# Define the sources for data collection
sources:
  # Add your sources here. Examples include:
  # - file
  # - syslog
  # - wineventlog
  # - journald
  # - cmd
  # - unifiedlog

The MySinksName is a top level element which contains each of your sink configurations.

Table: Sinks

ParameterTypeRequiredDefault ValueDescription
My Sink Name/sstringoptional[a]   The user defined name for each sink configuration. This name will be referenced in your sources.

[a] Optional parameters use their default value unless explicitly set.


The elements listed in this table define how each sink is configured.

Table: MySinkName

ParameterTypeRequiredDefault ValueDescription
compressionstringoptional[a] auto Specify the type of data compression, possible values: auto, none, gzip, deflate.
compressionLevelintegeroptional[a] false Specify the level of compression where 1 is best speed and 9 is best compression, if set to undefined or 0 the default value for the compression algorithm, specified in compression, is applied.
maxBatchSizeintegeroptional[a] 16MB Specifies the max size of batch (default 16MB) and works along with the maximum events per request. The limits are also propagated to all the sources that reference the sink.
maxEventSizeintegeroptional[a] 1MB This sets the maximum allowed single event size to 1 MB; larger messages will be truncated. Syslog and syslog TLS logs sources only, if maxEventSize is also defined at source level the lower of the two values will be applied.
proxystringoptional[a] system Must be set to none for Windows Server.Otherwise this can be used to specify an override proxy configuration for the sink, possible values: system, none or a URL such as: http://127.0.0.1:3129 for a http proxy. The default is system, which will try to determine the appropriate proxy or fallback to none.
tlsstringoptional[a]   This object contains details on the PEM certificates. this section allows you to override the defaults. Only one of the following options should be specified:
   Values
   caCertSpecify this key to load a certificate from the config file.
   caFileSpecify this key to load the PEM certificate from an external file.
   insecureSpecify if certificate validation is needed, if set to true the certificate validation is skipped.
tokenstringoptional[a]   Specify the Ingest Tokens for your repository or an environment variable.
typestringoptional[a]   Specify the type of sink. hec can only be used for NG-SIEM.
   Values
   hecNG-SIEM
   humio
   logscale
   loopbackAvailable from 1.10.0 and Can only be used with Syslog with Multi-Destinations Sinks
urlstringoptional[a]   Specify the url of your LogScale account for example https://cloud.humio.com.
workersstringoptional[a] 4 Specifies how many workers to use to send to LogScale, under normal circumstances leave this at the default setting, for more information see Sink Workers

[a] Optional parameters use their default value unless explicitly set.


Using a Proxy Server

The Falcon LogScale Collector supports using a forward proxy server when sending logs using the sink. In some environments, where direct access to LogScale is prohibited, it may be necessary to configure the proxy server manually. The collector attempts to detect the system's proxy automatically. If the collector should use a different proxy than the system's, or instead connect directly, it must be specified in the sink configuration. The proxy option accepts the following keywords: auto, system, and none, but it also accepts a URL specifying the proxy server to use.

Multi-Destinations Sinks

Multi-destination routing allows one syslog source to feed multiple destinations, enabling content-based filtering to route logs based on specific criteria, destination-specific transformations to process data differently for each endpoint, and team-specific views to provide customized access to the same underlying data.

The multi-destination uses a loopback architecture that consists of these key components:

  1. Syslog Source: Receives syslog messages over UDP or TCP

  2. Loopback Sink: Acts as an intermediary distribution point that stores events in memory

  3. Internal Sources: Connect to the loopback sink and forward events to different destinations

  4. Transforms: Optional processing rules applied to each routing path

  5. Destination Sinks: Final endpoints where logs are delivered (LogScale, NGSIEM, etc.)

Delivery and Backpressure in Multi-Destination Systems

When routing data to multiple destinations that process at different rates, effective backpressure management becomes critical. In multi-destination architectures, each sink may consume data at varying speeds, creating potential bottlenecks.

Primary vs. Secondary Forwarders

Our system implements a priority-based approach to handle these scenarios:

  • Primary Forwarders (default): Can pause the data source when their queue fills up, preventing data loss but potentially slowing overall throughput.

  • Secondary Forwarders: When configured as secondary and their queue fills up, they will not pause the source. This allows continuous data flow to primary destinations at the cost of potential data loss at the secondary sink.

Queue Management - Multi-destination

Each destination has a configurable queue that buffers incoming data. When a destination processes data slower than it arrives the the data is:

  1. Data accumulates in the destination's queue

  2. Once full, the system's response depends on the forwarder's priority setting

  3. Primary forwarders signal backpressure to the source

  4. Secondary forwarders drop excess data to maintain system throughput

This configurable priority system allows you to tailor the behavior to your specific requirements, balancing between data completeness and system performance.

How to Use Multi Destination Sinks

To set multiple destinations for a syslog source configure in source a Internal source and in sinks a distributor with a sink for each destination.

The fields used by the loopback feature need to be configured in both the source and sinks:

  • In the Source define one or more sources of type internal for each split of the data.

  • In the Sinks define a sink of type loopback and the final sinks for the redistributed data.

sources:
  syslog_input:
    type: syslog
    port: 514
    mode: udp  # or tcp
    sink: distributor  # points to a loopback sink
  
  route_to_destination1:
    type: internal
    from: distributor  # references the loopback sink name
    sink: destination1  # points to the first destination sink
    transforms:
      - type: regex_filter  # optional transformation
        pattern: "^ERROR"
        mode: include
  
  route_to_destination2:
    type: internal
    from: distributor
    sink: destination2
    transforms:
      - type: regex_filter
        pattern: "^INFO"
        mode: include

sinks:
  distributor:
    type: loopback  # this is the distribution point
    
  destination1:
    type: logscale  # or any other sink type
    token: your-token-1
    url: https://cloud.logscale.com
    
  destination2:
    type: logscale
    token: your-token-2
    url: https://cloud.logscale.com

Table: Multi-Destination Source

ParameterTypeRequiredDefault ValueDescription
fromstringoptional[a]   Specifies the internal source being used to route the data to a sink, the name of the distributor defined in sinks.
sinkstringoptional[a]   The sink for the redistributed data.
trasformsstringoptional[a]   The transforms to be applied to the data before it is sent to the sink. See How to Use Transforms for more information.

[a] Optional parameters use their default value unless explicitly set.