Google Cloud Storage Archiving

Security Requirements and Controls

Important

LogScale supports archiving to Google Cloud Storage buckets where object locking is enabled.

Google Cloud Storage Archiving Storage Format and Layout

When uploading a segment file, LogScale creates the Google Cloud Storage object key based on the tags, start date, and repository name of the segment file. The resulting object key makes the archived data browsable through the Google Cloud Storage management console.

LogScale uses the following pattern:

shell
REPOSITORY/TYPE/TAG_KEY_1/TAG_VALUE_1/../TAG_KEY_N/TAG_VALUE_N/YEAR/MONTH/DAY/START_TIME-SEGMENT_ID.gz

Where:

  • REPOSITORY

    Name of the repository

  • type

    Keyword (static) to identfy the format of the enclosed data.

  • TAG_KEY_1

    Name of the tag key (typically the name of parser used to ingest the data, from the #type field)

  • TAG_VALUE

    Value of the corresponding tag key.

  • YEAR

    Year of the timestamp of the events

  • MONTH

    Month of the timestamp of the events

  • DAY

    Day of the timestamp of the events

  • START_TIME

    The start time of the segment, in the format HH-MM-SS

  • SEGMENT_ID

    The unique segment ID of the event data

An example of this layout can be seen in the file list below:

shell
$ s3cmd ls -r s3://logscale2/accesslog/
2023-06-07 08:03         1453  s3://logscale2/accesslog/type/kv/2023/05/02/14-35-52-gy60POKpoe0yYa0zKTAP0o6x.gz
2023-06-07 08:03       373268  s3://logscale2/accesslog/type/kv/humioBackfill/0/2023/03/07/15-09-41-gJ0VFhx2CGlXSYYqSEuBmAx1.gz

For more information about this layout, see Event Tags.

File Format

LogScale supports two formats for storage: native format and NDJSON.

  • Native Format

    The native format is the raw data, i.e. the equivalent of the @rawstring of the ingested data:

    accesslog
    127.0.0.1 - - [07/Mar/2023:15:09:42 +0000] "GET /falcon-logscale/css-images/176f8f5bd5f02b3abfcf894955d7e919.woff2 HTTP/1.1" 200 15736 "http://localhost:81/falcon-logscale/theme.css" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36"
    127.0.0.1 - - [07/Mar/2023:15:09:43 +0000] "GET /falcon-logscale/css-images/alert-octagon.svg HTTP/1.1" 200 416 "http://localhost:81/falcon-logscale/theme.css" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36"
    127.0.0.1 - - [09/Mar/2023:14:16:56 +0000] "GET /theme-home.css HTTP/1.1" 200 70699 "http://localhost:81/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36"
    127.0.0.1 - - [09/Mar/2023:14:16:59 +0000] "GET /css-images/help-circle-white.svg HTTP/1.1" 200 358 "http://localhost:81/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36"
    127.0.0.1 - - [09/Mar/2023:14:16:59 +0000] "GET /css-images/logo-white.svg HTTP/1.1" 200 2275 "http://localhost:81/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36"
  • NDJSON Format

    The default archiving format is NDJSON When using NDJSON, the parsed fields will be available along with the raw log line. This incurs some extra storage cost compared to using raw log lines but gives the benefit of ease of use when processing the logs in an external system.

    json
    {"#type":"kv","#repo":"weblog","#humioBackfill":"0","@source":"/var/log/apache2/access_log","@timestamp.nanos":"0","@rawstring":"127.0.0.1 - - [07/Mar/2023:15:09:42 +0000] \"GET /falcon-logscale/css-images/176f8f5bd5f02b3abfcf894955d7e919.woff2 HTTP/1.1\" 200 15736 \"http://localhost:81/falcon-logscale/theme.css\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36\"","@id":"XPcjXSqXywOthZV25sOB1hqZ_0_1_1678201782","@timestamp":1678201782000,"@ingesttimestamp":"1691483483696","@host":"ML-C02FL14GMD6V","@timezone":"Z"}
    {"#type":"kv","#repo":"weblog","#humioBackfill":"0","@source":"/var/log/apache2/access_log","@timestamp.nanos":"0","@rawstring":"127.0.0.1 - - [07/Mar/2023:15:09:43 +0000] \"GET /falcon-logscale/css-images/alert-octagon.svg HTTP/1.1\" 200 416 \"http://localhost:81/falcon-logscale/theme.css\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36\"","@id":"XPcjXSqXywOthZV25sOB1hqZ_0_3_1678201783","@timestamp":1678201783000,"@ingesttimestamp":"1691483483696","@host":"ML-C02FL14GMD6V","@timezone":"Z"}
    {"#type":"kv","#repo":"weblog","#humioBackfill":"0","@source":"/var/log/apache2/access_log","@timestamp.nanos":"0","@rawstring":"127.0.0.1 - - [09/Mar/2023:14:16:56 +0000] \"GET /theme-home.css HTTP/1.1\" 200 70699 \"http://localhost:81/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36\"","@id":"XPcjXSqXywOthZV25sOB1hqZ_0_15_1678371416","@timestamp":1678371416000,"@ingesttimestamp":"1691483483696","@host":"ML-C02FL14GMD6V","@timezone":"Z"}
    {"#type":"kv","#repo":"weblog","#humioBackfill":"0","@source":"/var/log/apache2/access_log","@timestamp.nanos":"0","@rawstring":"127.0.0.1 - - [09/Mar/2023:14:16:59 +0000] \"GET /css-images/help-circle-white.svg HTTP/1.1\" 200 358 \"http://localhost:81/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36\"","@id":"XPcjXSqXywOthZV25sOB1hqZ_0_22_1678371419","@timestamp":1678371419000,"@ingesttimestamp":"1691483483696","@host":"ML-C02FL14GMD6V","@timezone":"Z"}
    {"#type":"kv","#repo":"weblog","#humioBackfill":"0","@source":"/var/log/apache2/access_log","@timestamp.nanos":"0","@rawstring":"127.0.0.1 - - [09/Mar/2023:14:16:59 +0000] \"GET /css-images/logo-white.svg HTTP/1.1\" 200 2275 \"http://localhost:81/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36\"","@id":"XPcjXSqXywOthZV25sOB1hqZ_0_23_1678371419","@timestamp":1678371419000,"@ingesttimestamp":"1691483483696","@host":"ML-C02FL14GMD6V","@timezone":"Z"}

    A single NDJSON line is just a JSON object, which formatted looks like this:

    json
    {
    "#humioBackfill" : "0",
    "#repo" : "weblog",
    "#type" : "kv",
    "@host" : "ML-C02FL14GMD6V",
    "@id" : "XPcjXSqXywOthZV25sOB1hqZ_0_1_1678201782",
    "@ingesttimestamp" : "1691483483696",
    "@rawstring" : "127.0.0.1 - - [07/Mar/2023:15:09:42 +0000] \"GET /falcon-logscale/css-images/176f8f5bd5f02b3abfcf894955d7e919.woff2 HTTP/1.1\" 200 15736 \"http://localhost:81/falcon-logscale/theme.css\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36\"",
    "@source" : "/var/log/apache2/access_log",
    "@timestamp" : 1678201782000,
    "@timestamp.nanos" : "0",
    "@timezone" : "Z"
    }

Each record includes the full detail for each event, including parsed fields, the original raw event string, and tagged field entries.

How Data is Uploaded to Google Cloud Storage

Data is uploaded to Google Cloud Storage as soon as a segment file has been created during ingest (for more information, see Ingestion: Digest Phase).

Each segment file is sent as as multipart upload, so the upload of a single file may require multiple Google Cloud Storage requests. The exact number of requests will depend on rate of ingest, but expect a rate of one request for each 8MB of ingested data.

Troubleshoot Google Cloud Storage archiving configuration

If you encounter an access denied error message when configuring Google Cloud Storage archiving, check your configuration settings for missing information or typos.

Tag Grouping

If tag grouping is applied for a repository, the archiving logic will upload one segment into one Google Cloud Storage file, even though the tag grouping makes each segment possibly contain multiple unique combinations of tags. The TAG_VALUE part of the Google Cloud Storage file name that corresponds to a tag with tag grouping will not contain any of the specific values for the tag in that segment, but will instead contain an internal value that denotes which tag group the segment belongs to. This is less human readable than splitting out a segment into a number of Google Cloud Storage files corresponding to each unique tag combination in the segment, but avoids the risk of a single segment being split into an unmanageable amount of Google Cloud Storage files.

Google Cloud Storage archived log re-ingestion

You can re-ingest log data that has been written to a Google Cloud Storage bucket through Google Cloud Storage archiving by using Log Collector and the native JSON parsing within LogScale.

This process has the following requirements:

  • The files need to be downloaded from the Google Cloud Storage bucket to the machine running the Log Collector. The Google Cloud Storage files cannot be accessed natively by the Log Collector.

  • The ingested events will be ingested into the repository that is created for the purpose of receiving the data.

To re-ingest logs:

  1. Create a repo in LogScale where the ingested data will be stored. See Creating a Repository or View.

  2. Create an ingest token, and choose the JSON parser. See Assigning Parsers to Ingest Tokens.

  3. Install the Falcon LogScale Collector to read from a file using the .gz extension as the file match. For example, using a configuration similar to this:

    yaml
    #dataDirectory is only required in the case of local configurations and must not be used for remote configurations files.
    dataDirectory: data
    sources:
    bucketdata:
      type: file
      # Glob patterns
      include:
      - /bucketdata/*.gz
      sink: my_humio_instance
      parser: json
      ...

    For more information, see Sources & Examples.

  4. Copy the log file from the Google Cloud Storage bucket into the configured directory (/bucketdata) in the above example.

The Log Collector reads the file that has been copied, sends it to LogScale, where the JSON event data will be parsed and recreated.