
S3 Archiving
Security Requirements and Controls
Change S3 archiving settings
permission
LogScale supports archiving ingested logs to Amazon S3. The archived logs are then available for further processing in any external system that integrates with S3. The files written by LogScale in this format are not searchable by LogScale — this is an export meant for other systems to consume.
For information about using S3 as storage for segments in a format that LogScale can read, see Bucket Storage.
When S3 Archiving is enabled all the events in repository are backfilled into S3 and then it archives new events by running a periodic job inside all LogScale nodes, which looks for new, unarchived segment files. The segment files are read from disk, streamed to an S3 bucket, and marked as archived in LogScale.
An administrator must set up archiving per repository. After selecting a
repository on LogScale, the configuration page is available under
Settings
.
Note
For slow-moving datasources it can take some time before segment files are completed on disk and then made available for the archiving job. In the worst case, before a segment file is completed, it must contain a gigabyte of uncompressed data or 30 minutes must have passed. The exact thresholds are those configured as the limits on mini segments.
Important
S3 archiving is not supported for S3 buckets where object locking is enabled.
For more information on segments files and datasources, see segment files and Datasources.
S3 Archiving Storage Format and Layout
When uploading a segment file, LogScale creates the S3 object key based on the tags, start date, and repository name of the segment file. The resulting object key makes the archived data browsable through the S3 management console.
File Format
LogScale supports two formats for storage: native format and NDJSON.
How Data is Uploaded to S3
Data is uploaded to S3 as soon as a segment file has been created during ingest (for more information, see Ingestion: Digest Phase).
Each segment file is sent as as multipart upload, so the upload of a single file may require multiple S3 requests. The exact number of requests will depend on rate of ingest, but expect a rate of one request for each 8MB of ingested data.
The size of each part of the upload is configured using the
S3_STORAGE_CHUNK_SIZE
configuration variable.
S3 Storage Configuration
For a self-hosted installation of LogScale, you need an IAM user with write access to the buckets used for archiving. That user must have programmatic access to S3, so when adding a new user through the AWS console make sure programmatic access is checked:
![]() |
Figure 81. Setup
Later in the process, you can retrieve the access key and secret key:
![]() |
Figure 82. Setup Key
This is needed in LogScale in the following configuration:
S3_ARCHIVING_ACCESSKEY=$ACCESS_KEY
S3_ARCHIVING_SECRETKEY=$SECRET_KEY
The keys are used for authenticating the user against the S3 service. For more guidance on how to retrieve S3 access keys, see AWS access keys. For more details on creating a new user, see creating a new user in IAM.
Once you have completed this configuration, choose whether to set up S3 archiving with an IAM role (recommended) or an IAM user.
Setup S3 archiving with IAM user
Note
From version 1.171 it is recommended to set up S3 archiving using roles in AWS instead of users. Any previous S3 archiving set up to a user will continue to work as expected.
When setting up S3 archiving with the IAM user, the cluster must have only one organization or
S3_ARCHIVING_REQUIRE_ROLE
must be set to false
.
Configuring S3 archiving with the IAM user requires that the user has PutObject
(write) permissions
to the bucket. So in the diagram below, the AWS IAM User has PutObject
(write) permissions for S3 Bucket.
Enabling LogScale to write to your S3 bucket means setting up AWS cross-account access.
In AWS:
Log in to the AWS console and navigate to your S3 service page.
Configure the user to have write access to a bucket by attaching a policy to the user.
The following JSON is an example policy configuration.
JSON{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::BUCKET_NAME" ] }, { "Effect": "Allow", "Action": [ "s3:PutObject", "s3:GetObject" ], "Resource": [ "arn:aws:s3:::BUCKET_NAME/*" ] } ] }
The policy can be used as an inline policy attached directly to the user through the AWS console:
Figure 83. IAM user example policy
In LogScale:
Go to the repository you want to archive and select
→ .Configure by giving the bucket name, region, and then
.
Troubleshoot S3 archiving configuration
If you encounter an access denied error message when configuring S3 archiving, check your configuration settings for missing information or typos.
Tag Grouping
From version 1.169, if tag grouping is applied for a repository, the archiving logic will upload one segment into one S3 file, even though the tag grouping makes each segment possibly contain multiple unique combinations of tags. The TAG_VALUE part of the S3 file name that corresponds to a tag with tag grouping will not contain any of the specific values for the tag in that segment, but will instead contain an internal value that denotes which tag group the segment belongs to. This is less human readable than splitting out a segment into a number of S3 files corresponding to each unique tag combination in the segment, but avoids the risk of a single segment being split into an unmanageable amount of S3 files.
Other options
HTTP proxy
If LogScale is set up to use an HTTP_PROXY_HOST
,
it will be used for communicating with S3 by default. To disable it, set
the following:
# Use the globally configured HTTP proxy for communicating with S3.
# Default is true.
S3_ARCHIVING_USE_HTTP_PROXY=false
Non-default endpoints
You can point to your own hosting endpoint for S3 to use for archiving if you host an S3-compatible service such as MinIO.
S3_ARCHIVING_ENDPOINT_BASE=http://my-own-s3:8080
Virtual host style (default)
LogScale will construct virtual host-style URLs like
https://my-bucket.my-own-s3:8080/path/inside/bucket/file.txt
.
For this style of access, you need to set your base URL, so it contains a placeholder for the bucket name.
S3_ARCHIVING_ENDPOINT_BASE=http://{bucket}.my-own-s3:8080
LogScale will replace the placeholder
{bucket}
with the relevant bucket name at
runtime.
Path-style
Some services do not support virtual host style access, and require
path-style access. Such URLs have the format
https://my-own-s3:8080/my-bucket/path/inside/bucket/file.txt
.
If you are using such a service, your endpoint base URL should not
contain a bucket placeholder.
S3_ARCHIVING_ENDPOINT_BASE=http://my-own-s3:8080
Additionally, you must set
S3_ARCHIVING_PATH_STYLE_ACCESS
to true.
IBM Cloud Storage compatibility
To use S3 Archiving with IBM Cloud Storage, set
S3_ARCHIVING_IBM_COMPAT
to true.
S3 archived log re-ingestion
You can re-ingest log data that has been written to an S3 bucket through S3 archiving by using Log Collector and the native JSON parsing within LogScale.
This process has the following requirements:
The files need to be downloaded from the S3 bucket to the machine running the Log Collector. The S3 files cannot be accessed natively by the Log Collector.
The ingested events will be ingested into the repository that is created for the purpose of receiving the data.
To re-ingest logs:
Create a repo in LogScale where the ingested data will be stored. See Creating a Repository or View.
Create an ingest token, and choose the JSON parser. See Assigning Parsers to Ingest Tokens.
Install the Falcon LogScale Collector to read from a file using the
.gz
extension as the file match. For example, using a configuration similar to this:yaml#dataDirectory is only required in the case of local configurations and must not be used for remote configurations files. dataDirectory: data sources: bucketdata: type: file # Glob patterns include: - /bucketdata/*.gz sink: my_humio_instance parser: json ...
For more information, see Sources & Examples.
Copy the log file from the S3 bucket into the configured directory (
/bucketdata
) in the above example.
The Log Collector reads the file that has been copied, sends it to LogScale, where the JSON event data will be parsed and recreated.