Ingesting Across Multiple Repositories

The HEC endpoint supports using different types of ingest tokens:

  • Repository ingest tokens - these are the tokens that you create to ingest to a specific repository only.

  • Organization ingest tokens - these tokens enable you to ingest data into all repositories within an organization (except sandbox and system repositories).

  • System ingest tokens - these tokens enable you to ingest data into all repositories in a cluster.

Prerequisites

To be able to create the organization and system tokens you need to enable the PermissionTokens feature flag. When running a new instance you can enable this by setting the INITIAL_FEATURE_FLAGS environment variable as follows:

ini
INITIAL_FEATURE_FLAGS=+PermissionTokens

This works from build 1.36 and later. For older builds, or if you need to enable the flag for an already-running instance, log in as root and navigate to https://$YOUR_HUMIO_URL/docs/api-explorer.

This gives you a GraphQL console where you can run the following mutation:

graphql
mutation {
  enableFeature(feature:PermissionTokens)
}

You may need to log out and then back in for the change to take effect.

Ingest Tokens

In previous versions of LogScale, the HEC required you to provide an ingest token that was tied to a particular repository. With it you could write to that specific repository or, if you enabled ALLOW_CHANGE_REPO_ON_EVENTS, any repo.

Now, the HEC accepts two new multi-repository ingest tokens: organization-wide and system-wide. An organization-wide ingest token allows you to ingest into any repository within the organization it belongs to. The system-wide allows you to ingest into any repository in the cluster. These token types can't ingest into system or sandbox repositories. You can generate these tokens through the UI using the following methods:

  • For system tokens: Account (profile)Cluster AdministrationSystem TokensAdd newIngest across all repositories in cluster.

  • For organization tokens: Account (profile)Organization SettingsManagement TokensAdd newIngest across all repos within organization.

Note

This requires the PermissionTokens feature flag to be enabled, as mentioned in Prerequisites.

When using a multi-repository token you must specify the repository you want to ingest into using the index field in a request to the HEC endpoint. When using a system token, you must also specify the organization the repository belongs to using the organization field.

Note

Repository-specific ingest tokens provide access to ingesting to that repo, without exceptions. For organization and system tokens there are exceptions: they do not permit ingest into system repositories or users' sandboxes. If you want to experiment with multi-repository tokens, using your sandbox won't work.

HTTP Request Fields

With multi-repository suport, the index field takes on additional functionality, and new fields have been added. These fields are shown in the following table:

Field Description
index The name or ID of the target repository to ingest into. For repository-specific ingest tokens, this defaults to the token's repository.
organization The ID of the organization the target repository belongs to. For repository or organization-specific ingest tokens, this defaults to the repository's organization and the token's organization respectively.
parserIndex If specified, the repository with this name or ID will be used to look up the parser to parse the event. If not specified, or invalid, the parser is looked up in the destination repository. It is only possible to specify repositories the ingest token has access to. For an organization-wide ingest token, for instance, you can only specify a repository in that organization.
parserOrganization If specified, this gives the ID of the organization in which the parserIndex will be looked up. If not specified the default value is the same as organization. This is only useful when using a system-wide ingest token and using a parser that belongs to a different organization than the target. As with parserIndex you can only specify organizations the ingest token has access to.

Repositories can be specified using either their name or ID. If you need a guarantee that events are delivered to a particular repository, the safest thing is to use only IDs. This is because names can change whereas IDs are fixed, and in the case of ambiguity it's the interpretation as an ID that is preferred. For organizations you can only use their ID, not their name.

All fields are optional, but note that if you leave out a field and it's one that doesn't have a default value, then a value must be specified through the event's tags. This applies to these fields:

Field Description
index With organization-wide or system-wide tokens, if you don't specify an index in the request, then the target repository will be taken from each event's #repo tag.
organization With system-wide tokens, if you don't specify an organization field in the request then the target organization will be taken from each event's #organization tag.

See the Event Tags section for more details.

Note

This works in single-organization mode as well, though the model is simpler. You can leave out the organization and parserOrganization fields - the index and parserIndex will be looked up within all repositories in the cluster.

Event Tags

The target of an event is taken primarily from the request fields. For instance, if the request specifies an index for an event then that's the repository the event is delivered to. However, it is also permitted to leave out index or organization, even for multi-repository tokens. In that case the event is processed normally. If parserIndex and parserOrganization were specified that parser is applied. Then any missing values are taken from the event's tags:

Field Description
#repo If no index field was given in the request the value of each event's #repo tag is used as the name or ID of the target repository for that event.
#organization Similarly, if no organization field was given in the request the value of each event's #organization tag is used as the ID of the organization of the target repository.

Since tags are only looked at after the event has been parsed, this allows the parser to decide which repository an event should go to, which is the most typical way to use this feature. If a parser is not required though, the #repo and #organization tags can be set on the raw unparsed event and that works too.

It is permitted for an event to contain these tags even if the corresponding fields were given explicitly in the request. In that case it's the request fields that will be used and the event tags will be ignored. The request will fail if either of these are missing in a case where they're required. That is, if neither an index or #repo is given for an organization-wide or system-wide token, or neither an organization or #organization are given for a system-wide token, then the request will fail.

How to Set Event Tags

Note that creating a tag in a parser is not simply a matter of specifying:

logscale
#repo := "myOtherRepo"

Assignments in parsers always create fields, not tags, even if the field names happen to include a #. Tagging is a separate step from parsing that needs to be configured separately. So the code shown previously will create a field called #repo, not a tag, and the ingest logic will ignore it.

What you need to do instead is have the parser set a different field, typically you would use the same name but with @ instead of #:

logscale
@repo := myOtherRepo

Then, under ParsersSettingsTagging, add @repo as a tag. The tagging mechanism will strip the @ and create a tag just called #repo which can then be picked up by the ingest logic and use to determine where to deliver the event.

There is nothing special about @repo as the field name, you could use repo or #repo or even _repo, it would have the same effect. It's just a convention that makes it stand out.

For more details about tagging see Event Tags.

Security Considerations

To make the security implications of this clear, here are some examples of what is permitted by different combinations of fields and tags. The general rule is that for a particular token, the system allows ingesting into the same set of repositories with event tags, as is allowed with request fields. They are just two different ways of doing the same thing - except that the fields are given preference, if specified.

For instance, if you use a system-wide token and specify neither index or organization, then that allows the parser to ingest unrestricted into any repository in the entire cluster, by setting the tags. That may be perfectly acceptable but it's worth keeping in mind. If in that case you only want the parser to be able to ingest into a single organization then specifying the organization field in the request will accomplish that. If the parser tries to specify a different #organization then that tag will be ignored. If you don't want the parser to affect what the target repository is, then specifying both index and organization will accomplish that, the parser can set #repo and #organization but they will be ignored.

Similarly with organization-wide tokens, if you don't specify an index in the request then the parser can ingest into any repository in that organization by setting #repo. The parser can also set #organization, but the only value that won't cause the request to fail for lack of access is the token's own organization - which is the default anyway.

Batched Events

There are two ways in which multiple events can be batched together in the same request to the HEC. You can simply include multiple events, separated by newlines, in the same request body. Each event in the sequence is parsed independently, so all fields need to be specified for each event.

The other is for the event field to be a list of values, as outlined in the description of that field. In that case, the other fields apply to all the events listed. Since multi-repository ingest potentially requires metadata to be specified for each event, this mechanism can potentially be useful to avoid repeating that metadata for each event.

Error Reporting

If the system is unable to resolve the destination repository for an event, the request fails and returns 422, UnprocessableEntity. If the body of the request contains multiple batched events, then none of the events will be ingested in that case. The system will never ingest a subset of the events in a request; it either ingests all events or fails them all. This only affects multi-repository tokens. With a repository-specific token, there is always a repository for a given event to arrive in - the one associated with the token. In that case the system does not fail the request, but falls back on ingesting into the token's repository.

If the system fails to resolve a parser in the requested parserIndex it will fall back to trying to resolve it in the destination repository, if one has been specified. If that fails as well, the system ingests the event without parsing it. In that case an error is recorded on the event by setting the field @error to true and @error_msg to a string describing the problem.

HTTP Response

The response to an ingest request is a JSON object that contains information about the operation. This is mainly informative, and for debugging. As far as reacting to the response, you should mainly rely on the status code and ignore the JSON. Here is an example of a response:

json
{
  "text": "Success",
  "code": 0,
  "eventCount": 8,
  "unresolvedSourcetypes: [{
    "index": "myRepo",
    "sourcetype": "htpreq"
  }],
  "unresolvedIndexes": [{
    "index": "myReepo",
    "organization": "aNInS0WHvORBcymQTp0HoLIKYDygBiwo"
  }]
}

This indicates that eight events were successfully ingested, however the system encountered an event which specified the sourcetype htpreq from repository myRepo which didn't exist. Also, an event specified that it should be ingested into myReepo which didn't exist. Since the request as a whole succeeded, that means it must have used a repository-specific ingest token, so that despite failing to resolve myReepo, the event was still ingested, but into the token's default repository.

Examples

Here are some examples of complete requests that can be used for testing.

This ingests a single event into a repository X using X's ingest token:

Mac OS or Linux (curl)
shell
curl -v -X POST $YOUR_LOGSCALE_URL/api/v1/ingest/hec \
    -H "Authorization: Bearer $INGEST_TOKEN" \
    -H "Content-Type: application/json" \
    -d '{ "event": "Repo Ingest Test Event" }'
Mac OS or Linux (curl) One-line
shell
curl -v -X POST $YOUR_LOGSCALE_URL/api/v1/ingest/hec \
    -H "Authorization: Bearer $INGEST_TOKEN" \
    -H "Content-Type: application/json" \
    -d '{ "event": "Repo Ingest Test Event" }'
Windows Cmd and curl
shell
curl -v -X POST $YOUR_LOGSCALE_URL/api/v1/ingest/hec ^
    -H "Authorization: Bearer $INGEST_TOKEN" ^
    -H "Content-Type: application/json" ^
    -d '{ "event": "Repo Ingest Test Event" }'
Windows Powershell and curl
powershell
curl.exe -X POST 
    -H "Authorization: Bearer $INGEST_TOKEN"
    -H "Content-Type: application/json"
    -d '{ "event": "Repo Ingest Test Event" }'
"$YOUR_LOGSCALE_URL/api/v1/ingest/hec"
Perl
perl
#!/usr/bin/perl

use HTTP::Request;
use LWP;

my $INGEST_TOKEN = "TOKEN";

my $uri = '$YOUR_LOGSCALE_URL/api/v1/ingest/hec';

my $json = '{ "event": "Repo Ingest Test Event" }';
my $req = HTTP::Request->new("POST", $uri );

$req->header("Authorization" => "Bearer $INGEST_TOKEN");
$req->header("Content-Type" => "application/json");

$req->content( $json );

my $lwp = LWP::UserAgent->new;

my $result = $lwp->request( $req );

print $result->{"_content"},"\n";
Python
python
#! /usr/local/bin/python3

import requests

url = '$YOUR_LOGSCALE_URL/api/v1/ingest/hec'
mydata = r'''{ "event": "Repo Ingest Test Event" }'''

resp = requests.post(url,
                     data = mydata,
                     headers = {
   "Authorization" : "Bearer $INGEST_TOKEN",
   "Content-Type" : "application/json"
}
)

print(resp.text)
Node.js
javascript
const https = require('https');

const data = JSON.stringify(
    { "event": "Repo Ingest Test Event" }
);


const options = {
  hostname: '$YOUR_LOGSCALE_URL/api/v1/ingest/hec',
  path: '/graphql',
  port: 443,
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Content-Length': data.length,
    Authorization: 'BEARER ' + process.env.TOKEN,
    'User-Agent': 'Node',
  },
};

const req = https.request(options, (res) => {
  let data = '';
  console.log(`statusCode: ${res.statusCode}`);

  res.on('data', (d) => {
    data += d;
  });
  res.on('end', () => {
    console.log(JSON.parse(data).data);
  });
});

req.on('error', (error) => {
  console.error(error);
});

req.write(data);
req.end();

The repository ingest token will have a format similar to 989c71b6-577c-4387-8db1-04ab6a94fb87. If you're running LogScale locally your URL might be localhost:3000/humio.

This ingests a single event into repository X using a system-level ingest token, and an organization ID:

Mac OS or Linux (curl)
shell
curl -v -X POST $YOUR_LOGSCALE_URL/api/v1/ingest/hec \
    -H "Authorization: Bearer $INGEST_TOKEN" \
    -H "Content-Type: application/json" \
    -d '{ "event": "System Ingest Test Event", "index": "X", "organization": "$ORGANIZATION_ID" }'
Mac OS or Linux (curl) One-line
shell
curl -v -X POST $YOUR_LOGSCALE_URL/api/v1/ingest/hec \
    -H "Authorization: Bearer $INGEST_TOKEN" \
    -H "Content-Type: application/json" \
    -d '{ "event": "System Ingest Test Event", "index": "X", "organization": "$ORGANIZATION_ID" }'
Windows Cmd and curl
shell
curl -v -X POST $YOUR_LOGSCALE_URL/api/v1/ingest/hec ^
    -H "Authorization: Bearer $INGEST_TOKEN" ^
    -H "Content-Type: application/json" ^
    -d '{ "event": "System Ingest Test Event", "index": "X", "organization": "$ORGANIZATION_ID" }'
Windows Powershell and curl
powershell
curl.exe -X POST 
    -H "Authorization: Bearer $INGEST_TOKEN"
    -H "Content-Type: application/json"
    -d '{ "event": "System Ingest Test Event", "index": "X", "organization": "$ORGANIZATION_ID" }'
"$YOUR_LOGSCALE_URL/api/v1/ingest/hec"
Perl
perl
#!/usr/bin/perl

use HTTP::Request;
use LWP;

my $INGEST_TOKEN = "TOKEN";

my $uri = '$YOUR_LOGSCALE_URL/api/v1/ingest/hec';

my $json = '{ "event": "System Ingest Test Event", "index": "X", "organization": "$ORGANIZATION_ID" }';
my $req = HTTP::Request->new("POST", $uri );

$req->header("Authorization" => "Bearer $INGEST_TOKEN");
$req->header("Content-Type" => "application/json");

$req->content( $json );

my $lwp = LWP::UserAgent->new;

my $result = $lwp->request( $req );

print $result->{"_content"},"\n";
Python
python
#! /usr/local/bin/python3

import requests

url = '$YOUR_LOGSCALE_URL/api/v1/ingest/hec'
mydata = r'''{ "event": "System Ingest Test Event", "index": "X", "organization": "$ORGANIZATION_ID" }'''

resp = requests.post(url,
                     data = mydata,
                     headers = {
   "Authorization" : "Bearer $INGEST_TOKEN",
   "Content-Type" : "application/json"
}
)

print(resp.text)
Node.js
javascript
const https = require('https');

const data = JSON.stringify(
    { "event": "System Ingest Test Event", "index": "X", "organization": "$ORGANIZATION_ID" }
);


const options = {
  hostname: '$YOUR_LOGSCALE_URL/api/v1/ingest/hec',
  path: '/graphql',
  port: 443,
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Content-Length': data.length,
    Authorization: 'BEARER ' + process.env.TOKEN,
    'User-Agent': 'Node',
  },
};

const req = https.request(options, (res) => {
  let data = '';
  console.log(`statusCode: ${res.statusCode}`);

  res.on('data', (d) => {
    data += d;
  });
  res.on('end', () => {
    console.log(JSON.parse(data).data);
  });
});

req.on('error', (error) => {
  console.error(error);
});

req.write(data);
req.end();

Note

Make sure you take steps to set the organization ID if using the code examples provided.

The system ingest token will have a format similar to grBqk3PRbxxaE77UKpqFl4IJm1i4ciGn~TQKczelOIGBo93pGBxugG3wImAB1a3vFKy30G05vxhvu, the organization ID will have a format similar to aNInS0WHvORBcymQTp0HoLIKYDygBiwo.