Manage Repositories in the Cluster
Resurrect Deleted Segments
This API endpoint allows undoing delete of recently deleted segments by lowering retention settings in particular. The endpoint will reset the "tombstone" on deleted segments internally, and restore all files that are still available in a bucket using Bucket Storage.
By default LogScale will keep files in bucket storage for seven (7) days longer than the retention settings require. This means that extending retention by seven days and then using this API can add approximately the latest seven days worth of deleted events.
In the case of a retention being lowered from the proper value to something very small, the 7 days allows you up to 7 days of time to revert the change to retention settings and invoke this API endpoint before any events are lost. Invoking this endpoint requires root access.
Description | Restore recently deleted segments. | ||
Method | POST /api/v1/repositories/ | ||
Request Data | |||
Authentication Required | yes | ||
Path Arguments | Description | Data type | Required? |
viewname | The repository name | string | required |
Return Codes | |||
200 | Request complete | ||
400 | Bad authentication | ||
500 | Request failed |
curl -v -X POST $YOUR_LOGSCALE_URL/api/v1/repositories/$VIEWNAME/resurrect-deleted-segments \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json"
curl -v -X POST $YOUR_LOGSCALE_URL/api/v1/repositories/$VIEWNAME/resurrect-deleted-segments \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json"
curl -v -X POST $YOUR_LOGSCALE_URL/api/v1/repositories/$VIEWNAME/resurrect-deleted-segments ^
-H "Authorization: Bearer $TOKEN" ^
-H "Content-Type: application/json"
curl.exe -X POST
-H "Authorization: Bearer $TOKEN"
-H "Content-Type: application/json"
"$YOUR_LOGSCALE_URL/api/v1/repositories/$VIEWNAME/resurrect-deleted-segments"
#!/usr/bin/perl
use HTTP::Request;
use LWP;
my $INGEST_TOKEN = "TOKEN";
my $uri = '$YOUR_LOGSCALE_URL/api/v1/repositories/$VIEWNAME/resurrect-deleted-segments';
my $json = '';
my $req = HTTP::Request->new("POST", $uri );
$req->header("Authorization" => "Bearer $TOKEN");
$req->header("Content-Type" => "application/json");
$req->content( $json );
my $lwp = LWP::UserAgent->new;
my $result = $lwp->request( $req );
print $result->{"_content"},"\n";
#! /usr/local/bin/python3
import requests
url = '$YOUR_LOGSCALE_URL/api/v1/repositories/$VIEWNAME/resurrect-deleted-segments'
mydata = r''
resp = requests.post(url,
data = mydata,
headers = {
"Authorization" : "Bearer $TOKEN",
"Content-Type" : "application/json"
}
)
print(resp.text)
const https = require('https');
const options = {
hostname: '$YOUR_LOGSCALE_URL/api/v1/repositories/$VIEWNAME/resurrect-deleted-segments',
path: '/graphql',
port: 443,
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Content-Length': data.length,
Authorization: 'BEARER ' + process.env.TOKEN,
'User-Agent': 'Node',
},
};
const req = https.request(options, (res) => {
let data = '';
console.log(`statusCode: ${res.statusCode}`);
res.on('data', (d) => {
data += d;
});
res.on('end', () => {
console.log(JSON.parse(data).data);
});
});
req.on('error', (error) => {
console.error(error);
});
req.write(data);
req.end();
Manage Data Sources Limits
LogScale supports control of the default number of datasources limit for each repository.
Show Datasources Limit
Description | See the current default limit on the number of datasources. | ||
Method | GET /api/v1/repositories/ | ||
Authentication Required | yes | ||
Path Arguments | Description | Data type | Required? |
repository | The repository name | string | required |
Return Codes | |||
200 | Request complete | ||
400 | Bad authentication | ||
500 | Request failed |
curl -v -X GET $YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/max-datasources \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json"
curl -v -X GET $YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/max-datasources \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json"
curl -v -X GET $YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/max-datasources ^
-H "Authorization: Bearer $TOKEN" ^
-H "Content-Type: application/json"
curl.exe -X GET
-H "Authorization: Bearer $TOKEN"
-H "Content-Type: application/json"
#!/usr/bin/perl
use HTTP::Request;
use LWP;
my $INGEST_TOKEN = "TOKEN";
my $uri = '$YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/max-datasources';
my $req = HTTP::Request->new("GET", $uri );
$req->header("Authorization" => "Bearer $TOKEN");
$req->header("Content-Type" => "application/json");
my $lwp = LWP::UserAgent->new;
my $result = $lwp->request( $req );
print $result->{"_content"},"\n";
#! /usr/local/bin/python3
import requests
url = '$YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/max-datasources'
resp = requests.get(url,
headers = {
"Authorization" : "Bearer $TOKEN",
"Content-Type" : "application/json"
}
)
print(resp.text)
const https = require('https');
let request = https.get('$YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/max-datasources', (res) => {
if (res.statusCode !== 200) {
console.error(`Error from server. Code: ${res.statusCode}`);
res.resume();
return;
}
let data = '';
res.on('data', (chunk) => {
data += chunk;
});
res.on('close', () => {
console.log('Response:');
console.log(JSON.parse(data));
});
});
return(undef,undef);
}
Update Datasources Limit
The REST API endpoint max-datasources allows setting a new limit, per repository, for the maximum number of datasources. limit for the number of data sources on each repository.
For more information on creating tags during parsing, see Event Tags, and for information tags and datasources, see Tag Fields and Datasources.
Description | Set a new value for the maximum number of allowed datasources. | ||
Method | POST /api/v1/repositories/ | ||
Request Data | |||
Authentication Required | yes | ||
Path Arguments | Description | Data type | Required? |
repository | The repository name | string | required |
Query Arguments | Description | Data type | Required? |
number | Maximum number of datasources | integer | required |
Return Codes | |||
200 | Request complete | ||
400 | Bad authentication | ||
500 | Request failed |
To update the maximum number of datasources, supply the number to the endpoint:
DATASOURCE_MAX=1000
curl -v -X POST $YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/max-datasources?number=$DATASOURCE_MAX \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json"
curl -v -X POST $YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/max-datasources?number=$DATASOURCE_MAX \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json"
curl -v -X POST $YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/max-datasources?number=$DATASOURCE_MAX ^
-H "Authorization: Bearer $TOKEN" ^
-H "Content-Type: application/json"
curl.exe -X POST
-H "Authorization: Bearer $TOKEN"
-H "Content-Type: application/json"
"$YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/max-datasources?number=$DATASOURCE_MAX"
#!/usr/bin/perl
use HTTP::Request;
use LWP;
my $INGEST_TOKEN = "TOKEN";
my $uri = '$YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/max-datasources?number=$DATASOURCE_MAX';
my $json = '';
my $req = HTTP::Request->new("POST", $uri );
$req->header("Authorization" => "Bearer $TOKEN");
$req->header("Content-Type" => "application/json");
$req->content( $json );
my $lwp = LWP::UserAgent->new;
my $result = $lwp->request( $req );
print $result->{"_content"},"\n";
#! /usr/local/bin/python3
import requests
url = '$YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/max-datasources?number=$DATASOURCE_MAX'
mydata = r''
resp = requests.post(url,
data = mydata,
headers = {
"Authorization" : "Bearer $TOKEN",
"Content-Type" : "application/json"
}
)
print(resp.text)
const https = require('https');
const options = {
hostname: '$YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/max-datasources?number=$DATASOURCE_MAX',
path: '/graphql',
port: 443,
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Content-Length': data.length,
Authorization: 'BEARER ' + process.env.TOKEN,
'User-Agent': 'Node',
},
};
const req = https.request(options, (res) => {
let data = '';
console.log(`statusCode: ${res.statusCode}`);
res.on('data', (d) => {
data += d;
});
res.on('end', () => {
console.log(JSON.parse(data).data);
});
});
req.on('error', (error) => {
console.error(error);
});
req.write(data);
req.end();
Delete Datasources
The deleting datasources endpoint marks the datasource for deletion, internally triggering delete of all segments in the datasource.
Description | Marks the datasource for deletion, triggering deletion of all segments in the datasource. | ||
Method | DELETE /api/v1/repositories/ | ||
Authentication Required | yes | ||
Path Arguments | Description | Data type | Required? |
datasourceid | The datasource ID number. | integer | required |
repository | The repository name | string | required |
Return Codes | |||
200 | Request complete | ||
400 | Bad authentication | ||
500 | Request failed |
curl -v -X DELETE $YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/datasources/$DATASOURCEID \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json"
curl -v -X DELETE $YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/datasources/$DATASOURCEID \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json"
curl -v -X DELETE $YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/datasources/$DATASOURCEID ^
-H "Authorization: Bearer $TOKEN" ^
-H "Content-Type: application/json"
curl.exe -X DELETE
-H "Authorization: Bearer $TOKEN"
-H "Content-Type: application/json"
#!/usr/bin/perl
use HTTP::Request;
use LWP;
my $INGEST_TOKEN = "TOKEN";
my $uri = '$YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/datasources/$DATASOURCEID';
my $req = HTTP::Request->new("DELETE", $uri );
$req->header("Authorization" => "Bearer $TOKEN");
$req->header("Content-Type" => "application/json");
my $lwp = LWP::UserAgent->new;
my $result = $lwp->request( $req );
print $result->{"_content"},"\n";
#! /usr/local/bin/python3
import requests
url = '$YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/datasources/$DATASOURCEID'
resp = requests.delete(url,
headers = {
"Authorization" : "Bearer $TOKEN",
"Content-Type" : "application/json"
}
)
print(resp.text)
const https = require('https');
let request = https.delete('$YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/datasources/$DATASOURCEID', (res) => {
if (res.statusCode !== 200) {
console.error(`Error from server. Code: ${res.statusCode}`);
res.resume();
return;
}
let data = '';
res.on('data', (chunk) => {
data += chunk;
});
res.on('close', () => {
console.log('Response:');
console.log(JSON.parse(data));
});
});
return(undef,undef);
}
Importing a Repository from Another LogScale Instance (BETA)
Removed: Beta feature removed
This feature is removed starting from LogScale version 1.79.0.
You can import users, dashboards, and segments files from another
LogScale instance. You need to get a copy of the
/data/humio-data/global-data-snapshot.json
from the origin server.
You also need to copy the segments files that you want to import. These
must be placed in the folder
/data/humio-data/ready_for_import_dataspaces
using the following structure:
/data/humio-data/ready_for_import_dataspaces/dataspace_$ID
You should copy the files for the repository to the server into another folder while the copying is happening, and then move it to the proper name once it's ready. Note the name of the directory uses the internal ID of the repository, which is the directory name in the source system.
The folder
/data/humio-data/ready_for_import_dataspaces
must be read+writeable for the
humio-user
running
the server, as it moves the files to another directory and deletes the
imported files when it is done with them, one at a time.
Example (note that you need both NAME and ID of the repository):
$ NAME="target-repo-name"
$ SRC_NAME="source-repo-name"
$ ID="my-repository-id"
$ sudo mkdir /data/humio-data/ready_for_import_dataspaces
$ sudo mv /data/from-other/dataspace_$ID /data/humio-data/ready_for_import_dataspaces
$ sudo chown -R humio /data/humio-data/ready_for_import_dataspaces/
$ curl -XPOST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d @from-other-global-data-snapshot.json \
"$YOUR_LOGSCALE_URL/api/v1/importrepository/$NAME?importSegmentFilesOnly=true&importFromName=$SRC_NAME"
The POST
imports the metadata, such as users
and dashboards, and moves the repository folder from
/data/humio-data/ready_for_import_dataspaces
to /data/humio-data/import
. A
low-priority background task will then import the actual segments files
from that point on.
You can start using the ingest tokens and other data, that are not
actual log-events as soon as the POST
has
completed.
You can run the POST
starting the import of the
same repository more than once. This is useful if you wish to import
only a fraction of the data files at first, but get all the metadata.
When you rerun the POST
, the metadata is
inserted/updated again, if it no longer matches only. The new repository
files will get copied at that point in time.
If you re-import the same segment files more than once, you get duplicate events in your target repository.
Note
We strongly recommend that you import to a new repository, at least until you have practiced this procedure. Having the newly imported data in a separate repository makes it easy to delete and try again, while deleting data from an existing repository will be very time consuming and error prone.
Configure Auto-Sharding for High-Volume Data Sources
A data source is ultimately bounded by the volume that one CPU thread can manage to compress and write to the filesystem. This is typically in about 190 GB/day. To handle more ingest traffic from a specific data source, you need to provide more variability in the set of tags. But in some cases, it may not be possible or desirable to adjust the set of tags or tagged fields in the client. To solve this case, LogScale supports adding a synthetic tag, that is assigned a random number for each (small bulk) of events.
LogScale supports detecting if there is a high load on a data
source, and automatically triggers this auto-sharding on the data
sources. You will see this happening on "fast" data sources, typically
if more than 190 GB/day is delivered to a single data source. The events
then get an extra tag, #humioAutoShard
that is
assigned a random integer value.
Starting from LogScale v1.152, auto-sharding is handled through
rate monitoring of the ingest flow. This is configured through the
dynamic configuration option TargetMaxRateForDatasource
with a default of 2 MB/s (about 190 GB/day). In previous
LogScale versions, the configuration was handled by ingest delay
through AUTOSHARDING_TRIGGER_DELAY_MS
and
AUTOSHARDING_CHECKINTERVAL_MS
configuration variables,
now dismissed.
The setting AUTOSHARDING_MAX
controls how many different
data sources get created this way for each "real" data source. The
default value is 1,024.
Configure Sticky Auto-Sharding for High-Volume Data Sources
In some use cases, it makes sense to disable the automatic tuning and
manage these settings using the API. Set AUTOSHARDING_MAX
to 1 to make the system never increase the number of autoshards of data
sources, then use the API to set sticky autosharding settings on the
selected data sources that require it. The sticky settings are not
limited by the AUTOSHARDING_MAX
configuration.
Show Autosharding Settings
Description | Show the autosharding settings for a datasource. | ||
Method | GET /api/v1/repositories/ | ||
Authentication Required | no | ||
Path Arguments | Description | Data type | Required? |
datasourceid | The datasource ID number. | integer | required |
repository | The repository name | string | required |
Return Codes | |||
200 | Request complete | ||
400 | Bad authentication | ||
500 | Request failed |
To show the autosharding settings for a specific datasource, run:
curl -v -X GET $YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/datasources/$DATASOURCEID/autosharding \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json"
curl -v -X GET $YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/datasources/$DATASOURCEID/autosharding \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json"
curl -v -X GET $YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/datasources/$DATASOURCEID/autosharding ^
-H "Authorization: Bearer $TOKEN" ^
-H "Content-Type: application/json"
curl.exe -X GET
-H "Authorization: Bearer $TOKEN"
-H "Content-Type: application/json"
#!/usr/bin/perl
use HTTP::Request;
use LWP;
my $INGEST_TOKEN = "TOKEN";
my $uri = '$YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/datasources/$DATASOURCEID/autosharding';
my $req = HTTP::Request->new("GET", $uri );
$req->header("Authorization" => "Bearer $TOKEN");
$req->header("Content-Type" => "application/json");
my $lwp = LWP::UserAgent->new;
my $result = $lwp->request( $req );
print $result->{"_content"},"\n";
#! /usr/local/bin/python3
import requests
url = '$YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/datasources/$DATASOURCEID/autosharding'
resp = requests.get(url,
headers = {
"Authorization" : "Bearer $TOKEN",
"Content-Type" : "application/json"
}
)
print(resp.text)
const https = require('https');
let request = https.get('$YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/datasources/$DATASOURCEID/autosharding', (res) => {
if (res.statusCode !== 200) {
console.error(`Error from server. Code: ${res.statusCode}`);
res.resume();
return;
}
let data = '';
res.on('data', (chunk) => {
data += chunk;
});
res.on('close', () => {
console.log('Response:');
console.log(JSON.parse(data));
});
});
return(undef,undef);
}
Update Autosharding
To update the autosharding settings for a specific datasource, run:
Description | Update the autosharding settings for a datasource. | ||
Method | POST /api/v1/repositories/ | ||
Request Data | |||
Authentication Required | no | ||
Path Arguments | Description | Data type | Required? |
datasourceid | The datasource ID number. | integer | required |
number | Number of autoshards for a datasource. Not limited by other configurations. | integer | required |
repository | The repository name | string | required |
Return Codes | |||
200 | Request complete | ||
400 | Bad authentication | ||
500 | Request failed |
curl -v -X POST $YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/datasources/$DATASOURCEID/autosharding \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json"
curl -v -X POST $YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/datasources/$DATASOURCEID/autosharding \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json"
curl -v -X POST $YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/datasources/$DATASOURCEID/autosharding ^
-H "Authorization: Bearer $TOKEN" ^
-H "Content-Type: application/json"
curl.exe -X POST
-H "Authorization: Bearer $TOKEN"
-H "Content-Type: application/json"
"$YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/datasources/$DATASOURCEID/autosharding"
#!/usr/bin/perl
use HTTP::Request;
use LWP;
my $INGEST_TOKEN = "TOKEN";
my $uri = '$YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/datasources/$DATASOURCEID/autosharding';
my $json = '';
my $req = HTTP::Request->new("POST", $uri );
$req->header("Authorization" => "Bearer $TOKEN");
$req->header("Content-Type" => "application/json");
$req->content( $json );
my $lwp = LWP::UserAgent->new;
my $result = $lwp->request( $req );
print $result->{"_content"},"\n";
#! /usr/local/bin/python3
import requests
url = '$YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/datasources/$DATASOURCEID/autosharding'
mydata = r''
resp = requests.post(url,
data = mydata,
headers = {
"Authorization" : "Bearer $TOKEN",
"Content-Type" : "application/json"
}
)
print(resp.text)
const https = require('https');
const options = {
hostname: '$YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/datasources/$DATASOURCEID/autosharding',
path: '/graphql',
port: 443,
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Content-Length': data.length,
Authorization: 'BEARER ' + process.env.TOKEN,
'User-Agent': 'Node',
},
};
const req = https.request(options, (res) => {
let data = '';
console.log(`statusCode: ${res.statusCode}`);
res.on('data', (d) => {
data += d;
});
res.on('end', () => {
console.log(JSON.parse(data).data);
});
});
req.on('error', (error) => {
console.error(error);
});
req.write(data);
req.end();
To update to a specific number of autoshards run the query as shown below:
$ curl -XPOST -H "Authorization: Bearer $TOKEN" "$YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY_NAME/datasources/$DATASOURCEID/autosharding?number=7"
Delete Autosharding
Description | Delete the autosharding settings for a datasource. | ||
Method | DELETE /api/v1/repositories/ | ||
Authentication Required | no | ||
Path Arguments | Description | Data type | Required? |
datasourceid | The datasource ID number. | integer | required |
repository | The repository name | string | required |
Return Codes | |||
200 | Request complete | ||
400 | Bad authentication | ||
500 | Request failed |
To delete the autosharding settings for a specific datasource, run:
curl -v -X DELETE $YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/datasources/$DATASOURCEID/autosharding \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json"
curl -v -X DELETE $YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/datasources/$DATASOURCEID/autosharding \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json"
curl -v -X DELETE $YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/datasources/$DATASOURCEID/autosharding ^
-H "Authorization: Bearer $TOKEN" ^
-H "Content-Type: application/json"
curl.exe -X DELETE
-H "Authorization: Bearer $TOKEN"
-H "Content-Type: application/json"
#!/usr/bin/perl
use HTTP::Request;
use LWP;
my $INGEST_TOKEN = "TOKEN";
my $uri = '$YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/datasources/$DATASOURCEID/autosharding';
my $req = HTTP::Request->new("DELETE", $uri );
$req->header("Authorization" => "Bearer $TOKEN");
$req->header("Content-Type" => "application/json");
my $lwp = LWP::UserAgent->new;
my $result = $lwp->request( $req );
print $result->{"_content"},"\n";
#! /usr/local/bin/python3
import requests
url = '$YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/datasources/$DATASOURCEID/autosharding'
resp = requests.delete(url,
headers = {
"Authorization" : "Bearer $TOKEN",
"Content-Type" : "application/json"
}
)
print(resp.text)
const https = require('https');
let request = https.delete('$YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/datasources/$DATASOURCEID/autosharding', (res) => {
if (res.statusCode !== 200) {
console.error(`Error from server. Code: ${res.statusCode}`);
res.resume();
return;
}
let data = '';
res.on('data', (chunk) => {
data += chunk;
});
res.on('close', () => {
console.log('Response:');
console.log(JSON.parse(data));
});
});
return(undef,undef);
}
Setup Grouping of Tags
Important
The GraphQL interface repository() is the preferred method for updating tag groupings.
Note
This is an advanced feature.
Tags are the fields with a prefix of
#
. They are used
internally to do sharding of data into smaller streams. A
data source
is
created for every unique combination of tag values set by the clients
(such as log shippers). LogScale will reject ingested events
once a certain number of datasources get created. The limit is currently
10,000 datasources per repository.
Show Tag Grouping
Description | List repositories grouped by tags in the cluster. | ||
Method | GET /api/v1/repositories/ | ||
Authentication Required | yes | ||
Path Arguments | Description | Data type | Required? |
repository | The repository name | string | required |
Return Codes | |||
200 | Request complete | ||
400 | Bad authentication | ||
500 | Request failed |
GET /api/v1/repositories/$REPOSITORY_NAME/taggrouping
curl -v -X GET $YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/taggrouping \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json"
curl -v -X GET $YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/taggrouping \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json"
curl -v -X GET $YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/taggrouping ^
-H "Authorization: Bearer $TOKEN" ^
-H "Content-Type: application/json"
curl.exe -X GET
-H "Authorization: Bearer $TOKEN"
-H "Content-Type: application/json"
#!/usr/bin/perl
use HTTP::Request;
use LWP;
my $INGEST_TOKEN = "TOKEN";
my $uri = '$YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/taggrouping';
my $req = HTTP::Request->new("GET", $uri );
$req->header("Authorization" => "Bearer $TOKEN");
$req->header("Content-Type" => "application/json");
my $lwp = LWP::UserAgent->new;
my $result = $lwp->request( $req );
print $result->{"_content"},"\n";
#! /usr/local/bin/python3
import requests
url = '$YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/taggrouping'
resp = requests.get(url,
headers = {
"Authorization" : "Bearer $TOKEN",
"Content-Type" : "application/json"
}
)
print(resp.text)
const https = require('https');
let request = https.get('$YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/taggrouping', (res) => {
if (res.statusCode !== 200) {
console.error(`Error from server. Code: ${res.statusCode}`);
res.resume();
return;
}
let data = '';
res.on('data', (chunk) => {
data += chunk;
});
res.on('close', () => {
console.log('Response:');
console.log(JSON.parse(data));
});
});
return(undef,undef);
}
LogScale recommends that you only use the parser as a tag in the field #type.
Using more tags may speed up queries on large data volumes, but only
works on a bounded value-set for the tag fields. The speed-up only
affects queries prefixed with
#tag=value
pairs
that significantly filter out input events.
Update Tag Grouping
Note
If you are using a hosted LogScale instance while following this procedure, please contact support if you wish to add grouping rules to your repository.
Adding a new set of rules using POST
replaces
the current set. The previous sets are kept, and if a previous one
matches, then the previous one is reused. The previous rules are kept
in the system but may be deleted by LogScale once all data
sources referring them has been deleted (through retention settings).
Description | Apply new tag grouping rules to repositories in the cluster. | ||
Method | POST /api/v1/repositories/ | ||
Request Data | |||
Authentication Required | yes | ||
Path Arguments | Description | Data type | Required? |
repository | The repository name | string | required |
Return Codes | |||
200 | Request complete | ||
400 | Bad authentication | ||
500 | Request failed |
POST /api/v1/repositories/$REPOSITORY_NAME/taggrouping
For some use cases, such as having the "client IP" from an access log as a tag, too many different tags will arise. For such a case, it is necessary to either stop having the field as a tag, or create a grouping rule on the tag field. Existing data is not rewritten when grouping rules are added or changed. Changing the grouping rules will thus in itself create more data sources.
Example: Setting the grouping rules for repository
$REPOSITORY_NAME
to hash the field #host into 8
buckets, and
#client_ip
into
10 buckets. Note how the field names do not include the
#
prefix in the
rules.
curl -v -X POST $YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/taggrouping \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"field":"host","modulus": 8}, {"field":"client_ip","modulus": 10} ]'
curl -v -X POST $YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/taggrouping \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"field":"host","modulus": 8}, {"field":"client_ip","modulus": 10} ]'
curl -v -X POST $YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/taggrouping ^
-H "Authorization: Bearer $TOKEN" ^
-H "Content-Type: application/json" ^
-d '{"field":"host","modulus": 8}, {"field":"client_ip","modulus": 10} ]'
curl.exe -X POST
-H "Authorization: Bearer $TOKEN"
-H "Content-Type: application/json"
-d '{"field":"host","modulus": 8}, {"field":"client_ip","modulus": 10} ]'
"$YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/taggrouping"
#!/usr/bin/perl
use HTTP::Request;
use LWP;
my $INGEST_TOKEN = "TOKEN";
my $uri = '$YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/taggrouping';
my $json = '{"field":"host","modulus": 8}, {"field":"client_ip","modulus": 10} ]';
my $req = HTTP::Request->new("POST", $uri );
$req->header("Authorization" => "Bearer $TOKEN");
$req->header("Content-Type" => "application/json");
$req->content( $json );
my $lwp = LWP::UserAgent->new;
my $result = $lwp->request( $req );
print $result->{"_content"},"\n";
#! /usr/local/bin/python3
import requests
url = '$YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/taggrouping'
mydata = r'''{"field":"host","modulus": 8}, {"field":"client_ip","modulus": 10} ]'''
resp = requests.post(url,
data = mydata,
headers = {
"Authorization" : "Bearer $TOKEN",
"Content-Type" : "application/json"
}
)
print(resp.text)
const https = require('https');
const data = JSON.stringify(
{"field":"host","modulus": 8}, {"field":"client_ip","modulus": 10} ]
);
const options = {
hostname: '$YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/taggrouping',
path: '/graphql',
port: 443,
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Content-Length': data.length,
Authorization: 'BEARER ' + process.env.TOKEN,
'User-Agent': 'Node',
},
};
const req = https.request(options, (res) => {
let data = '';
console.log(`statusCode: ${res.statusCode}`);
res.on('data', (d) => {
data += d;
});
res.on('end', () => {
console.log(JSON.parse(data).data);
});
});
req.on('error', (error) => {
console.error(error);
});
req.write(data);
req.end();
When using grouped tags in the query field, you can expect to get a
speed-up of approximately the modulus compared to not including the
tags in the query, provided you use an exact match on the field. If
you use a wildcard
(*
) in the value
for the grouped tag, the implementation currently scans all data
sources that have a non-empty value for that field and filter the
events to only get the results that match the wildcard pattern.
For non-grouped tag fields, it is efficient to use a wildcard at either end of the value string to match.
LogScale also supports auto-grouping of tags using the
configuration variables MAX_DISTINCT_TAG_VALUES
(default is 1000
)
and TAG_HASHING_BUCKETS
(default is
32
). LogScale
checks the number of distinct values for each key in each tag
combination against MAX_DISTINCT_TAG_VALUES
at regular
intervals. If this threshold is exceeded, a new grouping rule is added
with the modulus set to the value set in
TAG_HASHING_BUCKETS
, but only if there is no rule for
that tag key. You can thus configure rules using the API above and
decide the number of buckets there. This is preferable to
auto-detecting, as the auto-detection works after the fact and thus
leaves a large number of unused data sources that will need to get
deleted by retention at some point. The auto-grouping support is meant
as a safety measure to avoid suddenly creating many data sources by
mistake for a single tag key.
Mark Segment for Deletion
To mark a segment file for deletion, a DELETE
command can be sent to the segment deletion endpoint.
Description | Mark a segment file for deletion. | ||
Method | DELETE /api/v1/repositories/ | ||
Authentication Required | yes | ||
Path Arguments | Description | Data type | Required? |
datasourceid | The datasource ID number. | string | required |
repository | The repository name | string | required |
segmentid | Segment ID to delete for a datasource. | string | required |
Return Codes | |||
200 | Request complete | ||
400 | Bad authentication | ||
500 | Request failed |
/api/v1/repositories/$REPOSITORY_NAME/datasources/$DATASOURCEID/segments/$SEGMENTID
curl -v -X DELETE $YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/datasources/$DATASOURCEID/segments/$SEGMENTID \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json"
curl -v -X DELETE $YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/datasources/$DATASOURCEID/segments/$SEGMENTID \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json"
curl -v -X DELETE $YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/datasources/$DATASOURCEID/segments/$SEGMENTID ^
-H "Authorization: Bearer $TOKEN" ^
-H "Content-Type: application/json"
curl.exe -X DELETE
-H "Authorization: Bearer $TOKEN"
-H "Content-Type: application/json"
#!/usr/bin/perl
use HTTP::Request;
use LWP;
my $INGEST_TOKEN = "TOKEN";
my $uri = '$YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/datasources/$DATASOURCEID/segments/$SEGMENTID';
my $req = HTTP::Request->new("DELETE", $uri );
$req->header("Authorization" => "Bearer $TOKEN");
$req->header("Content-Type" => "application/json");
my $lwp = LWP::UserAgent->new;
my $result = $lwp->request( $req );
print $result->{"_content"},"\n";
#! /usr/local/bin/python3
import requests
url = '$YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/datasources/$DATASOURCEID/segments/$SEGMENTID'
resp = requests.delete(url,
headers = {
"Authorization" : "Bearer $TOKEN",
"Content-Type" : "application/json"
}
)
print(resp.text)
const https = require('https');
let request = https.delete('$YOUR_LOGSCALE_URL/api/v1/repositories/$REPOSITORY/datasources/$DATASOURCEID/segments/$SEGMENTID', (res) => {
if (res.statusCode !== 200) {
console.error(`Error from server. Code: ${res.statusCode}`);
res.resume();
return;
}
let data = '';
res.on('data', (chunk) => {
data += chunk;
});
res.on('close', () => {
console.log('Response:');
console.log(JSON.parse(data));
});
});
return(undef,undef);
}
Doing this marks the segment for deletion, and eventually the metadata for the file is removed.
Important
This is not a typical scenario, but may be required after, e.g., losing files from the bucket trusted with the files for the cluster.