Datasources

Datasources are created by LogScale through a combination of tags on the data during ingestion, and the segments that are written to storage.

The datsource is an important structure as it controls and influences both ingestion and searching:

  • During parsing, LogScale the parser can define the tags

A datasource defines the segment files tht are used to store a tag combination. For example in the diagram below:

block-beta columns 8 A["#host=server1 #source=http.log"] block:block1:6 SegmentA1:1 space:2 SegmentA2:1 SegmentA3:1 end DS1["Datasource"] B["#host=server2 #source=http.log"] block:block2:6 SegmentB1:4 space:1 SegmentB2:1 end DS2["Datasource"] C["#host=server2 #source=loadbalance"] block:block3:6 SegmentC1 SegmentC2 SegmentC3 SegmentC4 SegmentC5 end DS3["Datasource"] D["#host=server3 #source=loadbalance"] block:block4:6 SegmentD1 SegmentD2 SegmentD3 SegmentD4 SegmentD5 SegmentD6 end DS4["Datasource"] blockArrowId6<["&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Time&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"]>(right):8 style DS1 fill:#fff,stroke:#fff; style DS2 fill:none,stroke:#fff; style DS3 fill:none,stroke:#fff; style DS4 fill:none,stroke:#fff;

Each of the tag combinations is an individual datasource, and each datasource has it's own series of segments and timespan.

Data sources affect both the ingestion and searching of data:

  • During ingestion, tags affect how data flows through the Kafka )(see Ingestion: Kafka Phase) and digest (see Ingestion: Digest Phase.

  • When searching, the datasource is used to limit the segments needed to return the search results. If the query includes a specific tag, the segments searched can be limited to the matching datasource for the selected tags.

Because of this combination of ingest and search impact for a datasource, you should choose your tags carefully to ensure that you are maximising the ingest and query speed. The basic principles can be categorised as follows:

  • A higher number of tags will create a higher number of datasources, and this will increase the number of minisegments and segments in general. This increases the storage size on disk and increases the memory required by the Global Database to store the segment map.

  • A lower number of tags increases the number of segments that need to be accessed when searching, which may reduce performance.

Information is processed from the Kafka ingest queue to an individual digest node, which is then responsible for writing the information for a given datasource. In this situation, it is possible for a low-number of datasources to create a high volume of ingest for a given datasource and this can lead to performance delays during ingest, as seen in the figure below:

%%{init: {"flowchart": {"defaultRenderer": "elk"}} }%% graph LR subgraph KQ ["Kafka Queues"] direction LR KQ1["Ingest Queue Partition 1"] KQ2["Ingest Queue Partition 2"] KQ3["Ingest Queue Partition 3"] KQ4["Ingest Queue Partition 4"] end subgraph D1 ["Digest Node 1"] direction LR DP1["Digest Partition Worker 1"] DS1A["Datasource 1A"] DS2A["Datasource 2A"] DS3A["Datasource 3A"] DP2["Digest Partition Worker 2"] DS1B["Datasource 1B"] DS2B["Datasource 2B"] DS3B["Datasource 3B"] end subgraph D2 ["Digest Node 2"] direction LR DP3["Digest Partition Worker 2"] DS1C["Datasource 1C"] DS2C["Datasource 2C"] DS3C["Datasource 3C"] end KQ1 --> DP1 KQ2 --> DP2 KQ3 --> DP3 DP1 --> DS1A DP1 --> DS2A DP1 --> DS3A DP2 --> DS1B DP2 --> DS2B DP2 --> DS3B DP3 --> DS1C DP3 --> DS2C DP3 --> DS3C

Increasing the number of tags, or picking a different value to use for tags, may help to distribute the informaiton more effectively across the digest nodes and therefore alleviate contention during ingest.