LogScale Logical Architecture

LogScale operates as a cluster of three or more nodes that work together to store, organize, and query the data. Nodes have specific roles within the cluster, and each node may have multiple roles, each of which support the operations of the cluster.

The overall logical architecture for these components of a LogScale cluster can be seen in the diagram below. This covers the main logical components involved in getting data into and out of LogScale:

graph LR; L1[Log shipper] L2[Log shipper] L3[Log Collector] C1[Client] C2[Client] subgraph LoadBalancer direction LR LB[LB Node] LBB[LB Node] LBC[LB Node] end IN[[Ingest Nodes]] L2 --Ingest Data--> LoadBalancer L1 --> LoadBalancer L3 --> LoadBalancer LoadBalancer --> IN subgraph "LogScale" direction LR KQ[Kafka Queue] QCN[Query Coordination Nodes] UI[UI/API Nodes] DN[Digest Nodes] SN[Storage Nodes] GD[Global Database] end BS((Bucket Storage)) C1 --Query Requests--> LoadBalancer C2 --UI/API Requests--> LoadBalancer LoadBalancer --> QCN LoadBalancer --> UI IN --> KQ KQ --> DN QCN <--> UI QCN --Internal Query Requests--> DN & SN DN --Merged Segments--> SN DN <--Segments--> BS SN <--Segments--> BS click IN "#training-arch-in" "Ingest Nodes" click DN "#training-arch-dn" "Digest Nodes" click SN "#training-arch-sn" "Storage Nodes" click QCN "#training-arch-qcn" "Query Coordination Nodes" click UI "#training-arch-ui" "UI Nodes" click GD "#training-arch-fd" "Global Database" click BS "#training-arch-bs" "Bucket Storage"
  • Ingest Nodes

    Ingest nodes read and parse the incoming data and identify individual fields and optionally tag and identify the data so that it can be more organized to optimize searching. Ingestion is designed to be fast so that data can be ingested and processed.

    For details on the overall ingest process, see Ingesting Data Flow. A more detailed description is available in Ingesting Data.

  • Digest Nodes

    Digest nodes take the ingested data and convert that to the store format used by the rest of the LogScale so that the data can be easily searched.

    Digestion of data is an internal process that performs the operation process of physically storing the data on disk or bucket storage. For information on the digest process, see Ingestion: Digest Phase.

  • Storage Nodes

    Storage nodes provide access to the stored data that, including executing the access to the data during a search request.

    Storage nodes handle both the storage of information but also the recovery of data when running a query on historical information. For details on how LogScale stores data, see Ingestion: Storage Phase.

  • Query Coordination Nodes

    Query coordination nodes process a query and distribute the requests for the data to data storage nodes to access the data and return the query data. The query coordinators assemble, format and aggregate the data.

    Query coordination nodes handling searching. For detailed information on the search process, see Search Architecture.

  • UI Nodes

    The UI nodes support the web interface to the underlying data. This includes a query interface, widgets and dashboards that query, summarize and display the data.

  • Bucket Storage

    LogScale stores data in files called segments and uses and caches as much of this information as possible to ensure the highest performance when querying data. Because the volumes of data can be so large, having enough local storage (for example a local SSD or NVME drive) would be impractical. For this reason, LogScale can optionally store data on bucket storage, including Amazon S3, Google Cloud Storage, or MinIO. When using bucket storage, LogScale writes the data locally and to the bucket store. Older data in the local is expired, but if a query is executed that needs data from bucket storage, the segments are copied back from bucket storage for processing. This enables LogScale to store and process petabytes of information in an efficient and performant manner.

    For more information on bucket storage, see Ingestion: Storage Phase.

  • Global Database

    A shared store of key cluster information, including cluster members, segment files, and the distribution of data. The management and update of the information is exchanged between nodes by using a Kafka queue with the changes. Because the global database includes the reference location information for all the stored data, the global database is also backed up periodically to bucket storage.

    For more information on the global database, see Global Database.

In addition to the logical components noted above, there are also some elements of a LogScale cluster that are not directly displayed within this logical architecture. Some are covered in Additional Components.

When understanding the Logical architecture of the LogScale cluster there are some key principles to keep in mind:

  • Access to cluster should be via a Load Balancer, so that ingestion, queries and UI access can be spread across the different nodes in the cluster while presenting a single IP address for accessing the entire cluster.

  • Clusters can be scaled by adding further nodes. Adding nodes increases the potential for ingest and query capacity, but also implies additional management load.

  • Data is distributed, and duplicated, across the cluster to improve performance and support high availability. Node unavailability, for example during maintenance, upgrades or failure, are mitigated through this data and workload distribution.

  • LogScale stores its own logs and metadata within LogScale itself. This means, to understand the operation or performance of LogScale you can query and monitor LogScale by querying these internal data stores.

  • All data is timestamped. If a timestamp cannot be determined from the incoming data (for example a log file), the time of ingest is used as the timestamp.

  • The smallest data structure within LogScale is the event, this is a single, timestamped, entry within the LogScale data and typically refers to a single row or record of information from the source input logs. Events are used internally by LogScale to exchange information. A repository is a collection of timestamped events, and when searching, data is exchanged as a list of events between each section.

  • All event data is stored using the identified fields of the data and also the raw incoming string. This allows for data to be filtered on explicit fields but doesn't limit the searchable data to this structured information.

  • LogScale uses a special query language, LogScale Query Language (LQL), that is modeled on the Unix-shell principle of pipes. A list of events is processed through a language fragment that filters, augments, or aggregates the list of events from the previous step, with each step connected by a pipe.

These principles support the efficient operation of the LogScale cluster and have an impact on the basic use and deployment of the cluster.