Repositories

A repository stores a collection of events. There are no restrictions or requirements on the repository, and you do not need to define a structure or provide LogScale with any expectation of the format, size or complexity of the data. Repositories allow you to organize and structure your data. For example:

  • Repositories for each source of data, for example system logs, application logs, web logs

  • Repositories for a group of servers, regardless of the log format

  • Repositories per source log file, syslog, access log, error log, security log

The chosen repository structure and organization is important because when querying the data, the query is based in one repository. Other repositories can be included, for example using joins, but the basis for the query is always a single repository.

What repositories do allow you to control is:

  • Data Retention

    LogScale data is stored according to the timestamp of the data or when it was ingested. Typically when processing and querying log data you are looking for immediate or recent events, rather than historical data. To save on storage, LogScale can be configured to expire events from the repository according to their age. For example, for a security log you might configure to only store 30 days of information. For an audit log which may be needed for legal purposes you might configure a retention period of months or years.

  • User permissions and access

    LogScale uses a role-based authentication system that allows fine grained control on accessing and using the data in a repository whether through the UI or an API. You can for example allow a user to ingest data, but not read it, or read data but not manage it.

  • Ingestion and parsing of data

    Ingestion of data can be performed through a number of different APIs, allowing for ingestion of raw log lines or structured data. Data can also be ingested directly from Amazon S3.

Views

An additional data abstraction layer is available, called the view. Views do not store their own data. Instead, a view aggregates data from multiple repositories to be queried at the same time. These repositories can be local to the current LogScale instance, or you can use Multi-Cluster Search to access data across multiple LogScale clusters.

Views enable you to set the access permissions individually for each view and also filter the events from the source repository. For example, if you have a repository that contains HTTP access logs for all your web servers, you could create a view that displays only a subset of these, and then apply different access permissions to the view to allow limited access to the data set.

The diagram below shows different repository sources aggregated into a number of views which are then queried through individual dashboards.

%%{init: {"flowchart": {"defaultRenderer": "elk"}} }%% graph TD subgraph Dashboards SD[Security Dashboard] MD[Metrics Dashboard] OD[Operational Dashboard] end subgraph Views SV[Security View] MV[Metrics View] OV[Operational View] end subgraph Repositories S[Sensors] D[Detections] L[System Logs] M[Metrics] A[Application Logs] end SD-->SV MD-->MV OD-->OV SV-->S SV-->D MV-->M MV-->L OV-->L OV-->A

In all other respects, a view operates like a repository allowing users to run queries and build dashboards to view the data.

Sandbox Repository

Some accounts have a special private repository called sandbox. The sandbox repository is unique to each user and can be used for learning or testing but has fixed limits.

Each sandbox repository is limited, with retention settings are 7 days, 14 GB for ingest size and 3 GB for storage size.

System Repositories

LogScale stores its own log files, metadata, auditing and metric information using the same repository model as that used for user-derived data. This allows the information about LogScale to be queried and accessed using the same query system and for dashboards, alerts and actions to be configured.

The following repositories may exist in your LogScale deployment, with the exact list being dependent on the type of deployment (cloud or self-hosted) and licensing model in place. System repositories can include any of the following:

For more information on system repositories and their contents, see LogScale System Repository Schema Guide.