Best Practice: Comparing Repos and Views
Last Updated: 2022-03-11
Humio organizes data into Repositories. Views are a layer of abstraction that sit on top of a repository, or combination of repositories. The following sections describe some best practices related to repositories and views.
The decision to create a repository is influenced by several factors including:
User access control
The primary method of controlling use access to data is at the repository level. Organizations can give users access to repositories via groups (whether created in Humio or inherited from their organization's SSO provider) and groups have roles associated with them that govern the way that users can interact with the data in the repository.
Data Retention is set at the repository level and all data sources stored in a single repository will have the same level of retention.
Repositories hold collections of saved queries, alerts, scheduled searches, actions, parsers, and files. Any user with access to the repository will have access to the content in that repository.
The easiest approach in terms of repository management is to create a single repository for all of an organization's data sources. This approach works when the following conditions exist:
All data sources have the same retention requirements;
The total number of data sources (Data Sources) in the repository is fewer than 10,000 (see Data Sources for more information).
Access control to the data within the single repository can be controlled universally at the repository level or through the use of Views (see Views for more information).
As covered above the two most common reasons that would require an organization to create additional repositories include:
Implementing different levels of retention;
A requirement to create more than 10,000 total data sources.
While it isn't possible to apply different levels of retention in a single repository it is possible to design data sources in most cases to avoid generating too many to exist in a single repository (see Data Sources for more information).
A data source in Humio is identified by its unique combination of tags
(Event Tags) or fields denoted by the pound, or
hash, character (
#). In the screenshot below the
selected event has two tags:
#repo = aws_vpcflow and #type = vpcflow_raw.
Figure 273. Event fields with tags
#repo tag for each event in a given repository
will be the same, e.g. the name of the repository. In this case the
repository is named
aws_vpcflow so the tag value
aws_vpcflow. The #type tag for each is set to
the name of the parser used to parse the event. In the example above
the event was parsed with the
You can see the data sources in a repository in
--> Data Sources as illustrated in the following
Figure 274. Data source list
Notice that in the listing of data sources Humio only does not show the #repo tag as it will be the same for every data source.
Data sources are extremely important to Humio because they determine
how data is physically stored within the platform. Humio represents
each data source as a unique directory within the repository's
directory creating separate storage locations on disk for each new
data source. Restricting searches to specific data sources using the
appropriate tags as search filters (e.g.
"vpcflow_raw") can significantly increase the performance of
searches as this minimizes the need for Humio to traverse all of the
data sources associated with a repository.
There are a couple of important considerations to take into account when thinking about tags and data sources (see the following blog post for more details):
At about 10,000 events per second per data source Humio can no longer sequentially process incoming events. Before this happens Humio implements a process called auto sharding that adds tags to a data source (e.g.
#humioAutoShard=2, etc.) to split the data source into manageable chunks. Each new tag creates a new data source. This means that if the initial data source has the following tags
#repo = aws_vpcflowand
#type = vpcflow_rawand Humio needs to create three auto shards to manage the data velocity that one data source is now actually four data sources as illustrated in the table below:
Humio has a programmatic limit of 10,000 data sources per repository. This limit is designed to prevent issues related to having too many WIP (Work in Progress) buffers (system memory limitations) and too many directories (host operating system limitations).
If you have additional questions about data sources and how they
affect the way that your organization uses repositories please contact
you Humio Sales Engineer or Humio Technical Support
In Humio a View is a type of repository that contains no data of its own. A view is created by connecting one or more repositories as illustrated in the screenshot below:
Figure 275. View Configuration
Views offer the following benefits:
Views allow you to connect multiple repositories to enable searching across them as if they were a single repository;
Views allow you to provide users with access to data in repositories customized to their specific needs. For example, in a scenario where an organization has one repository for all data sources users can be given access to their data sources exclusively using a view Event Filter feature (e.g.
#type = "vpcflow_raw"). This use of views also helps to keep the content (events, queries, alerts, dashboards, files, etc) associated with specific data sources separate from other data sources (user groups) since anyone with access to a repository or view has access to all of the content.
See Creating a View for more details on how to implement views.