Node Identifiers
Hosts in LogScale are tracked via unique integral identifiers assigned to each physical node in the cluster. These identifiers are known as vhosts.
For example, a cluster with 3 physical hosts may identify their hosts as vhost 1, 2 and 3.
Vhosts are used in a number of places within LogScale when referring to specific hosts. Examples of uses include:
- Tracking segment replication 
- Deciding host partition assignment for processing log data from the ingest queue 
- Assigning subsets of hosts to run particular tasks 
- Identifying the logging node when analyzing LogScale debug logs 
Since vhosts are supposed to identify hosts uniquely, it is important that each host has one, and that they are not shared. Since they are often repeated in many places, they are kept short, which is why UUIDs are not used directly.
      A LogScale node booting for the first time will generate a host
      UUID, which is written to the
      cluster_membership.uuid file.
    
The UUID is associated with a vhost via global, which then uniquely identifies the node.
Both the vhost and UUID are written to global so LogScale can detect if multiple hosts try to use the same vhost.
While the UUID doesn't change, the vhost assigned to a host will not change, unless other hosts are manually configured to use the same vhost.
Cluster administrators can control vhost assignment directly in two ways:
- iniBOOTSTRAP_HOST_ID=5will make the node use vhost 5. This is useful if the administrator can easily enumerate the nodes in the cluster, and they wish to manually assign vhosts to nodes.
- iniBOOTSTRAP_HOST_UUID_COOKIE=xyzwill make the node use the UUID xyz. This can be useful if the administrator can assign fixed IDs to each node in the cluster, but can't easily generate numeric identifiers for them.
Important
        For both options described above, configuration values must be unique
        across the cluster. Assigning two hosts the vhost
        5 will likely cause crashing until
        the configuration is corrected.
      
Administrators may also opt to let LogScale assign vhosts automatically. The assignment logic has the following properties:
- A host with an empty disk will get a fresh vhost number, which is unlikely to have been used by other hosts recently. 
- A host that is currently a cluster member will always regain its old vhost number, as long as it still has its - cluster_membership.uuidfile.
- A host that is not currently a cluster member but is rejoining can regain its old vhost number, as long as it kept its disk contents. 
      The mechanism described above works well for clusters where nodes can keep
      their disks, since the
      cluster_membership.uuid file is
      retained over time.
    
In order to also support running LogScale on systems like Kubernetes, where disks may occasionally be wiped, we have automated some routine cleanup that must happen when nodes join and leave the cluster over time.
      Nodes that have ephemeral disks should be configured with
      USING_EPHEMERAL_DISKS set to true, or
      use a NODE_ROLES setting that cannot store segments. This
      will cause LogScale to consider the node ephemeral, and therefore
      eligible for automatic removal from the cluster if it goes offline for too
      long.
    
If an ephemeral node is offline for too long, a periodic task will unregister it from the cluster, and clean up any references to the associated vhost.
The job logs when hosts are removed, the logs can be found using the query in the humio repository:
class=*DeadEphemeralHostsDeletionJob*
      The delay before removing hosts can be adjusted via the
      GracePeriodBeforeDeletingDeadEphemeralHostsMs
      dynamic configuration in the GraphQL API; it controls how long an
      ephemeral node is allowed to be offline before some other node might
      unregister it from the cluster.
    
Note
        Do not reduce the delay specified in
        GracePeriodBeforeDeletingDeadEphemeralHostsMs
        below the default setting of 2 hours. Removing a host
        can be very expensive for a host with many segments, therefore it is not
        recommended unless strictly necessary.
      
If a host is unregistered from the cluster but retains its UUID and local disk, it can rejoin the cluster later and reacquire the vhost it had previously.
      Since a host may get a new vhost when a disk is wiped, cluster
      administrators for clusters using nodes where
      USING_EPHEMERAL_DISKS is set to true
      will need to ensure that the storage and digest partitioning tables are
      kept up to date as hosts join and leave the cluster.
    
Updating the tables is handled automatically if using the LogScale Kubernetes operator, but for clusters that do not use this operator, cluster administrators should create and run scripts periodically to keep the storage and digest tables up to date.
      The cluster() GraphQL query can provide updated
      tables (the
      suggestedIngestPartitions and
      suggestedStoragePartitions
      fields contain these), which can then be applied via the
      updateIngestPartitionScheme()" and
      updateStoragePartitionScheme()
      GraphQL mutations.
    
In order to ensure a vhost refers uniquely to one host, and to allow unregistered nodes to rejoin easily, vhosts are not reused frequently, even if the previous owner of a vhost is no longer registered in the cluster.
Automatic vhost assignment assigns within the range 1-10000, starting at 1 and using each number only once. Over time this pool may be exhausted as hosts join and leave. When this happens, assignment will again start at 1, and vhosts that are no longer used by registered hosts will be open for reuse. This should make it very unlikely that a given vhost is reused within a short timeframe.