Configure Log Shippers

Part of our Using Your Data series:

If you haven't read the previous pages, you may want to start at the beginning. If you've already read them, you're now ready to start feeding data into your LogScale temporary, trial repository.

Log Shippers

To have your server send data to a repository in LogScale, you will need a utility on your server that will read the server's data, package it and send it properly to LogScale Cloud. Software that performs this function is known as a log shipper. It could be an application specifically made for this purpose, or a utility that has multiple uses. You could even write a custom script that will achieve the same results.

A log shipper will typically read entries made to logs, but it might listen on specific ports and record the traffic. It may also collect metrics by some other method. You have to decide what to use to ship your data based on your needs and the log shipper's capabilities. Whatever methods it uses, it'll then send the logs and metrics it collects to LogScale.

For learning purposes, for the purposes of this tutorial, let's try two log shippers that are free, and easy to install and configure. We'll use one that will read Apache HTTP log files on a server that hosts a web site on the internet. For this, we'll use vector. The other log shipper we'll use, rsyslog will collect server metrics. Let's go through how to install and configure these two log shipper utilities.

Vector for HTTP Logs

The vector utility will collect entries from a server's log files. It can then route those entries to LogScale Cloud for storage in a repository. It can do much more, but we'll use it only to get the Apache HTTP log entries. We're assuming you have Apache web service running on your server.

Depending on your system, you can use a package management tools like, yum or apt-get to install vector on your server. See the Vector Documentation for more on installing vector and to get the right repository. To be sure you have the latest version, visit Vector Downloads.

After installing vector, edit the configuration file, vector.toml in the /etc/vector/ sub-directory, with a simple text editor. The data directory should be set already; don't change it. You'll need to add or modify two sections: one for input data, known as sources; and another for output, known as sinks. Below is an example of how the configuration file should look:

data_dir = "/var/lib/vector"

# Input Data
include = ["/var/log/httpd/*"]
type = "file"

# Output Data
inputs = ["apachelogs"]
type = "humio_logs"
encoding.codec = "json"
host = ""
token = "01a23456-7b89-0123-c456-d7e8f9012f3g"

In the Input Data section, the header reads, sources.apache_logs. That has to start with sources, but apachelogs could be whatever you want to label it. The line for include lists within square brackets the log file paths and names for vector to read. We're using the wildcard to get all of the files in /var/log/httpd/. The only other line needed is the type of input: in this case, they're files*.

LogScale Cloud's particulars are in the Output Data section. The header must start with sinks., but can end with whatever you want. We chose humiocloud. The value for input has a value, apachelogs that corresponds to the input source. The type specified will need to be humio_logsvector knows LogScale. Next is how to encode the data it will send to LogScale. We prefer json files.

For the host, give it the URL to LogScale Cloud. Last is the authentication token. Don't use the one shown in the example here. Use the one you generated earlier in your LogScale Cloud account for your trial repository. Go back to that page and click on the Copy icon for the my_vector ingest api token. Then paste it into the vector.toml.

When you're done, save the file and start vector. You would do something like the following:

systemctl start vector
journalctl -fu vector

The second line here is to see if vector is running without errors. If you see error messages saying permission denied, check the ownership and permissions for the data directory (i.e., data_dir), and the inputs directories and files. Vector will need to have at least read and execute access to those directories and files. Restart vector after resolving a problem.

rsyslog for Server Metrics

Let's add another log shipper, the utility rsyslog. It's similar to vector, but we'll use it to get server metrics to send to LogScale Cloud. To start, install rsyslog, along with Elasticsearch Output Module with a package management tool (e.g., yum). The Elasticsearch Output Module provides native support for logging to LogScale, using the Elasticsearch ingest API, which is one of several ingest APIs supported by LogScale:

yum install rsyslog rsyslog-elasticsearch

Now, add a configuration file to the /etc/rsyslog.d/ sub-directory with a simple text editor. Name it, 33-humio.conf and copy and paste these lines into that file:

template(name="humiotemplate" type="list" option.json="on") {
    constant(value="\"@timestamp\":\"") property(name="timereported" dateFormat="rfc3339")
    constant(value="\",\"message\":\"") property(name="msg")
    constant(value="\",\"host\":\"") property(name="hostname")
    constant(value="\",\"severity\":\"") property(name="syslogseverity-text")
    constant(value="\",\"facility\":\"") property(name="syslogfacility-text")
   constant(value="\",\"syslogtag\":\"") property(name="syslogtag")
   constant(value="\",\"name\":\"") property(name="programname")
   constant(value="\",\"pid\":\"") property(name="procid")
`.` action(type="omelasticsearch"

The first line above calls the module you'll need. It comes from the elasticsearch you installed. Next is a template, humiotemplate for organizing the data to send to LogScale Cloud. Don't make any changes to this template.

In the next section, the type and server values need to be as you see them here. However, you'll need to set the uid to the name of your repository on LogScale Cloud. The value for the pwd, or password needs to be changed to the ingest token you created (i.e., my_rsyslog), the one you associated with the syslog parser. You can leave the other two settings as they are above.

When you're finished editing the configuration file, save and close it. Then start the rsyslog service with systemctl. You might then use journalctl to see if there are any errors, as you did with vector earlier. This utility is pretty straightforward, though, so you shouldn't have any errors. If you do, it will probably be because of a typing mistake or the like in the configuration file related to connecting to the LogScale Cloud.

Check Data Sources on Cloud

Check Data Sources on Cloud

Figure 12. Check Data Sources on Cloud

After you've installed, configured and gotten both vector and rsyslog running and sending data to LogScale Cloud, go back to your browser window to see if there's data in your new repository. To verify this, on the Settings page, click on Data Sources in the left margin (see the screenshot in Figure 12, “Check Data Sources on Cloud”.

It may take a few minutes, but you should see at least two data sources listed, similar to how it looks in the screenshot here. Depending on your server and how you have it's configured, the size of those data source files may grow over time. Notice that there is an Original Size for each data file and the Storage Size for each. This is because LogScale is compressing the data. To get a closer look at this screenshot or any other one in this tutorial, just click on it.

The other way to check that data is getting to the repository is to click on Search in the menu at the top. That will show the raw data it's received and allow you to be able to search it, like you did in the interactive tutorial.

In the next page of this tutorial, we'll look at how to search the repository. We'll get you to do some things on your server and then check to see that entries for those actions or events appear shortly afterwards in the repository. In case you're curious as to where we're ultimately heading,in the last page of this tutorial, we'll put together a dashboard of the searches you'll enter next.

Part of our Using Your Data series: