Calculate Relationship Between X And Y Variables - Example 3

Calculate the linear relationship between server load and each of several types of request types using the linReg() function with bucket() and groupBy()

Query

logscale
bucket(function=[ avg(server_load_pct, as=y), groupBy(request_type, function=count(as=x)) ])
| groupBy(request_type, function=linReg(x=x, y=y))

Introduction

The linReg() function can be used to calculate a linear relationship between two variables by using least-squares fitting. The function is used to analyze different performance relationships in a system, for example: response size and transmission time, server load and total response size, or server load and request types.

In this example, the linReg() function is used to calculate the linear relationship between request_type (x variable) and server_load_pct (y variable). The example shows the relationship between server load and each of several types of HTTP request types across time.

Example incoming data might look like this:

@timestampserver_load_pctrequest_type
2024-01-15T09:00:00.000Z45.2GET
2024-01-15T09:00:00.000Z45.2POST
2024-01-15T09:00:00.000Z45.2GET
2024-01-15T09:05:00.000Z52.8GET
2024-01-15T09:05:00.000Z52.8PUT
2024-01-15T09:05:00.000Z52.8POST
2024-01-15T09:10:00.000Z48.6GET
2024-01-15T09:10:00.000Z48.6GET
2024-01-15T09:10:00.000Z48.6DELETE
2024-01-15T09:15:00.000Z65.3POST
2024-01-15T09:15:00.000Z65.3POST
2024-01-15T09:15:00.000Z65.3GET
2024-01-15T09:20:00.000Z42.1GET
2024-01-15T09:20:00.000Z42.1PUT
2024-01-15T09:20:00.000Z42.1GET

Step-by-Step

  1. Starting with the source repository events.

  2. flowchart LR; %%{init: {"flowchart": {"defaultRenderer": "elk"}} }%% repo{{Events}} 0{{Aggregate}} 1{{Aggregate}} result{{Result Set}} repo --> 0 0 --> 1 1 --> result style 0 fill:#ff0000,stroke-width:4px,stroke:#000;
    logscale
    bucket(function=[ avg(server_load_pct, as=y), groupBy(request_type, function=count(as=x)) ])

    Buckets the data points by time, then calculates the average server load for each time bucket returning the result in a field named y. It also groups the request types in a field named request_type and makes a count of requests by type in each time bucket returning the result in a field named x.

  3. flowchart LR; %%{init: {"flowchart": {"defaultRenderer": "elk"}} }%% repo{{Events}} 0{{Aggregate}} 1{{Aggregate}} result{{Result Set}} repo --> 0 0 --> 1 1 --> result style 1 fill:#ff0000,stroke-width:4px,stroke:#000;
    logscale
    | groupBy(request_type, function=linReg(x=x, y=y))

    Correlates x with y, showing the relationship between the variables x and y for each HTTP request type and outputs the results in fields named _slope (slope value),_intercept (intercept value),_r2 (adjusted R-squared value), and _n (number of data points). These four key values indicate relationship strength and reliability.

  4. Event Result set.

Summary and Results

The query is used to analyze how different HTTP request types affect server load. The analysis helps identify which HTTP request types have the strongest impact on server performance.

Sample output from the incoming example data:

request_type_slope_intercept_r2_n
DELETE<no value><no value><no value><no value>
GET-13.74999999999994172.79999999999990.59418245743135925
POST16.2999999999999232.700000000000120.71962072424842383
PUT<no value><no value><no value><no value>

_slope is the impact rate of request volume on server load.

_intercept is the baseline server load when there are no requests of a specific type.

_r2 is the statistical accuracy of the relationship.

_n is the total number of data points analyzed.