Calculate Relationship Between X And Y Variables - Example 2

Calculate the linear relationship between server load and total response size using the linReg() function with bucket()

Query

logscale
bucket(function=[ sum(bytes_sent, as=x), avg(server_load_pct, as=y) ])
| linReg(x=x, y=y)

Introduction

The linReg() function can be used to calculate a linear relationship between two variables by using least-squares fitting. The function is used to analyze different performance relationships in a system, for example: response size and transmission time, server load and total response size, or server load and request types.

In this example, the linReg() function is used to calculate the linear relationship between bytes_sent (x variable) and server_load_pct (y variable). The example shows the relationship between server load percentage and total response size across time.

Example incoming data might look like this:

@timestampbytes_sentserver_load_pct
2024-01-15T09:00:00Z15678045.2
2024-01-15T09:05:00Z23456752.8
2024-01-15T09:10:00Z18923448.6
2024-01-15T09:15:00Z34567865.3
2024-01-15T09:20:00Z12345642.1
2024-01-15T09:25:00Z27890158.7
2024-01-15T09:30:00Z19876551.4
2024-01-15T09:35:00Z28765459.2
2024-01-15T09:40:00Z16789046.8
2024-01-15T09:45:00Z29876561.5

Step-by-Step

  1. Starting with the source repository events.

  2. flowchart LR; %%{init: {"flowchart": {"defaultRenderer": "elk"}} }%% repo{{Events}} 0{{Aggregate}} 1{{Aggregate}} result{{Result Set}} repo --> 0 0 --> 1 1 --> result style 0 fill:#ff0000,stroke-width:4px,stroke:#000;
    logscale
    bucket(function=[ sum(bytes_sent, as=x), avg(server_load_pct, as=y) ])

    Buckets the data points by time, then calculates the sum of bytes sent for each bucket returning the result in a field named x, and calculates the average server load percentage for each bucket returning the result in a field named y.

  3. flowchart LR; %%{init: {"flowchart": {"defaultRenderer": "elk"}} }%% repo{{Events}} 0{{Aggregate}} 1{{Aggregate}} result{{Result Set}} repo --> 0 0 --> 1 1 --> result style 1 fill:#ff0000,stroke-width:4px,stroke:#000;
    logscale
    | linReg(x=x, y=y)

    Correlates x with y, showing the relationship between the variables x and y and outputs the results in fields named _slope (slope value),_intercept (intercept value),_r2 (adjusted R-squared value), and _n (number of data points). These four key values indicate relationship strength and reliability.

  4. Event Result set.

Summary and Results

The query is used to calculate a linear relationship between bytes_sent (x variable) and server_load_pct (y variable).

Calculating the relationship between server load percentage and total response size is useful to identify different operational patterns, such as, for example, performance bottlenecks, resource allocation issues, or to identify system optimization opportunities.

Sample output from the incoming example data:

_slope_intercept_r2_n
0.0001061752555719315828.9340981114079380.99117236733683510

_slope is the rate of change between server load and response size.

_intercept is the baseline relationship value.

_r2 is the statistical accuracy of the linear model.

_n is the total number of data points analyzed.