The linReg() function calculates a linear relationship between two variables by using least-squares fitting.

The function creates this relationship between x and y variables:

logscale
y = slope * x + intercept

The result is outputted in fields named _slope and _intercept — unless a different prefix than _ is specified.

Also outputted is the adjusted R-squared value in a field named _r2 and the number of data points in a field named _n.

These four key values indicate relationship strength and reliability.

Note that if all x values are the same or if all y values are the same, then the function cannot calculate results, therefore, nothing is outputted.

ParameterTypeRequiredDefault ValueDescription
prefixstringoptional[a] _ Prefix for all output field names.
xstringrequired   Specifies the field name that contains the independent variable.
ystringrequired   Specifies the field name that contains the dependent variable.

[a] Optional parameters use their default value unless explicitly set.

linReg() Examples

Click + next to an example below to get the full details.

Calculate Relationship Between X And Y Variables - Example 1

Calculate the linear relationship between message size and transmission time using the linReg() function

Query
logscale
linReg(x=bytes_sent, y=send_duration)
Introduction

In this example, the linReg() function is used to calculate the linear relationship between bytes_sent (x variable) and send_duration (y variable). The example shows the relationship between message size (bytes sent in a server) and transmission time (time to send the bytes).

Example incoming data might look like this:

@timestampbytes_sentsend_duration
2025-04-07 13:00:0010240.15
2025-04-07 13:00:0120480.25
2025-04-07 13:00:0240960.45
2025-04-07 13:00:0381920.85
2025-04-07 13:00:045120.08
2025-04-07 13:00:05163841.65
2025-04-07 13:00:0630720.35
2025-04-07 13:00:0761440.65
2025-04-07 13:00:08102401.05
2025-04-07 13:00:0946080.48
Step-by-Step
  1. Starting with the source repository events.

  2. logscale
    linReg(x=bytes_sent, y=send_duration)

    Correlates bytes_sent with send_duration, showing the relationship between message size and transmission time (the variables x and y) and outputs the results in fields named _slope (slope value),_intercept (intercept value),_r2 (adjusted R-squared value), and _n (number of data points). These four key values indicate relationship strength and reliability.

  3. Event Result set.

Summary and Results

The query is used to calculate a linear relationship between bytes_sent (x variable) and send_duration (y variable).

Calculating the relationship between size of data transferred and time taken to send data is useful, for example, in trend analysis, performance monitoring, or anomaly detection.

Sample output from the incoming example data:

_slope_intercept_r2_n
9.823069852941172E-50.042764705882353260.999689733689508110

_slope is the additional time needed per byte sent.

_intercept is the baseline transmission time.

_r2 is the statistical accuracy of the linear model.

_n is the total number of data points analyzed.

Calculate Relationship Between X And Y Variables - Example 2

Calculate the linear relationship between server load and total response size using the linReg() function with bucket()

Query
logscale
bucket(function=[ sum(bytes_sent, as=x), avg(server_load_pct, as=y) ])
| linReg(x=x, y=y)
Introduction

In this example, the linReg() function is used to calculate the linear relationship between bytes_sent (x variable) and server_load_pct (y variable). The example shows the relationship between server load percentage and total response size across time.

Example incoming data might look like this:

@timestampbytes_sentserver_load_pct
2024-01-15T09:00:00Z15678045.2
2024-01-15T09:05:00Z23456752.8
2024-01-15T09:10:00Z18923448.6
2024-01-15T09:15:00Z34567865.3
2024-01-15T09:20:00Z12345642.1
2024-01-15T09:25:00Z27890158.7
2024-01-15T09:30:00Z19876551.4
2024-01-15T09:35:00Z28765459.2
2024-01-15T09:40:00Z16789046.8
2024-01-15T09:45:00Z29876561.5
Step-by-Step
  1. Starting with the source repository events.

  2. logscale
    bucket(function=[ sum(bytes_sent, as=x), avg(server_load_pct, as=y) ])

    Buckets the data points by time, then calculates the sum of bytes sent for each bucket returning the result in a field named x, and calculates the average server load percentage for each bucket returning the result in a field named y.

  3. logscale
    | linReg(x=x, y=y)

    Correlates x with y, showing the relationship between the variables x and y and outputs the results in fields named _slope (slope value),_intercept (intercept value),_r2 (adjusted R-squared value), and _n (number of data points). These four key values indicate relationship strength and reliability.

  4. Event Result set.

Summary and Results

The query is used to calculate a linear relationship between bytes_sent (x variable) and server_load_pct (y variable).

Calculating the relationship between server load percentage and total response size is useful to identify different operational patterns, such as, for example, performance bottlenecks, resource allocation issues, or to identify system optimization opportunities.

Sample output from the incoming example data:

_slope_intercept_r2_n
0.0001061752555719315828.9340981114079380.99117236733683510

_slope is the rate of change between server load and response size.

_intercept is the baseline relationship value.

_r2 is the statistical accuracy of the linear model.

_n is the total number of data points analyzed.

Calculate Relationship Between X And Y Variables - Example 3

Calculate the linear relationship between server load and each of several types of request types using the linReg() function with bucket() and groupBy()

Query
logscale
bucket(function=[ avg(server_load_pct, as=y), groupBy(request_type, function=count(as=x)) ])
| groupBy(request_type, function=linReg(x=x, y=y))
Introduction

In this example, the linReg() function is used to calculate the linear relationship between request_type (x variable) and server_load_pct (y variable). The example shows the relationship between server load and each of several types of HTTP request types across time.

Example incoming data might look like this:

@timestampserver_load_pctrequest_type
2024-01-15T09:00:00.000Z45.2GET
2024-01-15T09:00:00.000Z45.2POST
2024-01-15T09:00:00.000Z45.2GET
2024-01-15T09:05:00.000Z52.8GET
2024-01-15T09:05:00.000Z52.8PUT
2024-01-15T09:05:00.000Z52.8POST
2024-01-15T09:10:00.000Z48.6GET
2024-01-15T09:10:00.000Z48.6GET
2024-01-15T09:10:00.000Z48.6DELETE
2024-01-15T09:15:00.000Z65.3POST
2024-01-15T09:15:00.000Z65.3POST
2024-01-15T09:15:00.000Z65.3GET
2024-01-15T09:20:00.000Z42.1GET
2024-01-15T09:20:00.000Z42.1PUT
2024-01-15T09:20:00.000Z42.1GET
Step-by-Step
  1. Starting with the source repository events.

  2. logscale
    bucket(function=[ avg(server_load_pct, as=y), groupBy(request_type, function=count(as=x)) ])

    Buckets the data points by time, then calculates the average server load for each time bucket returning the result in a field named y. It also groups the request types in a field named request_type and makes a count of requests by type in each time bucket returning the result in a field named x.

  3. logscale
    | groupBy(request_type, function=linReg(x=x, y=y))

    Correlates x with y, showing the relationship between the variables x and y for each HTTP request type and outputs the results in fields named _slope (slope value),_intercept (intercept value),_r2 (adjusted R-squared value), and _n (number of data points). These four key values indicate relationship strength and reliability.

  4. Event Result set.

Summary and Results

The query is used to analyze how different HTTP request types affect server load. The analysis helps identify which HTTP request types have the strongest impact on server performance.

Sample output from the incoming example data:

request_type_slope_intercept_r2_n
DELETE<no value><no value><no value><no value>
GET-13.74999999999994172.79999999999990.59418245743135925
POST16.2999999999999232.700000000000120.71962072424842383
PUT<no value><no value><no value><no value>

_slope is the impact rate of request volume on server load.

_intercept is the baseline server load when there are no requests of a specific type.

_r2 is the statistical accuracy of the relationship.

_n is the total number of data points analyzed.