tags : Infrastructure,Database,Observability

PromQL

Alerting

Read

Doubts ⚠

Placement of by, are these the same?

sum by (job)(rate(http_requests_total{job="node"}[5m]))
sum (rate(http_requests_total{job="node"}[5m]))  by (job)
  • what are modifiers (offset, bool)

TODO Range Vector vs Range Selector vs Range Query?

  • Different things?
  • PromQL query and HTTP query different things obviously
TermsDescription
Range Vector
Range Vector Selector
Range Query (HTTP)
Range function
Instant Vector
Instant Vector Selector
Instant Query (HTTP)

Instant vector

Range Vector

Range Vector Selector

Range function

Instant vs Range Query (HTTP)

FAQ

Other Notes

TSDB vs AWS Cloudwatch

Data Model

# stores values over time where the identifier stays the same.
<identifier> -> [(t0,v0), (t1,v1), ...]
t0 = int64, miliseconds unix timestamps
v0 = float64
# each of the following are totally different time series
prometheus_http_requests_total{code="200", handler="/api/v1/query", instance="localhost:9090", job="prometheus"}
prometheus_http_requests_total{code="200", handler="/graph", instance="localhost:9090", job="prometheus"}
  • We use metric name with labels for the identifier
  • Vertical writes: In a short timeframe, we update every current active time series not necessarily at the same time but in one scrape cycle.
  • On disk: Each two-hour block consists of a directory containing one or more chunk files that contain all time series samples for that window of time, as well as a metadata file and index file.
  • index file: indexes metric names and labels to time series in the chunk files.

Functions & Operators

TypeSub TypesOperands (expression types)Label dropping
OperatorsBinaryScalar & Instant VectorsDoes not drop labels, unless it’s the match label
AggregationInstant vectorsdrops based on use of by / without
Functions-Can be anything depending on the function
Modifiers/keywords-depends, Eg. bool, on, ignoring, group-left etc.

Operators

Binary

  • Types
    • Arithmetic binary operators + - \* / % ^
    • Comparison binary operators = != > < >= <==
    • Logical/set binary operators and (intersection) | or (union) | unless (complement)
  • Works with both scalar and instant vectors; let & be some binary operator.
  • See official docs for proper info
Operation TypeEvaluationResultantImpact
Arithmetic (+,- etc.)Scalar & ScalarScalarScalar
Scalar & Instant VectorInstant vectorMetric name dropped
Instant Vector & Instant VectorInstant vectorMetric name dropped, non-matching entries dropped, group labels added
Comparison (==,!=,>= etc.)Scalar & ScalarScalarw bool: 0 or 1
Scalar & Instant VectorInstant vectorw/o bool: drop, w bool: 0 if false, 1 if true
Instant Vector & Instant VectorInstant vectorFilter, w/o bool: drop not matching, w bool: 0 if false, 1 if true
Logical (and, or, etc.)Scalar & ScalarDOES NOT APPLY
Scalar & Instant VectorDOES NOT APPLY
Instant Vector & Instant VectorInstant vectorResult depends on or, and / unless
  • Matching for instant vectors (on and ignoring)

    • When doing instant vector x instant vector operation, we need to do matching of LHS & RHS
      • i.e For an operation to happen between instant-vector (s), they must “match”. To make things match, we can use on and ignoring as needed.
      • Simply using on and ignoring handles 1-1 matches, which is OK in most cases.
      • If we want n-1 / 1-n we need to use the group-right, group-left modifiers.
    • matching happens based on two keywords: on and ignoring
      • These keywords are sort of part of the operation rather than the operand. i.e if you’re doing X / on(abc) Y it’s (X, / on(abc), Y). The on modifier says, only do / when things match on abc.
      • We provide the labels to match on to on/ignoring

Aggregation

  • only takes instant vectors as inputs and only return instant vectors as outputs. Eg. sum,min,max,stddev,stdvar

Modifiers

  • bool is a modifier, you usually use them right after the operatator (Eg. w/o bool: something==1 , w bool: something == bool 1)
  • on without a label param matches things on all

Functions

Takes argument of any promql type, gives output in any promql type.

rate(v range_vector)

  • It should be always used with counter variables as it calculates the per second increase of your counter in the specified time range.
  • It makes no sense of taking rate of a gauge variable.
  • It’s a good rule not never compare raw counters and always use rate(), rate makes use of all the datapoints returned by the range vector unlike delta because it returns the per-second average rate of increase of the time series in the range vector.
  • Use rate for alerts and slow-moving counters, a small dip in the rate can reset the FOR clause when alerting if using irate so prefer using rate.
  • irate=/instant rate is basically taking =rate of the last two samples; i.e looks back at the last two samples under a sliding window.
  • irate should only be used when graphing volatile, fast-moving counters. See comparison here.

When taking rate it’s advisable to take the range time-frame be 4x the scraping interval. Prometheus default scrape_interval is 15s so it should be minimum 1m because rate needs at least two data points to calculate the rate of increase.

Querying

There are terms such as Instant Queries,Range Queries, Instant and Range vector selectors, Offset Modifiers, Subqueries, the time, start and end query parameters, grafana’s $__interval, steps and resolutions, which can be pretty intimidating at first to understand how all these relate. The documentation actually explains everything nicely but here’s a summary.

There are basically 2 ways you can select time series, they are called the time series selectors;

  • instant vector selectors
  • range vector selectors.

The offset modifier can be used to get historical instant or range vectors which can be useful for alerts when comparing against the past.

Instant vs Range vectors (Incorrect)

Instant vector returns the most recent value for any time series. This is how are we getting an instant vector at any(past,present) point in time even if the scrapes are happening at specific intervals.

Range query basically just dumps samples back from the current instant. The timestamps for different timeseries will mostly be different but the it’s interesting to notice that the samples mostly differ by the scrape_interval.

The API

Request and Response combinations (INCORRECT)

  • THIS PART MIXES range query with range vector , this is super wrong. A range query can return an instant vector/scalar other things. i.e ValScalar ValVector ValMatrix ValString
  • See https://pkg.go.dev/github.com/prometheus/client_golang/api/prometheus/v1#API for QueryRange and its return type.
    • i.e range query and range vector have no direct relation
  • Following needs fixing, i’ll comeback later.
  • Instant Vector

    See the parameters here.

    endpointquerynotesresposeType
    /queryinstant vector with timeThe prometheus UI in console mode uses this.vector
    /queryinstant vector with time and offsetEven if offset is set and values will be different, the timestamp of an instant vector result is always that of the evaluation time.vector
    /query_rangeinstant vector with start, end and stepThe prometheus UI in graph mode and grafana uses this to get instant vectors to plot.matrix
    /query_rangeinstant vector with start, end , step and offsetSimilar to /query with offsetmatrix
  • Range Vector

    See the parameters here.

    endpointquerynotesresposeType
    /queryrange vector with time-matrix
    /queryrange vector with time and offsetThe timestamps shown will be from the time of the offsetmatrix
    /query_rangerange vector with start, end and stepNot Allowed-
    /query_rangerange vector with start, end , step and offsetNot Allowed-

Steps and Resolution

These are specific to range queries(/query_range).

Instant queries(/query) do not have a step parameter, as they are evaluated at a single point in time (the current time, or a custom time if set)

Resolution in promethus world can be measured in seconds, eg. 1ms resolution > 1s resolution. 1ms resolution will have way higher noise. When querying prometheus data, we can use the step parameter in /query_range to set our resolution, which then evaluates the given query at each step independent of the stored samples. So we can represent larger time ranges without being accurate enough.

Sometimes query step/query resolution/resolution step/evaluation step/=interval=(source) are used interchangbly. The=step=,=start=and=end= paratemeters determines how many points you will get back.

What if you specify a resolution of 5s for a time series scraped with scape_interval set to 1m, Prometheus doesn’t have more datapoints to evaluate!

This explanation is equally true for any value for the resolution. Sample timestamps for different time series can be at arbitrary intervals and not time-aligned with each other but you still need to be able to select multiple series and aggregate over them, so they need to be artificially aligned.

The way promql does this by having an independent evaluation interval (the resolution step that you chose) for the query as a whole, independent of the underlying details of the data. So scrape_interval=1m and step=5s, simply means that new data is put into the time series at every scrape_interval but query is evaluated at each step for every timeseries that is matched for that identifier. Functions like rate don’t know whether they’re being called as part of a range query, nor do they know what the step is.

promql engine basically runs an evaluation loop where we start at the range’s start timestamp, then increase the timestamp by interval on every iteration, and abort the loop when the timestamp becomes larger than the range’s end timestamp.

Example timeseries: [(t,v)] => [(1,1),(4,4),(7,7),(11,11),(15,15)]

Running the evaluation loop on this time series with start=0,=end=16= and step=5 will give us [(0,0),(5,4),(10,7),(15,15)]

PromQL selects the value of the sample that is the most recent before (or exactly at) the evaluation step timestamp, it never looks at “future” data, so for t=0=/=start=0, the result would simply be empty instead of 1.

Prometheus UI and Grafana

Grafana has a few parameters/options to access the prometheus api endpoints in a few different ways, it’s best to consult the official grafana prometheus datasource doc.

Since Grafana has to deal with graphs and not just the metrics, it has to take extra care when plotting things so that there is never more data points than the the number of has pixels the graph has etc. It has few options which can be easily confused with what parameters prometheus api takes in.

Tweaking these grafanap options really depends on what one wants to see in the graph, but there are some variables and options which are not well stated in the official documentation.

  • $__interval (seconds) : It is a global variable that is dinamically calculated based on the time range and the width of the graph(the number of pixels). There are the $__from and $__to variables to specify timerange, . The formula is $__interval = ( $__from - $__to)/resolution, resolution here means the display resolution(the width of the panel). For example, If the panel is 1099px wide and time range is 24h = 86400s, $__interval = 86400/1099 = 78s, 78s is then rounded up to 2m which is then set to the $__interval variable.
  • When using the prometheus data source, the value of the $__interval variable is used as the step parameter for the prometheus range queries(/query_range)
  • In the prometheus query editor we get another option called Resolution which is a ratio(1/1,=1/2=,…). Resolution here basically means how many pixels per datapoint, (1/1 > 1/10) This basically modifies the $__interval variable ergo the step.

Example,

  • If $__interval was 2m and Resolution was set to 1/1, step would be 2m.
  • If $__interval was 2m and Resolution was set to 1/2, step would be 2m=x2==4m.
  • The minimum value of step is (or can be) limited in various ways, both for the entire panel of queries (Min interval) and in the individual metrics queries (Min step)

  • In grafana you can make your rate() range intervals auto-adjust to fit the steps. One could use $__interval to choose an appropriate value such as rate(a_metric_name[$__interval]) for the range vector.

  • The range interval you likely want to use for rate() depends partly on your query step:

    • Might want the rate() range interval of the query step
    • Might want the rate() range somewhat more than the query step
    • If we try to do something like rate(metric_name[3s]) where the step greater than the range of the vector, we’ll undersample and skip over some data! because now range durations only include one metric point and none of rate or irate can operate on one data point.
  • There is also the $__range,=__range_msvariables which is basicallyto - from` (CONFUSION!!!)

  • Takeaway, when working with prometheus and Grafana there are 3 different meanings of the same word resolution. The actual prometheus resolution/=step=(seconds), The panel width resolution and the Resolution ratio option of the prometheus datasource.

Issues

  • In console mode, Older react UI on time increment just increments by 10 mins, the new react UI does that by 30 mins. Sometime it increments the microseconds in the classic UI. weird.
  • This article points to some more complex senarios.

Stallness

Subquery