The problem with step alignment in Prometheus

Prometheus has a fixed alignment for all queries that cannot be changed by the user. You can specify an arbitrary step, but you have no control over where the boundaries are set. This is not an issue usually, as you do not care where exactly your 15m or 1h step boundaries are. It does become a problem, however, when dealing with larger steps, such as e.g. a weekly step where the boundaries become quite obviously and often meaningful to humans. In the weekly case, the step boundaries will be set to Thursday 00:00 UTC (an artifact of the internal implementation). This also has the unfortunate effect that the data from the last step boundary up to the end of the query range will be missing.

A very hacky solution

There is, to the best of my knowledge, no clean way to solve this issue. What we can do, however, is reduce the step (e.g. from 1w to 1d) and filter out the irrelevant datapoints, which brings us reasonably close to the result we are looking for. Keep in mind that this may cause a significant computation overhead.

Take for example a basic query for the number of HTTP requests per week, with a step of one week (Grafana Playground):

Expr: sum(increase(prometheus_http_requests_total[168h]))
Step: 168h0m0s

This query exhibits the aforementioned problem and you will see the datapoints being aligned to Thursdays. Depending on what day it is when the query is executed you may see different results (e.g. if it happens to be a Thursday everything will look quite reasonable) - simply shift the range start and end date to a different day in that case.

We can now change the step to 24h to evaluate the query on all days of the week (Grafana Playground):

Expr: sum(increase(prometheus_http_requests_total[168h]))
Step: 24h0m0s

This will give us the data we want, but we still have superfluous datapoints in there. We can filter it to only return data for specific weekdays (Grafana Playground):

Expr: sum(increase(prometheus_http_requests_total[168h])) and on() day_of_week() == 0
Step: 24h0m0s

This can be expanded to filter on any other rules one can come up with. For example, to get queries in Grafana aligned to the final day in the selected range, one can use (Grafana Playground):

Expr: sum(increase(prometheus_http_requests_total[168h])) and on() day_of_week() == day_of_week(vector($__to / 1000))
Step: 24h0m0s

Tips

  • In the standard Grafana “Time series” visualization this may result in large gaps between the individual datapoints and line charts will not be connected properly. This can be mitigated by switching to the “Bar chart” visualization.

References