28 May 2025

Per-backend IO stats in PostgreSQL 18

One of my favorite new PostgreSQL 18 features which hasn’t had much attention yet is tracking session-level IO stats. It’s going to be really helpful for more quickly finding problematic queries in real-time – kudos in particular to Bertrand Drouvot for the work on it!

What’s the problem?

It’s easy enough to know when you’re under IO pressure – views like pg_stat_io give you a high-level understanding, as can OS-level or disk-level tools & metrics. But until now, working out where that activity is coming from could be quite an involved process.

For in-flight queries, often I’ve had to ssh into a machine, using tools like iotop to work out which processes are heavy on IO, before marrying those up to process ids in pg_stat_activity to work out which query is responsible. It’s fiddly, and of course if you’re using a DBaaS offering to run PostgreSQL, you most likely won’t have that option anyway because of the limited access to the underlying environment.

You’ll also see a query’s IO impact in pg_stat_statements by looking at columns like shared_blks_read (assuming that extension is installed). But that logs queries only after they’ve finished executing, which won’t help you if the guilty one hasn’t finished running yet. That’s often the case for example with ad-hoc reporting queries, which can be notoriously IO-heavy, and are invisible in pg_stat_statements while they’re still in flight. Even assuming a query has finished running, the pg_stat_statements view typically shows each query’s cost over all time, so unless you use snapshotting techniques it’s also tricky to work out whether the reported IO activity is recent, or just relates to something which might have happened months ago.

Enter per-backend IO stats

PostgreSQL 18 introduces two new functions, pg_stat_get_backend_io() and pg_stat_get_backend_wal(). When passed the id of a process, those two functions provide a wealth of information covering reads, writes, evictions, fsyncs, timing (if track_io_timing / track_wal_io_timing are enabled), write-ahead log use, and more.

Example output of pg_stat_get_backend_io(pid)

Example output of pg_stat_get_backend_wal(pid)

More importantly, you can combine this information with pg_stat_activity, allowing you to identify which sessions are responsible for most IO. For example:

SELECT

a.pid,

any_value(a.query) query,

pg_size_pretty(sum(io.read_bytes)) read,

pg_size_pretty(sum(io.write_bytes)) written,

pg_size_pretty(sum(wal.wal_bytes)) wal

FROM pg_stat_activity a

LEFT JOIN pg_stat_get_backend_io(a.pid) io ON true

LEFT JOIN pg_stat_get_backend_wal(a.pid) wal ON true

GROUP BY a.pid

HAVING sum(io.read_bytes) > 0

ORDER BY sum(io.read_bytes) DESC

Example output of identifying sessions with most read activity

This data is so useful that in pgNow, Redgate’s free PostgreSQL troubleshooting tool, we show it by default when looking at live session information on a PostgreSQL 18 instance. It’s the ideal way to quickly determine which sessions are most likely to be worth exploring, without the complexity that other approaches have needed in the past. Having identified a session of interest, the drilldown lets you dig into richer information like stats on the tables referenced by the query, an explain plan, and more.

Session-level IO activity in pgNow

You can also easily use this data to group IO activity by other facets, like the Login or Application, further simplifying exploring the data.

Session-level IO activity grouped by application in pgNow

A word of warning

PostgreSQL records these stats for the lifetime of a session, so when a connection gets reused to run multiple queries, the data you’re seeing is the total cost of all queries which have been executed on that connection.

That means, for example, that if you have a web app which uses connection pooling to maintain a few live connections, what you’re seeing in this view is the cost of all the workload which happens to have been run on that same connection, while the reported query is just the most recent one which was executed on it.

When looking at a session with high IO, it’s certainly therefore possible that it was a previous query on the connection, rather than the most recent one, which was responsible for that IO activity – do bare that in mind.

When can you use this?

We’re a few months away from the official PostgreSQL 18 release so it’s not quite ready for production use yet, but Beta 1 is available for testing now. This information is shown in pgNow v0.2.12+.

I’d be interested to hear whether you’re similarly excited about this capability!

Redgate Test Data Manager

Redgate Flyway

Redgate Monitor

Overview

Protect

Automate

Monitor

Redgate Blog

Per-backend IO stats in PostgreSQL 18

What’s the problem?

Enter per-backend IO stats

A word of warning

When can you use this?

Redgate Test Data Manager

Redgate Flyway

Redgate Monitor

Overview

Protect

Automate

Monitor

What’s the problem?

Enter per-backend IO stats

A word of warning

When can you use this?

You may also like