Explaining PostgreSQL's EXPLAIN command

Table of Contents

Context #

SQL is a declarative language that we use to tell the database engine what we would like to do, delegating the decision on how to do it in the most efficient way to the database engine.

Every now and then we may encounter a query that takes a long time to run, compared to our expectations: in those situations we can use the EXPLAIN command to understand what is causing its performance and see if there are opportunities to improve it, either by optimizing the query or by helping Postgres (e.g. adding/changing indexes).

While being an amazing tool of the trade, the output of EXPLAIN is quite cryptic unless you know how to read it, so I’m hoping here to provide a reading lens.

How Postgres executes queries #

A short digression to introduce some terms.

When we submit a query to Postgres, the database engine starts a process to determine what is the best way to execute it. The following is an extremely simplified overview of the steps involved:

The SQL query is converted into a data structure called query plan tree, where each part of the query becomes a branch. There can be several possible ways to obtain the desired results, each represented by a different tree, so the optimizer performs a cost analysis and selects the optimal execution plan, selected according to the query characteristics and to the involved tables’ statistics. The execution plan is then executed, returning its results.

The EXPLAIN command #

There are a few flavours to it, the two main ones being:

EXPLAIN: returns the execution plan and an estimation of the costs of executing the query, without actually executing it.
EXPLAIN ANALYZE: using this option, EXPLAIN also executes the query, so it is able to return the actual execution time on top of its usual output.

It’s important to note here that estimated costs are not expressed in time units:

The most critical part of the display is the estimated statement execution cost […] (measured in cost units that are arbitrary, but conventionally mean disk page fetches).

source: https://www.postgresql.org/docs/current/sql-explain.html

Bigger values thus do indicate longer execution times, but they cannot be directly converted into them.

Example schema #

Let’s take this example schema (I have omitted fields that are not relevant to the queries for brevity’s sake):

create table invoices (  
    id          bigserial primary key,  
    customer_id bigint not null references customers,  
	...
);  
  
create table customers (  
    id bigserial primary key,  
	...
);  

I’ve dumped some data into the schema using a script, so each table now has a few hundred rows.

Note: Table size affects the execution plan choice: for example, if a table has a small number of rows, Postgres might decide that it’s more efficient to simply sequentially scan the table rather than using the index (or the other way around). For this reason, I’m not going to dig into the details of why Postgres chose one strategy over another in the examples below.

The execution plan #

Let’s now use EXPLAIN to see what steps Postgres will take to run a query that selects all invoices along with their customers’ details:

EXPLAIN  
SELECT * FROM invoices i
JOIN customers c ON i.customer_id = c.id;

Result:

Hash Join  (cost=21.02..27.57 rows=280 width=187)
  Hash Cond: (i.customer_id = c.id)
  ->  Seq Scan on invoices i  (cost=0.00..5.80 rows=280 width=51)
  ->  Hash  (cost=14.90..14.90 rows=490 width=136)
        ->  Seq Scan on customers c  (cost=0.00..14.90 rows=490 width=136)

Hash Join is the root of the tree and every line starting with -> is a child node, so we can represent the above as follows:

Executing such a tree means evaluating all its nodes, starting from its leaf nodes and then walking up towards the root. For this query, the execution order will be:

Sequential Scan on customers
Sequential Scan on invoices
Hash
Hash Join

Going back to the text, the details between parentheses represent some estimations:

cost=x..y are the estimated startup and total costs, respectively
rows=z is the estimated count of rows selected by the node
width=w is the estimated average size of data that will be returned for each row, in bytes

If startup cost is 0.00, the node can be immediately evaluated; otherwise, it can only be evaluated once its child nodes have been evaluated. For example, the evaluation of

Seq Scan on customers c  (cost=0.00..14.90 rows=490 width=136)

starts as soon as the execution starts, whereas the evaluation of

Hash  (cost=14.90..14.90 rows=490 width=136)

has to wait until the sequential scan completes before it can start, thus leading to cost=14.90..y.

Running EXPLAIN ANALYZE on the same query yields a very similar result, with the addition of planning and execution times:

Hash Join  (cost=21.02..27.57 rows=280 width=187) (actual time=0.024..0.101 rows=280 loops=1)
  Hash Cond: (i.customer_id = c.id)
  ->  Seq Scan on invoices i  (cost=0.00..5.80 rows=280 width=51) (actual time=0.005..0.022 rows=280 loops=1)
  ->  Hash  (cost=14.90..14.90 rows=490 width=136) (actual time=0.012..0.012 rows=49 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 11kB
        ->  Seq Scan on customers c  (cost=0.00..14.90 rows=490 width=136) (actual time=0.002..0.005 rows=49 loops=1)
Planning Time: 0.221 ms
Execution Time: 0.122 ms

As queries become more complex, the resulting execution plans will also become more complex, but these are the basics to read them.

The advantage of using ANALYZE is that we get an estimation of the actual execution time; the disadvantage is that the query actually needs to be executed, so it may not be a viable option, for example if we are trying to debug a query that puts excessive load on the database engine. In those situations, it’s better to start with vanilla EXPLAIN, find opportunitites for improvement: depending on the situation, it may mean reworking the query, adding or changing indexes, or even redesigning the schema if it really gets to it.