estimating and reclaiming bloat in PostgreSQL
Table of Contents
Context #
Postgres tables and indexes may become bloated over time, especially when their contents change frequently and significantly. Bloat in Postgres means that they use more disk space than they actually need, causing query performance degradation on top of the additional storage requirements.
For this reason, Postgres provides several ways to keep bloat under control, either automatically or manually, performing an activity called vacuuming (from the name of the command used to execute it, VACUUM
).
Detecting bloat #
This repository provides queries to estimate table and index bloat in Postgres, so it’s a great starting point to understand if vacuuming is needed at all.
Reclaiming bloat #
Reclaiming bloat is achieved by running
VACUUM <my_table>
, orVACUUM FULL <my_table>
The first form can run in parallel with other operations (except for those modifying the table definition) and performs a soft space reclamation: what I mean with this is that it identifies obsolete row versions in tables and indexes, marking the space they occupy as available for reuse to the engine, without returning that space to the operating system.
The second form requires an exclusive table lock, meaning that it cannot run in parallel with any other database operation. It does, however return bloat space to the operating system.
Automatic vacuuming #
To automate the execution of VACUUM
, Postgres has an “optional but highly recommended” (quoting from the documentation) feature called autovacuum. This feature consists of a set of processes that periodically check if vacuuming is needed for some tables, according to a number of parameters.
Automatic or manual? #
Both: enabling automatic vacuuming is, in most situations, highly desirable. Automatic vacuuming tries its best to perform maintenance while keeping its impact low, so it might not be able to do enough work on its own. A concrete example: if we have just imported a significant amount of data into the table because of a data migration, manually reclaiming bloat space will almost surely help the engine to choose the right query execution plan, without waiting for automatic vacuuming to kick in.