Vacuum in PostgreSQL is one of the most important points to consider when managing a PostgreSQL instance. Usually vacuum is running in the background and just gets the job done. But, as always, there are situations when you need to get a closer look at what is going on. PostgreSQL 9.6 (currently in Beta1) introduced a new view which allows to see the progress of the vacuum worker processes. Lets have a short look at it.

To have something available for vacuum to do lets generate some data:

(postgres@[local]:5432) [postgres] > create table t1 ( a int, b varchar(40), c timestamp );
CREATE TABLE
insert into t1 ( a, b, c )
select aa, bb, cc
  from generate_series(1,10000000) aa
     , md5(aa::varchar) bb
     , now() cc;
NSERT 0 10000000

Vacuuming this table right now does not make any sense as there are no tuples to cleanup. We’ll need to delete or update some data for this. Before doing this I’ll disable autovacuum so I can kick it off manually when I am ready with my data:

(postgres@[local]:5432) [postgres] > show autovacuum;
 autovacuum 
------------
 on
(1 row)

(postgres@[local]:5432) [postgres] > alter system set autovacuum='off';
ALTER SYSTEM
(postgres@[local]:5432) [postgres] > select * from pg_reload_conf();
 pg_reload_conf 
----------------
 t
(1 row)

(postgres@[local]:5432) [postgres] > 

Having a look at the log file confirms that the autovacuum launcher process was shut down:

2016-05-17 07:33:02.648 CEST - 4 - 2703 -  - @ LOG:  received SIGHUP, reloading configuration files
2016-05-17 07:33:02.653 CEST - 5 - 2703 -  - @ LOG:  parameter "autovacuum" changed to "off"
2016-05-17 07:33:02.655 CEST - 2 - 2709 -  - @ LOG:  autovacuum launcher shutting down

Time to delete some data from the test table:

delete from t1
 where mod(a,7) = 0
;
DELETE 1428571

This should result in some dead tuples which we can check by querying pg_stat_user_tables:

(postgres@[local]:5432) [postgres] > select n_tup_del
                                          , n_live_tup
                                          , n_dead_tup 
                                       from pg_stat_user_tables 
                                      where relname='t1';
 n_tup_del | n_live_tup | n_dead_tup 
-----------+------------+------------
   1571428 |    8571429 |    1428571
(1 row)

The amount of dead tuples corresponds to the number of rows we deleted. Now we can start vacuum on the table and check the new pg_stat_progress_vacuum for what is going on in a seconds session.

Session 1:

postgres@[local]:5432) [postgres] > vacuum verbose t1;

Session 2:

(postgres@[local]:5432) [postgres] > x
Expanded display is on.
(postgres@[local]:5432) [postgres] > select * from pg_stat_progress_vacuum;

The result of the above command (executed twice with some seconds in between) is:

(postgres@[local]:5432) [postgres] > select * from pg_stat_progress_vacuum;
-[ RECORD 1 ]------+--------------
pid                | 3051
datid              | 13322
datname            | postgres
relid              | 16388
phase              | scanning heap
heap_blks_total    | 93458
heap_blks_scanned  | 1568
heap_blks_vacuumed | 1567
index_vacuum_count | 0
max_dead_tuples    | 291
num_dead_tuples    | 16


(postgres@[local]:5432) [postgres] > select * from pg_stat_progress_vacuum;
-[ RECORD 1 ]------+--------------
pid                | 3051
datid              | 13322
datname            | postgres
relid              | 16388
phase              | scanning heap
heap_blks_total    | 93458
heap_blks_scanned  | 90618
heap_blks_vacuumed | 90617
index_vacuum_count | 0
max_dead_tuples    | 291
num_dead_tuples    | 15

Pretty cool. Now we can estimate how long the operation will take or how much work is already done (when heap_blks_scanned comes close to heap_blks_total work should almost be done). According to the documentation and to the thread on hackers this is only the first view providing progress information…more will probably come in future releases.