Infrastructure at your Service

PostgreSQL 9.6 introduced a new view called pg_stat_progress_vacuum which gives information about currently running vacuum processes. That was a great addition because since then you could easily estimate on how long a specific vacuum process will need to complete. PostgreSQL 12 will use the same infrastructure to extend that to more operations.

The first operation that can now be tracked is cluster. For that a new view is available which gives the following information:

postgres=# \d pg_stat_progress_cluster
           View "pg_catalog.pg_stat_progress_cluster"
       Column        |  Type   | Collation | Nullable | Default 
---------------------+---------+-----------+----------+---------
 pid                 | integer |           |          | 
 datid               | oid     |           |          | 
 datname             | name    |           |          | 
 relid               | oid     |           |          | 
 command             | text    |           |          | 
 phase               | text    |           |          | 
 cluster_index_relid | bigint  |           |          | 
 heap_tuples_scanned | bigint  |           |          | 
 heap_tuples_written | bigint  |           |          | 
 heap_blks_total     | bigint  |           |          | 
 heap_blks_scanned   | bigint  |           |          | 
 index_rebuild_count | bigint  |           |          | 

As always, lets generate a sample table and some indexes so we have something cluster can work on:

postgres=# create table t1 as select a,md5(a::text) as txt, now() as date from generate_series(1,3000000) a;
SELECT 3000000
postgres=# create index i1 on t1(a);
CREATE INDEX
postgres=# create index i2 on t1(txt);
CREATE INDEX
postgres=# create index i3 on t1(date);
CREATE INDEX
postgres=# \d t1
                         Table "public.t1"
 Column |           Type           | Collation | Nullable | Default 
--------+--------------------------+-----------+----------+---------
 a      | integer                  |           |          | 
 txt    | text                     |           |          | 
 date   | timestamp with time zone |           |          | 
Indexes:
    "i1" btree (a)
    "i2" btree (txt)
    "i3" btree (date)

Once we cluster that table we should see the progress in pg_stat_progress_cluster, so in the first session:

postgres=# cluster verbose t1 using i1;
psql: INFO:  clustering "public.t1" using index scan on "i1"
psql: INFO:  "t1": found 0 removable, 3000000 nonremovable row versions in 28038 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU: user: 0.82 s, system: 0.55 s, elapsed: 1.87 s.
CLUSTER
postgres=# 

… and in a second session:

postgres=# select * from pg_stat_progress_cluster;
 pid  | datid | datname  | relid | command |      phase       | cluster_index_relid | heap_tuples_scanned | heap_tuples_written | heap_blks_total | heap_blks_scanned | index_rebuild_count 
------+-------+----------+-------+---------+------------------+---------------------+---------------------+---------------------+-----------------+-------------------+---------------------
 1669 | 13586 | postgres | 16384 | CLUSTER | rebuilding index |               16390 |             3000000 |             3000000 |               0 |                 0 |                   2
(1 row)

Nice. And the same is now available when indexes get created:

postgres=# \d pg_stat_progress_create_index 
        View "pg_catalog.pg_stat_progress_create_index"
       Column       |  Type   | Collation | Nullable | Default 
--------------------+---------+-----------+----------+---------
 pid                | integer |           |          | 
 datid              | oid     |           |          | 
 datname            | name    |           |          | 
 relid              | oid     |           |          | 
 phase              | text    |           |          | 
 lockers_total      | bigint  |           |          | 
 lockers_done       | bigint  |           |          | 
 current_locker_pid | bigint  |           |          | 
 blocks_total       | bigint  |           |          | 
 blocks_done        | bigint  |           |          | 
 tuples_total       | bigint  |           |          | 
 tuples_done        | bigint  |           |          | 
 partitions_total   | bigint  |           |          | 
 partitions_done    | bigint  |           |          | 

2 Comments

  • Hans says:

    I find the choice of one pg_stat_progress_XXX view per command a bit unfortunate.
    A single pg_stat_progress with a column named “command” would be much easier to use for monitoring purposes.

    • Daniel Westermann says:

      I didn’t check but I am pretty sure you can find the answer to that in the mail thread of these features. There are reasons for both implementations, I guess.

      Cheers,
      Daniel

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Daniel Westermann
Daniel Westermann

Principal Consultant & Technology Leader Open Infrastructure