We are slowly coming to the end of this little series about partitioning in PostgreSQL. In the last post we had a look at indexing and constraints and today we will have a look at sub partitioning. Sub partitioning means you go one step further and partition the partitions as well. Although it is not required to read all the posts of this series to follow this one: If you want, here they are:
- PostgreSQL partitioning (1): Preparing the data set
- PostgreSQL partitioning (2): Range partitioning
- PostgreSQL partitioning (3): List partitioning
- PostgreSQL partitioning (4) : Hash partitioning
- PostgreSQL partitioning (5): Partition pruning
- PostgreSQL partitioning (6): Attaching and detaching partitions
- PostgreSQL partitioning (7): Indexing and constraints
Coming back to our range partitioned table this is how it looks like currently:
postgres=# d+ traffic_violations_p Partitioned table "public.traffic_violations_p" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description -------------------------+------------------------+-----------+----------+---------+----------+--------------+------------- seqid | text | | | | extended | | date_of_stop | date | | | | plain | | time_of_stop | time without time zone | | | | plain | | agency | text | | | | extended | | subagency | text | | | | extended | | description | text | | | | extended | | location | text | | | | extended | | latitude | numeric | | | | main | | longitude | numeric | | | | main | | accident | text | | | | extended | | belts | boolean | | | | plain | | personal_injury | boolean | | | | plain | | property_damage | boolean | | | | plain | | fatal | boolean | | | | plain | | commercial_license | boolean | | | | plain | | hazmat | boolean | | | | plain | | commercial_vehicle | boolean | | | | plain | | alcohol | boolean | | | | plain | | workzone | boolean | | | | plain | | state | text | | | | extended | | vehicletype | text | | | | extended | | year | smallint | | | | plain | | make | text | | | | extended | | model | text | | | | extended | | color | text | | | | extended | | violation_type | text | | | | extended | | charge | text | | | | extended | | article | text | | | | extended | | contributed_to_accident | boolean | | | | plain | | race | text | | | | extended | | gender | text | | | | extended | | driver_city | text | | | | extended | | driver_state | text | | | | extended | | dl_state | text | | | | extended | | arrest_type | text | | | | extended | | geolocation | point | | | | plain | | council_districts | smallint | | | | plain | | councils | smallint | | | | plain | | communities | smallint | | | | plain | | zip_codes | smallint | | | | plain | | municipalities | smallint | | | | plain | | Partition key: RANGE (date_of_stop) Partitions: traffic_violations_p_2013 FOR VALUES FROM ('2013-01-01') TO ('2014-01-01'), traffic_violations_p_2014 FOR VALUES FROM ('2014-01-01') TO ('2015-01-01'), traffic_violations_p_2015 FOR VALUES FROM ('2015-01-01') TO ('2016-01-01'), traffic_violations_p_2016 FOR VALUES FROM ('2016-01-01') TO ('2017-01-01'), traffic_violations_p_2017 FOR VALUES FROM ('2017-01-01') TO ('2018-01-01'), traffic_violations_p_2018 FOR VALUES FROM ('2018-01-01') TO ('2019-01-01'), traffic_violations_p_2019 FOR VALUES FROM ('2019-01-01') TO ('2020-01-01'), traffic_violations_p_2020 FOR VALUES FROM ('2020-01-01') TO ('2021-01-01'), traffic_violations_p_2021 FOR VALUES FROM ('2021-01-01') TO ('2022-01-01'), traffic_violations_p_default DEFAULT
Lets assume that you expect that traffic violations will grow exponentially in 2022 because more and more cars will be on the road and when there will be more cars there will be more traffic violations. To be prepared for that you do not only want to partition by year but also by month. In other words: Add a new partition for 2022 but sub partition that by month. First of all you need a new partition for 2022 that itself is partitioned as well:
create table traffic_violations_p_2022 partition of traffic_violations_p for values from ('2022-01-01') to ('2023-01-01') partition by range(date_of_stop);
Now we can add partitions to the just created partitioned partition:
create table traffic_violations_p_2022_jan partition of traffic_violations_p_2022 for values from ('2022-01-01') to ('2022-02-01'); create table traffic_violations_p_2022_feb partition of traffic_violations_p_2022 for values from ('2022-02-01') to ('2022-03-01'); create table traffic_violations_p_2022_mar partition of traffic_violations_p_2022 for values from ('2022-03-01') to ('2022-04-01'); create table traffic_violations_p_2022_apr partition of traffic_violations_p_2022 for values from ('2022-04-01') to ('2022-05-01'); create table traffic_violations_p_2022_may partition of traffic_violations_p_2022 for values from ('2022-05-01') to ('2022-06-01'); create table traffic_violations_p_2022_jun partition of traffic_violations_p_2022 for values from ('2022-06-01') to ('2022-07-01'); create table traffic_violations_p_2022_jul partition of traffic_violations_p_2022 for values from ('2022-07-01') to ('2022-08-01'); create table traffic_violations_p_2022_aug partition of traffic_violations_p_2022 for values from ('2022-08-01') to ('2022-09-01'); create table traffic_violations_p_2022_sep partition of traffic_violations_p_2022 for values from ('2022-09-01') to ('2022-10-01'); create table traffic_violations_p_2022_oct partition of traffic_violations_p_2022 for values from ('2022-10-01') to ('2022-11-01'); create table traffic_violations_p_2022_nov partition of traffic_violations_p_2022 for values from ('2022-11-01') to ('2022-12-01'); create table traffic_violations_p_2022_dec partition of traffic_violations_p_2022 for values from ('2022-12-01') to ('2023-01-01');
Looking at psql’s output when we describe the partitioned table not very much changed, just the keyword “PARTITIONED” is showing up beside our new partition for 2022:
postgres=# d+ traffic_violations_p Partitioned table "public.traffic_violations_p" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description -------------------------+------------------------+-----------+----------+---------+----------+--------------+------------- seqid | text | | | | extended | | date_of_stop | date | | | | plain | | time_of_stop | time without time zone | | | | plain | | agency | text | | | | extended | | subagency | text | | | | extended | | description | text | | | | extended | | location | text | | | | extended | | latitude | numeric | | | | main | | longitude | numeric | | | | main | | accident | text | | | | extended | | belts | boolean | | | | plain | | personal_injury | boolean | | | | plain | | property_damage | boolean | | | | plain | | fatal | boolean | | | | plain | | commercial_license | boolean | | | | plain | | hazmat | boolean | | | | plain | | commercial_vehicle | boolean | | | | plain | | alcohol | boolean | | | | plain | | workzone | boolean | | | | plain | | state | text | | | | extended | | vehicletype | text | | | | extended | | year | smallint | | | | plain | | make | text | | | | extended | | model | text | | | | extended | | color | text | | | | extended | | violation_type | text | | | | extended | | charge | text | | | | extended | | article | text | | | | extended | | contributed_to_accident | boolean | | | | plain | | race | text | | | | extended | | gender | text | | | | extended | | driver_city | text | | | | extended | | driver_state | text | | | | extended | | dl_state | text | | | | extended | | arrest_type | text | | | | extended | | geolocation | point | | | | plain | | council_districts | smallint | | | | plain | | councils | smallint | | | | plain | | communities | smallint | | | | plain | | zip_codes | smallint | | | | plain | | municipalities | smallint | | | | plain | | Partition key: RANGE (date_of_stop) Partitions: traffic_violations_p_2013 FOR VALUES FROM ('2013-01-01') TO ('2014-01-01'), traffic_violations_p_2014 FOR VALUES FROM ('2014-01-01') TO ('2015-01-01'), traffic_violations_p_2015 FOR VALUES FROM ('2015-01-01') TO ('2016-01-01'), traffic_violations_p_2016 FOR VALUES FROM ('2016-01-01') TO ('2017-01-01'), traffic_violations_p_2017 FOR VALUES FROM ('2017-01-01') TO ('2018-01-01'), traffic_violations_p_2018 FOR VALUES FROM ('2018-01-01') TO ('2019-01-01'), traffic_violations_p_2019 FOR VALUES FROM ('2019-01-01') TO ('2020-01-01'), traffic_violations_p_2020 FOR VALUES FROM ('2020-01-01') TO ('2021-01-01'), traffic_violations_p_2021 FOR VALUES FROM ('2021-01-01') TO ('2022-01-01'), traffic_violations_p_2022 FOR VALUES FROM ('2022-01-01') TO ('2023-01-01'), PARTITIONED, traffic_violations_p_default DEFAULT
The is where the new functions in PostgreSQL 12 become very handy:
postgres=# select * from pg_partition_tree('traffic_violations_p'); relid | parentrelid | isleaf | level -------------------------------+---------------------------+--------+------- traffic_violations_p | | f | 0 traffic_violations_p_default | traffic_violations_p | t | 1 traffic_violations_p_2013 | traffic_violations_p | t | 1 traffic_violations_p_2014 | traffic_violations_p | t | 1 traffic_violations_p_2015 | traffic_violations_p | t | 1 traffic_violations_p_2016 | traffic_violations_p | t | 1 traffic_violations_p_2017 | traffic_violations_p | t | 1 traffic_violations_p_2018 | traffic_violations_p | t | 1 traffic_violations_p_2019 | traffic_violations_p | t | 1 traffic_violations_p_2020 | traffic_violations_p | t | 1 traffic_violations_p_2021 | traffic_violations_p | t | 1 traffic_violations_p_2022 | traffic_violations_p | f | 1 traffic_violations_p_2022_jan | traffic_violations_p_2022 | t | 2 traffic_violations_p_2022_feb | traffic_violations_p_2022 | t | 2 traffic_violations_p_2022_mar | traffic_violations_p_2022 | t | 2 traffic_violations_p_2022_apr | traffic_violations_p_2022 | t | 2 traffic_violations_p_2022_may | traffic_violations_p_2022 | t | 2 traffic_violations_p_2022_jun | traffic_violations_p_2022 | t | 2 traffic_violations_p_2022_jul | traffic_violations_p_2022 | t | 2 traffic_violations_p_2022_aug | traffic_violations_p_2022 | t | 2 traffic_violations_p_2022_sep | traffic_violations_p_2022 | t | 2 traffic_violations_p_2022_oct | traffic_violations_p_2022 | t | 2 traffic_violations_p_2022_nov | traffic_violations_p_2022 | t | 2 traffic_violations_p_2022_dec | traffic_violations_p_2022 | t | 2
To verify if data is routed correctly to the sub partitions let’s add some data for 2022:
insert into traffic_violations_p (date_of_stop) select * from generate_series ( date('01-01-2022') , date('12-31-2022') , interval '1 day' );
If we did the partitioning correctly we should see data in the new partitions:
postgres=# select count(*) from traffic_violations_p_2022_nov; count ------- 30 (1 row) postgres=# select count(*) from traffic_violations_p_2022_dec; count ------- 31 (1 row) postgres=# select count(*) from traffic_violations_p_2022_feb; count ------- 28 (1 row)
Here we go. Of course you could go even further and sub-partition the monthly partitions further by day or week. You can also partition by list and then sub-partition the list partitions by range. Or partition by range and then sub-partition by list, e.g.:
postgres=# create table traffic_violations_p_list_dummy partition of traffic_violations_p_list for values in ('dummy') partition by range (date_of_stop); CREATE TABLE postgres=# create table traffic_violations_p_list_dummy_2019 partition of traffic_violations_p_list_dummy for values from ('2022-01-01') to ('2023-01-01'); CREATE TABLE postgres=# insert into traffic_violations_p_list (seqid, violation_type , date_of_stop) values (-1,'dummy',date('2022-12-01')); INSERT 0 1 postgres=# select date_of_stop,violation_type from traffic_violations_p_list_dummy_2019; date_of_stop | violation_type --------------+---------------- 2022-12-01 | dummy (1 row)
That’s it for sub-partitioning. In the final post we will look at some corner cases with partitioning in PostgreSQL.