Infrastructure at your Service

Daniel Westermann

Getting started with Exasol – Some words about indexes and transactions

If you followed Getting started with Exasol – Setting up an environment and Getting started with Exasol – Loading data from PostgreSQL you should have an Exasol test system up and running and a virtual schema pointing to a PostgreSQL schema. What we’ll be doing in this post is to load the data from PostgreSQL into Exasol and then have a look at how transactions work in Exasol and how Exasol is handling indexes.

As a quick reminder the virtual schema we’ve created in the previous post looks like this:

SQL_EXA> select table_name from exa_dba_tables where TABLE_SCHEMA = 'POSTGRESQL_REMOTE';
EXA: select table_name from exa_dba_tables where TABLE_SCHEMA = 'POSTGRESQL...

TABLE_NAME                                                                                                                      
--------------------------------------------------------------------------------------------------------------------------------
PGBENCH_ACCOUNTS                                                                                                                
PGBENCH_BRANCHES                                                                                                                
PGBENCH_HISTORY                                                                                                                 
PGBENCH_TELLERS                                                                                                                 

4 rows in resultset.

As these table are actually in PostgreSQL we’re going to load them locally into an Exasol schema:

SQL_EXA> create schema demo;
EXA: create schema demo;

Rows affected: 0

SQL_EXA> open schema demo;
EXA: open schema demo;

Rows affected: 0

SQL_EXA> create table PGBENCH_ACCOUNTS as select * from POSTGRESQL_REMOTE.PGBENCH_ACCOUNTS;
EXA: create table PGBENCH_ACCOUNTS as select * from POSTGRESQL_REMOTE.PGBEN...

Rows affected: 10000000

An alternative method to do the same is to use SELECT INTO statement in Exasol:

SQL_EXA> select * into table PGBENCH_BRANCHES from POSTGRESQL_REMOTE.PGBENCH_BRANCHES;
EXA: select * into table PGBENCH_BRANCHES from POSTGRESQL_REMOTE.PGBENCH_BR...

Rows affected: 100

SQL_EXA> select * into table PGBENCH_HISTORY from POSTGRESQL_REMOTE.PGBENCH_HISTORY;
EXA: select * into table PGBENCH_HISTORY from POSTGRESQL_REMOTE.PGBENCH_HIS...

Rows affected: 0

SQL_EXA> select * into table PGBENCH_TELLERS from POSTGRESQL_REMOTE.PGBENCH_TELLERS;
EXA: select * into table PGBENCH_TELLERS from POSTGRESQL_REMOTE.PGBENCH_TEL...

Rows affected: 1000

Of course we lost all primary keys, foreign keys and check constraints by doing this:

SQL_EXA> select count(*) from EXA_DBA_CONSTRAINTS where CONSTRAINT_SCHEMA = 'DEMO';
EXA: select count(*) from EXA_DBA_CONSTRAINTS where CONSTRAINT_SCHEMA = 'DE...

COUNT(*)             
---------------------
                    0

1 row in resultset.

Let’s add the same constraints as we have them on the PostgreSQL side:

SQL_EXA> alter table PGBENCH_ACCOUNTS add constraint PGBENCH_ACCOUNTS_PK primary key (AID);
EXA: alter table PGBENCH_ACCOUNTS add constraint PGBENCH_ACCOUNTS_PK primar...

Rows affected: 0

SQL_EXA> alter table PGBENCH_BRANCHES add constraint PGBENCH_BRANCHES_PK primary key (BID);
EXA: alter table PGBENCH_BRANCHES add constraint PGBENCH_BRANCHES_PK primar...

Rows affected: 0

SQL_EXA> alter table PGBENCH_TELLERS add constraint PGBENCH_TELLERS_PK primary key (TID);
EXA: alter table PGBENCH_TELLERS add constraint PGBENCH_TELLERS_PK primary ...

Rows affected: 0

SQL_EXA> select count(*) from EXA_DBA_CONSTRAINTS where CONSTRAINT_SCHEMA = 'DEMO';
EXA: select count(*) from EXA_DBA_CONSTRAINTS where CONSTRAINT_SCHEMA = 'DE...

COUNT(*)             
---------------------
                    3

1 row in resultset.

Now we have exactly the same setup as in PostgreSQL. Creating the primary keys should also have created some indexes in the background:

SQL_EXA> col index_schema for a30;
COLUMN   index_schema ON
FORMAT   a30
SQL_EXA> col index_table for a30;
COLUMN   index_table ON
FORMAT   a30
SQL_EXA> col index_type for a15;
COLUMN   index_type ON
FORMAT   a15
SQL_EXA> select INDEX_SCHEMA,INDEX_TABLE,INDEX_TYPE from EXA_DBA_INDICES where INDEX_SCHEMA = 'DEMO';
EXA: select INDEX_SCHEMA,INDEX_TABLE,INDEX_TYPE from EXA_DBA_INDICES where ...

INDEX_SCHEMA                   INDEX_TABLE                    INDEX_TYPE     
------------------------------ ------------------------------ ---------------
DEMO                           PGBENCH_ACCOUNTS               GLOBAL         
DEMO                           PGBENCH_BRANCHES               GLOBAL         
DEMO                           PGBENCH_TELLERS                GLOBAL         

3 rows in resultset.

SQL_EXA> 

If you’re looking for an index name, there is none:

SQL_EXA> col column_name for a30;
COLUMN   column_name ON
FORMAT   a30
SQL_EXA> col column_comment for a50;
COLUMN   column_comment ON
FORMAT   a50
SQL_EXA> desc full EXA_DBA_INDICES;
EXA: desc full EXA_DBA_INDICES;

COLUMN_NAME                    SQL_TYPE                                 NULLABLE DISTRIBUTION_KEY PARTITION_KEY    COLUMN_COMMENT                                    
------------------------------ ---------------------------------------- -------- ---------------- ---------------- --------------------------------------------------
INDEX_SCHEMA                   VARCHAR(128) UTF8                                                                   Schema of the index                               
INDEX_TABLE                    VARCHAR(128) UTF8                                                                   Table of the index                                
INDEX_OWNER                    VARCHAR(128) UTF8                                                                   Owner of the index                                
INDEX_OBJECT_ID                DECIMAL(18,0)                                                                       ID of the index                                   
INDEX_TYPE                     VARCHAR(20) UTF8                                                                    Index type                                        
IS_GEOMETRY                    BOOLEAN                                                                             States whether this is a geospatial index         
MEM_OBJECT_SIZE                DECIMAL(18,0)                                                                       Index size in bytes (at last COMMIT)              
CREATED                        TIMESTAMP                                                                           Timestamp of when the index was created           
LAST_COMMIT                    TIMESTAMP                                                                           Last time the object was changed in the DB        
REMARKS                        VARCHAR(100000) UTF8                                                                Additional information about the index            

10 rows in resultset.
SQL_EXA> 
tree (bi

There isn’t even a CREATE INDEX statement in Exasol. A “Global” index exists on all nodes of the cluster while a “Local” index only exits on the given node. Indexes are created and maintained automatically based on the queries the system is processing. We can easily see this when joining two simple tables:

SQL_EXA> create table demo1 ( a int, b int, c int );
EXA: create table demo1 ( a int, b int, c int );

Rows affected: 0

SQL_EXA> insert into demo1 select 1,1,1 from pgbench_accounts where aid < 1000;
EXA: insert into demo1 select 1,1,1 from pgbench_accounts where aid  create table demo2 ( a int, b int, c int );

SQL_EXA> create table demo2 ( a int, b int, c int );
EXA: create table demo2 ( a int, b int, c int );

Rows affected: 0

SQL_EXA> insert into demo2 select * from demo1;
EXA: insert into demo2 select * from demo1;

Rows affected: 999

SQL_EXA> select count(*) from demo1 a, demo2 b where a.a = b.a;
EXA: select count(*) from demo1 a, demo2 b where a.a = b.a;

COUNT(*)             
---------------------
               998001

1 row in resultset.

SQL_EXA> select INDEX_SCHEMA,INDEX_TABLE,INDEX_TYPE from EXA_DBA_INDICES where INDEX_SCHEMA = 'DEMO';
EXA: select INDEX_SCHEMA,INDEX_TABLE,INDEX_TYPE from EXA_DBA_INDICES where ...

INDEX_SCHEMA                                 INDEX_TABLE                    INDEX_TYPE          
-------------------------------------------- ------------------------------ --------------------
DEMO                                         PGBENCH_ACCOUNTS               GLOBAL              
DEMO                                         PGBENCH_BRANCHES               GLOBAL              
DEMO                                         PGBENCH_TELLERS                GLOBAL              
DEMO                                         DEMO1                          LOCAL               

If an index is not used for 5 weeks it will be dropped automatically.

Coming to transaction handling. First of all you need to know the auto commit is enabled by default (the same as in PostgreSQL) in the clients:

SQL_EXA> show autocommit;
AUTOCOMMIT = "ON"

Exasol supports the serializable transaction isolation level. This is the strongest level but also comes with some downsides. It can happen that transaction need to wait or even abort when the order of the transactions can not be guaranteed. DDL is transactional as well:

-- session 1
SQL_EXA> set AUTOCOMMIT off;
SQL_EXA> create table t1 ( a int );
EXA: create table t1 ( a int );


-- session 2
SQL_EXA> set AUTOCOMMIT off;
SQL_EXA> 
SQL_EXA> select * from t1;
EXA: select * from t1;
Error: [42000] object T1 not found [line 1, column 15] (Session: 1679358615345561600)

-- session 1
SQL_EXA> commit;
EXA: commit;

-- session 2
SQL_EXA> select * from t1;
EXA: select * from t1;

A                    
---------------------

0 rows in resultset.

SQL_EXA> 

The recommendation is to go with the default and let auto commit on to keep transactions as short as possible. Otherwise this can happen:

-- session 1
SQL_EXA> set AUTOCOMMIT off;
SQL_EXA> insert into t1 values(1);
EXA: insert into t1 values(1);

Rows affected: 1

SQL_EXA> insert into t1 values(2);
EXA: insert into t1 values(2);

Rows affected: 1

SQL_EXA> insert into t1 values(3);
EXA: insert into t1 values(3);

Rows affected: 1

SQL_EXA> commit;
EXA: commit;

Rows affected: 0

SQL_EXA> update t1 set a = 1 where a = 2;
EXA: update t1 set a = 1 where a = 2;

Rows affected: 1

SQL_EXA> 


-- session 2
SQL_EXA> set AUTOCOMMIT off;
SQL_EXA> insert into t1 values (1);
EXA: insert into t1 values (1);    --- must wait for session on to either commit or rollback

… or even this:

-- session 2
SQL_EXA> select * from t1;
EXA: select * from t1;

A                    
---------------------
                    1
                    1
                    3
                    1

4 rows in resultset.


-- session 1
SQL_EXA> update t1 set a = 1 where a = 2;
EXA: update t1 set a = 1 where a = 2;

Rows affected: 0

SQL_EXA> 

-- session 2
SQL_EXA> insert into t1 values (5);
EXA: insert into t1 values (5);
Error: [40001] GlobalTransactionRollback msg: Transaction collision: automatic transaction rollback. (Session: 1679360319468797952)

These kind of conflict are recorded in the catalog and you can check them:

SQL_EXA> col conflict_objects for a30;
COLUMN   conflict_objects ON
FORMAT   a30
SQL_EXA> select * from EXA_DBA_TRANSACTION_CONFLICTS;
EXA: select * from EXA_DBA_TRANSACTION_CONFLICTS;

SESSION_ID            CONFLICT_SESSION_ID   START_TIME                 STOP_TIME                  CONFLICT_TYPE        CONFLICT_OBJECTS               CONFLICT_INFO                           
--------------------- --------------------- -------------------------- -------------------------- -------------------- ------------------------------ ----------------------
  1679358615345561600                       2020-10-01 14:13:12.871000 2020-10-01 14:13:12.871000 TRANSACTION ROLLBACK DEMO.T1                        intern merged sessions                  
  1679358615345561600   1679357953088880640 2020-10-01 14:23:31.669000 2020-10-01 14:27:35.941000 WAIT FOR COMMIT      DEMO.T1                                                                
  1679359718013206528   1679357953088880640 2020-10-01 14:24:05.772000 2020-10-01 14:27:35.941000 WAIT FOR COMMIT      DEMO.T1                                                                
  1679359718013206528   1679357953088880640 2020-10-01 14:29:05.514000 2020-10-01 14:32:06.309000 WAIT FOR COMMIT      DEMO.T1                                                                
  1679360160561037312   1679357953088880640 2020-10-01 14:31:11.577000 2020-10-01 14:32:06.309000 WAIT FOR COMMIT      DEMO.T1                                                                
  1679360185071960064   1679357953088880640 2020-10-01 14:31:59.416000 2020-10-01 14:32:06.309000 WAIT FOR COMMIT      DEMO.T1                                                                
  1679360185071960064   1679359718013206528 2020-10-01 14:32:06.352000 2020-10-01 14:32:12.255000 WAIT FOR COMMIT      DEMO.T1                                                                
  1679359718013206528   1679357953088880640 2020-10-01 14:32:51.164000 2020-10-01 14:32:51.164000 TRANSACTION ROLLBACK DEMO.T1                                                                

8 rows in resultset.

As soon as you write to an object you are working on your own (temporary copy) of the object, Exasol calls this Multi-Copy. When a transaction commits that version of the objects becomes the valid one. Transactions that started before will still use the previous version of the object.

The recommendation therefore is to always have auto commit turned on in the client and for the cases where you need to turn it off: Keep the transactions as short as possible. That’s it for the introduction of indexes and transactions in Exasol. In the next post we’ll look at database sessions and auditing.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Daniel Westermann
Daniel Westermann

Principal Consultant & Technology Leader Open Infrastructure