Infrastructure at your Service

Daniel Westermann

Getting started with Exasol – Backup and restore (2)

In the previous post we did the groundwork for being able to backup the Exasol database to an S3 bucket. We learned that you can do full level 0 backups and of top of that you can do incremental backups, that refer to each other. A restore of the full backup worked fine and pretty fast. In this post we’ll restore both incremental backups and check if the result is what we expect. In addition we’ll revise the points I’ve outlined need to be implemented when it comes to backup and restore:

  • Backups should be non-blocking
  • Given that Exasol is targeting massive data volumes, there should be a way to parallelize backups and restores
  • There should be a way to check the integrity of backups
  • Point in time recovery of course is a must
  • Compression of backups becomes important with huge data, also this is not a blocker

Restoring an incremental backup is not different from restoring a full backup. You will need to shutdown the database, and then initiate the restore from backup section:

Once completed let’s check what data we have in our demo table:

[email protected]:~/EXAplus-7.0.0$ /home/dwe/EXAplus-7.0.0/exaplus -c 192.168.178.111:8563 -u sys -p exasol 
EXAplus 7.0.0 (c) EXASOL AG

Friday, October 9, 2020 12:26:48 PM CEST
Connected to database my_exa_db1 as user sys.
EXASolution 7.0.2 (c) EXASOL AG

SQL_EXA> open schema demo;
EXA: open schema demo;

Rows affected: 0

SQL_EXA> select * from t1;
EXA: select * from t1;

A                    
---------------------
                    1
                    2
                    3
                   -1
                   -2
                   -3

6 rows in resultset.

All fine. Last test with the incremental 2 backup:

The results are fine here as well:

[email protected]:~/EXAplus-7.0.0$ /home/dwe/EXAplus-7.0.0/exaplus -c 192.168.178.111:8563 -u sys -p exasol 
EXAplus 7.0.0 (c) EXASOL AG

Friday, October 9, 2020 1:00:46 PM CEST
Connected to database my_exa_db1 as user sys.
EXASolution 7.0.2 (c) EXASOL AG

EXA: open schema demo;

Rows affected: 0

SQL_EXA> select * from t1;
EXA: select * from t1;

A                    
---------------------
                    1
                    2
                    3
                   -1
                   -2
                   -3
                 -300
                 -301
                 -302

9 rows in resultset.

SQL_EXA> 

The rows we added in the last post after we did the second incremental backup are lost. Exasol does not support point in time recovery. You can restore only to the last backup you have available.

For getting to know if backups are done in parallel we need to increase our data set. The idea is to do two tests:

  1. Increase the size of data set and then check if there are parallel processes backing up a single
  2. Create a copy of the table and again check if there are multiple backup processes working on the backup

The data set is available for download here. This is the table definition ( the varchar(500) everywhere is just because I was too lazy to check the actual column sizes):

create table demo (afamiliarity varchar(500),
                   ahotttnesss varchar(500),
                   aid varchar(500),
                   alatitude varchar(500),
                   alocation varchar(500),
                   alongitude varchar(500),
                   aname varchar(500),
                   asimilar varchar(500),
                   aterms varchar(500),
                   aterms_freq varchar(500),
                   rid varchar(500),
                   rname varchar(500),
                   sartist_mbtags varchar(500),
                   sartist_mbtags_count varchar(500),
                   sbars_confidence varchar(500),
                   sbars_start varchar(500),
                   sbeats_confidence varchar(500),
                   sbeats_start varchar(500),
                   sduration varchar(500),
                   send_of_fade_in varchar(500),
                   shotttnesss varchar(500),
                   sid varchar(500),
                   skey varchar(500),
                   skey_confidence varchar(500),
                   sloudness varchar(500),
                   smode varchar(500),
                   smode_confidence varchar(500),
                   sstart_of_fade_out varchar(500),
                   statums_confidence varchar(500),
                   statums_start varchar(500),
                   stempo varchar(500),
                   stime_signature varchar(500),
                   stime_signature_confidence varchar(500),
                   stitle varchar(500),
                   syear varchar(500));

Loading the table locally from my workstation:

SQL_EXA> IMPORT INTO demo  
1    >        FROM LOCAL CSV FILE '/home/dwe/Downloads/music.csv'
2    >        COLUMN SEPARATOR = ',' SKIP = 5;
EXA: IMPORT INTO demo  ...

Rows affected: 9996

This gives us a table of approx. 2MB:

SQL_EXA> select OBJECT_NAME,OBJECT_TYPE,RAW_OBJECT_SIZE,MEM_OBJECT_SIZE,RAW_OBJECT_SIZE_LIMIT from EXA_DBA_OBJECT_SIZES where OBJECT_NAME = 'DEMO' AND OBJECT_TYPE='TABLE';
EXA: select OBJECT_NAME,OBJECT_TYPE,RAW_OBJECT_SIZE,MEM_OBJECT_SIZE,RAW_OBJ...

OBJECT_NAME     OBJECT_TYPE     RAW_OBJECT_SIZE       MEM_OBJECT_SIZE       RAW_OBJECT_SIZE_LIMIT
--------------- --------------- --------------------- --------------------- ---------------------
DEMO            TABLE                         2126754               3964283                      

1 row in resultset.

Let’s increase that a bit:

SQL_EXA> insert into demo select * from demo;
EXA: insert into demo select * from demo;

Rows affected: 9996
--
-- repeated that until:
--
SQL_EXA> select OBJECT_NAME,OBJECT_TYPE,RAW_OBJECT_SIZE,MEM_OBJECT_SIZE,RAW_OBJECT_SIZE_LIMIT from EXA_DBA_OBJECT_SIZES where OBJECT_NAME = 'DEMO' AND OBJECT_TYPE='TABLE';
EXA: select OBJECT_NAME,OBJECT_TYPE,RAW_OBJECT_SIZE,MEM_OBJECT_SIZE,RAW_OBJ...

OBJECT_NAME     OBJECT_TYPE     RAW_OBJECT_SIZE       MEM_OBJECT_SIZE       RAW_OBJECT_SIZE_LIMIT
--------------- --------------- --------------------- --------------------- ---------------------
DEMO            TABLE                     34844737536            7379579363                      

1 row in resultset.

That gives us a table of around 32GB. What we can see here as well is, that the compression ratio in memory is quite well. The actual size of the table in memory is only around 7GB. Exasol uses a columnar storage technique, that means data is not stored in tuples/rows but rather in columns. This is much better for compression, as data in the same column is usually much better compress-able than data that is stored in tuples.

Back to the original question: Will Exasol use multiple processes to backup the data? Currently we do see these sessions:

SQL_EXA> select USER_NAME,EFFECTIVE_USER,STATUS,COMMAND_NAME,ACTIVITY,SQL_TEXT from exa_dba_sessions;
EXA: select USER_NAME,EFFECTIVE_USER,STATUS,COMMAND_NAME,ACTIVITY,SQL_TEXT ...

USER_NAME                      EFFECTIVE_USER                 STATUS          COMMAND_NAME                             ACTIVITY        SQL_TEXT                      
------------------------------ ------------------------------ --------------- ---------------------------------------- --------------- ------------------------------
SYS                            SYS                            IDLE            NOT SPECIFIED                                                                          
SYS                            SYS                            EXECUTE SQL     SELECT                                                   select USER_NAME,EFFECTIVE_USE
                                                                                                                                       R,STATUS,COMMAND_NAME,ACTIVITY
                                                                                                                                       ,SQL_TEXT from exa_dba_session
                                                                                                                                       s;                            
SYS                            SYS                            IDLE            NOT SPECIFIED                                                              

What you can do when it comes to backups, is to kick off an ad-hoc backup as well:

The test is, to kickoff an ad-hoc backup and then check if we can see multiple sessions working on that. We can see that the backup started in the overview section of EXASolution:

What do we see in exa_dba_sessions while the backup is running?

SQL_EXA> select USER_NAME,EFFECTIVE_USER,STATUS,COMMAND_NAME,ACTIVITY,SQL_TEXT from exa_dba_sessions;
EXA: select USER_NAME,EFFECTIVE_USER,STATUS,COMMAND_NAME,ACTIVITY,SQL_TEXT ...

USER_NAME                      EFFECTIVE_USER                 STATUS          COMMAND_NAME                             ACTIVITY                       SQL_TEXT                      
------------------------------ ------------------------------ --------------- ---------------------------------------- ------------------------------ ------------------------------
SYS                            SYS                            IDLE            NOT SPECIFIED                                                                                         
SYS                            SYS                            EXECUTE SQL     SELECT                                                                  select USER_NAME,EFFECTIVE_USE
                                                                                                                                                      R,STATUS,COMMAND_NAME,ACTIVITY
                                                                                                                                                      ,SQL_TEXT from exa_dba_session
                                                                                                                                                      s;                            

2 rows in resultset.

Nothing at all. Backup operations are not visible in exa_dba_sessions. Where else can we have a look at? There is exa_system_events, but that does not give much information either, except for the markes when the backup started and when it completed:

SQL_EXA> select * from EXA_SYSTEM_EVENTS;
EXA: select * from EXA_SYSTEM_EVENTS;

CLUSTER_NAME                   MEASURE_TIME               EVENT_TYPE                     DBMS_VERSION                   NODES   DB_RAM_SIZE  PARAMETERS                    
------------------------------ -------------------------- ------------------------------ ------------------------------ ------- ------------ ------------------------------
MASTER                         2020-10-08 13:31:10.079000 STARTUP                        7.0.2                                1          2.0                               
MASTER                         2020-10-08 15:30:00.000000 RESTART                        7.0.2                                1          2.0                               
MASTER                         2020-10-08 15:30:02.866000 BACKUP_START                   7.0.2                                1          2.0                               
MASTER                         2020-10-08 15:30:10.144000 STARTUP                        7.0.2                                1          2.0                               
MASTER                         2020-10-08 15:40:00.000000 RESTART                        7.0.2                                1          2.0                               
MASTER                         2020-10-08 15:40:11.015000 STARTUP                        7.0.2                                1          2.0                               
MASTER                         2020-10-08 15:50:00.000000 RESTART                        7.0.2                                1          2.0                               
MASTER                         2020-10-09 13:00:26.663000 RESTORE_START                  7.0.2                                1          2.0                               
MASTER                         2020-10-09 13:00:34.410000 RESTORE_END                    7.0.2                                1          2.0                               
MASTER                         2020-10-09 13:00:39.317000 STARTUP                        7.0.2                                1          2.0                               
MASTER                         2020-10-09 15:30:00.000000 RESTART                        7.0.2                                1          2.0                               
MASTER                         2020-10-09 15:30:03.125000 BACKUP_START                   7.0.2                                1          2.0                               
MASTER                         2020-10-09 15:30:09.368000 STARTUP                        7.0.2                                1          2.0                               
MASTER                         2020-10-09 15:40:00.000000 RESTART                        7.0.2                                1          2.0                               
MASTER                         2020-10-09 15:40:02.901000 BACKUP_START                   7.0.2                                1          2.0                               
MASTER                         2020-10-09 15:40:11.151000 STARTUP                        7.0.2                                1          2.0                               
MASTER                         2020-10-09 15:50:00.000000 RESTART                        7.0.2                                1          2.0                               
MASTER                         2020-10-09 15:50:13.353000 STARTUP                        7.0.2                                1          2.0                               
MASTER                         2020-10-09 17:05:34.462000 BACKUP_START                   7.0.2                                1          2.0                               
MASTER                         2020-10-09 17:09:00.000000 RESTART                        7.0.2                                1          2.0                               
MASTER                         2020-10-09 20:37:37.643000 STARTUP                        7.0.2                                1          2.0                               
MASTER                         2020-10-09 20:43:39.873000 BACKUP_START                   7.0.2                                1          2.0                            

As there is no catalog table or view to monitor the backups processes (at least not to my knowledge), the question about parallelism remains unanswered. What we can answer, is the question about compressed backups. Checking the S3 bucket we can see that the backup is only 3.4 GB so that for sure is compressed:
.

In the community edition of Exasol you can not connect via SSH as root so there is no way of digging deeper in what’s going on. That’s it for now, in the next post we’ll bring up a real multi-node cluster in AWS, maybe we see more there. Once that is up an running we’ll have a look at what Exasol is doing differently from other database (data distribution, paritioning, …).

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Daniel Westermann
Daniel Westermann

Principal Consultant & Technology Leader Open Infrastructure