Infrastructure at your Service

Cesare Cervini

A SQLite extension for gawk (part III)

Welcome to part III of a three-part article on extending gawk with a SQLite binding. Part I is here. Part II is here.
In this final part, I’ll explain succinctly the code that was listed in part II.
This part is best read while having Part II opened in another browser window.

sqlite_gawk.c

Line 48: All the operations on a database requires a handle that is initialized once the database is successfully opened. The handle is of type (sqlite3 *). Obviously, gawk does not know pointers; therefore what gets returned to gawk after a database has been opened is an integer which is an index into a static array of (sqlite3 *). The array has a fixed size of 100 for up to 100 concurrently opened database, which should be largely enough (if not, just increase MAX_DB on line 46 and recompile, or implement a dynamically resizable array instead). When a database is closed, the free slot is of course recycled for the next database to open.

Line 56: Some initialization is performed, mostly of the array of database handles.

Lines 77 to 315: this code has been lifted from sqlite3.c, the SQLite shell, and only slightly edited. It provides the SQL extension functions readfile() (read file into a blob) and writefile() (dumps a blob into file) for working with blobs from SQL statements. These functions can then be used seamlessly as follows:

INSERT INTO test_with_blob(n1, my_blob) VALUES(1, readfile("gawk-4.2.1.tar.gz")
SELECT n1, writefile('yy' || rowid, my_blob) FROM test_with_blob
UPDATE test_with_blob set my_blob = readfile('/home/dmadmin/dmgawk/gawk-4.2.1/extension/sqlite_gawk.c') where n1 = 1000

SQLite makes it possible to extend the set of functions that can be called in SQL statements. The function sqlite3_create_function() allows to register a new user-defined function, which must comply with a standard profile. Afterwards, they are invoked by SQLite as callbacks if they are present in SQL statements. This is not an exclusive feature since Oracle for example allows such functions but they are stored functions written in PL/SQL (or in java or C/C++ for external procedures with a PL/SQL interface to the database). SQLite even allows those user-defined functions to perform aggregation like the usual sum(), count() or avg() functions, which is useful for example to supply the missing stddev() function.
readfile() works with a buffer allocated from memory with the same size as the file to read, which can can be problematic with huge files, but blob size is currently limited to about 10^9 bytes by the compilation, and to about 2 GiB by the implementation (see the limits here). A readfile() test with a 900 millions bytes file worked flawlessly (see the test script tsqlite.awk). In memory challenged machines, SQLite incremental blob I/O functions have been put to good use to work with smaller buffers (see overloadd function sqlite_exec() starting at line 406).

Line 323: get_free_sqlite_handle() is called each time a database is opened. It returns the index of the next free slot in the array of database handles. It does a sequential search in the array but again a maximum of 100 opened database are currently supported, so it is acceptable.

Line 335: do_sqllite_open() is the interface’s wrapper around SQLite sqlite3_open(). It passes it the name of the database and, if successful, receives a database handle. SQLite databases are entirely contained in one file and no password are needed to open it. The received handle is saved into the array of handles and its index is returned to gawk for future use. When gawk calls another function for the opened database, it passes this index to the function which looks into the array for the corresponding handle, which is used to talk to SQlite functions.
In do_sqllite_open(), the readfile() and writefile() user-defined functions are registered to be used in SQL statements as callbacks.

Line 387: do_sqlite_close() closes a database and marks the slot in the array of handles as available. Nothing special here.

Line 406: do_sqllite_exec(). This is a big one as, in addition to the expected DML/DDL statements, it also deals with incremental blob I/Os. It implements both acceptations of the gawk function sqlite_exec(). Both are distinguishable from each other by the number of parameters they take:
1. int sqlite_exec(db, stmt);
2. sqlite_exec(db, db_name, table, blob_column, rowid, “readfile(file_name)” | “writefile(file_name)”);
The first version takes 2 parameters, the database handle and the expected DML/DDL statement to send to the database. It returns 0 if OK and -1 if not.
The second one is for incremental blob I/Os and takes 6 parameters:
the database handle;
the database name; this is an surprising one but is required later by SQLite in function sqlite3_blob_open() documented here. It is a symbolic name (not the file name of the opened database) and must be set to either “main” for the main database, to “temp” for a temporary table, and to the name after the AS clause for ATTACHed databases (see here).
the triplet (table, column, rowid) is also required to identify the blob to access;
the last parameter tells the operation to perform on the blob, either update it with a content of from a file (parameter “readfile(filename)”) or dump it to a file (parameter “writefile(filename”)). Here, contrary to what has been said for lines 77 to 315, readfile and writefile are not user-defined SQL extensions but just place-holders for the requested operations. An alternative syntax could have been:

sqlite_exec(db, db_name, table, blob_column, rowid, "read" | "write", file_name);
or
sqlite_exec(db, db_name, table, blob_column, rowid, "load" | "dump", file_name);

but it was felt that reusing the verbs readfile() and writefile() was more standard, albeit admittedly a bit weird.
The filename between the parentheses should not be quoted for the real parameters to sqlite_exec() are the strings “readfile(filename)” or “writefile(filename)”. Internally, cf. starting at line 500, they are parsed and the operation and the filename get extracted from the strings; see regular expressions strings *RE_readfile and *RE_writefile.
When reading into an existing blob, SQLite does not allow its size to be changed in function sqlite3_blob_write(), therefore the blob is first UPDATEd with a value of the new size. This is done by assigning it zero bytes via the SQLite function zeroblob() during the UPDATE; this function is efficient in that it does not need to first instantiate a memory buffer of that size and fill it with zeros before assigning it to the blob. Once the blob is updated, it must be reopened.

Line 573 and 605: For an efficient transfer of data between the file and the blob, a 10 MiB large memory buffer is allocated.
Overall, there are a lot of goto statements in this function. Don’t be shocked or disgusted and rather consider them as “exit

Line 678: All the data functions that displays results from a SQL SELECT statement are actually callbacks; they are invoked by SQLite sqlite3_exec() from within the extension do_sqlite_select() and must comply to a precise profile. In order to keep some control of their behavior from our own code, it is possible to pass them a pointer to some data, which is the purpose of the DISPLAYED_TABLE C struct. It contains formatting parameters and on familiar-to-awker variable, NR, which gets updated by the callbacks at each call. sqlite3_exec() invokes the callback for each row that it fetches from the database, and the callback (all the select_callback_* except select_callback_array which stores it into a gawk associative array, more on this later) displays that row according to the parameters in struct DISPLAYED_TABLE. The callbacks are:

select_callback_raw()
select_callback_draft()
select_callback_sized()
select_callback_array()

Each of these displays the rows with its particular format, as discussed in here, so I won’t repeat myself here. As it is expected, presentation code is always quite lengthy, especially for all this line-truncation with or without ellipsis, and line-wrapping.

Line 960: An interesting functionality, much shorter to implement since no presentation is required, is storing the rows into a gawk associative array, done in function select_callback_array(). When gawk invokes sqlite_select() with 4 parameters (the 3rd is a dummy one introduced just to tell the acceptations apart), select_callback_array() is selected as sqlite3_exec()’s callback, e.g.:

sqlite_select(db, "select rowid from test_with_blob where n1 = 1000 limit 1", 0, a_test)

The associative array a_test receives the rows and columns and stores them as shown in the function’s comment. On lines 1000 to 1011, another approach was attempted, which however I couldn’t make work. This is not so bad as it looked a bit weird. It consisted in programmatically creating the gawk array (with a the given name) and injecting it into the gawk script so that it would magically appear in the script’s global variables. The gawk call would have been:

sqlite_select(db, "select rowid from test_with_blob where n1 = 1000 limit 1", 0, "a_test")

Note the string “a_test”, the name of the array to create. Upon returning, an array with that name would have been created and added to the script’s symbol table, and could be accessed as usual through a_test[line][col]. Currently, the array must already exist and have been “typed” as an array in the gawk script, which is more natural. The array is then filled using gawk’s extension API as documented in the Gawk Manual.

Line 1190: do_sqlite_select() terminates by directly invoking the callbacks to give them the opportunity to print an epilogue such as the number of processed rows or optimum column widths (by select_callback_draft()).
Line 1206: the extension functions are registered in gawk as usual. Note the varying arities, do_sqlite_exec() takes from 2 to 6 parameters while do_sqlite_select() takes from 2 to 4 parameters.

tsqlite.awk

This is the gawk test script.

Line 6: the SQLite extension is loaded here. The name is the one that was given to it on line 1218 of sqlite_gawk.c.

Line 9 to 20: Database opening is tested here. Apparently, it is not a problem to have the same database opened several times. In effect, SQLite allows concurrent reading but locks the entire database when a writing is performed; its concurrency is far limited here but in practice it is not a problem as the locks stay for a few milliseconds only.

Line 23: sqlite_exec() is tested with a DDL statement. Actually, SQLite allows to insert any value in a column, even though the types and sizes don’t match. It uses type affinity to convert between types (see here for details). It’s even possible to store a blob in a CHAR(10) column.

Lines 26 to 37: sqlite_exec() is tested with a few DML statements.

Lines 40 to 51: The select into a gawk array feature is tested. A dummy 3rd parameter is inserted to force the select_callback_array() callback to be invoked. The result is output through 2 ways: the dumparray() function from the manual (its code is given at the end of the script), a function that uses the gawk isarray() introspection function to recursively display an array (and possibly, of array of … etc.). This test confirms that the receiving array has the right structure, an integer-indexed array of associative arrays indexed by column names.

Lines 53 to 112: we test the different output formats from the other 3 callbacks select_callback_raw(), select_callback_draft() and select_callback_sized(). As said, these functions are selected based on the type and number of parameters of the overloaded function sqlite_select()

Line 114: sqlite_exec() tests the DELETE DML statement. To be noted that SQLite performs a TRUNCATE TABLE optimization here when no condition is present (i.e. there is no WHERE clause).

Lines 126 to 181: we test the blob operations through the SQL extension functions readfile() and writefile(). Files of different sizes, up to 990 million bytes are stuffed into a blob with no problem, except the ingestion time is quite perceptible for big files. As said before, the implementation of the readfile() function that reads data from a disk file int a blob allocated a buffer in memory to entirely held the file. If it fails, incremental blob I/O functions are available, read on.

Lines 183 to 200: They test the incremental blob I/O functions. These allow to work around any memory size problem as they use a fixed-sized 10 MiB buffer insted of allocating one the size of the file to read in or one for the blob to dump out. As sqlite_exec() needs here the blob’s rowid, it must first be read through a SELECT statement. The received value is passed into a gawk array a_test in a_test[0][“rowid”], ie. the result has one row only (stored in a_test[0]) and this row has only one column, named “rowid”. The currently opened database has the alias “main”. On line 191, note how writefile(filename) is quoted so the function is not evaluated by the gawk interpreter but passed through to do_sqlite_exec(). Note also how filename is not quoted, it does not need to be; it may look strange though and if this is insufferable just edit the REs on lines 501 and 502 of the extension to add them. Beware that they’ll be enclosed between an already quoted string so some escaping might be necessary to placate gawk’s parsing.

Lines 201 to 214: We are testing the on-screen formatting of the columns when a large chunk of text from a blob is displayed. We noticed that if the text contains line feeds, it corrupts the display; this why the replace() SQL function is used on line 210 to change it to the literal string “\\n”, which won’t be interpreted by the terminal.

Line 216: The database is closed.

A stress test

How about stressing SQLite and it gawk interface a little bit ? Here is a very simple script that recursively scans a given file directory up to a predefined depth and inserts the file entries, a few metadata and content, into a table.

# stress test program for the sqlite_gawk, interface to sqlite3;
# Cesare Cervini
# dbi-services.com
# 8/2018

# the test consists in scanning a disk directory for specifed files and loading into a sqlite db table all the files's metadata and content;
# the database stress-test.db must have been created beforehand;
# the used table will be created by the test;
# Usage:
#   AWKLIBPATH=gawk-4.2.1/extension/.libs find ~/ -type f -printf "%M %u %g %s %TY-%Tm-%Td %TH:%TM %p\n" | gawk-4.2.1/gawk -f stress_sqlite.awk
# or (if if AWKLIBPATH is defined appropriately or the SQLite extension has been deployed on system-wide):
#   find ~/ -type f -L -printf "%M %u %g %s %TY-%Tm-%Td %TH:%TM %p\n" | gawk -f stress_sqlite.awk
# for testing:
#    max_depth=5; limit=10; AWKLIBPATH=gawk-4.2.1/extension/.libs find ~/ -maxdepth $max_depth -type f -printf "%M %u %g %s %TY-%Tm-%Td %TH:%TM %p\n" | gawk -v limit=$limit -f stress_sqlite.awk
#

@load "sqlite_gawk"

BEGIN {
   my_db = sqlite_open("/home/dmadmin/dmgawk/stress-test.db")
   print "my_db =", my_db, "after db opened"

   rc = sqlite_select(my_db, "SELECT sqlite_version()", 20)
   printf "\n"

   rc = sqlite_exec(my_db, "DROP TABLE IF EXISTS stress")
   rc = sqlite_exec(my_db, "vacuum")

   rc = sqlite_exec(my_db, "CREATE TABLE stress(" \
                                                "file_type CHAR,"     \
                                                "permissions TEXT,"   \
                                                "owner_name TEXT,"    \
                                                "group_name TEXT,"    \
                                                "file_size NUMBER,"   \
                                                "file_date DATE,"     \
                                                "file_name TEXT,"     \
                                                "content_type TEXT,"  \
                                                "content BLOB)")
   print "return code=", rc, "after table creation"
   printf "\n"
}
{
   file_type = substr($1, 1, 1)
   permissions = substr($1, 2)
   owner_name = $2
   group_name = $3
   file_size = $4
   file_date = $5 " " $6
   file_name = $7
   cmd = "file -b " $7; cmd | getline content_type; close(cmd)
  
   stmt = sprintf("INSERT INTO stress(file_type, permissions, owner_name, group_name, file_size, file_date, file_name, content_type, content) VALUES(\"%s\", \"%s\", \"%s\", \"%s\", \"%s\", \"%s\", \"%s\", \"%s\", %s)",
                                      file_type, permissions, owner_name, group_name, file_size, file_date, file_name, content_type, "readfile('" file_name "')")
   print stmt
   rc = sqlite_exec(my_db, stmt)
   printf("return code=%d after insertion\n\n", rc)

   if (NR >= limit)
      exit
}
END {
   printf "\n"
   rc = sqlite_select(my_db, "SELECT file_type, permissions, owner_name, group_name, file_size, file_date, file_name, content_type, CASE WHEN instr(content_type, 'ASCII text') > 0 THEN " \
                             "CASE WHEN length(content) > 200 THEN replace(replace(replace(substr(content, 1, 200) || '...', '\n', '\\n'), '\t', '\\t'), '\r', '\\r') ELSE replace(replace(replace(content, '\n', '\\n'), '\t', '\\t'), '\r', '\\r') END ELSE substr(hex(content), 1, 200) || '...' END as text FROM stress", 0, my_array)
   print "return code = ", rc, "after select into array"
   dumparray("my_array", my_array)

   printf "\n"
   rc = sqlite_select(my_db, "SELECT file_type, permissions, owner_name, group_name, file_size, file_date, file_name, content_type, CASE WHEN instr(content_type, 'ASCII text') > 0 THEN " \
                             "CASE WHEN length(content) > 200 THEN replace(replace(replace(substr(content, 1, 200) || '...', '\n', '\\n'), '\t', '\\t'), '\r', '\\r') ELSE replace(replace(replace(content, '\n', '\\n'), '\t', '\\t'), '\r', '\\r') END ELSE substr(hex(content), 1, 200) || '...' END as text FROM stress", "9 11 10 10 9 16 30w 12w 50w")
   print "return code = ", rc, "after select"

   printf "\n"
   rc = sqlite_close(my_db)
   print "return code=", rc, "after closing database"
}

function dumparray(name, array, i) {
   for (i in array)
      if (isarray(array[i]))
         dumparray(name "[\"" i "\"]", array[i])
      else
         printf("%s[\"%s\"] = %s\n", name, i, array[i])
}

One line 23, the version of SQLite is printed; it should be 3.24.0.
On line 28, the stress table is created with a suitable structure to accept the files’ metadata of interest as output by the “ls -l” command, e.g.:

-r-xr-xr-x 1 dmadmin dmadmin 1641005 2017-01-26 05:14 /home/dmadmin/oracle/instantclient_12_2/libsqlplusic.so

Lines 43 to 50 extract into variables the metadata from each line of input to gawk from the find command. This intermediary step has been introduced for clarity as $* variables could directly be used on line 52.
Line 52 builds the INSERT SQL statement that gets passed to line 55 for execution.
Line 58 checks whether the file count limit has been reached and aborts the loop is so.
Finally, line 63 extracts the content of the stress table into an array which is dumped on line 66 and line 69 displays it in such a way that the output is not too messy.

Here is an example of the result:

max_depth=5; limit=20; find ~/ -maxdepth $max_depth -type f -printf "%M %u %g %s %TY-%Tm-%Td %TH:%TM %p\n" | gawk -v limit=$limit -f stress_sqlite.awk
my_db = 0 after db opened
sqlite_version()
--------------------
3.24.0
1 rows selected

return code= 0 after table creation

INSERT INTO stress(file_type, permissions, owner_name, group_name, file_size, file_date, file_name, content_type, content) VALUES("-", "rw-rw----", "dmadmin", "dmadmin", "2009", "2017-11-17 18:16", "/home/dmadmin/oraInventory/logs/oraInstall2017-11-17_06-11-23PM.err", "ASCII text", readfile('/home/dmadmin/oraInventory/logs/oraInstall2017-11-17_06-11-23PM.err'))
return code=0 after insertion

INSERT INTO stress(file_type, permissions, owner_name, group_name, file_size, file_date, file_name, content_type, content) VALUES("-", "rw-rw----", "dmadmin", "dmadmin", "1309", "2017-11-17 19:25", "/home/dmadmin/oraInventory/logs/oraInstall2017-11-17_06-11-23PM.out", "ASCII text", readfile('/home/dmadmin/oraInventory/logs/oraInstall2017-11-17_06-11-23PM.out'))
return code=0 after insertion

INSERT INTO stress(file_type, permissions, owner_name, group_name, file_size, file_date, file_name, content_type, content) VALUES("-", "rw-rw----", "dmadmin", "dmadmin", "2973490", "2017-11-17 19:29", "/home/dmadmin/oraInventory/logs/installActions2017-11-17_06-11-23PM.log", "ASCII text, with very long lines", readfile('/home/dmadmin/oraInventory/logs/installActions2017-11-17_06-11-23PM.log'))
return code=0 after insertion

INSERT INTO stress(file_type, permissions, owner_name, group_name, file_size, file_date, file_name, content_type, content) VALUES("-", "rw-rw----", "dmadmin", "dmadmin", "292", "2017-11-17 19:26", "/home/dmadmin/oraInventory/ContentsXML/libs.xml", "XML 1.0 document, ASCII text", readfile('/home/dmadmin/oraInventory/ContentsXML/libs.xml'))
return code=0 after insertion

INSERT INTO stress(file_type, permissions, owner_name, group_name, file_size, file_date, file_name, content_type, content) VALUES("-", "rw-rw----", "dmadmin", "dmadmin", "468", "2017-11-17 19:25", "/home/dmadmin/oraInventory/ContentsXML/inventory.xml", "XML 1.0 document, ASCII text", readfile('/home/dmadmin/oraInventory/ContentsXML/inventory.xml'))
return code=0 after insertion

INSERT INTO stress(file_type, permissions, owner_name, group_name, file_size, file_date, file_name, content_type, content) VALUES("-", "rw-rw----", "dmadmin", "dmadmin", "329", "2017-11-17 19:26", "/home/dmadmin/oraInventory/ContentsXML/comps.xml", "XML 1.0 document, ASCII text", readfile('/home/dmadmin/oraInventory/ContentsXML/comps.xml'))
return code=0 after insertion

INSERT INTO stress(file_type, permissions, owner_name, group_name, file_size, file_date, file_name, content_type, content) VALUES("-", "rw-rw----", "dmadmin", "dmadmin", "60", "2017-11-17 19:26", "/home/dmadmin/oraInventory/oraInst.loc", "ASCII text", readfile('/home/dmadmin/oraInventory/oraInst.loc'))
return code=0 after insertion

INSERT INTO stress(file_type, permissions, owner_name, group_name, file_size, file_date, file_name, content_type, content) VALUES("-", "rwxrwx---", "dmadmin", "dmadmin", "1661", "2017-11-17 19:26", "/home/dmadmin/oraInventory/orainstRoot.sh", "POSIX shell script, ASCII text executable", readfile('/home/dmadmin/oraInventory/orainstRoot.sh'))
return code=0 after insertion

INSERT INTO stress(file_type, permissions, owner_name, group_name, file_size, file_date, file_name, content_type, content) VALUES("-", "rw-rw----", "dmadmin", "dmadmin", "362", "2017-11-17 18:16", "/home/dmadmin/oraInventory/oui/srcs.lst", "XML 1.0 document, ASCII text", readfile('/home/dmadmin/oraInventory/oui/srcs.lst'))
return code=0 after insertion

INSERT INTO stress(file_type, permissions, owner_name, group_name, file_size, file_date, file_name, content_type, content) VALUES("-", "rw-------", "dmadmin", "dmadmin", "4747", "2018-09-01 13:59", "/home/dmadmin/.sqlite_history", "assembler source, ASCII text", readfile('/home/dmadmin/.sqlite_history'))
return code=0 after insertion

INSERT INTO stress(file_type, permissions, owner_name, group_name, file_size, file_date, file_name, content_type, content) VALUES("-", "rw-rw-r--", "dmadmin", "dmadmin", "176", "2018-08-08 22:49", "/home/dmadmin/mp-fact.awk", "ASCII text", readfile('/home/dmadmin/mp-fact.awk'))
return code=0 after insertion

INSERT INTO stress(file_type, permissions, owner_name, group_name, file_size, file_date, file_name, content_type, content) VALUES("-", "rwxrwxr-x", "dmadmin", "dmadmin", "24951775", "2016-11-04 12:13", "/home/dmadmin/setup_files/dfcUnix.zip", "Zip archive data, at least v1.0 to extract", readfile('/home/dmadmin/setup_files/dfcUnix.zip'))
return code=0 after insertion

INSERT INTO stress(file_type, permissions, owner_name, group_name, file_size, file_date, file_name, content_type, content) VALUES("-", "rwxrwxr-x", "dmadmin", "dmadmin", "2179169", "2016-11-04 12:24", "/home/dmadmin/setup_files/serviceWrapperManager.jar", "Java archive data (JAR)", readfile('/home/dmadmin/setup_files/serviceWrapperManager.jar'))
return code=0 after insertion

INSERT INTO stress(file_type, permissions, owner_name, group_name, file_size, file_date, file_name, content_type, content) VALUES("-", "rwxrwxr-x", "dmadmin", "dmadmin", "53170976", "2005-03-23 15:29", "/home/dmadmin/setup_files/53/dfcSetup.jar", "Zip archive data, at least v2.0 to extract", readfile('/home/dmadmin/setup_files/53/dfcSetup.jar'))
return code=0 after insertion

INSERT INTO stress(file_type, permissions, owner_name, group_name, file_size, file_date, file_name, content_type, content) VALUES("-", "rwxrwxr-x", "dmadmin", "dmadmin", "129883", "2005-03-20 08:29", "/home/dmadmin/setup_files/53/consistency_checker.ebs", "ASCII text, with CRLF line terminators", readfile('/home/dmadmin/setup_files/53/consistency_checker.ebs'))
return code=0 after insertion

INSERT INTO stress(file_type, permissions, owner_name, group_name, file_size, file_date, file_name, content_type, content) VALUES("-", "rw-rw-r--", "dmadmin", "dmadmin", "803", "2017-11-28 09:44", "/home/dmadmin/setup_files/53/init53", "ASCII text", readfile('/home/dmadmin/setup_files/53/init53'))
return code=0 after insertion

INSERT INTO stress(file_type, permissions, owner_name, group_name, file_size, file_date, file_name, content_type, content) VALUES("-", "rwxrwxr-x", "dmadmin", "dmadmin", "174586519", "2005-03-23 20:16", "/home/dmadmin/setup_files/53/server.jar", "Zip archive data, at least v2.0 to extract", readfile('/home/dmadmin/setup_files/53/server.jar'))
return code=0 after insertion

INSERT INTO stress(file_type, permissions, owner_name, group_name, file_size, file_date, file_name, content_type, content) VALUES("-", "rwxrwxr-x", "dmadmin", "dmadmin", "9843412", "2004-12-03 01:25", "/home/dmadmin/setup_files/53/tomcat4127Setup.jar", "Zip archive data, at least v2.0 to extract", readfile('/home/dmadmin/setup_files/53/tomcat4127Setup.jar'))
return code=0 after insertion

INSERT INTO stress(file_type, permissions, owner_name, group_name, file_size, file_date, file_name, content_type, content) VALUES("-", "rwxrwxr-x", "dmadmin", "dmadmin", "5587942", "2005-03-23 15:27", "/home/dmadmin/setup_files/53/bofcollaborationSetup.jar", "Zip archive data, at least v2.0 to extract", readfile('/home/dmadmin/setup_files/53/bofcollaborationSetup.jar'))
return code=0 after insertion

INSERT INTO stress(file_type, permissions, owner_name, group_name, file_size, file_date, file_name, content_type, content) VALUES("-", "rwxrwxr-x", "dmadmin", "dmadmin", "5900841", "2005-03-23 15:27", "/home/dmadmin/setup_files/53/bofworkflowSetup.jar", "Zip archive data, at least v2.0 to extract", readfile('/home/dmadmin/setup_files/53/bofworkflowSetup.jar'))
return code=0 after insertion

20 rows selected
return code = 0 after select into array
my_array["0"]["owner_name"] = dmadmin
my_array["0"]["file_type"] = -
my_array["0"]["group_name"] = dmadmin
my_array["0"]["permissions"] = rw-rw----
my_array["0"]["text"] = java.lang.NullPointerException\n\tat oracle.sysman.oii.oiin.OiinNetOps.addNICInfo(OiinNetOps.java:144)\n\tat oracle.sysman.oii.oiin.OiinNetOps.computeNICList(OiinNetOps.java:109)\n\tat oracle.sysman.oii.oii...
my_array["0"]["file_name"] = /home/dmadmin/oraInventory/logs/oraInstall2017-11-17_06-11-23PM.err
my_array["0"]["file_size"] = 2009
my_array["0"]["file_date"] = 2017-11-17 18:16
my_array["0"]["content_type"] = ASCII text
my_array["1"]["owner_name"] = dmadmin
my_array["1"]["file_type"] = -
my_array["1"]["group_name"] = dmadmin
my_array["1"]["permissions"] = rw-rw----
my_array["1"]["text"] = You can find the log of this install session at:\n /home/dmadmin/oraInventory/logs/installActions2017-11-17_06-11-23PM.log\nError in invoking target 'install' of makefile '/home/dmadmin/oracle/database/...
my_array["1"]["file_name"] = /home/dmadmin/oraInventory/logs/oraInstall2017-11-17_06-11-23PM.out
my_array["1"]["file_size"] = 1309
my_array["1"]["file_date"] = 2017-11-17 19:25
my_array["1"]["content_type"] = ASCII text
my_array["2"]["owner_name"] = dmadmin
my_array["2"]["file_type"] = -
my_array["2"]["group_name"] = dmadmin
my_array["2"]["permissions"] = rw-rw----
my_array["2"]["text"] = INFO: Using paramFile: /home/dmadmin/Downloads/database/install/oraparam.ini\nINFO: \nINFO: \nINFO: Checking Temp space: must be greater than 500 MB. Actual 75674 MB Passed\nINFO: Checking swap space...
my_array["2"]["file_name"] = /home/dmadmin/oraInventory/logs/installActions2017-11-17_06-11-23PM.log
my_array["2"]["file_size"] = 2973490
my_array["2"]["file_date"] = 2017-11-17 19:29
my_array["2"]["content_type"] = ASCII text, with very long lines
my_array["3"]["owner_name"] = dmadmin
my_array["3"]["file_type"] = -
my_array["3"]["group_name"] = dmadmin
my_array["3"]["permissions"] = rw-rw----
my_array["3"]["text"] = \n\n\n\n<QUER...
my_array["3"]["file_name"] = /home/dmadmin/oraInventory/ContentsXML/libs.xml
my_array["3"]["file_size"] = 292
my_array["3"]["file_date"] = 2017-11-17 19:26
my_array["3"]["content_type"] = XML 1.0 document, ASCII text
my_array["4"]["owner_name"] = dmadmin
my_array["4"]["file_type"] = -
my_array["4"]["group_name"] = dmadmin
my_array["4"]["permissions"] = rw-rw----
my_array["4"]["text"] = \n\n\n\n<VER...
my_array["4"]["file_name"] = /home/dmadmin/oraInventory/ContentsXML/inventory.xml
my_array["4"]["file_size"] = 468
my_array["4"]["file_date"] = 2017-11-17 19:25
my_array["4"]["content_type"] = XML 1.0 document, ASCII text
my_array["5"]["owner_name"] = dmadmin
my_array["5"]["file_type"] = -
my_array["5"]["group_name"] = dmadmin
my_array["5"]["permissions"] = rw-rw----
my_array["5"]["text"] = \n\n\n\n<TL_L...
my_array["5"]["file_name"] = /home/dmadmin/oraInventory/ContentsXML/comps.xml
my_array["5"]["file_size"] = 329
my_array["5"]["file_date"] = 2017-11-17 19:26
my_array["5"]["content_type"] = XML 1.0 document, ASCII text
my_array["6"]["owner_name"] = dmadmin
my_array["6"]["file_type"] = -
my_array["6"]["group_name"] = dmadmin
my_array["6"]["permissions"] = rw-rw----
my_array["6"]["text"] = inventory_loc=/home/dmadmin/oraInventory\ninst_group=dmadmin\n
my_array["6"]["file_name"] = /home/dmadmin/oraInventory/oraInst.loc
my_array["6"]["file_size"] = 60
my_array["6"]["file_date"] = 2017-11-17 19:26
my_array["6"]["content_type"] = ASCII text
my_array["7"]["owner_name"] = dmadmin
my_array["7"]["file_type"] = -
my_array["7"]["group_name"] = dmadmin
my_array["7"]["permissions"] = rwxrwx---
my_array["7"]["text"] = #!/bin/sh\nAWK=/bin/awk\nCHMOD=/bin/chmod\nCHGRP=/bin/chgrp\nCP=/bin/cp\nECHO=/bin/echo\nMKDIR=/bin/mkdir\nRUID=`/usr/bin/id|$AWK -F\( '{print $1}'|$AWK -F\= '{print $2}'`\nif [ ${RUID} != "0" ];then\n $ECHO...
my_array["7"]["file_name"] = /home/dmadmin/oraInventory/orainstRoot.sh
my_array["7"]["file_size"] = 1661
my_array["7"]["file_date"] = 2017-11-17 19:26
my_array["7"]["content_type"] = POSIX shell script, ASCII text executable
my_array["8"]["owner_name"] = dmadmin
my_array["8"]["file_type"] = -
my_array["8"]["group_name"] = dmadmin
my_array["8"]["permissions"] = rw-rw----
my_array["8"]["text"] = \n\n\n\n...
my_array["8"]["file_name"] = /home/dmadmin/oraInventory/oui/srcs.lst
my_array["8"]["file_size"] = 362
my_array["8"]["file_date"] = 2017-11-17 18:16
my_array["8"]["content_type"] = XML 1.0 document, ASCII text
my_array["9"]["owner_name"] = dmadmin
my_array["9"]["file_type"] = -
my_array["9"]["group_name"] = dmadmin
my_array["9"]["permissions"] = rw-------
my_array["9"]["text"] = .help\n.help\n.exit\n.help create\n.tables\ncreate table test(a1 number)\ngo\n.tables\nselect * from test\n/\nselect * from test\ngo\n.save test.db\n.exit\n.open stress-test.db\n.h\n.healp\n.help\n.schema\n.schema %\n.sc...
my_array["9"]["file_name"] = /home/dmadmin/.sqlite_history
my_array["9"]["file_size"] = 4747
my_array["9"]["file_date"] = 2018-09-01 13:59
my_array["9"]["content_type"] = assembler source, ASCII text
my_array["10"]["owner_name"] = dmadmin
my_array["10"]["file_type"] = -
my_array["10"]["group_name"] = dmadmin
my_array["10"]["permissions"] = rw-rw-r--
my_array["10"]["text"] = echo 10000 | gawk -M '{facto = factorial($0); printf("%d\n%d digits\n", facto, length(facto))} function factorial(n) {r = 1; for (i = 1; i <= n; i++) r *= i; return r}' | less\n
my_array["10"]["file_name"] = /home/dmadmin/mp-fact.awk
my_array["10"]["file_size"] = 176
my_array["10"]["file_date"] = 2018-08-08 22:49
my_array["10"]["content_type"] = ASCII text
my_array["11"]["owner_name"] = dmadmin
my_array["11"]["file_type"] = -
my_array["11"]["group_name"] = dmadmin
my_array["11"]["permissions"] = rwxrwxr-x
my_array["11"]["text"] = 504B03040A0000000000A9596449000000000000000000000000040000006466632F504B03040A000000000071863D490000000000000000000000000D0000006466632F6C6963656E7365732F504B03040A000000000071863D49000000000000000000...
my_array["11"]["file_name"] = /home/dmadmin/setup_files/dfcUnix.zip
my_array["11"]["file_size"] = 24951775
my_array["11"]["file_date"] = 2016-11-04 12:13
my_array["11"]["content_type"] = Zip archive data, at least v1.0 to extract
my_array["12"]["owner_name"] = dmadmin
my_array["12"]["file_type"] = -
my_array["12"]["group_name"] = dmadmin
my_array["12"]["permissions"] = rwxrwxr-x
my_array["12"]["text"] = 504B03040A0000000000785BD842000000000000000000000000090004004D4554412D494E462FFECA0000504B03040A0000000800775BD8428241A6B65C00000066000000140000004D4554412D494E462F4D414E49464553542E4D46F34DCCCB4C4B2D...
my_array["12"]["file_name"] = /home/dmadmin/setup_files/serviceWrapperManager.jar
my_array["12"]["file_size"] = 2179169
my_array["12"]["file_date"] = 2016-11-04 12:24
my_array["12"]["content_type"] = Java archive data (JAR)
my_array["13"]["owner_name"] = dmadmin
my_array["13"]["file_type"] = -
my_array["13"]["group_name"] = dmadmin
my_array["13"]["permissions"] = rwxrwxr-x
my_array["13"]["text"] = 504B030414000800080036337732000000000000000000000000140000004D4554412D494E462F4D414E49464553542E4D46F34DCCCB4C4B2D2ED10D4B2D2ACECCCFB35230D433E0E5722E4A4D2C494DD175AAB452F0CC2B2E49CCC909CEC84CCD49E1E5...
my_array["13"]["file_name"] = /home/dmadmin/setup_files/53/dfcSetup.jar
my_array["13"]["file_size"] = 53170976
my_array["13"]["file_date"] = 2005-03-23 15:29
my_array["13"]["content_type"] = Zip archive data, at least v2.0 to extract
my_array["14"]["owner_name"] = dmadmin
my_array["14"]["file_type"] = -
my_array["14"]["group_name"] = dmadmin
my_array["14"]["permissions"] = rwxrwxr-x
my_array["14"]["text"] = '####################################################################\r\n'\r\n' Name:\r\n'\r\n' consistency_checker.ebs\r\n'\r\n' Description:\r\n'\r\n' Docbase Consistency Checker Job and Utilities\r\n'\r\n' NOTE:...
my_array["14"]["file_name"] = /home/dmadmin/setup_files/53/consistency_checker.ebs
my_array["14"]["file_size"] = 129883
my_array["14"]["file_date"] = 2005-03-20 08:29
my_array["14"]["content_type"] = ASCII text, with CRLF line terminators
my_array["15"]["owner_name"] = dmadmin
my_array["15"]["file_type"] = -
my_array["15"]["group_name"] = dmadmin
my_array["15"]["permissions"] = rw-rw-r--
my_array["15"]["text"] = export DOCUMENTUM=/home/dmadmin/documentum53\nexport DM_HOME=$DOCUMENTUM/product/5.3\nexport DOCUMENTUM_SHARED=$DOCUMENTUM/shared\nexport dfcpath=$DOCUMENTUM_SHARED/dfc\n\n#export JAVA_HOME=$DOCUMENTUM_SHA...
my_array["15"]["file_name"] = /home/dmadmin/setup_files/53/init53
my_array["15"]["file_size"] = 803
my_array["15"]["file_date"] = 2017-11-28 09:44
my_array["15"]["content_type"] = ASCII text
my_array["16"]["owner_name"] = dmadmin
my_array["16"]["file_type"] = -
my_array["16"]["group_name"] = dmadmin
my_array["16"]["permissions"] = rwxrwxr-x
my_array["16"]["text"] = 504B0304140008000800BB597732000000000000000000000000140000004D4554412D494E462F4D414E49464553542E4D46F34DCCCB4C4B2D2ED10D4B2D2ACECCCFB35230D433E0E5722E4A4D2C494DD175AAB452F0CC2B2E49CCC909CEC84CCD49E1E5...
my_array["16"]["file_name"] = /home/dmadmin/setup_files/53/server.jar
my_array["16"]["file_size"] = 174586519
my_array["16"]["file_date"] = 2005-03-23 20:16
my_array["16"]["content_type"] = Zip archive data, at least v2.0 to extract
my_array["17"]["owner_name"] = dmadmin
my_array["17"]["file_type"] = -
my_array["17"]["group_name"] = dmadmin
my_array["17"]["permissions"] = rwxrwxr-x
my_array["17"]["text"] = 504B030414000800080031838231000000000000000000000000140000004D4554412D494E462F4D414E49464553542E4D46F34DCCCB4C4B2D2ED10D4B2D2ACECCCFB35230D433E0E5722E4A4D2C494DD175AAB452F0CC2B2E49CCC909CEC84CCD49E1E5...
my_array["17"]["file_name"] = /home/dmadmin/setup_files/53/tomcat4127Setup.jar
my_array["17"]["file_size"] = 9843412
my_array["17"]["file_date"] = 2004-12-03 01:25
my_array["17"]["content_type"] = Zip archive data, at least v2.0 to extract
my_array["18"]["owner_name"] = dmadmin
my_array["18"]["file_type"] = -
my_array["18"]["group_name"] = dmadmin
my_array["18"]["permissions"] = rwxrwxr-x
my_array["18"]["text"] = 504B0304140008000800E3317732000000000000000000000000140000004D4554412D494E462F4D414E49464553542E4D46F34DCCCB4C4B2D2ED10D4B2D2ACECCCFB35230D433E0E5722E4A4D2C494DD175AAB452F0CC2B2E49CCC909CEC84CCD49E1E5...
my_array["18"]["file_name"] = /home/dmadmin/setup_files/53/bofcollaborationSetup.jar
my_array["18"]["file_size"] = 5587942
my_array["18"]["file_date"] = 2005-03-23 15:27
my_array["18"]["content_type"] = Zip archive data, at least v2.0 to extract
my_array["19"]["owner_name"] = dmadmin
my_array["19"]["file_type"] = -
my_array["19"]["group_name"] = dmadmin
my_array["19"]["permissions"] = rwxrwxr-x
my_array["19"]["text"] = 504B0304140008000800F5317732000000000000000000000000140000004D4554412D494E462F4D414E49464553542E4D46F34DCCCB4C4B2D2ED10D4B2D2ACECCCFB35230D433E0E5722E4A4D2C494DD175AAB452F0CC2B2E49CCC909CEC84CCD49E1E5...
my_array["19"]["file_name"] = /home/dmadmin/setup_files/53/bofworkflowSetup.jar
my_array["19"]["file_size"] = 5900841
my_array["19"]["file_date"] = 2005-03-23 15:27
my_array["19"]["content_type"] = Zip archive data, at least v2.0 to extract

file_type permissions owner_name group_name file_size file_date file_name content_type text
--------- ----------- ---------- ---------- --------- ---------------- ------------------------------ ------------ --------------------------------------------------
- rw-rw---- dmadmin dmadmin 2009 2017-11-17 18:16 /home/dmadmin/oraInventory/log ASCII text java.lang.NullPointerException\n\tat oracle.sysman
s/oraInstall2017-11-17_06-11-2 .oii.oiin.OiinNetOps.addNICInfo(OiinNetOps.java:14
3PM.err 4)\n\tat oracle.sysman.oii.oiin.OiinNetOps.compute
NICList(OiinNetOps.java:109)\n\tat oracle.sysman.o
ii.oii...
- rw-rw---- dmadmin dmadmin 1309 2017-11-17 19:25 /home/dmadmin/oraInventory/log ASCII text You can find the log of this install session at:\n
s/oraInstall2017-11-17_06-11-2 /home/dmadmin/oraInventory/logs/installActions201
3PM.out 7-11-17_06-11-23PM.log\nError in invoking target '
install' of makefile '/home/dmadmin/oracle/databas
e/...
- rw-rw---- dmadmin dmadmin 2973490 2017-11-17 19:29 /home/dmadmin/oraInventory/log ASCII text, INFO: Using paramFile: /home/dmadmin/Downloads/dat
s/installActions2017-11-17_06- with very lo abase/install/oraparam.ini\nINFO: \nINFO: \nINFO:
11-23PM.log ng lines Checking Temp space: must be greater than 500 MB.
Actual 75674 MB Passed\nINFO: Checking swap s
pace...
- rw-rw---- dmadmin dmadmin 292 2017-11-17 19:26 /home/dmadmin/oraInventory/Con XML 1.0 docu \n\n\n\n
<QUER...
- rw-rw---- dmadmin dmadmin 468 2017-11-17 19:25 /home/dmadmin/oraInventory/Con XML 1.0 docu \n\n\n\
n<VER...
- rw-rw---- dmadmin dmadmin 329 2017-11-17 19:26 /home/dmadmin/oraInventory/Con XML 1.0 docu \n\n\n\n
<TL_L...
- rw-rw---- dmadmin dmadmin 60 2017-11-17 19:26 /home/dmadmin/oraInventory/ora ASCII text inventory_loc=/home/dmadmin/oraInventory\ninst_gro
Inst.loc up=dmadmin\n
- rwxrwx--- dmadmin dmadmin 1661 2017-11-17 19:26 /home/dmadmin/oraInventory/ora POSIX shell #!/bin/sh\nAWK=/bin/awk\nCHMOD=/bin/chmod\nCHGRP=/
instRoot.sh script, ASCI bin/chgrp\nCP=/bin/cp\nECHO=/bin/echo\nMKDIR=/bin/
I text execu mkdir\nRUID=`/usr/bin/id|$AWK -F\( '{print $1}'|$A
table WK -F\= '{print $2}'`\nif [ ${RUID} != "0" ];then\
n $ECHO...
- rw-rw---- dmadmin dmadmin 362 2017-11-17 18:16 /home/dmadmin/oraInventory/oui XML 1.0 docu \n\n\n\n...
- rw------- dmadmin dmadmin 4747 2018-09-01 13:59 /home/dmadmin/.sqlite_history assembler so .help\n.help\n.exit\n.help create\n.tables\ncreate
urce, ASCII table test(a1 number)\ngo\n.tables\nselect * from
text test\n/\nselect * from test\ngo\n.save test.db\n.
exit\n.open stress-test.db\n.h\n.healp\n.help\n.sc
hema\n.schema %\n.sc...
- rw-rw-r-- dmadmin dmadmin 176 2018-08-08 22:49 /home/dmadmin/mp-fact.awk ASCII text echo 10000 | gawk -M '{facto = factorial($0); prin
tf("%d\n%d digits\n", facto, length(facto))} funct
ion factorial(n) {r = 1; for (i = 1; i <= n; i++)
r *= i; return r}' | less\n
- rwxrwxr-x dmadmin dmadmin 24951775 2016-11-04 12:13 /home/dmadmin/setup_files/dfcU Zip archive 504B03040A0000000000A95964490000000000000000000000
nix.zip data, at lea 00040000006466632F504B03040A000000000071863D490000
st v1.0 to e 000000000000000000000D0000006466632F6C6963656E7365
xtract 732F504B03040A000000000071863D49000000000000000000
...
- rwxrwxr-x dmadmin dmadmin 2179169 2016-11-04 12:24 /home/dmadmin/setup_files/serv Java archive 504B03040A0000000000785BD8420000000000000000000000
iceWrapperManager.jar data (JAR) 00090004004D4554412D494E462FFECA0000504B03040A0000
000800775BD8428241A6B65C00000066000000140000004D45
54412D494E462F4D414E49464553542E4D46F34DCCCB4C4B2D
...
- rwxrwxr-x dmadmin dmadmin 53170976 2005-03-23 15:29 /home/dmadmin/setup_files/53/d Zip archive 504B0304140008000800363377320000000000000000000000
fcSetup.jar data, at lea 00140000004D4554412D494E462F4D414E49464553542E4D46
st v2.0 to e F34DCCCB4C4B2D2ED10D4B2D2ACECCCFB35230D433E0E5722E
xtract 4A4D2C494DD175AAB452F0CC2B2E49CCC909CEC84CCD49E1E5
...
- rwxrwxr-x dmadmin dmadmin 129883 2005-03-20 08:29 /home/dmadmin/setup_files/53/c ASCII text, '#################################################
onsistency_checker.ebs with CRLF li ###################\r\n'\r\n' Name:\r\n'\r\n' co
ne terminato nsistency_checker.ebs\r\n'\r\n' Description:\r\n'\
rs r\n' Docbase Consistency Checker Job and Utiliti
es\r\n'\r\n' NOTE:...
- rw-rw-r-- dmadmin dmadmin 803 2017-11-28 09:44 /home/dmadmin/setup_files/53/i ASCII text export DOCUMENTUM=/home/dmadmin/documentum53\nexpo
nit53 rt DM_HOME=$DOCUMENTUM/product/5.3\nexport DOCUMEN
TUM_SHARED=$DOCUMENTUM/shared\nexport dfcpath=$DOC
UMENTUM_SHARED/dfc\n\n#export JAVA_HOME=$DOCUMENTU
M_SHA...
- rwxrwxr-x dmadmin dmadmin 174586519 2005-03-23 20:16 /home/dmadmin/setup_files/53/s Zip archive 504B0304140008000800BB5977320000000000000000000000
erver.jar data, at lea 00140000004D4554412D494E462F4D414E49464553542E4D46
st v2.0 to e F34DCCCB4C4B2D2ED10D4B2D2ACECCCFB35230D433E0E5722E
xtract 4A4D2C494DD175AAB452F0CC2B2E49CCC909CEC84CCD49E1E5
...
- rwxrwxr-x dmadmin dmadmin 9843412 2004-12-03 01:25 /home/dmadmin/setup_files/53/t Zip archive 504B0304140008000800318382310000000000000000000000
omcat4127Setup.jar data, at lea 00140000004D4554412D494E462F4D414E49464553542E4D46
st v2.0 to e F34DCCCB4C4B2D2ED10D4B2D2ACECCCFB35230D433E0E5722E
xtract 4A4D2C494DD175AAB452F0CC2B2E49CCC909CEC84CCD49E1E5
...
- rwxrwxr-x dmadmin dmadmin 5587942 2005-03-23 15:27 /home/dmadmin/setup_files/53/b Zip archive 504B0304140008000800E33177320000000000000000000000
ofcollaborationSetup.jar data, at lea 00140000004D4554412D494E462F4D414E49464553542E4D46
st v2.0 to e F34DCCCB4C4B2D2ED10D4B2D2ACECCCFB35230D433E0E5722E
xtract 4A4D2C494DD175AAB452F0CC2B2E49CCC909CEC84CCD49E1E5
...
- rwxrwxr-x dmadmin dmadmin 5900841 2005-03-23 15:27 /home/dmadmin/setup_files/53/b Zip archive 504B0304140008000800F53177320000000000000000000000
ofworkflowSetup.jar data, at lea 00140000004D4554412D494E462F4D414E49464553542E4D46
st v2.0 to e F34DCCCB4C4B2D2ED10D4B2D2ACECCCFB35230D433E0E5722E
xtract 4A4D2C494DD175AAB452F0CC2B2E49CCC909CEC84CCD49E1E5
...
20 rows selected
return code = 0 after select

return code= 0 after closing database


It can be used as follows:

AWKLIBPATH=gawk-4.2.1/extension/.libs find root_dir -maxdepth max_depth -type f -printf "%M %u %g %s %TY-%Tm-%Td %TH:%TM %p\n" | gawk -v limit=how_many_files -f stress_sqlite.awk

Example:

max_depth=5; limit=50; AWKLIBPATH=gawk-4.2.1/extension/.libs find ~/ -maxdepth $max_depth -type f -printf "%M %u %g %s %TY-%Tm-%Td %TH:%TM %p\n" | gawk -v limit=$limit -f stress_sqlite.awk

Noticed the ugly nested replace() function calls in the SELECT statement ?:

"SELECT file_type, permissions, owner_name, group_name, file_size, file_date, file_name, content_type, CASE WHEN instr(content_type, 'ASCII text') > 0 THEN " \
"CASE WHEN length(content) > 200 THEN replace(replace(replace(substr(content, 1, 200) || '...', '\n', '\\n'), '\t', '\\t'), '\r', '\\r') ELSE replace(replace(replace(content, '\n', '\\n'), '\t', '\\t'), '\r', '\\r') END ELSE substr(hex(content), 1, 200) || '...' END as text FROM stress"

This is because there no SQLite equivalent to the Unix command “tr set1 set2″ or to Oracle’s translate() function, but such a function could easily be user-defined. This is left as an exercise for the reader.
The column widths passed to the last sqlite_select() call came from prior a “draft” execution with following call:

rc = sqlite_select(my_db, "SELECT file_type, permissions, owner_name, group_name, file_size, file_date, file_name, content_type, CASE content_type WHEN 'ASCII text' THEN " \
"CASE WHEN length(content) > 200 THEN replace(replace(substr(content, 1, 200) || '...', '\n', '\\n'), '\t', '\\t') ELSE replace(replace(content, '\n', '\\n'), '\t', '\\t') END ELSE '' END as text FROM stress")

And its the result was:

...
Optimum column widths
=====================
for query: SELECT file_type, permissions, owner_name, group_name, file_size, file_date, file_name, content_type, CASE content_type WHEN 'ASCII text' THEN CASE WHEN length(content) > 200 THEN replace(replace(substr(content, 1, 200) || '...', '
', '\n'), ' ', '\t') ELSE replace(replace(content, '
', '\n'), ' ', '\t') END ELSE '' END as text FROM stress
file_type 9
permissions 11
owner_name 10
group_name 10
file_size 9
file_date 16
file_name 71
content_type 42
text 209

So those values were injected back after some adjustments.
Beware that files can add up pretty quickly, hence the file limit parameter. Ditto for the database file. Here, the 50 files in ~/dmadmin sum up to almost 2 Gb already so make sure there is enough free space for the database file.
If the way sqlite_select() works does not appeal to you, despite all the different display formats it offers, you can give a try to sqlitebrowser available here.

Bulk updates from an array

With the sqlite_exec() primitive available, it is possibe to build up a bulk insert, update or delete function that takes its database parameters from a gawk array. Let’s name it bulk_exec(). Here is a possible implementation:

function bulk_exec(db, op, table_name, larray     , rowid, stmt, columns, values, col) {
   for (rowid in larray) {
      if ("delete" == op) {
         stmt = sprintf("DELETE FROM test1 WHERE ROWID = %s", rowid)
      }
      else if ("update" == op) {
         columns = ""
         values = ""
         for (col in larray[rowid]) {
            columns = columns ("" == columns ? "" : ", ") col
            values = values ("" == values ? "" : ", ") "'" larray[rowid][col] "'"
         }
         stmt = sprintf("UPDATE test1 SET (%s) = (%s) WHERE ROWID = %s", columns, values, rowid)
      }
      else if ("insert" == op) {
         columns = ""
         values = ""
         for (col in larray[rowid]) {
            columns = columns ("" == columns ? "" : ", ") col
            values = values ("" == values ? "" : ", ") "'" larray[rowid][col] "'"
         }
         stmt = sprintf("INSERT INTO test1 (%s) VALUES (%s)", columns, values)
      }
      rc = sqlite_exec(my_db, stmt)
      print "return code = ", rc
   }
}

The op test is made here for each row, which is inefficient, but this is just an example.
db is the database handle;
op is the operation to perform, e.g. one of “insert”, “update” or “delete”;
table_name is the table to modify;
larray is the array that contains the tabular data to apply to table_name.
The structure of the array larray is important. For one thing, it is indexed by rowids so as to univoquely identify the rows to delete or update (obviously not used when inserting rows); this is a difference from the sequence number usually used as an index. The second dimension of the array must contain the column values for updating existing rows or inserting new rows (obviously not used when deleting rows). If the gawk array comes from a preceding call to sqlite_select(), it may be necessary to include the rowid column in the SELECT statement’s list of columns (see line 2 below) and remap the array to be indexed by that rowid (see lines 9 to 16 below). Here is a code snipped showing this for the 3 bulk operations:

...
   stmt = "SELECT rowid, n1, s1, s2 FROM test1";
   rc = sqlite_select(my_db, stmt, 0, a_test)
   delete del_test
   delete mod_test
   delete ins_test
   for (i in a_test)
      if (0 == i % 4)
         del_test[a_test[i]["rowid"]] = 0
      else if (1 == i % 4) {
         mod_test[a_test[i]["rowid"]]["n1"] = a_test[i]["n1"] * 100
         mod_test[a_test[i]["rowid"]]["s1"] = a_test[i]["s1"] " - s1 modified"
         mod_test[a_test[i]["rowid"]]["s2"] = a_test[i]["s2"] " - s2 modified"
      }
      else if (2 == i % 4) {
         ins_test[a_test[i]["rowid"]]["n1"] = int(1000 * rand()) + 1000
         ins_test[a_test[i]["rowid"]]["s1"] = a_test[i]["s1"] " - s1 inserted"
         ins_test[a_test[i]["rowid"]]["s2"] = a_test[i]["s2"] " - s2 inserted"
      }
   dumparray("a_test", a_test)
   dumparray("del_test", del_test)
   dumparray("mod_test", mod_test)
   dumparray("ins_test", ins_test)
   bulk_exec(my_db, "update", "test1", mod_test)
   bulk_exec(my_db, "delete", "test1", del_test)
   bulk_exec(my_db, "insert", "test1", ins_test)
   dumparray("a_test", a_test)
...

An alternate implementation could still use the sequential integer index as returned by sqlite_select() and, if needed, extract the rowid column from the array. Here is an illustration for the bulk delete:

function bulk_exec(db, op, table_name, larray     , rowid, stmt, columns, values, col) {
   for (i in larray) {
      if ("delete" == op) {
         stmt = sprintf("DELETE FROM test1 WHERE ROWID = %s", larray[i]["rowid"])
      }
...

bulk_exec() could have been implemented in the extension C code but the benefits would have been limited.

Conclusion

One important remark is that although SQLite supports UTF-8 and UTF-16, no particular attention has been applied here when performing C string operations; C strings are still char * in the extension but it shouldn’t be too difficult to convert them to wchar_t * for multi-byte characters.
The extension is quite heavy on blobs because I find them fascinating. However, they may be OK to store that 4 Mb Despacito MP3 song but not so much for that uncompressed 4 Gb Mamma Mia ! Editor’s Cuts Blu-ray Edition. The fact that it is possible to store anything in a database does not imply that it is a sensible solution to do so. Even Documentum proposes blob stores for faster content retrieval but the content sizes are limited to 64 Kb. The contents are still stored in filesystems because there is no gain to deal with huge (in the Tb or above range) database’s datafiles (although with data files on raw disks it may be a different story for repositories). It might be simpler to have everything in one place, metadata and contents, but performance, administration, and backup would be much more painful. A more reasonable use of blobs is to store small, frequently used files in-line and larger ones on disk with their full path name saved as metadata in the repository. That’s what Documentum, and the iPod, do. It is the best alternative given how slow it is to scan a file tree on disk. Scanning inside the database is way faster that in a filesystem, especially the Documentum filestores due to their particular structure (4-level deep with all the files in the last level). Anyway, the feature is here and can turn useful if used appropriately.
This extension proves again how easy gawk lets itself customize for a broader, almost unexpected usage. So far, we can use it to work with Documentum repositories (dmgawk), work with databases (SQLite and PostgreSQL), process xml (awk/xml), etc. The fact that all this is free and open-source helps greatly; no wonder this development model has gained such a tremendous momentum.

 

Leave a Reply

Cesare Cervini
Cesare Cervini