Infrastructure at your Service

Cesare Cervini

Adding a Documentum Extension to gawk, part I

Recently, I was searching my NAS for some files which, to end this intolerable suspense, I did not find but on the other hand I did stumbled across a couple of interesting dmawk scripts I wrote for on customer more than 20 years ago. Some statements looked a bit odd, e.g. access to elements from multi-dimensional arrays such as “a[i1][i2]”, or “delete A” to empty an array (instead of the well-known awk idiom split(“”, A)). After a while, I understood what was going on here: the dmawk that came with content server v3.x was formerly based on the GNU dialect of awk named gawk. It was already a more powerful interpreter than the standard AT&T awk available on the Unix I was using, namely HP-UX, and nowadays it has become even better, as you can see by yourself by checking the official manual here.

At that time, it already allowed to be extended, which Documentum took profit of by turning gawk into a DMCL client, dmawk. However, it was quite a tedious task because it required hacking deeply into the source code. When years later I was playing with this and trying to add Oracle connectivity to gawk (and turn it into an Oracle OCI-based client, oragawk ;-), I remember for instance one step that required the symbol table to be edited in order to add the new functions, and possibly their implementation code inserted in the bison file; finally, the whole gawk source had to be recompiled and relinked. Tedious yet bearable as it didn’t prevent passionate people from forking custom versions with useful new functionalities such as xml processing in xgawk.

Over the years, starting with v4.1.1, gawk has evolved a new, much easier mechanism for adding extensions (aka plugins). It is named dynamic extension (see documentation here); it lets one load shared libraries at run-time and invoke their functions from within a gawk script through a minimum interface to be provided; the other way around, i.e. callbacks, is also possible for the brave, or really motivated, ones. Several powerful extensions have been developed through this system such as: json serialization from/to an associative array (useful e.g. to prepare parameters to pass to javascript functions such as the ones in HighChart or flot libraries), arbitrary-precision integer and floating point support, postgresql database access, etc. (for a list of current extensions, see here). If you have a recent gawk compiled with the MPFR extension, try for example the command below:
gawk -M -v prec=16000 'BEGIN{PREC = prec; printf("a big fp number:\n%f\n\n", 3.0**10**4); n = 3**10**4; print "a large integer number:\n" n; print "it has " length(n) " digits"; exit}'
Impressive, isn’t ?
To know if your local gawk has the option, ask it so:
gawk --version
GNU Awk 4.1.3, API: 1.1 (GNU MPFR 3.1.4, GNU MP 6.1.0)
Copyright (C) 1989, 1991-2015 Free Software Foundation.

This one has it.

So why hasn’t Documentum kept up with gawk and gone for the standard, lesser awk instead ? Maybe because of a licensing issue ? Gawk is protected by the GNU GPL (see its full text in the gawk manual above) which mandates not only its modified source code but also all extension code statically or dynamically linked with it to be provided also in source code form and included in the delivered package, and maybe this was disturbing at the time. Now that open source is omnipresent, from O/Ses to RDBMS, from the network software to WEB development frameworks, from ERP to CRM software, it is so common that even a restrictive license such as the GPL does not shock anyone today. Linux itself, or GNU/Linux as some like to call it, contains a massive amount of GPL-ed software, linux per se being limited to the kernel, and is under the GPL itself. Thus, in the new extension mechanism, the GPL requires to publish the source code of not only the interface to be linked with gawk but also of the proprietary shared library (see the definition of Corresponding Source) that is loaded at run-time; but wait: since tightly uses other proprietary Documentum libraries, their source code would also need to be published, and so on, transitively. Obviously, this licensing is by design not favorable to closed source, which is exactly the reason d’être of the FSF (aka, the Free Software Foundation). If Documentum chose not to go the GPL way at that time with the simpler EDMS, it is very unlikely that it will in the future unless it drastically changes its business model and become, say, the Red Hat of document management!

Fortunately, there are no reasons to hold one’s breath until that day for I propose here a simple, straightforward interface between gawk and the Documentum run-time libraries which will not violate the GPL as nothing closed source is distributed. Its usage is exactly the same as documented in the API reference manual. I’ll show you hereafter how to prepare and use it. Since this article will be quite long, I’ll split it in two parts. Part I, you’re reading it, presents the interface and explains how to compile it as a gawk extension. Part II here will deal with a wrapper around that interface providing a few helper functions and a test program to show how to use the extension.


o  Check the currently installed version of gawk;

$ gawk --version
GNU Awk 4.1.3, API: 1.1 (GNU MPFR 3.1.4, GNU MP 6.1.0)
Copyright (C) 1989, 1991-2015 Free Software Foundation.

o  This is an old release; let’s take this opportunity to upgrade to the current latest, the 4.2.1;
    prepare a working directory; the name “dmgawk” fits it well;
$ mkdir ~/dmgawk; cd ~/dmgawk

o  Let’s get the latest source code of gawk, it’s the release 4.2.1;
$ wget
$ tar -xpvzf gawk-4.2.1.tar.gz
$ cd gawk-4.2.1

o  compile it;
$ ./configure
$ make

o  Check the new version;
$ ./gawk --version
GNU Awk 4.2.1, API: 2.0
Copyright (C) 1989, 1991-2018 Free Software Foundation.

o  Note that the MPFR and GMP extensions were not linked in; that’s because those libraries were not present on my system;
    to have them, just download, compile and install them, and then run gawk’s ./configure and make again;
    fine so far; we’ll extend this local version; whether it will or can be installed system-wide is up to you;
    move to the extensions directory;
    edit the automake and add the highlighted references below to the new interface, let’s call it dctm.c;
$ cd extension
$ vi
pkgextension_LTLIBRARIES = \ \
... \
time_la_SOURCES = time.c
time_la_LIBADD = $(MY_LIBS)
dctm_la_SOURCES = dctm.c
dctm_la_LIBADD = $(MY_LIBS)

o  save and quit; that’s all for the make file;
    let’s edit the interface dctm.c now and insert the code below;
$ vi dctm.c

 * dctm.c - Builtin functions that provide an interface to Documentum dmapp.h;
 * see dmapp.h for description of functions; 
 * Cesare Cervini
 * 5/2018
 * go to .libs and and link dmcl.o it with the Documentum library with: gcc -o -shared dctm.o path-to-the-shared-library/;
#include <config.h>

#include <stdio.h>
#include <assert.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#include <sys/types.h>
#include <sys/stat.h>

#include "gawkapi.h"

#include "gettext.h"
#define _(msgid)  gettext(msgid)
#define N_(msgid) msgid

/* make it point to the Documentum dmapp.h on your system */
#include "/home/dmadmin/documentum/share/sdk/include/dmapp.h"

static const gawk_api_t *api;	/* for convenience macros to work */
static awk_ext_id_t ext_id;
static const char *ext_version = "dctm extension: version 1.0";
static awk_bool_t (*init_func)(void) = NULL;

int plugin_is_GPL_compatible;

/*  do_dmAPIInit */
static awk_value_t *
do_dmAPIInit(int nargs, awk_value_t *result, struct awk_ext_func *unused) {
   unsigned int ret = 0;

   assert(result != NULL);

   ret = dmAPIInit();
   ret &= 0xff;

   return make_number(ret, result);

/*  do_dmAPIDeInit */
static awk_value_t *
do_dmAPIDeInit(int nargs, awk_value_t *result, struct awk_ext_func *unused) {
   unsigned int ret = 0;

   assert(result != NULL);

   ret = dmAPIDeInit();
   ret &= 0xff;

   return make_number(ret, result);

/*  do_dmAPIExec */
static awk_value_t *
do_dmAPIExec(int nargs, awk_value_t *result, struct awk_ext_func *unused) {
   awk_value_t str;
   unsigned int ret = 0;

   assert(result != NULL);

   if (get_argument(0, AWK_STRING, & str)) {
      ret = dmAPIExec(str.str_value.str);
      ret &= 0xff;
   } else if (do_lint)
      lintwarn(ext_id, _("dmAPIExec: called with inappropriate argument(s)"));

   return make_number(ret, result);

/*  do_dmAPIGet */
static awk_value_t *
do_dmAPIGet(int nargs, awk_value_t *result, struct awk_ext_func *unused) {
   awk_value_t str;
   char *got_value;

   assert(result != NULL);

   if (get_argument(0, AWK_STRING, & str)) {
      got_value = dmAPIGet(str.str_value.str);
   } else if (do_lint)
      lintwarn(ext_id, _("dmAPIGet: called with inappropriate argument(s)"));

   make_const_string(got_value == NULL ? "" : got_value, strlen(got_value), result);
   return result;

/*  do_dmAPISet */
static awk_value_t *
do_dmAPISet(int nargs, awk_value_t *result, struct awk_ext_func *unused) {
   awk_value_t str1;
   awk_value_t str2;
   unsigned int ret = 0;

   assert(result != NULL);

   if (get_argument(0, AWK_STRING, & str1) && get_argument(0, AWK_STRING, & str2)) {
      ret = dmAPISet(str1.str_value.str, str2.str_value.str);
      ret &= 0xff;
   } else if (do_lint)
      lintwarn(ext_id, _("dmAPISet: called with inappropriate argument(s)"));

   return make_number(ret, result);

/* these are the exported functions along with their min and max arities; */
static awk_ext_func_t func_table[] = {
	{ "dmAPIInit",   do_dmAPIInit, 0, 0, awk_false, NULL },
	{ "dmAPIDeInit", do_dmAPIDeInit, 0, 0, awk_false, NULL },
	{ "dmAPIExec",   do_dmAPIExec, 1, 1, awk_false, NULL },
	{ "dmAPIGet",    do_dmAPIGet, 1, 1, awk_false, NULL },
	{ "dmAPISet",    do_dmAPISet, 2, 2, awk_false, NULL },

/* define the dl_load function using the boilerplate macro */

dl_load_func(func_table, dctm, "")

o  Again, run configure and build the extensions;
$ pwd
$ ./configure
$ make

o  The extensions’ object files and shared libraries are actually contained in the hidden .libs directory below extension; so move there and check the generated library;

$ cd .libs
$ ldd => (0x00007ffc1d7e5000) => /lib/x86_64-linux-gnu/ (0x00007f3452930000)
/lib64/ (0x00007f3452efd000)

o  The library has still no reference to; let’s link it with it and check again;
    find that Documentum library on your system; on mine, it’s in /home/dmadmin/documentum/product/7.3/bin;
$ gcc -o -shared dctm.o /home/dmadmin/documentum/product/7.3/bin/
$ ldd => (0x00007fff55dbf000)
/home/dmadmin/documentum/product/7.3/bin/ (0x00007fece91bb000) => /lib/x86_64-linux-gnu/ (0x00007fece8df1000) => /lib/x86_64-linux-gnu/ (0x00007fece8bb9000) => /lib/x86_64-linux-gnu/ (0x00007fece899c000) => /usr/lib/x86_64-linux-gnu/ (0x00007fece861a000) => /lib/x86_64-linux-gnu/ (0x00007fece8311000) => /lib/x86_64-linux-gnu/ (0x00007fece80fb000)
/lib64/ (0x00007fece95ca000)

Good, so far has been linked with and thus is able to access the dmAPI*() functions. gawk let us load dynamic extensions through two ways:
1. the statement @load “name_of_extension” inserted in the client awk script;
2. the -l | –load “name_of_extension” command-line options
Please, turn now to part II here for the rest of this article.


Leave a Reply

Cesare Cervini
Cesare Cervini