Infrastructure at your Service

Cesare Cervini

Adding a Documentum extension into python

Many years ago, out of frustration by the poorness of scripting tools in Documentum, I realized a Documentum binding for python using the distutils and I remember how easy and straightforward it had been, even for someone not really into these things on a daily basis. Recently, I wanted to reuse that work but couldn’t find the source files, not that they were many, but I did not want to do it over again. Finally, I had to give up and admit it: I lost them for good somewhere among the tens of machines, O/Ses, disks and USB drives I played with during that period. Those were sunny, cloudless times (got it ?). Anyway, I decided to do it again but, as I hate repeating work, by using another method of extending python: this time I went for ctypes (cf. documentation here: https://docs.python.org/3/library/ctypes.html).
One of the advantages of ctypes over distutils is that no compilation is needed, even when changing the version of python or of the O/S because the interface is python, and it gets interpreted at run-time. Thanks to ctypes, there isn’t much to do and interfacing to a run-time library such as libdmcl40.so is a no brainer.
There was however a big change in the evolution from python 2 to python 3: strings no longer are arrays of bytes but are now a distinct, uncompatible type storing unicode characters. Transformation functions are of course provided to go from one type to the other and back. For low-level work such as interfacing a C/C++ shared library, the distinction is important because in C, strings are accessed as “char *”, i.e. array of bytes and one cannot just pass around python text strings with 1 to 4 bytes per character. Fortunately, there was no need to produce two versions of the interface because python 2.7, the last version of python 2, understands the type conversion functions used here:

string.encode('ascii', 'ignore') to convert python 3's strings to python 2's arrays of bytes compatible with C/C++ char*
b.decode() to convert python 2's arrays of bytes to python3 unicode strings

(see https://docs.python.org/3/howto/unicode.html). Thus, it was sufficient to just write one version for python 3 and it would also be compatible with python 2.
I started my work in a venerable 32-bit Ubuntu 14.04 with python 2.7. The Documentum library I used there was the 32-bit libdmcl40.so included in the Content server v5.3 binaries. Later, I installed python 3 on the same VM and made the necessary changes to the interface so it would be accepted by this interpreter. Later on, I copied the interface file to another VM running a 64-bit Ubuntu 16.04 and v7.3 Documentum ContentServer binaries but I couldn’t make it work with the included 64-bit libdmcl40.so. I kept receiving SIGSEGV and core dumps from both python2 and python3. A case for a gdb session sometime… Fortunately, with the java-enabled libdmcl.so library, both pythons worked well, albeit with a perceptible delay at startup because of all the jars to load, a small price to pay though.

The interface

The C functions libdmcl*.so exports are the following:

int dmAPIInit();
int dmAPIDeInit();
int dmAPIExec(const char *str);
char* dmAPIGet(const char *str);
int dmAPISet(const char *str, const char *arg);
char* dmAPIDesc(const char *str, int *code, int *type, int *sess);
char* dmAPIEval(const char *str, const char *arg);
char* dmGetPassword(const char *str);
int dmGetVersion( int *, int * );

However, the last 2 functions don’t seem to really be available from the library. Also, dmAPIDesc() (not to be confused with the describe server method) and dmAPIEval() are not documented in the API reference manual. Therefore, I’ve only considered the first 5 functions, the ones that really do the job as the building blocks of any Documentum script.
From within python, those functions are accessed through the wrapper functions below:

def dmInit()
def dmAPIDeInit()
def dmAPIGet(s)
def dmAPISet(s, value)
def dmAPIExec(stmt)

Those take care of the string conversions operations so you don’t have to; they are the only ones who directly talk to the Documentum API and the only ones to use to do API stuff. Generally, they return True or a string if successful, and False or None if not.
Every Documentum client should start with a call do dmInit() in order to load and initialize the libdmcl*.so library’s internal state. To guarantee that, the interface does it itself at load time. As this function is idempotent, further calls at script start up don’t have any effect. On the other hand, dmAPIDeInit() is not really necessary, just exiting the script will do.
Here, I named the proxy function dmInit() instead of dmAPIInit() for a reason. This function does not just invoke the library’s dmAPIInit() but also initializes the python interface and its usage of ctypes: it loads the shared library and describes the types of the API functions’ arguments (argtypes) and return values (restype). Here is a snippet of its the main part:

dmlib = 'libdmcl40.so'
...
dm = ctypes.cdll.LoadLibrary(dmlib); dm.restype = ctypes.c_char_p
...
dm.dmAPIInit.restype = ctypes.c_int;
dm.dmAPIDeInit.restype = ctypes.c_int;
dm.dmAPIGet.restype = ctypes.c_char_p; dm.dmAPIGet.argtypes = [ctypes.c_char_p] dm.dmAPISet.restype = ctypes.c_int; dm.dmAPISet.argtypes = [ctypes.c_char_p, ctypes.c_char_p] dm.dmAPIExec.restype = ctypes.c_int; dm.dmAPIExec.argtypes = [ctypes.c_char_p] status = dm.dmAPIInit()

The shared library whose name is in dmlib must be in the LD_LIBRARY_PATH (or SHLIB_PATH or LIBPATH, depending on the Unix flavor); specifying its full path name does work too.  As I wrote it before, if the script crashes, try to set it to libdmcl.so instead if it’s available.
The wrapper functions are used by all the verbs documented in the API Reference Manual. When the manual says for example:

Fetch
Purpose Fetches an object from the repository without placing a lock on the object.
Syntax
dmAPIExec("fetch,session,object_id[,type][,persistent_cache][,consistency_check_value]")
...
Return value
The Fetch method returns TRUE if successful or FALSE if unsuccessful.
...

it is the function dmAPIExec() that conveys the verb “fetch” and its arguments to the shared library. It takes just one argument, a string, and return, None if the call failed, a positive integer if it succeeded.

Another example:

Getservermap
Purpose Returns information about the servers known to a connection broker.
Syntax
dmAPIGet("getservermap,session,repository_name[,protocol][,host_name][,port_number]")
...
Return value
The Getservermap method returns a non-persistent ID for a server locator object.
...

Here, it’s dmAPIGet() that does it for the verb “getservermap”. It returns an empty string if the call failed (remapped to None to be more pythonic), a non-empty one with an ID if it succeeded.

For more usage comfort, a few functions have been added in the interface:

def connect(docbase, user_name, password):
"""
connects to given docbase as user_name/password;
returns a session id if OK, None otherwise
"""
def execute(session, dql_stmt):
"""
execute non-SELECT DQL statements;
returns TRUE if OK, False otherwise;
"""
def select(session, dql_stmt, attribute_names):
"""
execute the DQL SELECT statement passed in dql_stmt and outputs the result to stdout;
attributes_names is a list of attributes to extract from the result set;
return True if OK, False otherwise;
"""
def disconnect(session):
"""
closes the given session;
returns True if no error, False otherwise;
"""

Basically, they only wrap some error handling code around the calls to dmAPIGet()/dmAPIExec(). execute() and select() are just examples of how to use the interface and could be removed from it. Let’s give a look at the latter one for instance:

def select(session, dql_stmt, attribute_names):
   """
   execute the DQL SELECT statement passed in dql_stmt and outputs the result to stdout;
   attributes_names is a list of attributes to extract from the result set;
   return True if OK, False otherwise;
   """
   show("in select(), dql_stmt=" + dql_stmt)
   try:
      query_id = dmAPIGet("query," + session + "," + dql_stmt)
      if query_id is None:
         raise(getOutOfHere)

      s = ""
      for attr in attribute_names:
         s += "[" + attr + "]\t"
      print(s)
      resp_cntr = 0
      while dmAPIExec("next," + session + "," + query_id):
         s = ""
         for attr in attribute_names:
            value = dmAPIGet("get," + session + "," + query_id + "," + attr)
            if "r_object_id" == attr and value is None:
               raise(getOutOfHere)
            s += "[" + (value if value else "None") + "]\t"
         resp_cntr += 1
         show(str(resp_cntr) + ": " + s)
      show(str(resp_cntr) + " rows iterated")

      err_flag = dmAPIExec("close," + session + "," + query_id)
      if not err_flag:
         raise(getOutOfHere)

      status = True
   except getOutOfHere:
      show(dmAPIGet("getmessage," + session).rstrip())
      status = False
   except Exception as e:
      print("Exception in select():")
      print(e)
      traceback.print_stack()
      print(resp_cntr); print(attr); print(s); print("[" + value + "]")
      status = False
   finally:
      show("exiting select()")
      return status

If it weren’t for the error handling, it really looks like dmawk code fresh from the API manual !
And here are two invocations:

   print("")
   stmt = "select r_object_id, object_name, owner_name, acl_domain, acl_name from dm_document"
   status = DctmAPI.select(session, stmt, ("r_object_id", "object_name", "owner_name", "acl_domain", "acl_name"))
   if status:
      print("select [" + stmt + "] was successful")
   else:
      print("select [" + stmt + "] was not successful")

   print("")
   stmt = "select count(*) from dm_document"
   status = DctmAPI.select(session, stmt,  ["count(*)"])
   if status:
      print("select [" + stmt + "] was successful")
   else:
      print("select [" + stmt + "] was not successful"

Resulting in the following output:

in select(), dql_stmt=select r_object_id, object_name, owner_name, acl_domain, acl_name from dm_document
[r_object_id] [object_name] [owner_name] [acl_domain] [acl_name] 1: [0900c350800001d0] [Default Signature Page Template] [dmadmin] [dmadmin] [dm_4500c35080000101] 2: [6700c35080000100] [CSEC Plugin] [dmadmin] [dmadmin] [dm_4500c35080000101] 3: [6700c35080000101] [Snaplock Connector] [dmadmin] [dmadmin] [dm_4500c35080000101] 4: [0900c350800001ff] [Blank Word 2007 / 2010 Document] [dmadmin] [dmadmin] [dm_4500c35080000101] 5: [0900c35080000200] [Blank Word 2007 / 2010 Template] [dmadmin] [dmadmin] [dm_4500c35080000101] 6: [0900c35080000201] [Blank Word 2007 / 2010 Macro-enabled Document] [dmadmin] [dmadmin] [dm_4500c35080000101] 7: [0900c35080000202] [Blank Word 2007 / 2010 Macro-enabled Template] [dmadmin] [dmadmin] [dm_4500c35080000101] 8: [0900c35080000203] [Blank Excel 2007 / 2010 Workbook] [dmadmin] [dmadmin] [dm_4500c35080000101] 9: [0900c35080000204] [Blank Excel 2007 / 2010 Template] [dmadmin] [dmadmin] [dm_4500c35080000101] 10: [0900c350800001da] [11/21/2017 16:31:10 dm_PostUpgradeAction] [dmadmin] [dmadmin] [dm_4500c35080000101] 11: [0900c35080000205] [Blank Excel 2007 / 2010 Macro-enabled Workbook] [dmadmin] [dmadmin] [dm_4500c35080000101] 12: [0900c35080000206] [Blank Excel 2007 / 2010 Macro-enabled Template] [dmadmin] [dmadmin] [dm_4500c35080000101] 13: [0900c35080000207] [Blank Excel 2007 / 2010 Binary Workbook] [dmadmin] [dmadmin] [dm_4500c35080000101] 14: [0900c35080000208] [Blank PowerPoint 2007 / 2010 Presentation] [dmadmin] [dmadmin] [dm_4500c35080000101] 15: [0900c35080000209] [Blank PowerPoint 2007 / 2010 Slide Show] [dmadmin] [dmadmin] [dm_4500c35080000101] ...
880 rows iterated
exiting select()
select [select r_object_id, object_name, owner_name, acl_domain, acl_name from dm_document] was successful
in select(), dql_stmt=select count(*) from dm_document
[count(*)] 1: [880] 1 rows iterated
exiting select()
select [select count(*) from dm_document] was successful

Admittedly, the above select() function could be more clever and find by itself the queried attributes by inspecting the returned collection; this is done by the variant select2dict() (see below). Also, the output could be more structured. Stay tuned on this channel, it’s coming up !

The packaging

In order to make the interface easily usable, it has been packaged into a module named DctmAPI. To use it, just add an “import DctmAPI” statement in the client script and prefix the functions from the module with “DctmAPI”, the module namespace, when calling them, as shown in the example above.
I’ve given some thoughts about making a class out of it but the benefits were not so obvious because many functions are so generic that most of them would have been @staticmethod of the class anyway. Moreover, the only state variable would have been the session id, so instead of carrying it around, an instance of the class would have to be used instead, no real improvement here. Even worse, as the session id would have been hidden in the instance, the statements passed to an instance object would have to be changed not to include it and leave that to the instance, which would hurt the habits of using the standard API argument format; also, as a few API verbs don’t need a session id, exceptions to the rule would need to be introduced, which would mess the class even more. Therefore, I chose to stick as closer as possible to the syntax documented in the API manual, at the only cost of introducing a namespace with the module.

The source

Without further ado, here is the full interface module DctmAPI.py:

"""
This module is a python - Documentum binding based on ctypes;
requires libdmcl40.so/libdmcl.so to be reachable through LD_LIBRARY_PATH;
C. Cervini - dbi-services.com

The binding works as-is for both python2 and python3; no recompilation required; that's the good thing with ctypes compared to e.g. distutils/SWIG;
Under a 32-bit O/S, it must use the libdmcl40.so, whereas under a 64-bit Linux it must use the java backed one, libdmcl.so;

For compatibility with python3 (where strings are now unicode ones and no longer arrays of bytes, ctypes strings parameters are always converted to unicode, either by prefixing them
with a b if litteral or by invoking their encode('ascii', 'ignore') method; to get back to text from bytes, b.decode() is used;these works in python2 as well as in python3 so the source is compatible with these two versions of the language;
"""

import os
import ctypes
import sys, traceback

# use foreign C library;
# use this library in Content server = v6.x, 64-bit Linux;
#dmlib = 'libdmcl.so'

dm = 0
logLevel = 1

class getOutOfHere(Exception):
   pass

def show(mesg):
   "displays the message mesg if allowed"
   if logLevel > 0:
      print(mesg)

def dmInit():
   """
   initializes the Documentum part;
   returns True if successfull, False otherwise;
   """

   show("in dmInit()")
   global dm

   try:
      dm = ctypes.cdll.LoadLibrary(dmlib);  dm.restype = ctypes.c_char_p
      show("dm=" + str(dm) + " after loading library " + dmlib)
      dm.dmAPIInit.restype    = ctypes.c_int;
      dm.dmAPIDeInit.restype  = ctypes.c_int;
      dm.dmAPIGet.restype     = ctypes.c_char_p;      dm.dmAPIGet.argtypes  = [ctypes.c_char_p]
      dm.dmAPISet.restype     = ctypes.c_int;         dm.dmAPISet.argtypes  = [ctypes.c_char_p, ctypes.c_char_p]
      dm.dmAPIExec.restype    = ctypes.c_int;         dm.dmAPIExec.argtypes = [ctypes.c_char_p]
      status  = dm.dmAPIInit()
   except Exception as e:
      print("exception in dminit(): ")
      print(e)
      traceback.print_stack()
      status = False
   finally:
      show("exiting dmInit()")
      return True if 0 != status else False
   
def dmAPIDeInit():
   """
   releases the memory structures in documentum's library;
   returns True if no error, False otherwise;
   """
   status = dm.dmAPIDeInit()
   return True if 0 != status else False
   
def dmAPIGet(s):
   """
   passes the string s to dmAPIGet() method;
   returns a non-empty string if OK, None otherwise;
   """
   value = dm.dmAPIGet(s.encode('ascii', 'ignore'))
   return value.decode() if value is not None else None

def dmAPISet(s, value):
   """
   passes the string s to dmAPISet() method;
   returns TRUE if OK, False otherwise;
   """
   status = dm.dmAPISet(s.encode('ascii', 'ignore'), value.encode('ascii', 'ignore'))
   return True if 0 != status else False

def dmAPIExec(stmt):
   """
   passes the string s to dmAPIExec() method;
   returns TRUE if OK, False otherwise;
   """
   status = dm.dmAPIExec(stmt.encode('ascii', 'ignore'))
   return True if 0 != status else False

def connect(docbase, user_name, password):
   """
   connects to given docbase as user_name/password;
   returns a session id if OK, None otherwise
   """
   show("in connect(), docbase = " + docbase + ", user_name = " + user_name + ", password = " + password) 
   try:
      session = dmAPIGet("connect," + docbase + "," + user_name + "," + password)
      if session is None or not session:
         raise(getOutOfHere)
      else:
         show("successful session " + session)
         show(dmAPIGet("getmessage," + session).rstrip())
   except getOutOfHere:
      print("unsuccessful connection to docbase " + docbase + " as user " + user_name)
      session = None
   except Exception as e:
      print("Exception in connect():")
      print(e)
      traceback.print_stack()
      session = None
   finally:
      show("exiting connect()")
      return session

def execute(session, dql_stmt):
   """
   execute non-SELECT DQL statements;
   returns TRUE if OK, False otherwise;
   """
   show("in execute(), dql_stmt=" + dql_stmt)
   try:
      query_id = dmAPIGet("query," + session + "," + dql_stmt)
      if query_id is None:
         raise(getOutOfHere)
      err_flag = dmAPIExec("close," + session + "," + query_id)
      if not err_flag:
         raise(getOutOfHere)
      status = True
   except getOutOfHere:
      show(dmAPIGet("getmessage," + session).rstrip())
      status = False
   except Exception as e:
      print("Exception in execute():")
      print(e)
      traceback.print_stack()
      status = False
   finally:
      show(dmAPIGet("getmessage," + session).rstrip())
      show("exiting execute()")
      return status

def select(session, dql_stmt, attribute_names):
   """
   execute the DQL SELECT statement passed in dql_stmt and outputs the result to stdout;
   attributes_names is a list of attributes to extract from the result set;
   return True if OK, False otherwise;
   """
   show("in select(), dql_stmt=" + dql_stmt)
   try:
      query_id = dmAPIGet("query," + session + "," + dql_stmt)
      if query_id is None:
         raise(getOutOfHere)

      s = ""
      for attr in attribute_names:
         s += "[" + attr + "]\t"
      print(s)
      resp_cntr = 0
      while dmAPIExec("next," + session + "," + query_id):
         s = ""
         for attr in attribute_names:
            value = dmAPIGet("get," + session + "," + query_id + "," + attr)
            if "r_object_id" == attr and value is None:
               raise(getOutOfHere)
            s += "[" + (value if value else "None") + "]\t"
         resp_cntr += 1
         show(str(resp_cntr) + ": " + s)
      show(str(resp_cntr) + " rows iterated")

      err_flag = dmAPIExec("close," + session + "," + query_id)
      if not err_flag:
         raise(getOutOfHere)

      status = True
   except getOutOfHere:
      show(dmAPIGet("getmessage," + session).rstrip())
      status = False
   except Exception as e:
      print("Exception in select():")
      print(e)
      traceback.print_stack()
      print(resp_cntr); print(attr); print(s); print("[" + value + "]")
      status = False
   finally:
      show("exiting select()")
      return status

def select2dict(session, dql_stmt, result):
   """
   execute the DQL SELECT statement passed in dql_stmt and stores the result into result, an array of dictionaries;
   return True if OK, False otherwise;
   """
   show("in select2dict(), dql_stmt=" + dql_stmt)

   status = False
   try:
      query_id = dmAPIGet("query," + session + "," + dql_stmt)
      if query_id is None:
         raise(getOutOfHere)

      # iterate through the result set;
      row_counter = 0
      attr_name = []
      width = {}
      while dmAPIExec("next," + session + "," + query_id):
         result.append({})
         nb_attrs = dmAPIGet("count," + session + "," + query_id)
         if nb_attrs is None:
            show("Error retrieving the count of returned attributes: " + dmAPIGet("getmessage," + session))
            raise(getOutOfHere)
         nb_attrs = int(nb_attrs) 
         for i in range(nb_attrs):
            if 0 == row_counter:
               # get the attributes' names only once for the whole query;
               value = dmAPIGet("get," + session + "," + query_id + ",_names[" + str(i) + "]")
               if value is None:
                  show("error while getting the attribute name at position " + str(i) + ": " + dmAPIGet("getmessage," + session))
                  raise(getOutOfHere)
               attr_name.append(value)
               if value in width:
                  width[value] = max(width[attr_name[i]], len(value))
               else:
                  width[value] = len(value)

            is_repeating = dmAPIGet("repeating," + session + "," + query_id + "," + attr_name[i])
            if is_repeating is None:
               show("error while getting the arity of attribute " + attr_name[i] + ": " + dmAPIGet("getmessage," + session))
               raise(getOutOfHere)
            is_repeating = int(is_repeating)

            if 1 == is_repeating:
               # multi-valued attributes;
               result[row_counter] [attr_name[i]] = []
               count = dmAPIGet("values," + session + "," + query_id + "," + attr_name[i])
               if count is None:
                  show("error while getting the arity of attribute " + attr_name[i] + ": " + dmAPIGet("getmessage," + session))
                  raise(getOutOfHere)
               count = int(count)

               for j in range(count):
                  value = dmAPIGet("get," + session + "," + query_id + "," + attr_name[i] + "[" + str(j) + "]")
                  if value is None:
                     value = "null"
                  result[row_counter] [attr_name[i]].append(value)
            else:
               # mono-valued attributes;
               value = dmAPIGet("get," + session + "," + query_id + "," + attr_name[i])
               if value is None:
                  value = "null"
               width[attr_name[i]] = len(attr_name[i])
               result[row_counter][attr_name[i]] = value
         row_counter += 1
      err_flag = dmAPIExec("close," + session + "," + query_id)
      if not err_flag:
         show("Error closing the query collection: " + dmAPIGet("getmessage," + session))
         raise(getOutOfHere)

      status = True

   except getOutOfHere:
      show(dmAPIGet("getmessage," + session).rstrip())
      status = False
   except Exception as e:
      print("Exception in select2dict():")
      print(e)
      traceback.print_stack()
      status = False
   finally:
      return status

def disconnect(session):
   """
   closes the given session;
   returns True if no error, False otherwise;
   """
   show("in disconnect()")
   try:
      status = dmAPIExec("disconnect," + session)
   except Exception as e:
      print("Exception in disconnect():")
      print(e)
      traceback.print_stack()
      status = False
   finally:
      show("exiting disconnect()")
      return status

# initializes the interface;
dmInit()

The select2dict() function is an enhancement of select(). It dynamically gets the list of attributes SELECTed, so no need to provide one. In addition, instead of printing it, it returns the resultset into an array of dictionaries, which is handy if some further processing is required later.

The test script

Here is an example of script showing how to use the interface:

#!/usr/bin/env python

"""
Test the ctypes-based python interface to Documentum API;
"""

import DctmAPI

# -----------------
# main;
if __name__ == "__main__":
   DctmAPI.logLevel = 1

   # not really needed as it is done in the module itself;
   status = DctmAPI.dmInit()
   if status:
      print("dmInit() was successful")
   else:
      print("dmInit() was not successful")

   print("")
   session = DctmAPI.connect(docbase = "dmtest", user_name = "dmadmin", password = "dmadmin")
   if session is None:
      print("no session opened, exiting ...")
      exit(1)
   
   print("")
   dump = DctmAPI.dmAPIGet("dump," + session + "," + "0900c35080008107")
   print("object 0900c35080008107 dumped:\n" + dump)
   
   print("")
   stmt = "update dm_document object set language_code = 'FR' where r_object_id = '0900c35080008107'"
   status = DctmAPI.execute(session, stmt)
   if status:
      print("execute [" + stmt + "] was successful")
   else:
      print("execute [" + stmt + "] was not successful")

   print("")
   stmt = "select r_object_id, object_name, owner_name, acl_domain, acl_name from dm_document"
   status = DctmAPI.select(session, stmt, ("r_object_id", "object_name", "owner_name", "acl_domain", "acl_name"))
   if status:
      print("select [" + stmt + "] was successful")
   else:
      print("select [" + stmt + "] was not successful")

   print("")
   stmt = "select count(*) from dm_document"
   status = DctmAPI.select(session, stmt,  ["count(*)"])
   if status:
      print("select [" + stmt + "] was successful")
   else:
      print("select [" + stmt + "] was not successful")

   print("")
   status = DctmAPI.disconnect(session)
   if status:
      print("successfully disconnected")
   else:
      print("error while  disconnecting")

   print("")
   status = DctmAPI.dmAPIDeInit()
   if status:
      print("successfully deInited")
   else:
      print("error while  deInited")

I’m not a day to day user of python so I guess there are ways to make the interface more idiomatic, or pythonic as they say. Feel free to adapt it to your tastes and needs. Comments and suggestions are welcome of course.

Leave a Reply

Cesare Cervini
Cesare Cervini