Infrastructure at your Service

Cesare Cervini

An awk filter to truncate or wrap around tabular output

In my previous blog “idql and its column output”, see link here, I provided a small awk filter to reflow the idql output’s columns by wrapping their content around. Later I came of thinking that it could be useful to be able to truncate the columns instead, in order to have an even compacter output. A shorter, denser table can certainly help is some cases, such as when a quick overview of a query’s result is enough to get an idea of its correctness or its performance.
Of course, in case of truncation, a user-definable trailing string, defaulting to the ellipsis, would be appended to show that a truncation has been applied; this is to avoid to present incomplete and therefore incorrect data to an unsuspecting or distracted reader.
So, here is the script:

# Usage:
#    gawk [-v maxw=nn] [-v truncate=0|1] [-v ellipsis=...] [-v wrap=1|0] -f compact_wwa.awk file
# or:
#    cmd | gawk [-v maxw=nn] [-v truncate=0|1] [-v ellipsis=...] [-v wrap=1|0] -f compact_wwa.awk
# where:
#     missing numeric parameters assume the value 0;
#     maxw is the maximum column width and defaults to 32;
#         characters outside this limit are either wrapped around in their own column or truncated, depending on the parameters truncate and wrap;
#     wrapping is the default and has priority over truncating;
#     truncate=0 wrap=0 --> wrapping around;
#     truncate=0 wrap=1 --> wrapping around;
#     truncate=1 wrap=0 --> truncate;
#     truncate=1 wrap=1 --> wrapping around;
#     truncating is therefore only done when explicitly and unambiguously requested; this is to preserve data integrity whenever possible;
#     ellipsis string is only considered when truncating and defaults to '...'; thus, there is always one to warn of truncation;
# example:
#     gawk -v maxw=50 -f compact_wwa.awk tmp_file  | less -S
#     C. Cervini, dbi-services;
BEGIN {
   if (!maxw) maxw = 32
   if (!truncate)
      if (!wrap)
         wrap = 1
      else;
   else if (!wrap);
   else
      truncate = 0
   if (!ellipsis)
      ellipsis = "..."
   while (getline && !match($0, /^([0-9]+> )+/));
   header = substr($0, RLENGTH + 1)
   getline
   nbFields = NF
   fs[0] = 0; fw[0] = -1 # just so that fs[1] = 1, see below;
   headerLine = ""; sepLine = ""
   for (i = 1; i <= NF; i++) {
      fs[i] = fs[i - 1] + fw[i - 1] + 2
      fw[i] = length($i)
      Min = min(fw[i], maxw)
      if (1 == truncate)
         Min = max(Min, length(ellipsis))
      sepLine = sepLine sprintf("%s  ", substr($0, fs[i], Min))
   }
   printWithWA(header)
   printf("%s\n", sepLine)
}
{
   if (match($0, /^\([0-9]+ rows? affected\)/)) {
      print
      exit
   }
   printWithWA($0)
}
function printWithWA(S) {
   do {
      left_over = ""
      for (i = 1; i <= nbFields; i++) {
         Min = min(fw[i], maxw)
         if (1 == truncate)
            Min = max(Min, length(ellipsis))
         columnS = substr(S, fs[i], Min)
         if (1 == truncate) {
            restColumn = substr(S, fs[i] + Min, fw[i] - Min); gsub(/ +$/, "", restColumn)
            if (restColumn)
               columnS = substr(columnS, 1, length(columnS) - length(ellipsis)) ellipsis
         }
         printf("%s  ", columnS)
         restS = substr(S, fs[i] + Min, fw[i] - Min)
         if (length(restS) > 0) {
            left_over = left_over sprintf("%-*s  ", fw[i], restS)
         }
         else
            left_over = left_over sprintf("%*s  ", fw[i], "")
      }
      printf "\n"
      gsub(/ +$/, "", left_over)
      S = left_over
   } while (left_over && wrap)
}
function min(x, y) {
   return(x <= y ? x : y)
}
function max(x, y) {
   return(x >= y ? x : y)
}

Example of execution

idql dmtest -Udmadmin -Pdmadmin -w500 <<EoQ | gawk -v maxw=25 -v truncate=1 -v ellipsis="(...)" -f compact_wwa.awk | less -S
select r_object_id, user_name, user_os_name, user_address, user_group_name, user_privileges, owner_def_permit, world_def_permit, group_def_permit, default_folder, user_db_name, description,
acl_domain, acl_name, user_os_domain, home_docbase, user_state, client_capability, globally_managed, user_delegation, workflow_disabled, alias_set_id, user_source, user_ldap_dn, user_xprivileges,
failed_auth_attempt, user_admin, user_global_unique_id, user_login_name, user_login_domain, user_initials, USER_PASSWORD, user_web_page, first_failed_auth_utc_time, last_login_utc_time,
deactivated_utc_time, deactivated_ip_addr, root_log_dir
from
dm_user
go
exit
EoQ

Result:
blog10
blog11
Here the ellipsis string has been set to “(…)”.
One could wonder why there are 2 apparently contradictory options to specify the wrapping, truncate and wrap. Wouldn’t one be enough since truncate = non wrap ? And it would also remove the 7 lines of validation and prioritization at the beginning. One of the reason is comfort as it is often easier to think in positive rather than in negative logic. The other is that both options are not exactly opposite; the opposite of truncate can be “do nothing” as well as “wrap around”, and vice-versa. It is better to be explicit rather than implicit so to avoid any ambiguity.
What if a do-nothing filter is needed while still calling compact_wwa.awk (have I said that wwa stands for “with wrap-around” ?), e.g. from another script that passes the command-line options to it ? Just specify some large value for maxw and that’s it. As its name suggests it, compact_wwa.awk only attempts to shrink columns if larger than maxw; narrower columns are not modified.

Possible emprovements

There are still lots of ways to enhance this script. For instance, one of them is to provide a maxw per column. Here, the parameter maxw is applied to all the columns but it would be nice to compress (whether by truncating or by wrapping around) only the uninteresting columns so to have more space for the relevant ones.
Another way would be to make the script work both ways, enlarging columns as well as shrinking them, which could sometimes simplify their parsing since they’ll all have the same width. Maybe the next time…

Wrapping up

I think this script can prove useful to those who need to work with idql and have to deal with its limits. Countless times in the past, I had to open a DQL extraction’s output in UltraEdit to manually shrink all those columns. Too bad that I didn’t have this little utility before, it could have spared me tons of time doing all this tedious work.

 

Leave a Reply

Cesare Cervini
Cesare Cervini