Infrastructure at your Service

Morgan Patou

Documentum – DFC traces setup & investigation

When working with Documentum, you will most probably have to enable the DFC traces one day or another and then work with these traces to analyze them. The purpose of this blog is simply to show how the DFC traces can be enabled, which tools can be used to quickly process them and what are the limitations of such things.

Enabling the DFC traces can be done very easily by updating the dfc.properties file of the client. This client can be a DA, D2, JMS, Index Agent, aso… The change is applied directly (if enabled=true) and disabled by default (if commented or enable=false). If you have a dfc.properties that is inside a war file (for DA/D2 for example) and that you deployed your application as a war file (not exploded), then disabling the tracing might need a restart of your application. To avoid that, you can have a dfc.properties inside the war file that just point to another one outside of the war file and then enabling/disabling the traces from this second file will work properly. There are a lot of options to customize how the traces should be generated. A first example with only a few properties that you can use and reuse every time you need traces:

dfc.tracing.enable=true
dfc.tracing.verbose=true
dfc.tracing.max_stack_depth=0
dfc.tracing.mode=compact
dfc.tracing.dir=/tmp/dfc_tracing

 

Another example with more properties to really specify what you want to see:

dfc.tracing.enable=true
dfc.tracing.verbose=true
dfc.tracing.max_stack_depth=4
dfc.tracing.include_rpcs=true
dfc.tracing.mode=standard
dfc.tracing.include_session_id=true
dfc.tracing.user_name_filter[0]=dmadmin
dfc.tracing.user_name_filter[1]=myuser
dfc.tracing.thread_name_filter[0]=Thread-3
dfc.tracing.thread_name_filter[1]=Thread-25
dfc.tracing.timing_style=milliseconds_from_start
dfc.tracing.dir=/tmp/dfc_tracing
dfc.tracing.file_prefix=mydfc
dfc.tracing.max_backup_index=10
dfc.tracing.max_file_size=104857600
...

 

All these properties are quite easy to understand even without explanation but you can probably find more information and all the possible options in the official Documentum documentation. It’s not the main purpose of this blog so I’m just mentioning a few properties to get started. By default, the name of the generated files will be something like “dfctrace.timestamp.log”, you can change that by setting the “dfc.tracing.file_prefix” for example. Adding and customizing the properties will change the display format and style inside the files so if you want to have a way to analyze these DFC traces, it is better to use more or less always the same set of options. For the example below, OTX asked me to use these properties only:

dfc.tracing.enable=true
dfc.tracing.verbose=true
dfc.tracing.max_stack_depth=4
dfc.tracing.include_rpcs=true
dfc.tracing.mode=compact
dfc.tracing.include_session_id=true
dfc.tracing.dir=/tmp/dfc_tracing

 

When you have your DFC traces, you need a way to analyze them. They are pretty much readable but it will be complicated to get something out of it without spending a certain amount of time – unless you already know what you are looking for – simply because there are a lot of information inside… For that purpose, Ed Bueche developed more than 10 years ago some AWK scripts to parse the DFC traces files: traceD6.awk and trace_rpc_histD6.awk. You can find these scripts at the following locations (all EMC links… So might not be working at some point in the future):

As you can see above, it is not really maintained and the same scripts or a mix of several versions can be found at several locations so it can be a little bit confusing. All the old links are about the awk scripts but since 2013, there is now a python script too (also developed by Ed Bueche).

In this blog, I wanted to talk about the AWK scripts mainly. Earlier this month, I was working with OTX on some performance tuning tasks and for that, I gathered the DFC traces for several scenarios, in different log files, well separated, aso… Then, I provided them to OTX for the analysis. OTX came back to me a few minutes later saying that most of the traces were corrupted and asking me to regenerate them. Since it takes time and because there were some testing in progress on this environment, then gathering clean DFC traces for several scenarios would have forced the tests to be stopped, aso… So if possible, I wanted to avoid doing that. (Ok ok you got me, I’m just lazy ;))

The content of the DFC traces looked correct to me and after a quick verification, I saw that OTX was trying to use the AWK scripts (traceD6.awk and trace_rpc_histD6.awk) to analyze the logs but they were apparently getting an error. The files didn’t look corrupted to me so I mentioned to OTX that the issue might very well be with the AWK scripts they were using. They weren’t really convinced about that so I decided to take a look at these scripts and try to find it out. I already used these scripts but never really looked inside so it was the perfect opportunity to take some time for that:

[dmadmin@content_server_01 ~]$ cd /tmp/dfc_tracing/
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$ ls -l trace* dfctrace.*.log
-rw-r-----. 1 dmadmin dmadmin 92661060 Nov 3 09:24 dfctrace.1510220481672.log
-rw-r-----. 1 dmadmin dmadmin 3240 Nov 4 14:10 traceD6.awk
-rw-r-----. 1 dmadmin dmadmin 7379 Nov 4 14:10 traceD6.py
-rw-r-----. 1 dmadmin dmadmin 5191 Nov 4 14:10 trace_rpc_histD6.awk
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$ awk -f traceD6.awk < dfctrace.1510220481672.log > output_awk_1.log
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$ wc -l output_awk_1.log
2 output_awk_1.log
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$ awk -f trace_rpc_histD6.awk < dfctrace.1510220481672.log > output_awk_2.log
awk: trace_rpc_histD6.awk:203: (FILENAME=- FNR=428309) fatal: division by zero attempted
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$ wc -l output_awk_2.log
4 output_awk_2.log
[dmadmin@content_server_01 dfc_tracing]$

 

As you can see above, the first script generated a log file that contains only 2 lines, so this is already suspicious even if there are no errors. The second script generated an error and its log file contains only 4 lines… The input DFC trace file has a size of 90Mb so it’s clear that there is something wrong and that’s why OTX said that the DFC traces were corrupted. The error message shows the line (203) as the origin of the issue as well as a “division by zero attempted” message. This obviously means that somewhere on this line, there is a division and that the divisor is equal to 0 or at least not set at all. Since I love all kind of UNIX scripting, I would rather fix the bug in the script than having to generate a new set of DFC traces (and the new set would still be impacted by the issue anyway…)! So checking inside the trace_rpc_histD6.awk file, the line 203 is the following one:

[dmadmin@content_server_01 dfc_tracing]$ grep -n -C1 "TIME SPENT" trace_rpc_histD6.awk
202-    printf ("DURATION (secs):\t%17.3f\n", ((curr_tms - st_tms)) );
203:    printf ("TIME SPENT EXECUTING RPCs (secs):%8.3f (which is %3.2f percent of total time)\n", total_rpc_time, 100*total_rpc_time/(curr_tms - st_tms));
204-    printf ("Threads :\t%25d\n", thread_cnt);
[dmadmin@content_server_01 dfc_tracing]$

 

The only division on this line is the total time taken to execute the RPCs divided by the duration of the log file (timestamp of last message – first message). So the value of “curr_tms – st_tms” is 0. Potentially, it could be that both variables have the exact same value but since the first and last messages on the DFC traces don’t have the same timestamp, this isn’t possible and therefore both variables are actually 0 or not set. To check where these variables are defined, how and in which function:

[dmadmin@content_server_01 dfc_tracing]$ grep -n -C15 -E "curr_tms|st_tms" trace_rpc_histD6.awk | grep -E "curr_tms|st_tms|^[0-9]*[:-][^[:space:]]"
144-/ .RPC:/ {
159:                    st_tms = $1;
162:            curr_tms = $1;
175-}
177-/obtained from pool/ {
--
187-}
188-/.INFO: Session/ {
193-}
197-END {
202:    printf ("DURATION (secs):\t%17.3f\n", ((curr_tms - st_tms)) );
203:    printf ("TIME SPENT EXECUTING RPCs (secs):%8.3f (which is %3.2f percent of total time)\n", total_rpc_time, 100*total_rpc_time/(curr_tms - st_tms));
[dmadmin@content_server_01 dfc_tracing]$

 

This shows that the only location where these two variables are set is inside the matching pattern “/ .RPC:/” (st_tms is set to $1 only on the first execution). So it means that this portion of code is never executed so in other words: this pattern is never found in the DFC trace file. Why is that? Well that’s pretty simple: the DFC traces file contains a lot of RPC calls but these lines never contain ” .RPC:”, there are always at least two dots (so something like that: ” ..RPC:” or ” …RPC:” or ” ….RPC:”). The reason why there are several dots is simply because the RPC are placed where they are called… In this case, OTX asked us to use “dfc.tracing.max_stack_depth=4″ so this is what I did and it is the reason why the AWK scripts cannot work by default because they need “dfc.tracing.max_stack_depth=0″, that’s written at the beginning of the scripts in the comment sections.

So a simple way to fix the AWK scripts is to remove the space at the beginning of the pattern for both the traceD6.awk and trace_rpc_histD6.awk scripts and after doing that, it will work for all max_stack_depth values:

[dmadmin@content_server_01 dfc_tracing]$ grep -n ".RPC:/" *.awk
traceD6.awk:145:/ .RPC:/ {
trace_rpc_histD6.awk:144:/ .RPC:/ {
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$ sed -i 's,/ .RPC:/,/.RPC:/,' *.awk
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$ grep -n ".RPC:/" *.awk
traceD6.awk:145:/.RPC:/ {
trace_rpc_histD6.awk:144:/.RPC:/ {
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$ awk -f traceD6.awk < dfctrace.1510220481672.log > output_awk_1.log
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$ wc -lc output_awk_1.log
 1961 163788 output_awk_1.log
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$ awk -f trace_rpc_histD6.awk < dfctrace.1510220481672.log > output_awk_2.log
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$ wc -l output_awk_2.log
 367 49050 output_awk_2.log
[dmadmin@content_server_01 dfc_tracing]$

 

That looks much much better… Basically, the first script list all RPCs with their thread, name and times while the second script creates a sorted list of queries that took the most time to execute as well as a list of calls and occurrences per types/names.

The AWK and Python scripts, even if they are globally working, might have some issues with commas, parenthesis and stuff like that (again it depends which dfc.tracing options you selected). This is why I mentioned above that there is actually both a AWK and Python version of these scripts. Sometimes, the AWK scripts will contain the right information, sometimes it is the Python version that will but in all cases, the later will run much faster. So if you want to work with these scripts, you will have to juggle a little bit:

[dmadmin@content_server_01 dfc_tracing]$ python traceD6.py dfctrace.1510220481672.log > output_py_1.log
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$ wc -l output_py_1.log
 1959 194011 output_py_1.log
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$ python traceD6.py dfctrace.1510220481672.log -profile > output_py_2.log
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$ wc -l output_py_2.log
 342 65917 output_py_2.log
[dmadmin@content_server_01 dfc_tracing]$

 

As you can see, there are fewer lines in the python output files but that’s because some unnecessary headers have been removed in the python version so it’s actually normal. However there are much more characters so it shows that, in this case, the extracted DQL queries contain more characters but it does not mean that these characters are actually part of the DQL queries: you will see below that there are references to “, FOR_UPDATE=F, BATCH_HINT=50, BOF_DQL=T]],50,true,true)” => This is NOT part of the DQL but it is present on the output of the Python script while it is not for the AWK one:

[dmadmin@content_server_01 dfc_tracing]$ head -15 output_awk_1.log
analysis program version 2 based on DFC build 6.0.0.76
68354.130 & 0.005 & [http--0.0.0.0-9082-3] & EXEC_QUERY  select r_object_id from dm_sysobject where folder ('/Home') and object_name = 'Morgan Patou'
68354.135 & 0.000 & [http--0.0.0.0-9082-3] & multiNext
68354.136 & 0.005 & [http--0.0.0.0-9082-3] & SysObjFullFetch  0b0f12345004f0de
68354.165 & 0.002 & [http--0.0.0.0-9082-4] & EXEC_QUERY  select r_object_id from dm_user where user_name = 'Morgan Patou'
68354.167 & 0.000 & [http--0.0.0.0-9082-4] & multiNext
68354.167 & 0.002 & [http--0.0.0.0-9082-4] & IsCurrent
68354.170 & 0.003 & [http--0.0.0.0-9082-4] & EXEC_QUERY  SELECT COUNT(*) AS items FROM dm_group WHERE group_name = 'report_user' AND ANY i_all_users_names = 'Morgan Patou'
68354.173 & 0.001 & [http--0.0.0.0-9082-4] & multiNext
68354.175 & 0.003 & [http--0.0.0.0-9082-4] & EXEC_QUERY  select r_object_id from dm_sysobject where folder ('/myInsight') and object_name = 'myInsight.license'
68354.178 & 0.001 & [http--0.0.0.0-9082-4] & multiNext
68354.179 & 0.001 & [http--0.0.0.0-9082-4] & IsCurrent
68354.165 & 0.010 & [http--0.0.0.0-9082-3] & SysObjGetPermit
68354.175 & 0.006 & [http--0.0.0.0-9082-3] & SysObjGetXPermit
68354.181 & 0.006 & [http--0.0.0.0-9082-4] & MAKE_PULLER
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$ head -15 output_py_1.log
68354.130 & 0.005 & [http--0.0.0.0-9082-3] & EXEC_QUERY & select r_object_id from dm_sysobject where folder ('/Home') and object_name = 'Morgan Patou', FOR_UPDATE=F, BATCH_HINT=50, BOF_DQL=T]],50,true,true)
68354.135 & 0.000 & [http--0.0.0.0-9082-3] & multiNext &
68354.136 & 0.005 & [http--0.0.0.0-9082-3] & SysObjFullFetch & 0b0f12345004f0de
68354.165 & 0.002 & [http--0.0.0.0-9082-4] & EXEC_QUERY & select r_object_id from dm_user where user_name = 'Morgan Patou', FOR_UPDATE=F, BATCH_HINT=50, BOF_DQL=T]],50,true,true)
68354.167 & 0.000 & [http--0.0.0.0-9082-4] & multiNext &
68354.167 & 0.002 & [http--0.0.0.0-9082-4] & IsCurrent & 110f123450001d07
68354.170 & 0.003 & [http--0.0.0.0-9082-4] & EXEC_QUERY & SELECT COUNT(*) AS items FROM dm_group WHERE group_name = 'report_user' AND ANY i_all_users_names = 'Morgan Patou', FOR_UPDATE=T, BATCH_HINT=50, BOF_DQL=T, FLUSH_BATCH=-1]],50,true,true)
68354.173 & 0.001 & [http--0.0.0.0-9082-4] & multiNext &
68354.175 & 0.003 & [http--0.0.0.0-9082-4] & EXEC_QUERY & select r_object_id from dm_sysobject where folder ('/myInsight') and object_name = 'myInsight.license', FOR_UPDATE=F, BATCH_HINT=50, BOF_DQL=T]],50,true,true)
68354.178 & 0.001 & [http--0.0.0.0-9082-4] & multiNext &
68354.179 & 0.001 & [http--0.0.0.0-9082-4] & IsCurrent & 090f123450023f63
68354.165 & 0.010 & [http--0.0.0.0-9082-3] & SysObjGetPermit & 0b0f12345004f0de
68354.175 & 0.006 & [http--0.0.0.0-9082-3] & SysObjGetXPermit & 0b0f12345004f0de
68354.181 & 0.006 & [http--0.0.0.0-9082-4] & MAKE_PULLER & null
68354.187 & 0.000 & [http--0.0.0.0-9082-4] & getBlock &
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$ head -35 output_py_2.log

****** PROFILE OF rpc CALLS *****
     3.273           0.080              41      AUTHENTICATE_USER
     0.032           0.002              17      BEGIN_TRANS
     0.001           0.000              14      END_PUSH_V2
     0.202           0.012              17      END_TRANS
    21.898           0.071             310      EXEC_QUERY
     0.028           0.005               6      FETCH_CONTENT
     0.011           0.000              55      GET_ERRORS
     0.117           0.004              27      GET_LOGIN
     0.290           0.002             163      IsCurrent
     0.013           0.000              82      KILL_PULLER
     0.003           0.000              14      KILL_PUSHER
     0.991           0.012              82      MAKE_PULLER
     0.005           0.000              14      MAKE_PUSHER
     0.002           0.000               5      NEXT_ID_LIST
     0.083           0.002              38      NORPC
     0.015           0.005               3      RelationCopy
     0.446           0.032              14      SAVE
     0.274           0.014              20      SAVE_CONT_ATTRS
     0.140           0.010              14      START_PUSH
     0.134           0.045               3      SysObjCheckin
     0.048           0.016               3      SysObjCheckout
     2.199           0.009             240      SysObjFullFetch
     0.913           0.006             141      SysObjGetPermit
     0.764           0.005             141      SysObjGetXPermit
     0.642           0.046              14      SysObjSave
     0.033           0.000              82      getBlock
     1.454           0.004             399      multiNext

**** QUERY RESPONSE SORTED IN DESCENDING ORDER ****

10.317  select distinct wf.object_name as workflow_name, pr.object_name as process_name, i.name as Performer_Name, i.task_name as Task_Name, i.date_sent as Date_Task_Sent, i.actual_start_date as Date_Task_Acquired, wf.r_creator_name as Workflow_Initiator, cd.primary_group as "group", cd.subgroup as subgroup, cd.artifact_name as Artifact_Name, cd.object_name as document_name, cd.r_version_label as version_label, cd.title as Document_Title, cd.r_object_id as object_id from cd_common_ref_model(all) cd, dmi_package p, dmi_queue_item i, dm_workflow wf, dm_process pr
0.607   select r_object_id from dm_sysobject where folder ('/myInsight/Presentations/Standard Presentations/Graphical Reports') and object_name = 'FusionInterface.xsl', FOR_UPDATE=F, BATCH_HINT=50, BOF_DQL=T]],50,true,true)
0.505   select r_object_id from dm_sysobject where folder ('/myInsight/Presentations/Life Sciences') and object_name = 'Unique Templates.xsl', FOR_UPDATE=F, BATCH_HINT=50, BOF_DQL=T]],50,true,true)
[dmadmin@content_server_01 dfc_tracing]$

 

To conclude this blog on a more philosophical note: think twice before doing the same thing over and over again because it might just be useless. ;)

 

 

Leave a Reply


4 − four =

Morgan Patou
Morgan Patou

Senior Consultant