The Documentum job dm_FTIndexAgentBoot does not start anymore and we have the following job status: The job object indicated the job was in progress. In this blog posting, I will describe how to analyze and solve this issue.

Analysis

The Index Agent does not start automatically when the repository is started although the option start_index_agents = T in server.ini is set.
When the server is started, the dm_FTIndexAgentBoot job is normally executed, but not in our case. All other jobs start well.

The job status indicates:
The job object indicated the job was in progress, but the job was not actually running.  It is likely that the dm_agent_exec utility was stopped while the job was in progress

To get more information, we have to start the dm_agent_exec with the trace option is set to true.

  1. Use DA to change the agent_exec_method method Verb parameter to:
    ./dm_agent_exec -trace_level 1
  2. Kill the dm_agent_exec process related to the repository from where the issue is linked:
    > ps -ef | grep dm_agent | grep “docbase_name test”
    dmadmin     16922 16117  0 14:14 ?        00:00:02 ./dm_agent_exec -docbase_name test.test -docbase_owner dmadmin -sleep_duration 0
    > kill 16922.
  3. The dm_agent_exec process will be automatically restarted with the new trace level
    > ps -ef | grep dm_agent | grep “docbase_name test”
    dmadmin     16486 16117  0 14:07 ?        00:00:02 ./dm_agent_exec -trace_level 1 -docbase_name test.test -docbase_owner dmadmin -sleep_duration 0

Once the trace_level is setted to 1 (true), more information is written into the agent_exec.log file.
By analyzing the log file we can see that the dm_agentexec process checks which job has to be started, by using the following queries:

execquery,s0,F,SELECT ALL   r_object_id, a_last_invocation,   a_last_completion, a_special_app FROM dm_job WHERE ( ((a_last_invocation IS NOT NULLDATE) AND    (a_last_completion IS NULLDATE))  OR  ((a_special_app = 'agentexec') AND (r_lock_machine = 'testcs')) ) AND (i_is_reference = 0 OR i_is_reference is NULL) AND (i_is_replica = 0 OR i_is_replica is NULL)
execquery,s0,F,SELECT ALL r_object_id, a_next_invocation FROM dm_job WHERE ( (run_now = 1) OR ((is_inactive = 0) AND ( ( a_next_invocation date > DATE('now')) OR (expiration_date IS NULLDATE)) AND ((max_iterations = 0) OR (a_iterations < max_iterations))) ) AND (i_is_reference = 0 OR i_is_reference is NULL) AND (i_is_replica = 0 OR i_is_replica is NULL) ORDER BY a_next_invocation, r_object_id

As these queries do not return a result (0 row), the job should have been started. So what is then wrong?
In the log file, the following information is written: Output File Name: /pkgs/dms/opt/documentum/dba/log/00010b9a/agentexec/job_08010b9a800003f5

“ls -ltr /pkgs/dms/opt/documentum/dba/log/00010b9a/agentexec” does not return any job_08010b9a800003f5 log file

BUT "ls -l job_08010b9a900003f5" returns:
---------- 1 dmadmin dmadmind 245 Aug 27 14:30 /pkgs/dms/opt/documentum/dba/log/00010b9a/agentexec/job_08010b9a800003f5

However, the file cannot be written by the agent_exec process due to the wrong file permissions!

Solution

The solution is to set the correct write permissions:

chmod 640 /pkgs/dms/opt/documentum/dba/log/00010b9a/agentexec/job_08010b9a800003f5

Do not forget to disable the trace flag on the agent_exec_method!

After that, the dm_FTIndexAgentBoot job starts well, because the output file can be written.
Of course this solution has to be applied to this specific issue; most important in this procedure, is the use of the trace flag, to check what happens on the agentexec side.