The application support informed me that their jobs are not running anymore. When I started the analysis, I found that all activated jobs did not start for a few weeks.
First of all, I decided to work on a specific job which is not one from application team but where I know that I can start it several times without impacting the business.
Do you know which one? dm_ContentWarning
I checked the job attributes like start_date, expiration_date, is_inactive, target_server (as we have several Content Server to cover the high availability), a_last_invocation, a_next_invocation and of course the a_current_status.
Once this first check was done, with the DA I started the job (selected run now and saved the job).
object_name : dm_ContentWarning start_date : 5/30/2017 20:00:00 expiration_date : 5/30/2025 20:00:00 max_iterations : 0 run_interval : 1 run_mode : 3 is_inactive : F inactivate_after_failure : F target_server : [email protected] a_last_invocation : 9/20/2018 19:05:29 a_last_completion : 9/20/2018 19:07:00 a_current_status : ContentWarning Tool Completed at 9/20/2018 19:06:50. Total duration was 1 minutes. a_next_invocation : 9/21/2018 19:05:00
Few minutes later, I checked again the result and the different attributes, not all attributes like before but only a_last_completion and a_next_invocation and of course the content of the job log file. The job ran as expected when I forced the job to run.
a_last_completion : 10/31/2018 10:41:25 a_current_status : ContentWarning Tool Completed at 10/31/2018 10:41:14. Total duration was 2 minutes. a_next_invocation : 10/31/2018 19:05:00
[[email protected] agentexec]$ more job_0801234380000359 Wed Oct 31 10:39:54 2018 [INFORMATION] [LAUNCHER 12071] Detected while preparing job dm_ContentWarning for execution: Agent Exec connected to server Docbase1: [DM_SESSION_I_SESSION_START]info: "Session 01012343807badd5 started for user dmadmin." ... ...
Ok the job ran and the a_next_invocation was set accordingly to run_interval and run_mode in our case once a day. (I thought), I found the reason of the issue: the repository was stopped for a few days and therefore, when restarted, the a_next_invocation date was in the past (a_next_invocation: 9/21/2018 19:05:00). So I decided to see the result the day after once the job ran based on the defined schedule (a_next_invocation: 10/31/2018 19:05:00).
The next day… the job did not run. Strange!
I decided to think a bit deeper ;-). Do something else to go a step further and set the a_next_invocation date to run the job in 5 minutes.
update dm_job objects set a_next_invocation = date('01.11.2018 11:53:00','dd.mm.yyyy hh:mi:ss') where object_name = 'dm_ContentWarning'; 1 select r_object_id, object_name, a_next_invocation from dm_job where object_name = 'dm_ContentWarning'; 0801234380000359 dm_ContentWarning 11/01/2018 11:53:00
Result, the job did not start. 🙁 Hmmm, why ?
Before continuing to work on the job, I did some other checks, like analyzing the log files, repository, agent_exec, sysadmin etc.
I found that the DB was down a few days before and decided to restart the repository, set the a_next_invocation again but unfortunately this did not help.
To be sure it’s not related to the full installation, I ran, successfully, a distributed job (the dm_contentWarningvmcs2_Docbase1) on the second Content Server. This meant the issue is only located on my first Content Server.
I knew, even if I did not used often it in my last 20 years in the Documentum world, that we can trace the agent_exec so let’s see this point:
- add for the dm_agent_method the parameter -trace_level 1
- reinit the server
- kill the dm_agent_exec process related to Docbase1, the process will be started automatically after few minutes.
[[email protected] agentexec]$ ps -ef | grep agent | grep Docbase1 dmadmin 27312 26944 0 Oct31 ? 00:00:49 ./dm_agent_exec -enable_ha_setup 1 -docbase_name dmadmin 27312 26944 0 Oct31 ? 00:00:49 ./dm_agent_exec -enable_ha_setup 1 -docbase_name Docbase1.Docbase1 -docbase_owner dmadmin -sleep_duration 0 [[email protected] agentexec]$ kill -9 27312 [[email protected] agentexec]$ ps -ef | grep agent | grep Docbase1 [[email protected] agentexec]$ [[email protected] agentexec]$ ps -ef | grep agent | grep Docbase1 dmadmin 15440 26944 57 07:48 ? 00:00:06 ./dm_agent_exec -enable_ha_setup 1 -trace_level 1 -docbase_name Docbase1.Docbase1 -docbase_owner dmadmin -sleep_duration 0 [[email protected] agentexec]$
I changed again the a_next_invocation and check the agent_exec log file where the executed queries have been recorded.
Two recorded queries seemed to be important:
SELECT count(r_object_id) as cnt FROM dm_job WHERE ( (run_now = 1) OR ((is_inactive = 0) AND ( ( a_next_invocation <= DATE('now') AND a_next_invocation IS NOT NULLDATE ) OR ( a_next_continuation DATE('now')) OR (expiration_date IS NULLDATE)) AND ((max_iterations = 0) OR (a_iterations < max_iterations))) ) AND (i_is_reference = 0 OR i_is_reference is NULL) AND (i_is_replica = 0 OR i_is_replica is NULL) AND UPPER(target_server) = '[email protected]' SELECT ALL r_object_id, a_next_invocation FROM dm_job WHERE ( (run_now = 1) OR ((is_inactive = 0) AND ( ( a_next_invocation <= DATE('now') AND a_next_invocation IS NOT NULLDATE ) OR ( a_next_continuation DATE('now')) OR (expiration_date IS NULLDATE)) AND ((max_iterations = 0) OR (a_iterations < max_iterations))) ) AND (i_is_reference = 0 OR i_is_reference is NULL) AND (i_is_replica = 0 OR i_is_replica is NULL) AND UPPER(target_server) = '[email protected]' ORDER BY run_now DESC, a_next_invocation, r_object_id ENABLE (RETURN_TOP 3 )
I executed the second query and it found three jobs (RETURN_TOP 3) which are from the application team. As the three selected jobs have an old a_next_invocation value, they will never run and will always be selected when the job is executed and unfortunately this means my dm_ContentWarning job will never be selected for automatic execution.
I informed the application team that I will keep only one job active (dm_ContentWarning) to see if the job will run. And guess what, it ran … YES!
Okay, now we have the solution:
- reactivate all previously deactivated job
- set the a_next_invocation to a future date
And do not forget to deactivate the trace for the dm_agent_exec.