Infrastructure at your Service

Hello everybody,

 

Introduction:

Today we will try to debug a classic case of schedulers: The wait host status.

 

Context:

On monitoring domain, one or more jobs are not submitted to the host then are in blue (wait host) status

Example:

The aim is to know what is happening to this job and fix it.

Analysis:

The aim is to troubleshoot what could be one of the root cause the host in unavailable and fix it

Result:

Job is in “wait host status” (you can also see it in the run information part located under the right panel.)
We can also read that no machine is available.

Next step is to check if the machine is still alive by making a ping on it:

 

Note:

You can also use the nc command to check if communication with agent is opened.

nc -v <server> <port>,

Example :  nc myserver 7006

Note: it doesn’t indicate that agent is running!

Result:

Ping is OK machine is alive (of course there can be some other parameters like firewall rules or services not activated)

If the machine is available, maybe the issue is from the agent, so we will check it.

1) By using CCM

We note that the agent is unavailable, to have more info we will connect to the machine where the agent is installed

2) By Connecting on the agent’s machine

By connect on the machine with ctm agent user we see that the agent is not running and should be the cause of our trouble
Below the result after the ag_diag_comm command (used to check the agent status)

 

If we start it again it can solve the issue.

a) Restarting the Control-M/Agent:

By using CTM utility start-ag  and shut-ag:

 

[[email protected] scripts]# /home/controlm/ctm_agent/ctm/scripts/start-ag

 Enter Control-M/Agent UNIX username [controlm]:

 Enter Control-M/Agent Process Name <AG|AT|AR|ALL> [ALL]:


Starting the agent as 'root' user

Control-M/Agent Listener started. pid: 23534
Control-M/Agent Tracker started. pid: 23603

Control-M/Agent started successfully.

b) Check agent status

Result:

Agent is running successfully, so let’s go back to our blue job in the monitoring domain:
Now we can see that the job has been submitted to the host for execution:
As soon as we started the agent again, the job could be executed.

To finish, we can check the agent status on the CCM:

 

Agent is OK on CCM and jobs can now be executed.

Note:

This ping from CCM  can be performed from the Control-M server in command line:

“ctmping -HOSTID <server>”  command is more efficient as it  takes in account the configuration Control-M/server and Control-M Agent.

Conclusion:

Now ,we have the steps to fix this “wait hosts” status , you will be able to correct it quickly😊

Note:

If you have the “no machine available” message and your ping is not OK, please check with your system admin to get the server where the agent is installed up again. Then check if agent could be started successfully.

Feel free to check our dbi’s bloggers to have more tips and tricks, and also you can have a look on BMC site.

 

See you for the next blog , don’t forget to share and comment!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Nabil Saoual
Nabil Saoual

Consultant