Hello everybody,
Introduction
Today we will try to debug a classic case of schedulers: The wait host status.
Context:
On monitoring domain, one or more jobs are not submitted to the host then are in blue (wait host) status
Example:
The aim is to know what is happening to this job and fix it
Analysis
The aim is to troubleshoot what could be one of the root cause the host in unavailable and fix it
Result
Job is in “wait host status” (you can also see it in the run information part located under the right panel.)
We can also read that no machine is available.
Next step is to check if the machine is still alive by making a ping on it
Note:
You can also use the nc command to check if communication with agent is opened.
nc -v <server> <port>,
Example : nc myserver 7006
Note: it doesn’t indicate that agent is running!
Result:
Ping is OK machine is alive (of course there can be some other parameters like firewall rules or services not activated)
If the machine is available, maybe the issue is from the agent, so we will check it.
By using CCM
We note that the agent is unavailable, to have more info we will connect to the machine where the agent is installed
By Connecting on the agent’s machine
By connect on the machine with ctm agent user we see that the agent is not running and should be the cause of our trouble
Below the result after the ag_diag_comm command (used to check the agent status)
If we start it again it can solve the issue.
Restarting the Control-M/Agent:
By using CTM utility start-ag and shut-ag
[root@CTMSRVCENTOS scripts]# /home/controlm/ctm_agent/ctm/scripts/start-ag Enter Control-M/Agent UNIX username [controlm]: Enter Control-M/Agent Process Name <AG|AT|AR|ALL> [ALL]: Starting the agent as 'root' user Control-M/Agent Listener started. pid: 23534 Control-M/Agent Tracker started. pid: 23603 Control-M/Agent started successfully.
Check agent status
Result
Agent is running successfully, so let’s go back to our blue job in the monitoring domain:
Now we can see that the job has been submitted to the host for execution:
As soon as we started the agent again, the job could be executed.
To finish, we can check the agent status on the CCM:
Agent is OK on CCM and jobs can now be executed.
Note:
This ping from CCM can be performed from the Control-M server in command line:
“ctmping -HOSTID <server>” command is more efficient as it takes in account the configuration Control-M/server and Control-M Agent.
Conclusion
Now ,we have the steps to fix this “wait hosts” status , you will be able to correct it quickly😊
Note:
If you have the “no machine available” message and your ping is not OK, please check with your system admin to get the server where the agent is installed up again. Then check if agent could be started successfully.
Feel free to check our dbi’s bloggers to have more tips and tricks, and also you can have a look on BMC site.
See you for the next blog , don’t forget to share and comment!
yipityup
18.07.2023This was a great write up and immediately addressed my needs.
Thanks Nabil!
Hello,
Thanks also for your feedback, happy to help you, you are welcome :)
Nabil