A few years ago, I wrote this blog about a WLST script to monitor a WebLogic Server. At that time, we were managing a Documentum Platform with 115 servers and now, it’s more than 700 servers so I wanted to come back in this blog with an update on the WLST script.
1. Update of the WLST script needed
Over the past two years, we installed a lot of new servers with a lot of new components. Some of these components required us to adapt slightly our monitoring solution to be able to handle the monitoring in the same, efficient way, for all servers of our Platform: we want to have a single solution which fits all cases. The new cases we came accross where WebLogic Clustering as well as EAR Applications.
In the past, we only had WAR files related to Documentum: D2.war, da.war, D2-REST.war, aso… All these WAR files are quite simple to monitor because one “ApplicationRuntimes” equal one “ComponentRuntimes” (I’m talking here about the WLST script from the previous blog). So basically if you want to check the number of open sessions [get(‘OpenSessionsCurrentCount’)] or the total amount of sessions [get(‘SessionsOpenedTotalCount’)], then it’s just one value. EAR files often contain WAR file(s) as well as other components so in this case, you have potentially a lot of “ComponentRuntimes” for each “ApplicationRuntimes”. Therefore, the best way I found to keep having a single monitoring solution for all WebLogic Servers, no matter what application is deployed on it, was to loop on each components and cumulate the number of open (respectively total sessions) for each components and then return that for the application.
In addition to that, we also started to deploy some WebLogic Servers in Cluster so the monitoring script also needed to take that into account. In the previous version, the WLST script supposed that the deployment was a single local Managed Server (local to the AdminServer) so in case of a WLS Cluster, the deployment target can be a cluster and in this case, the WLST script wouldn’t find the correct monitoring value so I had to introduce a check on whether or not the Application is deployed on a cluster and in this case, then I’m selecting the deployment on the local Managed Server that is part of this cluster. We are using the NodeManager Listen Address to know if the Managed Server is a local one so it expects both the NodeManager and the Managed Server to use the same Listen Address.
As a side note, in case you have a WebLogic Cluster that is deploying an Application only on certain machines of the WebLogic Domain (so for example you have 3 machines but a cluster only targets 2 of them), then on the machine(s) where the Application isn’t deployed by the WebLogic Cluster, the monitoring will still try to find the Application on a local Managed Server and it will not succeed. This will still create a log file for this Application with the following content: “CRITICAL – The Managed Server ‘ + appTargetName + ‘ or the Application ‘ + app.getName() + ‘ is not started”. This is expected since the Application isn’t deployed there but it’s then your job to either set the monitoring tool to expect a CRITICAL or just not check this specific log file for this machine.
Finally the last modification I did was using a properties file instead of embedded properties because we are now deploying more and more WebLogic Servers with our silent scripts (takes a few minutes to have a WLS fully installed, configured, with clustering, with SSL, aso…) and it is easier to have a properties file for a WebLogic Domain that is used by our WebLogic Servers as well as by the Monitoring System to know what’s installed, if it’s a cluster, where is the AdminServer, if it’s using t3 or t3s, aso…
2. WebLogic Domain properties file
As mentioned above, we started to use properties file with our silent scripts to describes what is installed on the local server aso… This is an extract of a domain.properties file that we are using:
[[email protected]_server_01 ~]$ cat /app/weblogic/wlst/domain.properties ... NM_HOST=weblogic_server_01.dbi-services.com ADMIN_URL=t3s://weblogic_server_01.dbi-services.com:8443 DOMAIN_NAME=MyDomain ... CLUSTERS=clusterWS-01:msWS-011,machine-01,weblogic_server_01.dbi-services.com,8080,8081:msWS-012,machine-02,weblogic_server_02.dbi-services.com,8080,8081|clusterWS-02:msWS-021,machine-01,weblogic_server_01.dbi-services.com,8082,8083:msWS-022,machine-02,weblogic_server_02.dbi-services.com,8082,8083 ... [[email protected]_server_01 ~]$
The parameter “CLUSTERS” in this properties file is composed in the following way:
- If it’s a WebLogic Domain with Clustering: CLUSTERS=cluster1:ms11,machine11,listen11,http11,https11:ms12,machine12,…|cluster2:ms21,machine21,…:ms22,machine22,…:ms23,machine23,…
- ms11 and ms12 being 2 Managed Servers part of the cluster cluster1
- ms21, ms22 and ms23 being 3 Managed Servers part of the cluster cluster2
- If it’s not a WebLogic Domain with Clustering: CLUSTERS= (equal nothing, it’s empty, not needed)
There are other properties in this domain.properties of ours like the config and key secure files that WebLogic is using (different from the Nagios ones), the NodeManager configuration (port, type, config & key secure files as well) and a few other things about the AdminServer, the list of Managed Servers, aso… But all these properties aren’t needed for the monitoring topic so I’m only showing the ones that make sense.
3. New version of the WLST script
Enough talk, I assume you came here for the WLST script so here it is. I highlighted below what changed compared to the previous version so you can spot easily how the customization was done:
[[email protected]_server_01 ~]$ cat /app/nagios/etc/objects/scripts/MyDomain_check_weblogic.wls # WLST # Identification: check_weblogic.wls v1.2 15/08/2018 # # File: check_weblogic.wls # Purpose: check if a WebLogic Server is running properly # Author: dbi services (Morgan Patou) # Version: 1.0 23/03/2016 # Version: 1.1 14/06/2018 - re-formatting # Version: 1.2 15/08/2018 - including cluster & EAR support # ################################################### from java.io import File from java.io import FileOutputStream import re properties='/app/weblogic/wlst/domain.properties' try: loadProperties(properties) except: exit() directory='/app/nagios/etc/objects/scripts' userConfig=directory + '/' + DOMAIN_NAME + '_configfile.secure' userKey=directory + '/' + DOMAIN_NAME + '_keyfile.secure' try: connect(userConfigFile=userConfig, userKeyFile=userKey, url=ADMIN_URL) except: exit() def setOutputToFile(fileName): outputFile=File(fileName) fos=FileOutputStream(outputFile) theInterpreter.setOut(fos) def setOutputToNull(): outputFile=File('/dev/null') fos=FileOutputStream(outputFile) theInterpreter.setOut(fos) def getLocalServerName(clustername): localServerName="" for clusterList in CLUSTERS.split('|'): found=0 for clusterMember in clusterList.split(':'): if found == 1: clusterMemberDetails=clusterMember.split(',') if clusterMemberDetails == NM_HOST: localServerName=clusterMemberDetails if clusterMember == clustername: found=1 return localServerName while 1: domainRuntime() for server in domainRuntimeService.getServerRuntimes(): setOutputToFile(directory + '/wl_threadpool_' + domainName + '_' + server.getName() + '.out') cd('/ServerRuntimes/' + server.getName() + '/ThreadPoolRuntime/ThreadPoolRuntime') print 'threadpool_' + domainName + '_' + server.getName() + '_OUT',get('ExecuteThreadTotalCount'),get('HoggingThreadCount'),get('PendingUserRequestCount'),get('CompletedRequestCount'),get('Throughput'),get('HealthState') setOutputToNull() setOutputToFile(directory + '/wl_heapfree_' + domainName + '_' + server.getName() + '.out') cd('/ServerRuntimes/' + server.getName() + '/JVMRuntime/' + server.getName()) print 'heapfree_' + domainName + '_' + server.getName() + '_OUT',get('HeapFreeCurrent'),get('HeapSizeCurrent'),get('HeapFreePercent') setOutputToNull() try: setOutputToFile(directory + '/wl_sessions_' + domainName + '_console.out') cd('/ServerRuntimes/AdminServer/ApplicationRuntimes/consoleapp/ComponentRuntimes/AdminServer_/console') print 'sessions_' + domainName + '_console_OUT',get('OpenSessionsCurrentCount'),get('SessionsOpenedTotalCount') setOutputToNull() except WLSTException,e: setOutputToFile(directory + '/wl_sessions_' + domainName + '_console.out') print 'CRITICAL - The Server AdminServer or the Administrator Console is not started' setOutputToNull() domainConfig() for app in cmo.getAppDeployments(): domainConfig() cd('/AppDeployments/' + app.getName()) for appTarget in cmo.getTargets(): if appTarget.getType() == "Cluster": appTargetName=getLocalServerName(appTarget.getName()) else: appTargetName=appTarget.getName() print appTargetName domainRuntime() try: setOutputToFile(directory + '/wl_sessions_' + domainName + '_' + app.getName() + '.out') cd('/ServerRuntimes/' + appTargetName + '/ApplicationRuntimes/' + app.getName()) openSessions=0 totalSessions=0 for appComponent in cmo.getComponentRuntimes(): result=re.search(appTargetName,appComponent.getName()) if result != None: cd('ComponentRuntimes/' + appComponent.getName()) try: openSessions+=get('OpenSessionsCurrentCount') totalSessions+=get('SessionsOpenedTotalCount') except WLSTException,e: cd('/ServerRuntimes/' + appTargetName + '/ApplicationRuntimes/' + app.getName()) cd('/ServerRuntimes/' + appTargetName + '/ApplicationRuntimes/' + app.getName()) print 'sessions_' + domainName + '_' + app.getName() + '_OUT',openSessions,totalSessions setOutputToNull() except WLSTException,e: setOutputToFile(directory + '/wl_sessions_' + domainName + '_' + app.getName() + '.out') print 'CRITICAL - The Managed Server ' + appTargetName + ' or the Application ' + app.getName() + ' is not started' setOutputToNull() java.lang.Thread.sleep(120000) [[email protected]_server_01 ~]$
For all our WAR files, even if the WLST script changed, the outcome is the same since there is only one component and for the EAR files, it will just add all of the open sessions into a global count. Obviously, this doesn’t necessary represent the real number of “user” sessions but it’s an estimation of the load. We do not really care about a specific number but we want to see how the load evolves during the day and we can adjust our thresholds to take into account that it’s not just a single component’s sessions but it’s a global count.
You can obviously tweak the script to match your needs but this is working pretty well for us on all our environments. If you have ideas about what could be updated to make it even better, don’t hesitate to share!