As already mentioned in a previous blog, ActiveMQ has been used in the scope of Alfresco since quite some time. Initially, as an optional component but then, it became mandatory. Just like for the Database, if ActiveMQ isn’t installed or not reachable, Alfresco will never completely start. Of course, for the Database, you will see that quickly since there will be errors on the Alfresco startup logs but that’s not really the case for ActiveMQ because it is only a subsystem and Alfresco will just quietly wait for this “Messaging” subsystem to be available before continuing the startup process. In this blog, I will show a case I faced recently where ActiveMQ wouldn’t start for a non-obvious reason and therefore Alfresco was stuck in the startup process as well.
You can always try to disable the automatic startup of the “Messaging” subsystem (using “messaging.subsystem.autoStart=false“) and you might want to disable the Event one as well. However, it should only stop the automatic startup of the subsystem, that doesn’t mean that it won’t be started later, on-demand. Another alternative could be to use the in-memory broker queue (using something like “messaging.broker.url=vm://localhost?broker.persistent=false” and making sure you have the activemq broker jar loaded). These options are out of scope of this blog since I needed a real ActiveMQ installation, working.
So, in case the startup of Alfresco is blocked on the “Messaging” subsystem, the logs will contain something like:
alfresco@alf01:~$ cat /opt/tomcat/logs/catalina.out ... 2021-07-09 12:15:00,670 INFO [management.subsystems.ChildApplicationContextFactory] [localhost-startStop-1] Starting 'Subscriptions' subsystem, ID: [Subscriptions, default] 2021-07-09 12:15:00,692 INFO [management.subsystems.ChildApplicationContextFactory] [localhost-startStop-1] Startup of 'Subscriptions' subsystem, ID: [Subscriptions, default] complete 2021-07-09 12:15:00,713 INFO [management.subsystems.ChildApplicationContextFactory] [localhost-startStop-1] Starting 'Synchronization' subsystem, ID: [Synchronization, default] 2021-07-09 12:15:00,903 INFO [management.subsystems.ChildApplicationContextFactory] [localhost-startStop-1] Startup of 'Synchronization' subsystem, ID: [Synchronization, default] complete 2021-07-09 12:15:00,979 INFO [service.descriptor.DescriptorService] [localhost-startStop-1] Alfresco JVM - v11.0.10+9-Ubuntu-0ubuntu1.18.04; maximum heap size 4096.000MB 2021-07-09 12:15:01,019 INFO [service.descriptor.DescriptorService] [localhost-startStop-1] ================================================================ 2021-07-09 12:15:01,020 INFO [service.descriptor.DescriptorService] [localhost-startStop-1] Alfresco license: Mode ENTERPRISE, cluster:enabled granted to Trial User limited to 2 days expiring Sun Jul 11 00:00:00 CEST 2021 (2 days remaining). 2021-07-09 12:15:01,032 INFO [service.descriptor.DescriptorService] [localhost-startStop-1] Note: this is a limited trial of the Enterprise version of Alfresco Content Services that goes into read-only mode after 2 days. Request an extended 30-day trial at: https://www.alfresco.com/platform/content-services-ecm/trial/docker 2021-07-09 12:15:01,033 INFO [service.descriptor.DescriptorService] [localhost-startStop-1] ================================================================ 2021-07-09 12:15:01,034 INFO [service.descriptor.DescriptorService] [localhost-startStop-1] Server Mode :PRODUCTION 2021-07-09 12:15:01,035 INFO [service.descriptor.DescriptorService] [localhost-startStop-1] Alfresco Content Services started (Enterprise). Current version: 6.1.1 (3 r1dba1394-b23) schema 12,005. Originally installed version: 6.1.1 (3 r1dba1394-b23) schema 12,005. 2021-07-09 12:15:01,102 INFO [management.subsystems.ChildApplicationContextFactory] [localhost-startStop-1] Starting 'ActivitiesFeed' subsystem, ID: [ActivitiesFeed, default] 2021-07-09 12:15:01,892 INFO [management.subsystems.ChildApplicationContextFactory] [localhost-startStop-1] Startup of 'ActivitiesFeed' subsystem, ID: [ActivitiesFeed, default] complete 2021-07-09 12:15:02,612 INFO [repo.authorization.AuthorizationService] [localhost-startStop-1] Preauthorized user: admin 2021-07-09 12:15:02,620 INFO [management.subsystems.ChildApplicationContextFactory] [localhost-startStop-1] Starting 'Replication' subsystem, ID: [Replication, default] 2021-07-09 12:15:02,640 INFO [management.subsystems.ChildApplicationContextFactory] [localhost-startStop-1] Startup of 'Replication' subsystem, ID: [Replication, default] complete 2021-07-09 12:15:05,278 INFO [management.subsystems.ChildApplicationContextFactory] [localhost-startStop-1] Starting 'Messaging' subsystem, ID: [Messaging, default] 2021-07-09 12:15:44,013 WARN [repo.usage.RepoUsageMonitor] [DefaultScheduler_Worker-3] The Alfresco Content Services license will expire in 2 days. 09-Jul-2021 12:16:17.538 INFO [hz._hzInstance_1_slingshot.cached.thread-3] com.hazelcast.nio.SocketAcceptor.null [alf01.dbi-services.com]:5801 [slingshot] 5801 is accepting socket connection from /10.10.10.12:56601 09-Jul-2021 12:16:17.551 INFO [hz._hzInstance_1_slingshot.cached.thread-3] com.hazelcast.nio.ConnectionManager.null [alf01.dbi-services.com]:5801 [slingshot] 5801 accepted socket connection from /10.10.10.12:56601 09-Jul-2021 12:16:24.522 INFO [hz._hzInstance_1_slingshot.ServiceThread] com.hazelcast.cluster.ClusterManager.null [alf01.dbi-services.com]:5801 [slingshot] Members [2] { Member [alf01.dbi-services.com]:5801 this Member [alf02.dbi-services.com]:5801 } 2021-07-09 12:17:29,624 INFO [cluster.core.MembershipChangeLogger] [hz._hzInstance_1_MainRepository-e941bb53-3778-456a-b330-ecfb98f54c2a.event-5] Member joined: 10.10.10.12:5701 (hostname: alf02.dbi-services.com) 2021-07-09 12:17:29,629 INFO [cluster.core.MembershipChangeLogger] [hz._hzInstance_1_MainRepository-e941bb53-3778-456a-b330-ecfb98f54c2a.event-5] Current cluster members: 10.10.10.11:5701 (hostname: alf01.dbi-services.com) 10.10.10.12:5701 (hostname: alf02.dbi-services.com) alfresco@alf01:~$
As shown above, the last message during startup is “Starting ‘Messaging’ subsystem, ID: [Messaging, default]“. After that, nothing happens anymore (remaining messages are related to license status and cluster management). As mentioned previously, this is because ActiveMQ is either not running at all, currently starting or just unreachable. Looking at the ActiveMQ logs showed the below:
alfresco@alf01:~$ date Fri Jul 9 12:18:01 CEST 2021 alfresco@alf01:~$ alfresco@alf01:~$ cat /opt/activemq/logs/activemq.log 2021-07-09 12:14:33,968 | INFO | Refreshing org.apache.activemq.xbean.XBeanBrokerFactory$1@130161f7: startup date [Fri Jul 09 12:14:33 CEST 2021]; root of context hierarchy | org.apache.activemq.xbean.XBeanBrokerFactory$1 | main 2021-07-09 12:14:35,745 | INFO | Using Persistence Adapter: KahaDBPersistenceAdapter[/data/activemq/kahadb] | org.apache.activemq.broker.BrokerService | main alfresco@alf01:~$
ActiveMQ is apparently in the starting process but it is also stuck on the Persistence Adapter. Looking at the configuration and the current files present:
alfresco@alf01:~$ grep -A2 "<persistenceAdapter" /opt/activemq/conf/activemq.xml <persistenceAdapter> <kahaDB directory="/data/activemq/kahadb"/> </persistenceAdapter> alfresco@alf01:~$ alfresco@alf01:~$ ll /data/activemq/kahadb/ total 180 drwxr-x--- 1 alfresco alfresco 52 Jul 9 12:14 ./ drwxr-x--- 1 alfresco alfresco 12 Jul 9 12:14 ../ -rw-r----- 1 alfresco alfresco 8 Jul 9 12:14 lock alfresco@alf01:~$
So, the lock file seems present but there is no DB yet. This usually happens when the lock wasn’t set yet. This environment is an Alfresco Cluster, with therefore multiple nodes and multiple ActiveMQ as well. The simplest setup in this case is to use a shared storage for the kahaDB (using NAS in this case). To debug the issue, I tried to start the ActiveMQ but with a local path, which isn’t on a NAS. It started properly:
alfresco@alf01:~$ cat /opt/activemq/logs/activemq.log 2021-07-09 12:22:11,112 | INFO | Refreshing org.apache.activemq.xbean.XBeanBrokerFactory$1@130161f7: startup date [Fri Jul 09 12:22:11 CEST 2021]; root of context hierarchy | org.apache.activemq.xbean.XBeanBrokerFactory$1 | main 2021-07-09 12:22:14,604 | INFO | Using Persistence Adapter: KahaDBPersistenceAdapter[/opt/activemq/data/kahadb] | org.apache.activemq.broker.BrokerService | main 2021-07-09 12:22:14,839 | INFO | PListStore:[/opt/activemq/data/alfdms.b1/tmp_storage] started | org.apache.activemq.store.kahadb.plist.PListStoreImpl | main 2021-07-09 12:22:15,088 | INFO | Apache ActiveMQ 5.15.6 (alfdms.b1, ID:alf01-37899-1625826134875-0:1) is starting | org.apache.activemq.broker.BrokerService | main 2021-07-09 12:22:15,176 | INFO | Listening for connections at: tcp://alf01:61616?maximumConnections=1000&wireFormat.maxFrameSize=104857600 | org.apache.activemq.transport.TransportServerThreadSupport | main 2021-07-09 12:22:15,181 | INFO | Connector openwire started | org.apache.activemq.broker.TransportConnector | main 2021-07-09 12:22:15,207 | INFO | Listening for connections at: amqp://alf01:5672?maximumConnections=1000&wireFormat.maxFrameSize=104857600 | org.apache.activemq.transport.TransportServerThreadSupport | main 2021-07-09 12:22:15,215 | INFO | Connector amqp started | org.apache.activemq.broker.TransportConnector | main 2021-07-09 12:22:15,235 | INFO | Listening for connections at: stomp://alf01:61613?maximumConnections=1000&wireFormat.maxFrameSize=104857600 | org.apache.activemq.transport.TransportServerThreadSupport | main 2021-07-09 12:22:15,244 | INFO | Connector stomp started | org.apache.activemq.broker.TransportConnector | main 2021-07-09 12:22:15,263 | INFO | Listening for connections at: mqtt://alf01:1883?maximumConnections=1000&wireFormat.maxFrameSize=104857600 | org.apache.activemq.transport.TransportServerThreadSupport | main 2021-07-09 12:22:15,268 | INFO | Connector mqtt started | org.apache.activemq.broker.TransportConnector | main 2021-07-09 12:22:15,855 | WARN | [email protected]@4f449e8f{/,null,STARTING} has uncovered http methods for path: / | org.eclipse.jetty.security.SecurityHandler | main 2021-07-09 12:22:16,039 | INFO | Listening for connections at ws://alf01:61614?maximumConnections=1000&wireFormat.maxFrameSize=104857600 | org.apache.activemq.transport.ws.WSTransportServer | main 2021-07-09 12:22:16,041 | INFO | Connector ws started | org.apache.activemq.broker.TransportConnector | main 2021-07-09 12:22:16,043 | INFO | Apache ActiveMQ 5.15.6 (alfdms.b1, ID:alf01-37899-1625826134875-0:1) started | org.apache.activemq.broker.BrokerService | main 2021-07-09 12:22:16,045 | INFO | For help or more information please see: http://activemq.apache.org | org.apache.activemq.broker.BrokerService | main 2021-07-09 12:22:16,052 | WARN | Store limit is 10240 mb (current store usage is 0 mb). The data directory: /opt/activemq/data/kahadb only has 8960 mb of usable space. - resetting to maximum available disk space: 8960 mb | org.apache.activemq.broker.BrokerService | main 2021-07-09 12:22:17,585 | INFO | No Spring WebApplicationInitializer types detected on classpath | /admin | main 2021-07-09 12:22:18,425 | INFO | ActiveMQ WebConsole available at http://0.0.0.0:8161/ | org.apache.activemq.web.WebConsoleStarter | main 2021-07-09 12:22:18,426 | INFO | ActiveMQ Jolokia REST API available at http://0.0.0.0:8161/api/jolokia/ | org.apache.activemq.web.WebConsoleStarter | main 2021-07-09 12:22:18,526 | INFO | Initializing Spring FrameworkServlet 'dispatcher' | /admin | main 2021-07-09 12:22:18,802 | INFO | No Spring WebApplicationInitializer types detected on classpath | /api | main 2021-07-09 12:22:18,992 | INFO | jolokia-agent: Using policy access restrictor classpath:/jolokia-access.xml | /api | main alfresco@alf01:~$
As soon as ActiveMQ was running, Alfresco could continue the startup process as well:
2021-07-09 12:17:29,624 INFO [cluster.core.MembershipChangeLogger] [hz._hzInstance_1_MainRepository-e941bb53-3778-456a-b330-ecfb98f54c2a.event-5] Member joined: 10.10.10.12:5701 (hostname: alf02.dbi-services.com) 2021-07-09 12:17:29,629 INFO [cluster.core.MembershipChangeLogger] [hz._hzInstance_1_MainRepository-e941bb53-3778-456a-b330-ecfb98f54c2a.event-5] Current cluster members: 10.10.10.11:5701 (hostname: alf01.dbi-services.com) 10.10.10.12:5701 (hostname: alf02.dbi-services.com) 2021-07-09 12:22:25,435 INFO [management.subsystems.ChildApplicationContextFactory] [localhost-startStop-1] Startup of 'Messaging' subsystem, ID: [Messaging, default] complete 2021-07-09 12:22:36,007 INFO [extensions.webscripts.DeclarativeRegistry] [asynchronouslyRefreshedCacheThreadPool1] Registered 500 Web Scripts (+0 failed), 656 URLs 2021-07-09 12:22:36,020 INFO [extensions.webscripts.DeclarativeRegistry] [asynchronouslyRefreshedCacheThreadPool1] Registered 1 Package Description Documents (+0 failed) 2021-07-09 12:22:36,021 INFO [extensions.webscripts.DeclarativeRegistry] [asynchronouslyRefreshedCacheThreadPool1] Registered 0 Schema Description Documents (+0 failed) 2021-07-09 12:22:36,106 INFO [extensions.webscripts.DeclarativeRegistry] [asynchronouslyRefreshedCacheThreadPool1] Registered 0 Web Scripts (+0 failed), 0 URLs ...
So, this issue is linked to the NAS. Since I already faced this kind of issue in the past years, I tried looking at the ports used by the NAS for the nlockmgr (Lock Manager):
alfresco@alf01:~$ rpcinfo -p 10.10.1.101 | grep nlockmgr 100021 1 udp 52352 nlockmgr 100021 3 udp 52352 nlockmgr 100021 4 udp 52352 nlockmgr 100021 1 tcp 46364 nlockmgr 100021 3 tcp 46364 nlockmgr 100021 4 tcp 46364 nlockmgr alfresco@alf01:~$
I asked a colleague to look at the traces on the network layer, to see if something was being blocked/dropped and he could confirm me that the issue was simply that the port tcp/46364 was closed on the FireWall between the Alfresco/ActiveMQ VM and the NAS. Opening the port allowed ActiveMQ to request a lock of the file and therefore to start properly while using the NAS (port opened ~6min after the startup):
alfresco@alf01:~$ cat /opt/activemq/logs/activemq.log 2021-07-09 13:51:07,231 | INFO | Refreshing org.apache.activemq.xbean.XBeanBrokerFactory$1@130161f7: startup date [Fri Jul 09 13:51:07 CEST 2021]; root of context hierarchy | org.apache.activemq.xbean.XBeanBrokerFactory$1 | main 2021-07-09 13:51:10,349 | INFO | Using Persistence Adapter: KahaDBPersistenceAdapter[/data/activemq/kahadb] | org.apache.activemq.broker.BrokerService | main 2021-07-09 13:57:44,643 | INFO | PListStore:[/opt/activemq/data/alfdms.b1/tmp_storage] started | org.apache.activemq.store.kahadb.plist.PListStoreImpl | main 2021-07-09 13:57:44,794 | INFO | Apache ActiveMQ 5.15.6 (alfdms.b1, ID:alf01-40847-1625831864661-0:1) is starting | org.apache.activemq.broker.BrokerService | main 2021-07-09 13:57:44,819 | INFO | Listening for connections at: tcp://alf01:61616?maximumConnections=1000&wireFormat.maxFrameSize=104857600 | org.apache.activemq.transport.TransportServerThreadSupport | main 2021-07-09 13:57:44,821 | INFO | Connector openwire started | org.apache.activemq.broker.TransportConnector | main 2021-07-09 13:57:44,830 | INFO | Listening for connections at: amqp://alf01:5672?maximumConnections=1000&wireFormat.maxFrameSize=104857600 | org.apache.activemq.transport.TransportServerThreadSupport | main 2021-07-09 13:57:44,834 | INFO | Connector amqp started | org.apache.activemq.broker.TransportConnector | main 2021-07-09 13:57:44,843 | INFO | Listening for connections at: stomp://alf01:61613?maximumConnections=1000&wireFormat.maxFrameSize=104857600 | org.apache.activemq.transport.TransportServerThreadSupport | main 2021-07-09 13:57:44,846 | INFO | Connector stomp started | org.apache.activemq.broker.TransportConnector | main 2021-07-09 13:57:44,857 | INFO | Listening for connections at: mqtt://alf01:1883?maximumConnections=1000&wireFormat.maxFrameSize=104857600 | org.apache.activemq.transport.TransportServerThreadSupport | main 2021-07-09 13:57:44,859 | INFO | Connector mqtt started | org.apache.activemq.broker.TransportConnector | main 2021-07-09 13:57:45,171 | WARN | [email protected]@4f449e8f{/,null,STARTING} has uncovered http methods for path: / | org.eclipse.jetty.security.SecurityHandler | main 2021-07-09 13:57:45,273 | INFO | Listening for connections at ws://alf01:61614?maximumConnections=1000&wireFormat.maxFrameSize=104857600 | org.apache.activemq.transport.ws.WSTransportServer | main 2021-07-09 13:57:45,274 | INFO | Connector ws started | org.apache.activemq.broker.TransportConnector | main 2021-07-09 13:57:45,283 | INFO | Apache ActiveMQ 5.15.6 (alfdms.b1, ID:alf01-40847-1625831864661-0:1) started | org.apache.activemq.broker.BrokerService | main 2021-07-09 13:57:45,285 | INFO | For help or more information please see: http://activemq.apache.org | org.apache.activemq.broker.BrokerService | main 2021-07-09 13:57:47,106 | INFO | No Spring WebApplicationInitializer types detected on classpath | /admin | main 2021-07-09 13:57:48,247 | INFO | ActiveMQ WebConsole available at http://0.0.0.0:8161/ | org.apache.activemq.web.WebConsoleStarter | main 2021-07-09 13:57:48,247 | INFO | ActiveMQ Jolokia REST API available at http://0.0.0.0:8161/api/jolokia/ | org.apache.activemq.web.WebConsoleStarter | main 2021-07-09 13:57:48,297 | INFO | Initializing Spring FrameworkServlet 'dispatcher' | /admin | main 2021-07-09 13:57:48,484 | INFO | No Spring WebApplicationInitializer types detected on classpath | /api | main 2021-07-09 13:57:48,640 | INFO | jolokia-agent: Using policy access restrictor classpath:/jolokia-access.xml | /api | main alfresco@alf01:~$ alfresco@alf01:~$ ll /data/activemq/kahadb/ total 180 drwxr-x--- 1 alfresco alfresco 52 Jul 9 13:57 ./ drwxr-x--- 1 alfresco alfresco 12 Jul 9 13:57 ../ -rw-r----- 1 alfresco alfresco 3554432 Jul 9 14:03 db-1.log -rw-r----- 1 alfresco alfresco 53248 Jul 9 14:03 db.data -rw-r----- 1 alfresco alfresco 53344 Jul 9 14:03 db.redo -rw-r----- 1 alfresco alfresco 8 Jul 9 13:57 lock alfresco@alf01:~$
There are other reasons why ActiveMQ might not start properly but usually, what’s happening will be written inside the logs. This one is one of the few cases I faced where it’s not that obvious…