Infrastructure at your Service

As already mentioned in a previous blog, ActiveMQ has been used in the scope of Alfresco since quite some time. Initially, as an optional component but then, it became mandatory. Just like for the Database, if ActiveMQ isn’t installed or not reachable, Alfresco will never completely start. Of course, for the Database, you will see that quickly since there will be errors on the Alfresco startup logs but that’s not really the case for ActiveMQ because it is only a subsystem and Alfresco will just quietly wait for this “Messaging” subsystem to be available before continuing the startup process. In this blog, I will show a case I faced recently where ActiveMQ wouldn’t start for a non-obvious reason and therefore Alfresco was stuck in the startup process as well.

 

You can always try to disable the automatic startup of the “Messaging” subsystem (using “messaging.subsystem.autoStart=false“) and you might want to disable the Event one as well. However, it should only stop the automatic startup of the subsystem, that doesn’t mean that it won’t be started later, on-demand. Another alternative could be to use the in-memory broker queue (using something like “messaging.broker.url=vm://localhost?broker.persistent=false” and making sure you have the activemq broker jar loaded). These options are out of scope of this blog since I needed a real ActiveMQ installation, working.

 

So, in case the startup of Alfresco is blocked on the “Messaging” subsystem, the logs will contain something like:

[email protected]:~$ cat /opt/tomcat/logs/catalina.out
...
2021-07-09 12:15:00,670  INFO  [management.subsystems.ChildApplicationContextFactory] [localhost-startStop-1] Starting 'Subscriptions' subsystem, ID: [Subscriptions, default]
2021-07-09 12:15:00,692  INFO  [management.subsystems.ChildApplicationContextFactory] [localhost-startStop-1] Startup of 'Subscriptions' subsystem, ID: [Subscriptions, default] complete
2021-07-09 12:15:00,713  INFO  [management.subsystems.ChildApplicationContextFactory] [localhost-startStop-1] Starting 'Synchronization' subsystem, ID: [Synchronization, default]
2021-07-09 12:15:00,903  INFO  [management.subsystems.ChildApplicationContextFactory] [localhost-startStop-1] Startup of 'Synchronization' subsystem, ID: [Synchronization, default] complete
2021-07-09 12:15:00,979  INFO  [service.descriptor.DescriptorService] [localhost-startStop-1] Alfresco JVM - v11.0.10+9-Ubuntu-0ubuntu1.18.04; maximum heap size 4096.000MB
2021-07-09 12:15:01,019  INFO  [service.descriptor.DescriptorService] [localhost-startStop-1] ================================================================
2021-07-09 12:15:01,020  INFO  [service.descriptor.DescriptorService] [localhost-startStop-1] Alfresco license: Mode ENTERPRISE, cluster:enabled granted to Trial User limited to 2 days expiring Sun Jul 11 00:00:00 CEST 2021 (2 days remaining).
2021-07-09 12:15:01,032  INFO  [service.descriptor.DescriptorService] [localhost-startStop-1] Note: this is a limited trial of the Enterprise version of Alfresco Content Services that goes into read-only mode after 2 days. Request an extended 30-day trial at: https://www.alfresco.com/platform/content-services-ecm/trial/docker
2021-07-09 12:15:01,033  INFO  [service.descriptor.DescriptorService] [localhost-startStop-1] ================================================================
2021-07-09 12:15:01,034  INFO  [service.descriptor.DescriptorService] [localhost-startStop-1] Server Mode :PRODUCTION
2021-07-09 12:15:01,035  INFO  [service.descriptor.DescriptorService] [localhost-startStop-1] Alfresco Content Services started (Enterprise). Current version: 6.1.1 (3 r1dba1394-b23) schema 12,005. Originally installed version: 6.1.1 (3 r1dba1394-b23) schema 12,005.
2021-07-09 12:15:01,102  INFO  [management.subsystems.ChildApplicationContextFactory] [localhost-startStop-1] Starting 'ActivitiesFeed' subsystem, ID: [ActivitiesFeed, default]
2021-07-09 12:15:01,892  INFO  [management.subsystems.ChildApplicationContextFactory] [localhost-startStop-1] Startup of 'ActivitiesFeed' subsystem, ID: [ActivitiesFeed, default] complete
2021-07-09 12:15:02,612  INFO  [repo.authorization.AuthorizationService] [localhost-startStop-1] Preauthorized user: admin
2021-07-09 12:15:02,620  INFO  [management.subsystems.ChildApplicationContextFactory] [localhost-startStop-1] Starting 'Replication' subsystem, ID: [Replication, default]
2021-07-09 12:15:02,640  INFO  [management.subsystems.ChildApplicationContextFactory] [localhost-startStop-1] Startup of 'Replication' subsystem, ID: [Replication, default] complete
2021-07-09 12:15:05,278  INFO  [management.subsystems.ChildApplicationContextFactory] [localhost-startStop-1] Starting 'Messaging' subsystem, ID: [Messaging, default]
2021-07-09 12:15:44,013  WARN  [repo.usage.RepoUsageMonitor] [DefaultScheduler_Worker-3] The Alfresco Content Services license will expire in 2 days.
09-Jul-2021 12:16:17.538 INFO [hz._hzInstance_1_slingshot.cached.thread-3] com.hazelcast.nio.SocketAcceptor.null [alf01.dbi-services.com]:5801 [slingshot] 5801 is accepting socket connection from /10.10.10.12:56601
09-Jul-2021 12:16:17.551 INFO [hz._hzInstance_1_slingshot.cached.thread-3] com.hazelcast.nio.ConnectionManager.null [alf01.dbi-services.com]:5801 [slingshot] 5801 accepted socket connection from /10.10.10.12:56601
09-Jul-2021 12:16:24.522 INFO [hz._hzInstance_1_slingshot.ServiceThread] com.hazelcast.cluster.ClusterManager.null [alf01.dbi-services.com]:5801 [slingshot]

Members [2] {
 Member [alf01.dbi-services.com]:5801 this
 Member [alf02.dbi-services.com]:5801
}

2021-07-09 12:17:29,624  INFO  [cluster.core.MembershipChangeLogger] [hz._hzInstance_1_MainRepository-e941bb53-3778-456a-b330-ecfb98f54c2a.event-5] Member joined: 10.10.10.12:5701 (hostname: alf02.dbi-services.com)
2021-07-09 12:17:29,629  INFO  [cluster.core.MembershipChangeLogger] [hz._hzInstance_1_MainRepository-e941bb53-3778-456a-b330-ecfb98f54c2a.event-5] Current cluster members:
 10.10.10.11:5701 (hostname: alf01.dbi-services.com)
 10.10.10.12:5701 (hostname: alf02.dbi-services.com)
[email protected]:~$

 

As shown above, the last message during startup is “Starting ‘Messaging’ subsystem, ID: [Messaging, default]“. After that, nothing happens anymore (remaining messages are related to license status and cluster management). As mentioned previously, this is because ActiveMQ is either not running at all, currently starting or just unreachable. Looking at the ActiveMQ logs showed the below:

[email protected]:~$ date
Fri Jul  9 12:18:01 CEST 2021
[email protected]:~$
[email protected]:~$ cat /opt/activemq/logs/activemq.log
2021-07-09 12:14:33,968 | INFO  | Refreshing [email protected]: startup date [Fri Jul 09 12:14:33 CEST 2021]; root of context hierarchy | org.apache.activemq.xbean.XBeanBrokerFactory$1 | main
2021-07-09 12:14:35,745 | INFO  | Using Persistence Adapter: KahaDBPersistenceAdapter[/data/activemq/kahadb] | org.apache.activemq.broker.BrokerService | main
[email protected]:~$

 

ActiveMQ is apparently in the starting process but it is also stuck on the Persistence Adapter. Looking at the configuration and the current files present:

[email protected]:~$ grep -A2 "<persistenceAdapter" /opt/activemq/conf/activemq.xml
        <persistenceAdapter>
            <kahaDB directory="/data/activemq/kahadb"/>
        </persistenceAdapter>
[email protected]:~$
[email protected]:~$ ll /data/activemq/kahadb/
total 180
drwxr-x--- 1 alfresco alfresco       52 Jul  9 12:14 ./
drwxr-x--- 1 alfresco alfresco       12 Jul  9 12:14 ../
-rw-r----- 1 alfresco alfresco        8 Jul  9 12:14 lock
[email protected]:~$

 

So, the lock file seems present but there is no DB yet. This usually happens when the lock wasn’t set yet. This environment is an Alfresco Cluster, with therefore multiple nodes and multiple ActiveMQ as well. The simplest setup in this case is to use a shared storage for the kahaDB (using NAS in this case). To debug the issue, I tried to start the ActiveMQ but with a local path, which isn’t on a NAS. It started properly:

[email protected]:~$ cat /opt/activemq/logs/activemq.log
2021-07-09 12:22:11,112 | INFO  | Refreshing [email protected]: startup date [Fri Jul 09 12:22:11 CEST 2021]; root of context hierarchy | org.apache.activemq.xbean.XBeanBrokerFactory$1 | main
2021-07-09 12:22:14,604 | INFO  | Using Persistence Adapter: KahaDBPersistenceAdapter[/opt/activemq/data/kahadb] | org.apache.activemq.broker.BrokerService | main
2021-07-09 12:22:14,839 | INFO  | PListStore:[/opt/activemq/data/alfdms.b1/tmp_storage] started | org.apache.activemq.store.kahadb.plist.PListStoreImpl | main
2021-07-09 12:22:15,088 | INFO  | Apache ActiveMQ 5.15.6 (alfdms.b1, ID:alf01-37899-1625826134875-0:1) is starting | org.apache.activemq.broker.BrokerService | main
2021-07-09 12:22:15,176 | INFO  | Listening for connections at: tcp://alf01:61616?maximumConnections=1000&wireFormat.maxFrameSize=104857600 | org.apache.activemq.transport.TransportServerThreadSupport | main
2021-07-09 12:22:15,181 | INFO  | Connector openwire started | org.apache.activemq.broker.TransportConnector | main
2021-07-09 12:22:15,207 | INFO  | Listening for connections at: amqp://alf01:5672?maximumConnections=1000&wireFormat.maxFrameSize=104857600 | org.apache.activemq.transport.TransportServerThreadSupport | main
2021-07-09 12:22:15,215 | INFO  | Connector amqp started | org.apache.activemq.broker.TransportConnector | main
2021-07-09 12:22:15,235 | INFO  | Listening for connections at: stomp://alf01:61613?maximumConnections=1000&wireFormat.maxFrameSize=104857600 | org.apache.activemq.transport.TransportServerThreadSupport | main
2021-07-09 12:22:15,244 | INFO  | Connector stomp started | org.apache.activemq.broker.TransportConnector | main
2021-07-09 12:22:15,263 | INFO  | Listening for connections at: mqtt://alf01:1883?maximumConnections=1000&wireFormat.maxFrameSize=104857600 | org.apache.activemq.transport.TransportServerThreadSupport | main
2021-07-09 12:22:15,268 | INFO  | Connector mqtt started | org.apache.activemq.broker.TransportConnector | main
2021-07-09 12:22:15,855 | WARN  | [email protected]@4f449e8f{/,null,STARTING} has uncovered http methods for path: / | org.eclipse.jetty.security.SecurityHandler | main
2021-07-09 12:22:16,039 | INFO  | Listening for connections at ws://alf01:61614?maximumConnections=1000&wireFormat.maxFrameSize=104857600 | org.apache.activemq.transport.ws.WSTransportServer | main
2021-07-09 12:22:16,041 | INFO  | Connector ws started | org.apache.activemq.broker.TransportConnector | main
2021-07-09 12:22:16,043 | INFO  | Apache ActiveMQ 5.15.6 (alfdms.b1, ID:alf01-37899-1625826134875-0:1) started | org.apache.activemq.broker.BrokerService | main
2021-07-09 12:22:16,045 | INFO  | For help or more information please see: http://activemq.apache.org | org.apache.activemq.broker.BrokerService | main
2021-07-09 12:22:16,052 | WARN  | Store limit is 10240 mb (current store usage is 0 mb). The data directory: /opt/activemq/data/kahadb only has 8960 mb of usable space. - resetting to maximum available disk space: 8960 mb | org.apache.activemq.broker.BrokerService | main
2021-07-09 12:22:17,585 | INFO  | No Spring WebApplicationInitializer types detected on classpath | /admin | main
2021-07-09 12:22:18,425 | INFO  | ActiveMQ WebConsole available at http://0.0.0.0:8161/ | org.apache.activemq.web.WebConsoleStarter | main
2021-07-09 12:22:18,426 | INFO  | ActiveMQ Jolokia REST API available at http://0.0.0.0:8161/api/jolokia/ | org.apache.activemq.web.WebConsoleStarter | main
2021-07-09 12:22:18,526 | INFO  | Initializing Spring FrameworkServlet 'dispatcher' | /admin | main
2021-07-09 12:22:18,802 | INFO  | No Spring WebApplicationInitializer types detected on classpath | /api | main
2021-07-09 12:22:18,992 | INFO  | jolokia-agent: Using policy access restrictor classpath:/jolokia-access.xml | /api | main
[email protected]:~$

 

As soon as ActiveMQ was running, Alfresco could continue the startup process as well:

2021-07-09 12:17:29,624  INFO  [cluster.core.MembershipChangeLogger] [hz._hzInstance_1_MainRepository-e941bb53-3778-456a-b330-ecfb98f54c2a.event-5] Member joined: 10.10.10.12:5701 (hostname: alf02.dbi-services.com)
2021-07-09 12:17:29,629  INFO  [cluster.core.MembershipChangeLogger] [hz._hzInstance_1_MainRepository-e941bb53-3778-456a-b330-ecfb98f54c2a.event-5] Current cluster members:
 10.10.10.11:5701 (hostname: alf01.dbi-services.com)
 10.10.10.12:5701 (hostname: alf02.dbi-services.com)

2021-07-09 12:22:25,435  INFO  [management.subsystems.ChildApplicationContextFactory] [localhost-startStop-1] Startup of 'Messaging' subsystem, ID: [Messaging, default] complete
2021-07-09 12:22:36,007  INFO  [extensions.webscripts.DeclarativeRegistry] [asynchronouslyRefreshedCacheThreadPool1] Registered 500 Web Scripts (+0 failed), 656 URLs
2021-07-09 12:22:36,020  INFO  [extensions.webscripts.DeclarativeRegistry] [asynchronouslyRefreshedCacheThreadPool1] Registered 1 Package Description Documents (+0 failed)
2021-07-09 12:22:36,021  INFO  [extensions.webscripts.DeclarativeRegistry] [asynchronouslyRefreshedCacheThreadPool1] Registered 0 Schema Description Documents (+0 failed)
2021-07-09 12:22:36,106  INFO  [extensions.webscripts.DeclarativeRegistry] [asynchronouslyRefreshedCacheThreadPool1] Registered 0 Web Scripts (+0 failed), 0 URLs
...

 

So, this issue is linked to the NAS. Since I already faced this kind of issue in the past years, I tried looking at the ports used by the NAS for the nlockmgr (Lock Manager):

[email protected]:~$ rpcinfo -p 10.10.1.101 | grep nlockmgr
    100021    1   udp  52352  nlockmgr
    100021    3   udp  52352  nlockmgr
    100021    4   udp  52352  nlockmgr
    100021    1   tcp  46364  nlockmgr
    100021    3   tcp  46364  nlockmgr
    100021    4   tcp  46364  nlockmgr
[email protected]:~$

 

I asked a colleague to look at the traces on the network layer, to see if something was being blocked/dropped and he could confirm me that the issue was simply that the port tcp/46364 was closed on the FireWall between the Alfresco/ActiveMQ VM and the NAS. Opening the port allowed ActiveMQ to request a lock of the file and therefore to start properly while using the NAS (port opened ~6min after the startup):

[email protected]:~$ cat /opt/activemq/logs/activemq.log
2021-07-09 13:51:07,231 | INFO  | Refreshing [email protected]: startup date [Fri Jul 09 13:51:07 CEST 2021]; root of context hierarchy | org.apache.activemq.xbean.XBeanBrokerFactory$1 | main
2021-07-09 13:51:10,349 | INFO  | Using Persistence Adapter: KahaDBPersistenceAdapter[/data/activemq/kahadb] | org.apache.activemq.broker.BrokerService | main
2021-07-09 13:57:44,643 | INFO  | PListStore:[/opt/activemq/data/alfdms.b1/tmp_storage] started | org.apache.activemq.store.kahadb.plist.PListStoreImpl | main
2021-07-09 13:57:44,794 | INFO  | Apache ActiveMQ 5.15.6 (alfdms.b1, ID:alf01-40847-1625831864661-0:1) is starting | org.apache.activemq.broker.BrokerService | main
2021-07-09 13:57:44,819 | INFO  | Listening for connections at: tcp://alf01:61616?maximumConnections=1000&wireFormat.maxFrameSize=104857600 | org.apache.activemq.transport.TransportServerThreadSupport | main
2021-07-09 13:57:44,821 | INFO  | Connector openwire started | org.apache.activemq.broker.TransportConnector | main
2021-07-09 13:57:44,830 | INFO  | Listening for connections at: amqp://alf01:5672?maximumConnections=1000&wireFormat.maxFrameSize=104857600 | org.apache.activemq.transport.TransportServerThreadSupport | main
2021-07-09 13:57:44,834 | INFO  | Connector amqp started | org.apache.activemq.broker.TransportConnector | main
2021-07-09 13:57:44,843 | INFO  | Listening for connections at: stomp://alf01:61613?maximumConnections=1000&wireFormat.maxFrameSize=104857600 | org.apache.activemq.transport.TransportServerThreadSupport | main
2021-07-09 13:57:44,846 | INFO  | Connector stomp started | org.apache.activemq.broker.TransportConnector | main
2021-07-09 13:57:44,857 | INFO  | Listening for connections at: mqtt://alf01:1883?maximumConnections=1000&wireFormat.maxFrameSize=104857600 | org.apache.activemq.transport.TransportServerThreadSupport | main
2021-07-09 13:57:44,859 | INFO  | Connector mqtt started | org.apache.activemq.broker.TransportConnector | main
2021-07-09 13:57:45,171 | WARN  | [email protected]@4f449e8f{/,null,STARTING} has uncovered http methods for path: / | org.eclipse.jetty.security.SecurityHandler | main
2021-07-09 13:57:45,273 | INFO  | Listening for connections at ws://alf01:61614?maximumConnections=1000&wireFormat.maxFrameSize=104857600 | org.apache.activemq.transport.ws.WSTransportServer | main
2021-07-09 13:57:45,274 | INFO  | Connector ws started | org.apache.activemq.broker.TransportConnector | main
2021-07-09 13:57:45,283 | INFO  | Apache ActiveMQ 5.15.6 (alfdms.b1, ID:alf01-40847-1625831864661-0:1) started | org.apache.activemq.broker.BrokerService | main
2021-07-09 13:57:45,285 | INFO  | For help or more information please see: http://activemq.apache.org | org.apache.activemq.broker.BrokerService | main
2021-07-09 13:57:47,106 | INFO  | No Spring WebApplicationInitializer types detected on classpath | /admin | main
2021-07-09 13:57:48,247 | INFO  | ActiveMQ WebConsole available at http://0.0.0.0:8161/ | org.apache.activemq.web.WebConsoleStarter | main
2021-07-09 13:57:48,247 | INFO  | ActiveMQ Jolokia REST API available at http://0.0.0.0:8161/api/jolokia/ | org.apache.activemq.web.WebConsoleStarter | main
2021-07-09 13:57:48,297 | INFO  | Initializing Spring FrameworkServlet 'dispatcher' | /admin | main
2021-07-09 13:57:48,484 | INFO  | No Spring WebApplicationInitializer types detected on classpath | /api | main
2021-07-09 13:57:48,640 | INFO  | jolokia-agent: Using policy access restrictor classpath:/jolokia-access.xml | /api | main
[email protected]:~$
[email protected]:~$ ll /data/activemq/kahadb/
total 180
drwxr-x--- 1 alfresco alfresco       52 Jul  9 13:57 ./
drwxr-x--- 1 alfresco alfresco       12 Jul  9 13:57 ../
-rw-r----- 1 alfresco alfresco  3554432 Jul  9 14:03 db-1.log
-rw-r----- 1 alfresco alfresco    53248 Jul  9 14:03 db.data
-rw-r----- 1 alfresco alfresco    53344 Jul  9 14:03 db.redo
-rw-r----- 1 alfresco alfresco        8 Jul  9 13:57 lock
[email protected]:~$

 

There are other reasons why ActiveMQ might not start properly but usually, what’s happening will be written inside the logs. This one is one of the few cases I faced where it’s not that obvious…

 

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Morgan Patou
Morgan Patou

Technology Leader ECM & Senior Consultant