Infrastructure at your Service

In a previous blog, I talked about the possible usage of K8s Services in place of the default headless/pod name and the issues that it brings. This one can be seen as a continuation since it is also related to the usage of K8s Services to install Documentum but this time with another issue that is specific to a RCS/CFS. This issue & solution might be interesting for you, even if you aren’t using K8s.

As mentioned in this previous blog, the installation of a Primary CS using K8s Services is possible but it might bring you some trouble with a few repository objects. To go further with the testing, without fixing the issues on the first CS, we tried to install a RCS/CFS (second CS for the High Availability) with the exact same parameters. As a reminder, this is what has been used:

  • Primary Content Server:
    • headless/pod: documentum-server-0.documentum-server.dbi-ns01.svc.cluster.local
    • K8s Service: cs01.dbi-ns01.svc.cluster.local
  • Remote Content Server:
    • headless/pod: documentum-server-1.documentum-server.dbi-ns01.svc.cluster.local
    • K8s Service: cs02.dbi-ns01.svc.cluster.local
  • Repository & Service: gr_repo

Therefore, the Repository silent properties file contained the following on this second CS:

[[email protected] ~]$ grep -E "FQDN|HOST" RCS_Docbase_Global.properties
SERVER.FQDN=cs02.dbi-ns01.svc.cluster.local
SERVER.REPOSITORY_HOSTNAME=cs01.dbi-ns01.svc.cluster.local
SERVER.PRIMARY_CONNECTION_BROKER_HOST=cs01.dbi-ns01.svc.cluster.local
SERVER.PROJECTED_CONNECTION_BROKER_HOST=cs02.dbi-ns01.svc.cluster.local
SERVER.PROJECTED_DOCBROKER_HOST_OTHER=cs01.dbi-ns01.svc.cluster.local
[[email protected] ~]$

 

I started the silent installation of the Repository and after a few seconds, the installer exited. Obviously, it means that something went wrong. Checking at the installation logs:

[[email protected] ~]$ cd $DM_HOME/install/logs
[[email protected] logs]$ cat install.log
13:42:26,225  INFO [main] com.documentum.install.shared.installanywhere.actions.InitializeSharedLibrary - The product name is: CfsConfigurator
13:42:26,225  INFO [main] com.documentum.install.shared.installanywhere.actions.InitializeSharedLibrary - The product version is: 16.4.0000.0248
13:42:26,225  INFO [main]  -
13:42:26,308  INFO [main] com.documentum.install.shared.installanywhere.actions.InitializeSharedLibrary - Done InitializeSharedLibrary ...
13:42:26,332  INFO [main] com.documentum.install.multinode.cfs.installanywhere.actions.DiWAServerCfsInitializeImportantServerVariables - The installer is gathering system configuration information.
13:42:26,349  INFO [main] com.documentum.install.server.installanywhere.actions.DiWASilentRemoteServerValidation - Start to verify the password
13:42:29,357  INFO [main] com.documentum.install.server.installanywhere.actions.DiWASilentRemoteServerValidation - FQDN is invalid
13:42:29,359 ERROR [main] com.documentum.install.server.installanywhere.actions.DiWASilentRemoteServerValidation - Fail to reach the computer with the FQDN "cs02.dbi-ns01.svc.cluster.local". Check the value you specified. Click Yes to ignore this error, or click No to re-enter the FQDN.
com.documentum.install.shared.common.error.DiException: Fail to reach the computer with the FQDN "cs02.dbi-ns01.svc.cluster.local". Check the value you specified. Click Yes to ignore this error, or click No to re-enter the FQDN.
        at com.documentum.install.server.installanywhere.actions.DiWASilentRemoteServerValidation.setup(DiWASilentRemoteServerValidation.java:64)
        at com.documentum.install.shared.installanywhere.actions.InstallWizardAction.install(InstallWizardAction.java:73)
        at com.zerog.ia.installer.actions.CustomAction.installSelf(Unknown Source)
        at com.zerog.ia.installer.AAMgrBase.an(Unknown Source)
        at com.zerog.ia.installer.ConsoleBasedAAMgr.ac(Unknown Source)
        at com.zerog.ia.installer.AAMgrBase.am(Unknown Source)
        at com.zerog.ia.installer.AAMgrBase.runNextInstallPiece(Unknown Source)
        at com.zerog.ia.installer.ConsoleBasedAAMgr.ac(Unknown Source)
        at com.zerog.ia.installer.AAMgrBase.am(Unknown Source)
        ...
        at com.zerog.ia.installer.AAMgrBase.runNextInstallPiece(Unknown Source)
        at com.zerog.ia.installer.ConsoleBasedAAMgr.ac(Unknown Source)
        at com.zerog.ia.installer.AAMgrBase.am(Unknown Source)
        at com.zerog.ia.installer.AAMgrBase.runNextInstallPiece(Unknown Source)
        at com.zerog.ia.installer.ConsoleBasedAAMgr.ac(Unknown Source)
        at com.zerog.ia.installer.AAMgrBase.runPreInstall(Unknown Source)
        at com.zerog.ia.installer.LifeCycleManager.consoleInstallMain(Unknown Source)
        at com.zerog.ia.installer.LifeCycleManager.executeApplication(Unknown Source)
        at com.zerog.ia.installer.Main.main(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.zerog.lax.LAX.launch(Unknown Source)
        at com.zerog.lax.LAX.main(Unknown Source)
[[email protected] logs]$

 

On the Primary CS, the installation using the K8s Service went smoothly without error but on the Remote CS with the exact same setup, it failed with the message: ‘Fail to reach the computer with the FQDN “cs02.dbi-ns01.svc.cluster.local”. Check the value you specified. Click Yes to ignore this error, or click No to re-enter the FQDN.‘. So the installer binaries behave differently if it’s a PCS or a RCS/CFS. Another funny thing is the message that says ‘Click Yes to ignore this error, or click No to re-enter the FQDN‘… That’s obviously a GUI message that is being printed to the logs but fortunately, the silent installer isn’t just waiting for an input that will never come.

I assumed that this had something to do with the K8s Services and some kind of network/hostname validation that the RCS/CFS installer is trying to do (which isn’t done on the Primary). Therefore, I tried a few things like checking the nslookup & ping, validating that the docbroker is responding:

[[email protected] logs]$ nslookup cs01.dbi-ns01.svc.cluster.local
Server: 1.1.1.10
Address: 1.1.1.10#53

Name: cs01.dbi-ns01.svc.cluster.local
Address: 1.1.1.100
[[email protected] logs]$
[[email protected] logs]$ ping cs01.dbi-ns01.svc.cluster.local
PING cs01.dbi-ns01.svc.cluster.local (1.1.1.100) 56(84) bytes of data.
^C
--- cs01.dbi-ns01.svc.cluster.local ping statistics ---
12 packets transmitted, 0 received, 100% packet loss, time 10999ms
[[email protected] logs]$
[[email protected] logs]$ dmqdocbroker -t cs01.dbi-ns01.svc.cluster.local -p 1489 -c ping
dmqdocbroker: A DocBroker Query Tool
dmqdocbroker: Documentum Client Library Version: 16.4.0110.0058
Using specified port: 1489
Successful reply from docbroker at host (documentum-server-0) on port(1490) running software version (16.4.0110.0167  Linux64).
[[email protected] logs]$
[[email protected] logs]$
[[email protected] logs]$
[[email protected] logs]$ nslookup cs02.dbi-ns01.svc.cluster.local
Server: 1.1.1.10
Address: 1.1.1.10#53

Name: cs02.dbi-ns01.svc.cluster.local
Address: 1.1.1.200
[[email protected] logs]$
[[email protected] logs]$ ping cs02.dbi-ns01.svc.cluster.local
PING cs02.dbi-ns01.svc.cluster.local (1.1.1.200) 56(84) bytes of data.
^C
--- cs02.dbi-ns01.svc.cluster.local ping statistics ---
12 packets transmitted, 0 received, 100% packet loss, time 10999ms
[[email protected] logs]$
[[email protected] logs]$ dmqdocbroker -t cs02.dbi-ns01.svc.cluster.local -p 1489 -c ping
dmqdocbroker: A DocBroker Query Tool
dmqdocbroker: Documentum Client Library Version: 16.4.0110.0058
Using specified port: 1489
Successful reply from docbroker at host (documentum-server-1) on port(1490) running software version (16.4.0110.0167  Linux64).
[[email protected] logs]$

 

As you can see above, same result for the Primary CS and the Remote one. The only thing not responding is the ping but that’s because it’s a K8s Service… At this point, I assumed that the RCS/CFS installer is trying to do something like a ping which fails and therefore the error on the log and the stop of the installer. To validate that, I simply updated a little bit the file /etc/hosts (as root obviously):

[[email protected] ~]$ cat /etc/hosts
# Kubernetes-managed hosts file.
127.0.0.1 localhost
::1       localhost ip6-localhost ip6-loopback
fe00::0   ip6-localnet
fe00::0   ip6-mcastprefix
fe00::1   ip6-allnodes
fe00::2   ip6-allrouters
1.1.1.200  documentum-server-1.documentum-server.dbi-ns01.svc.cluster.local  documentum-server-1
[[email protected] ~]$
[[email protected] ~]$ echo '1.1.1.200  cs02.dbi-ns01.svc.cluster.local' >> /etc/hosts
[[email protected] ~]$
[[email protected] ~]$ cat /etc/hosts
# Kubernetes-managed hosts file.
127.0.0.1 localhost
::1       localhost ip6-localhost ip6-loopback
fe00::0   ip6-localnet
fe00::0   ip6-mcastprefix
fe00::1   ip6-allnodes
fe00::2   ip6-allrouters
1.1.1.200  documentum-server-1.documentum-server.dbi-ns01.svc.cluster.local  documentum-server-1
1.1.1.200  cs02.dbi-ns01.svc.cluster.local
[[email protected] ~]$

 

After doing that, I tried again to start the RCS/CFS installer in silent (exact same command, no changes to the properties file and this time, it was able to complete the installation without issue.

[[email protected] ~]$ cd $DM_HOME/install/logs
[[email protected] logs]$ cat install.log
14:01:33,199 INFO [main] com.documentum.install.shared.installanywhere.actions.InitializeSharedLibrary - The product name is: CfsConfigurator
14:01:33,199 INFO [main] com.documentum.install.shared.installanywhere.actions.InitializeSharedLibrary - The product version is: 16.4.0000.0248
14:01:33,199 INFO [main] -
14:01:33,247 INFO [main] com.documentum.install.shared.installanywhere.actions.InitializeSharedLibrary - Done InitializeSharedLibrary ...
14:01:33,278 INFO [main] com.documentum.install.multinode.cfs.installanywhere.actions.DiWAServerCfsInitializeImportantServerVariables - The installer is gathering system configuration information.
14:01:33,296 INFO [main] com.documentum.install.server.installanywhere.actions.DiWASilentRemoteServerValidation - Start to verify the password
14:01:33,906 INFO [main] com.documentum.fc.client.security.impl.JKSKeystoreUtilForDfc - keystore file name is /tmp/089972.tmp/dfc.keystore
14:01:34,394 INFO [main] com.documentum.fc.client.security.internal.CreateIdentityCredential$MultiFormatPKIKeyPair - generated RSA (2,048-bit strength) mutiformat key pair in 468 ms
14:01:34,428 INFO [main] com.documentum.fc.client.security.internal.CreateIdentityCredential - certificate created for DFC <CN=dfc_MlM5tLi5T9u1r82AdbulKv14vr8a,O=EMC,OU=Documentum> valid from Tue Sep 10 13:56:33 UTC 2019 to Fri Sep 07 14:01:33 UTC 2029:
14:01:34,429 INFO [main] com.documentum.fc.client.security.impl.JKSKeystoreUtilForDfc - keystore file name is /tmp/089972.tmp/dfc.keystore
14:01:34,446 INFO [main] com.documentum.fc.client.security.impl.InitializeKeystoreForDfc - [DFC_SECURITY_IDENTITY_INITIALIZED] Initialized new identity in keystore, DFC alias=dfc, identity=dfc_MlM5tLi5T9u1r82AdbulKv14vr8a
14:01:34,448 INFO [main] com.documentum.fc.client.security.impl.AuthenticationMgrForDfc - identity for authentication is dfc_MlM5tLi5T9u1r82AdbulKv14vr8a
14:01:34,449 INFO [main] com.documentum.fc.impl.RuntimeContext - DFC Version is 16.4.0110.0058
14:01:34,472 INFO [Timer-3] com.documentum.fc.client.impl.bof.cache.ClassCacheManager$CacheCleanupTask - [DFC_BOF_RUNNING_CLEANUP] Running class cache cleanup task
...
[[email protected] logs]$

 

Since this looks obviously as a bug, I opened a SR with the OpenText Support (#4252205). The outcome of this ticket is that the RCS/CFS installer is indeed doing a different validation that what is done by the PCS installer and that’s why the issue is only for RCS/CFS. At the moment, there is no way to skip this validation when using the silent installer (contrary to the GUI which allows you to ‘click Yes‘). Therefore, OpenText decided to add a new parameter starting with the CS 16.4 P20 (end of December 2019) to check whether the FQDN validation should be done or just skipped. This new parameter will be “SERVER.VALIDATE_FQDN” and it will be a Boolean value. The default value will be set to “true” and therefore by default, it will do the FQDN validation. To skip it starting with the P20, just put the value to false and the RCS/CFS installer should be able to complete successfully. To be tested once the patch is out!

 

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Morgan Patou
Morgan Patou

Senior Consultant & Technology Leader ECM