Updated: Apr 27, 2021
Finding a working configuration which helps to build a successful vIDM 8.x cluster was a huge problem.
So thought of sharing what worked for me while I was trying to build a cluster in my lab.
So, Let's start
Creating CA Signed Certificates
The first step is to keep our vIDM CA-signed certificate ready. For this one can generate CSR ( Certificate Signing Request ) from vRLCM by clicking on Generate CSR option available under Locker --- Certificate --- Generate CSR
Then fill in all the details requested in the Generate CSR form.
Remember the Common Name should always be your Load Balancer FQDN
Server Domain/Hostname should contain all the hostnames which would part of the vIDM cluster
All IP addresses corresponding to the vIDM nodes involved in this cluster must be documented
One we click on GENERATE after we fill all the details as shown above we get a pem file downloaded.
This PEM file must be given to your certificate authority to get this signed and generate a CA-signed certificate which can be used to deploy our vIDM cluster
Your authority would give you a Key file and a Secure Certificate
Once we have the above information we would use this to import this certificate into vRLCM
To Import certificate into vRLCM, we need to go to Locker and then click on Certificate and Import certificate
We have two files at hand the first one is vidm.key and the second one is premidm certificate.
Open vidm.key using notepad and then paste that inside Private Key section
Open premidm certificate using notepad and then copy and paste this content under Certificate Chain section
Then click on IMPORT to get this certificate imported into Locker
Once we have our CA-signed certificate which will be used in deploying our vIDM cluster imported into vRLCM we will now go ahead and then import this certificate into NSX-V load balancer
Upload vIDM certificate chain and the corresponding root CA certificates onto NSX-V Edge
Browse to NSX-V Edge, then under the configure tab, click on Certificates and then Add a new Certificate
We need to enter the certificate details the same way we did before and that would import the certificate onto the edge
Once the above step is done Server Certificate is imported into Edge
Root certificate has to be added in the same way by exporting it out of Server certificate
Paste this content into a separate file and save it as root.cer and then on the NSX Edge, Configure and then click on Add new CA certificate
That's it you will have both ROOT and the SERVER certificates in place
Configuring NSX-V Loadbalancer to support a clustered vIDM
Here's the configuration which has to be part of Application Profiles which supports vIDM LB
Application Profile Type: HTTPS End to End
Cookie Name: JSESSIONID
Mode: App Session
Expires in: 3600
Insert X-Forwarded-For-HTTP header: Enabled
Under Client SSL tab
Client Authentication: Ignore
Under Server SSL, Server Authentication must be enabled
Under Service Monitoring
Max Retries: 3
Ensure there are no mistakes while typing or copying URL information
IP Filter: IPv4
Monitor Port: 443
Virtual Server: Enable
Post/Port Range: 443
Creating a Request in vRLCM for a Clustered vIDM deployment
This is not a complicated task so I won't be discussing much. One important aspect is DELEGATE IP ensure this IP is not resolvable
This will be used during the PGPOOL configuration of your vIDM Cluster.
Before we submit the request we need to ensure all pre-validation is successful.
If above certificate steps are not done then your vIDM deployment will fail at Stage-3 as shown below
com.vmware.vrealize.lcm.common.exception.EngineException: vIDM install prevalidation failed at com.vmware.vrealize.lcm.vidm.core.task.VidmInstallPrecheckTask.execute(VidmInstallPrecheckTask.java:62) at com.vmware.vrealize.lcm.automata.core.TaskThread.run(TaskThread.java:45) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
Attached a vIDM PreCheck report below for reference.
Once we have all the prechecks successful, then we can submit the request.
Post-Submission Stages during Cluster Deployment
There are 20 stages for a vIDM clustered deployment, they are
Stage 1: validateenvironmentdata
As the name itself suggests it validates the environment data submitted
Stage 2: infraprevalidation
Infrastructure details provided will be validated, the same stuff which was performed during prechecks
Stage 3: vidmprevalidation
vIDM validations will be performed, the same as the ones in prechecks
Stage 4: deployvidm
deploys vIDM OVA's on vCenter
Stage 5: vidmconfiguremaster,vidmprepareslave,vidmprepareslave
During this stage, your Master or the First Node in your vIDM is configured and the Slaves, second and the third nodes will be prepared
Note: In Stage 5, there is a phase called VidmFQDNUpdate if there is a failure observed at this stage as shown below in the screenshot, then your primary vIDM appliance is trying to open or communicate to your vIDM LB and expecting a valid response which is not happening.
Under /opt/vmware/horizon/workspace/configurator.log following exceptions will be seen
2020-09-03T12:37:37,599 INFO (Thread-146) [;;;] com.vmware.horizon.svadmin.service.ApplicationSetupService - Invalid status code validating FQDN: https://premidm.prem.com : 503 2020-09-03T13:06:20,493 INFO (Thread-3) [;;;] com.vmware.horizon.svadmin.service.ApplicationSetupService - Invalid status code validating FQDN: https://premidm.prem.com : 503 2020-09-03T13:14:39,678 INFO (Thread-189) [;;;] com.vmware.horizon.svadmin.service.ApplicationSetupService - Invalid status code validating FQDN: https://premidm.prem.com : 503 2020-09-03T13:14:49,407 INFO (Thread-189) [;;;] com.vmware.horizon.svadmin.service.ApplicationSetupService - Invalid status code validating FQDN: https://premidm.prem.com : 503 2020-09-03T13:17:01,765 INFO (Thread-189) [;;;] com.vmware.horizon.svadmin.service.ApplicationSetupService - Invalid status code validating FQDN: https://premidm.prem.com : 503
At this moment you need to check if your Primary vIDM is responsive
I'll check if https://<<primaryvidmfqdn>>/ is responsive and the vIDM landing page opens up
If this works then I'll check if https://<<vidmlbhostname>>/ responds or redirects me to the available node and give me the vIDM landing page. If this works then you would not see the above failure in Stage 5. You will have to fix your Load Balancer configuration issue to proceed forward. No matter how many retries you perform, you will only see that one line in the configurator.log but nothing else.
At this stage, your LB Pool status should show up for your Primary vIDM appliance
Stage 6: vidmconfigureslave
Stage 7: vidmclusterverify
Stage 8: vidmclusterverify
Stage 9: vidmenableconnector
Stage 11: vidmpreparemasterpgpool
Stage 12: vidmconfigurepgpool
You might encounter an exception during VidmAddSSHPostgresKeys task during this stage. Actually, your vIDM appliances are rebooted and if the appliances do not come on time then LCM will fail to execute the scripts to complete deployment. All we need to do is that to find out if SSH to the nodes are working and then perform a Retry
Step 13: vidmstartmasterpgpoolservices
Step 14: vidmstartslavepgpoolservices
Step 15: vidminitialconfigprep
Stage 16: savevmoidtoinventory
Stage 17: environmentupdate
Stage 18: notificationschedules
Stage 19: setauthprovider
Stage 20: vidmClusterHealthScheduler