• Arun Nukula

vRA fails to deploy from vRSLCM if Second or Teritiary DNS servers are unable to resolve hostnames

I have been attempting to install vRA 8.x for quite a number of times but I've never been successful due to a simple problem. Let me explain what was that.


Every time I used to install it used to fail at this point where it was installing client-secrets

Release "client-secrets" does not exist. Installing it now.
Error: Job failed: BackoffLimitExceeded
helm failed to upgrade 'client-secrets' in namespace 'prelude'

Note: Above snippet has been taken from deploy.log


When we check csp-fixture-job-XXXX.log under /services-logs/csp-clients-fixture we see that the curl timed out



Logging in
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (6) Could not resolve host: premvra.prem.com




But before we started the install we did cross-check that nslookup to my DNS was working absolutely fine, so why this problem?


premvra which is our vRA node

premidm which is our vIDM node

premlcm which is out vRLCM node


When you trigger the easy installer it would ask you for Netowrk Information as you can see in the below screenshot



The first DNS server in my case is my Windows Active Directory which has forward and reverse lookup zones configured and contains all the DNS records for premlcm, premidm and premvra as well as the rest of the VMware environment.


The second DNS server 10.yy.yy.yy is our router which also functions as a DNS server for all other systems outside my lab environment. This router will not be able to resolve anything within the dns zone hosted in the MS DNS Server, but is reachable for all systems.


When vRA installation is in progress during this stage when client-secrets are being installed there are certain POST calls made for few registrations in the background


Form my research looks like we perform a ROUND-ROBIN load balancing mechanism when multiple DNS servers are configured. In my case , servers ( premlcm , premvra, and premidm ) will only be resolved through my primary DNS.


If in case the POST calls go through the secondary DNS for the name resolution it would fail

and throw below exception

2020-04-28 10:03:41.430+0000 ERROR  43 --- [or-http-epoll-1]    c.v.i.common.util.HealthUtilComponent : premidm.prem.com: Name or service not known
java.net.UnknownHostException: premidm.prem.com: Name or service not known
        at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) ~[na:1.8.0_241]
        Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Error has been observed at the following site(s):
        |_ checkpoint ⇢ Request to POST https://premidm.prem.com/SAAS/API/1.0/oauth2/token?grant_type=client_credentials [DefaultWebClient]
Stack trace:
                at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) ~[na:1.8.0_241]
                at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929) ~[na:1.8.0_241]
                at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324) ~[na:1.8.0_241]
                at java.net.InetAddress.getAllByName0(InetAddress.java:1277) ~[na:1.8.0_241]


After scrapping out this existing deployment, I went ahead and started the installation with only 1 DNS which was able to resolve all the nodes and has entries, and finally, the installation was successful.


This scenario might occur in LAB where not all DNS servers are configured for name resolutions or even in production environments where DNS replications have few issues


After numerous attempts, it was so heartening to see this screen where it says "INSTALLED"



Every DNS Servers mentioned during installation should be able to resolve all the three nodes else installation failures will happen.




326 views

Subscribe Now

  • Twitter
  • Facebook Social Icon

Copyright © 2019 nukescloud