• Arun Nukula

NSX data collection unavailable

In preparation for using NSX network, security, and load balancing capabilities in vRealize Automation , at first we have to create an NSX endpoint


I was asked to look into a problem where even after creating an endpoint successfully along with association mapped , selecting data collection under Compute Resource does not show Network and Security Inventory


Looking at the logs after NSX endpoint was created we do see there is a data collection workitem created , that's VCNSInventory


Reference : ManagerService / All.log

[UTC:2019-09-03 10:29:48 Local:2019-09-03 15:59:48] [Debug]: [sub-thread-Id="45"  context=""  token=""] DC: Created data collection item, WorkflowInstanceId 183022, Task VCNSInventory, EntityID 8ed67519-99fb-4afa-811f-227e753a24eb, StatusID = 457b3af7-b739-45b2-ab9f-0cdd79596af0


Taking one of the instance 183022 into consideration and inspecting worker logs


Worker initialises instance


2019-09-03T10:29:49.962Z DC-DEM02 vcac: [component="iaas:DynamicOps.DEM.exe" priority="Trace" thread="4268"] [sub-thread-Id="27"  context=""  token=""] Worker Controller: initializing instance 183022 - vSphereVCNSInventory of the workflow execution unit

2019-09-03T10:29:52.009Z DC-DEM02 vcac: [component="iaas:DynamicOps.DEM.exe" priority="Trace" thread="4268"] [sub-thread-Id="27"  context=""  token=""] WorkflowExecutionUnit: initialize started: 183022


2019-09-03T10:30:14.401Z DC-DEM02 vcac: [component="iaas:DynamicOps.DEM.exe" priority="Trace" thread="4864"] [sub-thread-Id="28"  context=""  token=""] Workflow ID: 183022 Activity <Mark Data Collection Complete>: State: Closed


2019-09-03T10:30:14.417Z DC-DEM02 vcac: [component="iaas:DynamicOps.DEM.exe" priority="Debug" thread="4864"] [sub-thread-Id="28"  context=""  token=""] Workflow Complete183022 - Successful

2019-09-03T10:30:14.417Z DC-DEM02 vcac: [component="iaas:DynamicOps.DEM.exe" priority="Trace" thread="4864"] [sub-thread-Id="28"  context=""  token=""] Worker Controller: WriteCompletedWorkflow


As shown above it did go through data collection and marked as successful but it was never showing up in UI.


At this time when we performed a Test Connection for an endpoint and click on OK, though test connection was successful , it was unable to save this endpoint.


That's when I got an idea that there must be something wrong with endpoints table


Assumption was changed to confirmation after reviewing API data captured from HAR file


Now that we know that there is definitely something wrong with the endpoints

Using query select * from ManagementEndpoints found that there were stale entries for all vSphere endpoints


Ideally there should be only one entry per endpoint ( vSphere ) inside this table. But here we have 2 per vSphere endpoint.


How do we now identify which is the correct one and what ManagementEndpointId to be deleted


For this you have to grep vSphereAgent.log ( Proxy Agent logs ) and search for managementEndpointId. This managementEndpointId what you find in the log is the correct one and this entry must remain under ManagementEndpointID of dbo.ManagementEndpoints table


Example 2019-09-09T03:54:23.466Z DC-AGENT01 vcac: [component="iaas:VRMAgent.exe" priority="Debug" thread="900"] [sub-thread-Id="6"  context=""  token=""] Ping Sent Successfully : [<?xml version="1.0" encoding="utf-16"?><pingReport agentName="vCenter" agentVersion="7.3.0.0" agentLocation="PRDVC" WorkitemsProcessed="9254"><Endpoint externalReferenceId="cbeebd33-245a-4b18-a8a8-d337e8c46627" productName="VMware vCenter Server" version="6.5.0" licenseName="VMware vCenter Server 6 Standard" /><ManagementEndpoint Name="vCenter" /><Nodes><Node name="SINGAPORE" type="Cluster" identity="prodvc/IDBI DC/host/SINGAPORE" datacenterExternalReferenceId="datacenter-21" externalReferenceId="domain-c26" isCluster="True" managementEndpointId="e5b052e1-0792-465a-a2a8-6b8b031f48ac" /><Node name="DC_PRODUCTION_RHEL_CLUSTER" type="Cluster" identity="prodvc/SGP/host/SINGAPORE" datacenterExternalReferenceId="datacenter-21" externalReferenceId="domain-c1310" isCluster="True" managementEndpointId="e5b052e1-0792-465a-a2a8-6b8b031f48ac" /></Nodes><AgentTypes><AgentType name="Hypervisor" /><AgentType name="vSphereHypervisor" /></AgentTypes></pingReport>]



Now that we know which ones are correct by cross checking vSphereAgent.log and then ManagementEndpoints table , we had to remove stale entries from this table


Took a backup of SQL IaaS database along with snapshots and then executed delete statements on the one's we thought are the stale entries


delete from dbo.ManagementEndpoints where ManagementEndpointID = 'E15DFAAE-229E-4874-AACB-793BDB6076F4';

delete from dbo.ManagementEndpoints where ManagementEndpointID = '03CACB31-23DD-444C-A493-8DDC8BC4E4CF';


But this did not solve our problem. Removing stale entries and then saving endpoints threw a different exception this time


So when you create an endpoint in vRA , it not only creates an entry in IaaS but it also creates an entry inside vRA's postgres database


We explored table called epconf_endpoint , this table has all entries of endpoints created through vRA UI and the id from Postgres database must match ManagementEndpointId of SQL database ( IaaS )


Remember these were the id's we deleted from SQL, the reason for "Endpoint with id [xxxxxx] is not found in iaas " is this discrepancy between IaaS and Postgres


Now updating id's taken for appropriate endpoints and updating here in Postgres would resolve this data mismatch. But there is a catch here.


As you can see above there is already a NSX endpoint created. Which we all know it is , as that's what we are troubleshooting to make it work.

Along with NSX endpoint , there is an association created, this association information is stored under epconf_association table


This association table contains

id of the association

from_endpoint_id : This is your NSXEndpointId from IaaS database and Id from epconf_endpoint of your postgres database

to_endpoint_id : This is your mapping you create to one of the vSphere endpoints.


Note : NSX endpoint information is stored inside table ,[DynamicOps.VCNSModel].[VCNSEndpoints] of IaaS database


This is where we found an answer to our problem

  • The to_endpoint_id inside epconf_association was pointing to a wrong id

  • Both the id's under epconf_endpoint has to modified to the one's present under IaaS


Remediation Plan

As a first step , we deleted NSX endpoint from vRA UI , this removed entry from epconf_association , so there is no need to update this table anymore


After removal of NSX endpoint from UI , we then moved onto epconf_endpoint to update id's with correct one's taken from IaaS database


Updating vCenter endpoint

update epconf_endpoint set id = 'e5b052e1-0792-465a-a2a8-6b8b031f48ac' where name = 'vCenter'


Updating vCenter01 endpoint

update epconf_endpoint set id = '5646fa1e-6a2b-4d08-9381-219fe6d92a5e' where name = 'vCenter01'


After we corrected id's inside epconf_endpoint ( Postgres ) to match with ManagementEndpointID ( IaaS Database ) , we were successfully able to save endpoints



Post this , creation of NSX endpoint and mapping it with a correct vSphere endpoint did result in a successful NSX data collection.



!! Hope this helps !!

56 views

Recent Posts

See All

Subscribe Now

  • Twitter
  • Facebook Social Icon

Copyright © 2019 nukescloud