identity-service-app initialization failure after upgrading to vRealize Automation 8.4.2
Updated: Apr 24, 2022
Click here for detailed analysis
vRealize Automation 8.4.2 was release few days back and it had some fixes which we were looking forward to w.r.t custom resources and dynamic types.
I was testing this behavior in my lab and came across an exception which led to upgrade failure. Others might bump into this as well. So let me explain what it was...
My environment was built on vRA 8.2 , where a migration was performed , specific to Custom Resources and then we did upgrade to 8.4.1. There is a known issue in version vRA 8.4.1 with Customer Resources where we get 500 OK exception when we try and execute Day-2 actions.
After 8.4.1 upgrade , we then wanted to 8.4.2
Upgrade goes through but the exception is seen when identity-service-app is being initialized
2021-06-26T07:31:56.205Z INFO identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] com.zaxxer.hikari.HikariDataSource.getConnection:110 - HikariPool-1 - Starting...
2021-06-26T07:31:56.677Z INFO identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] com.zaxxer.hikari.HikariDataSource.getConnection:123 - HikariPool-1 - Start completed.
2021-06-26T07:31:57.474Z INFO identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] liquibase.lockservice.log:23 - Successfully acquired change log lock
2021-06-26T07:31:58.508Z INFO identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] liquibase.changelog.log:23 - Reading from public.databasechangelog
2021-06-26T07:31:58.683Z INFO identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] liquibase.lockservice.log:23 - Successfully released change log lock
2021-06-26T07:31:58.689Z INFO identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] liquibase.lockservice.log:23 - Successfully acquired change log lock
2021-06-26T07:31:58.690Z WARN identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] liquibase.hub.log:23 - Skipping auto-registration
2021-06-26T07:31:58.806Z INFO identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] liquibase.changelog.log:23 - Table identity_lock created
2021-06-26T07:31:58.814Z INFO identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] liquibase.changelog.log:23 - ChangeSet db/changelog/28-add-lock-table.xml::1619005213735-1::zshindova (generated) ran successfully in 115ms
2021-06-26T07:31:58.834Z INFO identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] liquibase.changelog.log:23 - Marking ChangeSet: db/changelog/29-update-migration-roles.xml::1622846852-1::tairskiv ran despite precondition failure due to onFail='MARK_RAN':
db/changelog/db.changelog-master.xml : SQL Precondition failed. Expected '0' got '1'
2021-06-26T07:31:58.845Z INFO identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] liquibase.changelog.log:23 - Data updated in identity_user_service_role
2021-06-26T07:31:58.847Z INFO identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] liquibase.changelog.log:23 - Data updated in identity_user_service_role
2021-06-26T07:31:58.852Z INFO identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] liquibase.changelog.log:23 - Data updated in identity_group_role
2021-06-26T07:31:58.853Z INFO identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] liquibase.changelog.log:23 - Data updated in identity_group_role
2021-06-26T07:31:58.874Z INFO identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] liquibase.lockservice.log:23 - Successfully released change log lock
2021-06-26T07:31:58.876Z WARN identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] o.s.b.w.r.c.AnnotationConfigReactiveWebServerApplicationContext.refresh:559 - Exception encountered during context initialization - cancelling refresh attempt: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'liquibase' defined in class path resource [org/springframework/boot/autoconfigure/liquibase/LiquibaseAutoConfiguration$LiquibaseConfiguration.class]: Invocation of init method failed; nested exception is liquibase.exception.LiquibaseException: liquibase.exception.MigrationFailedException: Migration failed for change set db/changelog/29-update-migration-roles.xml::1622846852-2::tairskiv:
Reason: liquibase.exception.DatabaseException: ERROR: update or delete on table "identity_service_role" violates foreign key constraint "fkhcfahltn20109qfxny3vpogm" on table "identity_group_service_role"
Detail: Key (id)=(6799b5da-19f5-4aa3-9bd3-9376b3ac4890) is still referenced from table "identity_group_service_role". [Failed SQL: (0) DELETE FROM public.identity_service_role WHERE service_definition_id = (SELECT id FROM identity_service_definition WHERE name = 'Migration')]
2021-06-26T07:31:58.876Z INFO identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] com.zaxxer.hikari.HikariDataSource.close:350 - HikariPool-1 - Shutdown initiated...
2021-06-26T07:31:58.898Z INFO identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] com.zaxxer.hikari.HikariDataSource.close:352 - HikariPool-1 - Shutdown completed.
2021-06-26T07:31:58.905Z INFO identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] org.ehcache.core.EhcacheManager.removeCache:216 - Cache 'users' removed from EhcacheManager.
2021-06-26T07:31:58.906Z INFO identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] org.ehcache.core.EhcacheManager.removeCache:216 - Cache 'user_groups' removed from EhcacheManager.
2021-06-26T07:32:03.963Z INFO identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] o.s.b.a.l.ConditionEvaluationReportLoggingListener.logMessage:136 -
Error starting ApplicationContext. To display the conditions report re-run your application with 'debug' enabled.
2021-06-26T07:32:03.993Z ERROR identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] o.s.boot.SpringApplication.reportFailure:834 - Application run failed
Note:
1. Below command will help you to connect to postgres database
vracli dev psql
\c identity-db
**** STRONGLY ENCOURAGED NOT TO CONNECT UNLESS INSTRUCTED BY VMWARE ****
2. The best way would be to take a backup , restore it on a postgres server installed on Linux or Windows VM , then execute your queries on it...
vracli db backup identity-db > /root/identity-db-<<date>>.psql
Above command will help you to take a backup.
<<date>> is a variable , substitute it accordingly.
3. To restore
psql -U postgres -d **destinationdb** < **path-to-the-sql-file**
Create the dummy database first to restore
As part of investigation connected to identity service database and then found out that this id referenced in the snippet is part of three tables
ID : 6799b5da-19f5-4aa3-9bd3-9376b3ac4890

Then went ahead and looked into each of those tables
select * from identity_service_role where id = '6799b5da-19f5-4aa3-9bd3-9376b3ac4890';

As shown in the above screenshot this belonged to service role " Migration Assistant Administrator"
The identity-service-app was stuck in "CrashLoopBackOff" state

After a while upgrade did fail with following exception

Exception is nothing different a generic one.
com.vmware.vrealize.lcm.common.exception.EngineException: Upgrade on vRealize Automation VA nbnvra.nukescloud.com failed with state fatal. To know more about the failure, run command "vracli upgrade status --details" on the vRealize Automation VA nbnvra.nukescloud.com. If user wants to revert snapshot and trigger upgrade again, click RETRY with revertSnapshotNRetryUpgrade property set to true (or) If user wants to cancel the whole uprgade and revert to the state before upgrade click RETRY with cancelUpgradeNRevertBack property set to true. If both the retry properties are set to true, revertSnapshotNRetryUpgrade property will take precedence and will be honoured at com.vmware.vrealize.lcm.plugin.core.vra80.task.VraVaUpgradeStatusCheckTask.execute(VraVaUpgradeStatusCheckTask.java:329) at com.vmware.vrealize.lcm.automata.core.TaskThread.run(TaskThread.java:45) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source)
Since the logs from identity-service-app were stating that it's unable to make a update or change to postgres database to a certain ID which belongs to Migration Assistant Administrator , this clearly gave an indication to a recent change made to this service role in 8.4.1
We have Migration Assistant Administrator recently moved into Cloud Assembly role. This change was introduced in 8.4.1. And 8.4.2 was supposed to bring in some enhancements to this.
Had no choice but to revert this failed upgrade back to 8.4.1.
Then the first thing I did is to go and remove the vRA Migration Assistant Administrator role and then add the new Cloud Assembly based role in place

This is how it looks after the change

Post that another upgrade attempt has been made and it definitely succeeded
2021-06-26T12:19:16.708Z INFO identity-service [host='identity-service-app-568d9fbbc-949ft' thread='main' user='' org='' trace=''] liquibase.changelog.log:23 - ChangeSet db/changelog/28-add-lock-table.xml::1619005213735-1::zshindova (generated) ran successfully in 86ms
2021-06-26T12:19:16.724Z INFO identity-service [host='identity-service-app-568d9fbbc-949ft' thread='main' user='' org='' trace=''] liquibase.changelog.log:23 - Marking ChangeSet: db/changelog/29-update-migration-roles.xml::1622846852-1::tairskiv ran despite precondition failure due to onFail='MARK_RAN':
db/changelog/db.changelog-master.xml : SQL Precondition failed. Expected '0' got '1'
2021-06-26T12:19:16.738Z INFO identity-service [host='identity-service-app-568d9fbbc-949ft' thread='main' user='' org='' trace=''] liquibase.changelog.log:23 - Data updated in identity_user_service_role
2021-06-26T12:19:16.739Z INFO identity-service [host='identity-service-app-568d9fbbc-949ft' thread='main' user='' org='' trace=''] liquibase.changelog.log:23 - Data updated in identity_user_service_role
2021-06-26T12:19:16.744Z INFO identity-service [host='identity-service-app-568d9fbbc-949ft' thread='main' user='' org='' trace=''] liquibase.changelog.log:23 - Data updated in identity_group_role
2021-06-26T12:19:16.746Z INFO identity-service [host='identity-service-app-568d9fbbc-949ft' thread='main' user='' org='' trace=''] liquibase.changelog.log:23 - Data updated in identity_group_role
2021-06-26T12:19:16.749Z INFO identity-service [host='identity-service-app-568d9fbbc-949ft' thread='main' user='' org='' trace=''] liquibase.changelo