In this post we will talk about an issue that can occur during the upgrade process of the Configuration Manager infrastructure if there are secondary sites.
After upgrading our primary site to the 2006 CM Hotfix Rollup (KB4578605) we followed the specific KB provided by Microsoft to upgrade secondary sites (https://support.microsoft.com/help/4578605).
As indicated, to update a secondary site to the new version, it is necessary to recover it; after we started recovering the fist two, they got stuck at Pending state and we cannot did any operation on them. Other sites, upgraded one by one, have been updated correctly.
After some research on log side, the issue seemed to be related to the database link that has failed; unfortunately our secondary sites remained in this state even after having launched the replication link analyzer.
We also tried the following troubleshooting steps with no luck:
1) OS Restart of the two secondary sites;
2) Restart of the SMS_EXECUTIVE, SMS_REPLICATION_MANAGER and SMS_REPLICATION_CONFIGURATION_MONITOR of the primary site;
All the main log files (hman.log, rcmctrl.log, replmgr.log) related to that components didn’t seems to give us any error both in the primary and in the secondary .
At this point we focused on the status of the secondary site on the SQL side and we’ve noticed that the secondary site was reporting as active status from SQL perspective but the replication link was failed. Manually triggering “Secondary_Site_Replication_Configuration “ replication group we’ve noticed that the replication was working properly in the background. To test the correct replication from Secondary Site, we created a file named “Secondary_Site_Replication_Configuration.pub” inside the folder “<CM_Installation_Path>\inboxes\rcm.box” on the Secondary Site. The replication was working but the secondary was stuck in 2 – Pending 1011 – Recover a secondary site.
Before trying it out in your production environment, take the backup of Configuration Manager DB ; also I would recommend doing this with the help of Microsoft support.
To solve the issue, we have reset the site status to Recovery Failed which enabled the recovery to be run from the console once more; to do it, we performed the following stored procedure on the primary site DB :
spUpdateSiteStatus ‘<secondary_site_code>’ , 6,’-2′
This allowed us to manually changing the status of the recovery process to Recovery Failed.
With the secondary site in this state, it was possible to trigger the secondary site recovery once more and the subsequent execution is successfully completed.
Thanks to colleague Andrea Lupini for the contribution.