This week I was hit by the perfect storm. I came across an environment that had two issues occur that created a nightmare, which I lost sleep over. Had these events occurred separately I would have had no problems and navigated them quite easily. Well, if our jobs were easy we would be bored and easily replaced by computer subroutines. And for those of us who are consultants we wouldn’t get those nice perks that come with our job. So, this week I dropped everything, cried a little, asked my boss if I could quit and faced the nightmare like a good consultant. Enough bloviating, let’s get to it.
First this environment runs on the, now EOS/EOL MCS 7845-I3, which in itself is a great teacher of patience with its (what feels like) 20 minute uEFI boot times. The Publisher started displaying that amber light we’ve all seen before on one of the hard drives. No big deal right? I logged in and discovered that the Publisher’s filesystem went into read only mode. Great. After a ‘show hardware’ it was discovered three of the four hard drives were gone. 1 failed and 2 in imminent failure mode. TAC case opened to get the drives replaced, done. Next step grab the last successful DRS backup to prepare for a Publisher restore. Life’s OK.
Here is where I started to get upset and our second event occurs: DRS had been failing for months. Only the Publisher showed as complete. At this point I’m like great, I have to attempt a restore from an incomplete backup which I’ve never seen work but this is me so it’ll work this time right? So the drives come in and I go through the forever process of installing UCM on the Publisher, which was easy. During this time I remembered why I love UCS and Collaboration in a virtualized environment, pondered life and attempted to formulate the plan on rebuilding a production cluster from scratch, if this restore didn’t work. Four or so hours later I got to attempt the restore and wait, what? DRS will only restore CDR from those incomplete backups. Great, I called it a night and went to bed, seriously.
After a sleepless night I reached out to Cisco TAC and one of the best Collaboration SEs I’ve ever worked with, who is also a CCIE. Affer a few minutes the SE shares this document on how to restore a Publisher from a Subscriber with no previous DRS backups. First, I felt like he should have delivered that to me in a LMGTFY link and then second, I was thankful for all of those previous cases opened by people who were screwed by lazy consultants or bad network engineers who never cared to make sure backups were set up. After three hours I was able to successfully restore the Publisher without impacting call processing. I chose this moment to set up those pesky RSA IMM boards and update the server firmware as well, so I did cause brief outages but this document worked great.
- I knew the cluster Security Password, if you don’t I believe you’re out of luck
- The Publisher was glass housed
If you ever find yourself in this situation, follow it to the letter.