Showing posts with label disaster. Show all posts
Showing posts with label disaster. Show all posts

Friday, May 23, 2008

A Nice Welcome Back

2477693540_f3ba137582_m

Back in the middle of April my wife and I took a trip for our 4th anniversary. We saved up over the past couple of anniversaries to go somewhere big. We went to the Dominican Republic. It was wonderful.

The night before we left to come home I got a call from work to let me know that our exchange server had crashed and burned. My stomach immediately turned and my mind started to race. I was out of the country where my mobile phone didn't work, I had no computer access, so there was absolutely nothing I could do. I kept asking myself if I did have some kind of connectivity could I actually be doing something to fix the problem.

I was able to contact the church from our hotel phone and got some more details. None of them were good. After a $75 phone call I found out...

  1. We lost a disk in our array for Exchange
  2. The were able to call in a friend to come and help
  3. A new disk was ordered and installed in the array
  4. As it turns out when installing a replacement disk into an array on Dell PowerEdge servers there is a possibility that the array will become corrupted
  5. Our array became corrupted when the replacement disk was installed
  6. The most recent complete exchange backup was a month old (a whole other story)
  7. The friend we had help us was able to purchase a program to restore the data from the corrupted array (Arax Disk Doctor - totally saved my bacon)
  8. After the restoring the exchange database files we found out the they were corrupt

Needless to say I didn't sleep very much the next couple of days.

As I find out more information I learn that the server crashed the night that I left for vacation and had essentially been down the entire week I was gone. At the Sunday service they announced that the we weren't mad at the congregation and we weren't dodging their emails but our server had crashed. I my heart sank as did my body in the pew.

Over the next few days I worked nonstop to do a hard repair and a defrag of our database (which wasn't easy since the db was 40GB and we had almost no servers with enough room/power to deal with files that large). We were able to restore Exchange by Wednesday morning at 3:30 am with no noticeable data loss. The outage was about a week which is far beyond unacceptable. The only good thing is that it brings DR to forefront of management's mind which is something I've been trying to do this entire year.

Needless to say I needed another vacation to recover from the welcome back I received from my last one. The unfortunate part is that I may never be able to take a vacation again given what happened this past one.

I don't know what it is about servers but somehow they know when you leave and the worst possible things happen when you are gone. I think that it has something to do with separation anxiety.