Friday, May 23, 2008

A Nice Welcome Back

2477693540_f3ba137582_m

Back in the middle of April my wife and I took a trip for our 4th anniversary. We saved up over the past couple of anniversaries to go somewhere big. We went to the Dominican Republic. It was wonderful.

The night before we left to come home I got a call from work to let me know that our exchange server had crashed and burned. My stomach immediately turned and my mind started to race. I was out of the country where my mobile phone didn't work, I had no computer access, so there was absolutely nothing I could do. I kept asking myself if I did have some kind of connectivity could I actually be doing something to fix the problem.

I was able to contact the church from our hotel phone and got some more details. None of them were good. After a $75 phone call I found out...

  1. We lost a disk in our array for Exchange
  2. The were able to call in a friend to come and help
  3. A new disk was ordered and installed in the array
  4. As it turns out when installing a replacement disk into an array on Dell PowerEdge servers there is a possibility that the array will become corrupted
  5. Our array became corrupted when the replacement disk was installed
  6. The most recent complete exchange backup was a month old (a whole other story)
  7. The friend we had help us was able to purchase a program to restore the data from the corrupted array (Arax Disk Doctor - totally saved my bacon)
  8. After the restoring the exchange database files we found out the they were corrupt

Needless to say I didn't sleep very much the next couple of days.

As I find out more information I learn that the server crashed the night that I left for vacation and had essentially been down the entire week I was gone. At the Sunday service they announced that the we weren't mad at the congregation and we weren't dodging their emails but our server had crashed. I my heart sank as did my body in the pew.

Over the next few days I worked nonstop to do a hard repair and a defrag of our database (which wasn't easy since the db was 40GB and we had almost no servers with enough room/power to deal with files that large). We were able to restore Exchange by Wednesday morning at 3:30 am with no noticeable data loss. The outage was about a week which is far beyond unacceptable. The only good thing is that it brings DR to forefront of management's mind which is something I've been trying to do this entire year.

Needless to say I needed another vacation to recover from the welcome back I received from my last one. The unfortunate part is that I may never be able to take a vacation again given what happened this past one.

I don't know what it is about servers but somehow they know when you leave and the worst possible things happen when you are gone. I think that it has something to do with separation anxiety.

6 comments:

Anonymous said...

Ouch...this is a good example why I want to move the church to Google Apps :)

Foster said...

believe me Google Apps was on my mind the entire flight back to Oklahoma

bsharpe said...

we just moved off a "exchange server in the closet" for this very reason. I'm on vacation now and am sleeping great.

bsharpe said...

doh! And google apps was our answer.

Jason Lee said...

so whats the new DR plan and hardware?

Foster said...

The new DR plan is being driven by our Management Team. In the past all major decision were made on the IT Director's discretion of cost benefit of available resources. The leadership of the church should be more involved in the plan mostly to make sure their expectations are met with the plan.

I think that there will be some kind of 3rd party that we will contract with to provide hardware and a location to restore our backups to. As far as backups go I'm really leaning toward an online backup solution like Mozy because tapes are unreliable and so are the people that are in charge of rotating them and moving them off site.