Just how good is your backup regime? A colleague here at Serio tells this story.
I managed a direct sales IT system. It was a very busy environment – people asking for new reports, account managers shouting, IT staff either leaving or threatening to leave – and often I simply followed the current crisis (whatever that was), having been in the job for just a few months.
One of my first projects was to move our direct sales system onto a new ‘fault-tolerant’ system. This went well, and I delegated the task of ‘sorting out the backups’ to one of my team who was not in the process of leaving that week. I never thought any more about it (the backups), after all I had delegated it.
One Thursday morning. I came into work at 8:00am as usual, and our new server was making a funny noise, in that when you switched it on it went ‘boing...’, and then refused to boot. With increasing panic, I switched the machine off and back-on, hoping for a boot-up, but all it did was go ‘boing...’. I even tried pushing the 'on' button slowly, like that would help.
An engineer was called, who pronounced the RAID system and one of it’s disks ‘toast – never seen a problem like it’ he said. ‘It’s not that fault tolerant’ he added helpfully.
Worse was to come. I scrambled for the backup, only to find that the entire Oracle database had been skipped from the backup tape because the files were busy, as the database was still active of course. Also by this time customers were phoning, the call centre was operational (in that it had staff answering the phone).
We had £50,000 approx (a usual days trading) of unfulfilled orders. Many orders part-shipped, some ready to go, stock to be booked in things looked grim. How was I going to explain we had no backup, and all was lost? A posse of account managers was waiting outside my office. They looked mean and angry, especially the females.
Just then, ‘the new guy’ came in, who joined us no more that a fortnight earlier. ‘Wassup?’ he says? I explained.
‘Oh, I’ve got a backup here from last night. I couldn’t see any decent backup so I did a full Oracle export to network filestore, using cron to schedule the job’. And sure enough, there it was – I wasn’t going to get fired after all, although I probably deserved it’.
I relate this story for the simple fact that every backup mistake you could make is in this story.
- Failure to plan for backup and recovery
- Managerial delegation of responsibility with inadequate checks or verification
- Over-dependence of ‘fault tolerant technology’ leading to complacency
- Failure to understand how to backup his software.
I’ll post some information on Monday about the correct way to approach backing up your database system, focusing in particular on SQLServer, the types of backup, transaction log and so on.
(With thanks to AndyW for input).