Thank goodness for backups, utility software, and partitioned hard
drives. After spending most of Thursday and Friday with a damaged hard
drive, I'm thrilled to be up and running again. We finally posted our
first site update since early Thursday morning at just before 3:00 p.m.
today.
I don't think we've had site updates this late in the day since I
worked an IT job full time and did Low End Mac after I got home from
work. I also don't think we've gone a day-and-a-half without new
content in years (excluding weekends and holidays, of course).
What Happened?
Thursday morning Software Update launched and asked if I wanted to
install the latest security update, which had been out since Monday. I
hadn't heard anything bad about it, so I installed it, rebooted my
TiBook, and nothing has been quite the same since. (For the record, I
don't think the Security Update had anything to do with the drive
problem, but you never know....)
The problem didn't manifest itself when browsing or working with
email, but when I tried to do a global search-and-replace throughout
the site, it became evident that something was very, very wrong. BBEdit
Lite, which is usually such a speed demon, would pause for maybe 30
seconds, maybe two minutes, as it tried to read files. I'd never seen
anything like it before.
When the problem didn't disappear with a reboot, I decided to boot
into OS 9 and run some utilities - Disk First Aid, the AppleCare
version of TechTool, and Norton Utilities. Disk First Aid and TechTool
didn't find the problem; I'm sure DiskWarrior 2.1 would have missed it,
too, if I'd been able to locate the CD.
Norton, on the other hand, found such a big problem running the
surface scan on my work partition that it just hung at the same spot. I
booted back into OS X, ran Apple's Disk Utility, and it found
nothing, but Norton hung at exactly the same spot. Nothing could make
it go past that point during the media scan.
Backing Up
I'd been working on a mailbag piece, among other things, and didn't
want to lose that or any other changes I'd made so far that morning, so
I connected my old 10 GB drive (the one that came inside my TiBook and
now lives in a FireWire enclosure) and tried to copy everything from my
work disk to the external drive. Like Norton, it would get so far and
then hang up.
Booting back into OS 9, I tried the same thing, hoping that Copy
Agent would be able to work around the trouble. No luck. Back and
forth. Try this. Try that. Get nowhere - and then it was time to go to
work.
Before I left, I launched File
Synchronization in hopes it might succeed where regular copying and
Copy Agent had failed. Almost. It came a lot closer, but it also
strained against the bad part of the hard drive.
I had another go at it Thursday night, but although File
Synchronization strove valiantly to copy everything, the drive problems
made it incredibly slow. So I canceled and resolved to tackle it again
Friday morning.
In the end, it was a combination of File Synchronization, Copy
Agent, and Retrospect that let me get back to work - and it still took
hours.
Recovery
I booted into OS 9 and used Copy Agent to move all of the loose
files from the bad partition's main directory to a folder on the
FireWire drive. Then I dragged the folders over one by one, noting
which ones hung things up. Folders affected by the damaged hard drive
included the System Folder, the Norton folder, and the folder where I
keep my working copy of Low End Mac.
I trashed the System Folder and Norton folder, since I could copy
the first from my main partition if I needed to and reinstall Norton
from a CD. The important thing was to recover as much of LEM as
possible, so I opened the folder and dragged folders from the damaged
partition to the LEM folder on the external drive. And I noted where
the problems were.
Then I ran Norton on the external drive. It identified a few mangled
files, which I trashed. With any luck at all, I'd have 99.99% recovery
(LEM has well over 10,000 files).
After this, I erased the troublesome partition and ran another
Norton surface scan. Same story. The damage appeared to be physical,
not something in software like crosslinked files or a damaged filing
system. It does look like I'll need to replace the hard drive, which
I've had for less than a year.
But back to site recovery. After trashing the damaged files, I
synchronized the copy of LEM on the external drive with my backup copy
on my main partition using File Synchronization. The deleted files were
back where they belonged, although they might have been slightly older
versions than what had been lost.
After this, the next step was to set up my backup computer, which
I'd moved earlier in the day while Norton was trying to fix things, and
get all of the site files back from the most recent backup. As luck
would have it, that backup was about one hour before the problems
began.
I restored yet another copy of my LEM working files to my main
partition and then synchronized it with the copy on the external drive
telling it to use the most recent version of any file where the time
stamp was different.
It worked. I ended up with a couple damaged files after all that,
but one I had yet another local copy of, and the other I'll still have
to get from an earlier backup. But at that point I had recovered my
files.
An Ounce of Prevention
Redundant backups are never a bad idea. Low End Mac is hosted on a
remote server, so even if my house burns down or my TiBook is stolen,
there's going to be a copy of it. And I also have two CD-R copies of
the website - one by the computer and another copy at my other job.
These aren't quite current, but in a pinch I could use the CD-R to
recover a file that had somehow been mangled.
Then there are the two local copies. I kept my working copies on a
separate partition and used File Synchronization about once a day to
duplicate it to my main partition. This is useful mostly when I forget
to rename today's article before I edit and save it. Oops, get it from
the main partition.
Finally, there's good old Retrospect network backup. In our
cyberlair (what would be a family room in a home with less geeks) sits
my old SuperMac J700 with a single task to do - run network backup. It
does this at about 50 MB per minute to an 80 MB FireWire hard drive,
and recovery from the drive is nearly as fast as backing up. That was a
nice lesson to learn today; prior to switching to FireWire, I'd been
using tape drives, and file recovery was always a long, slow process
that involved two or three tape swaps.
Fortunately I'm a firm believer in backups. At this point I still
haven't checked the status of the mailbag column I was working on
before I decided to install the security update, but at the very worst
I'll be able to recover it from the outgoing mail folder in Apple's
Mail program.
Benefits of the Crash
Can anything good come of this kind of hard drive failure? Perhaps.
It definitely reinforces my conviction that backup is important.
Regular backup. Redundant backup.
I'm also grateful that I do partition my hard drives, since if I
hadn't done so I would have lost the whole 20 GB drive instead of only
a 2 GB partition. In a pinch, I could have set up an external 3.5"
FireWire drive, but I would have completely lost my portability. And my
old 10 GB drive just doesn't have the space for the 12-13 GB of files
on my computer.
Working with an external bus-powered FireWire hard drive keeps me
portable, although it is one extra piece to carry. It may also be a bit
more efficient. Maybe.
Here's my thinking on that. My main hard drive is a fast 5400 rpm
unit with an 8 MB data buffer, but both partitions had to share
that buffer. The 10 GB Toshiba drive doesn't spin as fast or have as
large a buffer, but the whole buffer is used just for my website files.
Further, the FireWire bus may be more efficient than the IDE bus used
by the internal drive - I've certainly found that a fast FireWire drive
can run circles around even the 20 GB 5400 rpm TravelStar.
Finally, this convinced me to stop postponing ordering the
DiskWarrior 3.0 upgrade. Not only will this be a full featured utility
program that boots and runs in OS X, but it has a new feature that
can monitor your drives looking for potential failure. I have to wonder
if that would have caught this problem before it became so big.
Peace of Mind
Once upon a time I backed LEM up to a Zip disk, but we've grown too
big for that. I've been doing network backups at home for years,
something I learned all about in my IT job years ago. With a backup,
you can recover from almost any disaster that doesn't level the
building. Been there. Done that.
Whether you back up crucial files to floppy, Zip, CD-ROM, your .mac
storage space, or a network server, it's always a good idea to have at
least one backup of anything that's important. I've never had a hard
drive problem like this before, and without backup the situation would
be a lot worse than it is tonight.
Bad stuff happens. With multiple backups, you should be able to
recover from almost all of it.
Dan Knight has been using Macs since 1986,
sold Macs for several years, supported them for many more years, and
has been publishing Low End Mac since April 1997. If you find Dan's articles helpful, please consider making a donation to his tip jar.