This is the current status and collected knowledge so far:
This issue applies to all WD5000AACS or WD10EACS disks sold by us before mid september, 2008. Somewhere early or mid september we received a batch with an updated firmware, and the issue is gone since then. Also the newer WD10EADS disks are not affected. Our Seagate 80 GB and SSD disks are also not affected.
The issue is that the head unloads after a 5 second timeout, to save power. This works good in many environments (windows etc.), but in some Linux environments the journalling filesystem makes the system touch the disk often, generally quite soon after the head is unloaded (apparently every 5-15 seconds). This is why the counter "Load_Cycle_Count" increases rapidly on some systems. Western Digital recommends staying below 300 000 cycles, this is reached quite soon on some systems. However, WD does not accept warranty returns due to this fact, unless the disk is actually broken. It is unclear if this really is a serious issue, there are reports of these disks having reached over 1.5 Million cycles without malfunctioning. However, we are taking it seriously just in case.
Also, not all users are affected by this, even if they have the affected disks. This is probably since they use their server with a higher average disk load, that makes the disk being touched often enough to prevent the head from unloading.
There was at one point a suspicion that WD had only masked away the real value of Load_Cycle_Count and replaced it with a copy of Start_Stop_Count, since these values often (always?) are identical on disks newer than the September 2008 batch. However, this is most likely wrong. The probable cause is that every Start/Stop cycle induces a Load/Unload cycle, and if the timout for power-saving Load/Unload cycles is long enough this will never occur in our system, hence these values will be identical.
If listening carefully to the disks a small "klonk" sound can be heard every now and then. This is not neccesarily the load/unload cycle, just disk activity.
Excito is now working on finding a fix (stopping the load/unload count increase) that can be distributed through our update service. This to also reach customers not reading this. We are currently developing and testing it. Please have patience with this, it is a large process for us to release new software since all possible side effects of such software has to be assessed and tested.
Meanwhile, customers that worry about this issue are recommended to use the script shown below, and also to verify that this actually works by monitoring the load/unload cycle counter over some time.
Log on using ssh, and issue:
Code: Select all
nano wd_stop_unload.sh
Code: Select all
#!/bin/bash
while [ true ];do
sleep 5
touch /tmp/wdfix
done
Code: Select all
chmod +x wd_stop_unload.sh
Code: Select all
nohup /full/path/to/wd_stop_unload.sh &
This script will also keep running after you close your ssh session, but not automatically start after a re-boot.
Please also report back to us that this really works for you. Meanwhile, as I said, we are working on a real fix, distributed through our update service.
We will keep you posted on the progress.