[SOLVED] [Old] WD Green Power drives may kill themselves !!

Post by **johannes** » 20 Feb 2009, 02:53

Hi all,

This is the current status and collected knowledge so far:

This issue applies to all WD5000AACS or WD10EACS disks sold by us before mid september, 2008. Somewhere early or mid september we received a batch with an updated firmware, and the issue is gone since then. Also the newer WD10EADS disks are not affected. Our Seagate 80 GB and SSD disks are also not affected.

The issue is that the head unloads after a 5 second timeout, to save power. This works good in many environments (windows etc.), but in some Linux environments the journalling filesystem makes the system touch the disk often, generally quite soon after the head is unloaded (apparently every 5-15 seconds). This is why the counter "Load_Cycle_Count" increases rapidly on some systems. Western Digital recommends staying below 300 000 cycles, this is reached quite soon on some systems. However, WD does not accept warranty returns due to this fact, unless the disk is actually broken. It is unclear if this really is a serious issue, there are reports of these disks having reached over 1.5 Million cycles without malfunctioning. However, we are taking it seriously just in case.

Also, not all users are affected by this, even if they have the affected disks. This is probably since they use their server with a higher average disk load, that makes the disk being touched often enough to prevent the head from unloading.

There was at one point a suspicion that WD had only masked away the real value of Load_Cycle_Count and replaced it with a copy of Start_Stop_Count, since these values often (always?) are identical on disks newer than the September 2008 batch. However, this is most likely wrong. The probable cause is that every Start/Stop cycle induces a Load/Unload cycle, and if the timout for power-saving Load/Unload cycles is long enough this will never occur in our system, hence these values will be identical.

If listening carefully to the disks a small "klonk" sound can be heard every now and then. This is not neccesarily the load/unload cycle, just disk activity.

Excito is now working on finding a fix (stopping the load/unload count increase) that can be distributed through our update service. This to also reach customers not reading this. We are currently developing and testing it. Please have patience with this, it is a large process for us to release new software since all possible side effects of such software has to be assessed and tested.

Meanwhile, customers that worry about this issue are recommended to use the script shown below, and also to verify that this actually works by monitoring the load/unload cycle counter over some time.

Log on using ssh, and issue:

Code: Select all

nano wd_stop_unload.sh

Add the following to this file:

Code: Select all

#!/bin/bash
while [ true ];do
sleep 5
touch /tmp/wdfix
done

Make the script executable:

Code: Select all

chmod +x wd_stop_unload.sh

Run the script:

Code: Select all

nohup /full/path/to/wd_stop_unload.sh &

[EDIT]: Forgot that full patch to the script is required. For instance /home/user/wd_stop... See above. [/EDIT]

This script will also keep running after you close your ssh session, but not automatically start after a re-boot.

Please also report back to us that this really works for you. Meanwhile, as I said, we are working on a real fix, distributed through our update service.

We will keep you posted on the progress.

paolol61 · Post by **paolol61** » 21 Feb 2009, 05:37

I get this error

bubba2:/home/paolol61# nohup wd_stop_unload.sh &
[1] 18952
bubba2:/home/paolol61# nohup: appending output to `nohup.out'
nohup: cannot run command `wd_stop_unload.sh': No such file or directory

thanks.

joost · Post by **joost** » 21 Feb 2009, 07:50

And I get the following error:

Code: Select all

nano wd_stop_unload.sh
chmod +x wd_stop_unload.sh
nohup wd_stop_unload.sh &

nohup: appending output to `nohup.out'
nohup: cannot run command `wd_stop_unload.sh': No such file or directory
[1] 3409
[1]+  Exit 127                nohup wd_stop_unload.sh

Post by **johannes** » 21 Feb 2009, 09:39

Ah, my bad, was too fast there. Guide corrected above.

paolol61 · Post by **paolol61** » 21 Feb 2009, 09:48

Thanks a lot.
It works

Post by **johannes** » 21 Feb 2009, 10:14

Great. It also prevents the load cycle counter from increasing?

joost · Post by **joost** » 21 Feb 2009, 11:48

Yep. This seems to work.
My counter now seems to stay put at 75848.

Thanks!

nitram · Post by **nitram** » 21 Feb 2009, 14:39

Unfortunattely the script does not stops the increase of the load cycle on my disk

Sat Feb 21 20:24:16 CET 2009
Device Model: WDC WD1000FYPS-01ZKB0
9 Power_On_Hours 0x0032 096 096 000 Old_age Always - 3453
193 Load_Cycle_Count 0x0032 083 083 000 Old_age Always - 351099
194 Temperature_Celsius 0x0022 112 110 000 Old_age Always - 40

Sat Feb 21 20:34:16 CET 2009
Device Model: WDC WD1000FYPS-01ZKB0
9 Power_On_Hours 0x0032 096 096 000 Old_age Always - 3453
193 Load_Cycle_Count 0x0032 083 083 000 Old_age Always - 351138
194 Temperature_Celsius 0x0022 112 110 000 Old_age Always - 40

Any other suggestions?

Martin

nitram · Post by **nitram** » 21 Feb 2009, 17:04

I adjusted the script to the tail on /var/log/messages, mentioned earlier in this post and that seems to slows it down.

Sat Feb 21 21:00:45 CET 2009
Device Model: WDC WD1000FYPS-01ZKB0
9 Power_On_Hours 0x0032 096 096 000 Old_age Always - 3454
193 Load_Cycle_Count 0x0032 083 083 000 Old_age Always - 351214
194 Temperature_Celsius 0x0022 113 110 000 Old_age Always - 39
Sat Feb 21 22:59:29 CET 2009
Device Model: WDC WD1000FYPS-01ZKB0
9 Power_On_Hours 0x0032 096 096 000 Old_age Always - 3456
193 Load_Cycle_Count 0x0032 083 083 000 Old_age Always - 351223
194 Temperature_Celsius 0x0022 112 110 000 Old_age Always - 40

Martin

RobV · Post by **RobV** » 21 Feb 2009, 17:15

No, the script does NOT stop the increase of the load cycle on my disk.

Code: Select all

Sat Feb 21 22:55:30 CET 2009
Device Model: WDC WD1000FYPS-01ZKB0
9 Power_On_Hours  0x0032 096 096 000 Old_age   Always - 3340
193 Load_Cycle_Count  0x0032 153 153 000 Old_age Always - 141499
194 Temperature_Celsius 0x0022 106 104 000 Old_age Always - 46

Sat Feb 21 23:05:23 CET 2009
Device Model: WDC WD1000FYPS-01ZKB0
9 Power_On_Hours 0x0032 096 096 000 Old_age Always - 3341
193 Load_Cycle_Count 0x0032 153 153 000 Old_age Always - 141503
194 Temperature_Celsius 0x0022 105 104 000 Old_age Always -  47

ps -f output:

Code: Select all

UID  PID  PPID  C STIME TTY          TIME CMD
root  29853 29820 0 22:48 pts/0 00:00:00 su
root  29854 29853 0 22:48 pts/0 00:00:00 bash
root  30007 29854 0 22:55 pts/0 00:00:00 /bin/bash /home/user/wd_stop_unload.sh
root  30455 30007 0 23:08 pts/0 00:00:00 sleep 5
root  30457 29854 0 23:08 pts/0 00:00:00 ps -f

Or is it just supposed to slow the process down a litttle?

Some background:
My B2 is in use for almost 5 months and I use torrent downloading now and then

Puma · Post by **Puma** » 22 Feb 2009, 06:27

Hello,

I can confirm that the script does not stop the loadcycle.

Since yesterday evening about 2800 loadcycles more with script working.
(I checked the working in "top")

So I will now start to torrent

WD type and serial:
WD5000ABPS-01ZZB0 / WD-WCASU5232253
It is a RE2 so it should handle 600.000 cycles...

I am now starting to worry!
-Should I change the disk? (it seems making louder noise... or am I imagening??)
-Which HD to buy?
-RAID 1 Would be a great help.

Please come soon with a solution.

Puma

Post by **johannes** » 22 Feb 2009, 08:46

Hi all,

Ok, sorry about this. We still don't have any disks here to test with (on their way back here now), so the "touch dummyfile" approach was a guess.

Does it help if you exchange the line

Code: Select all

 touch /tmp/wdfix

to

Code: Select all

tail /var/log/messages > /tmp/wdfix

?

Thanks,

rewien · Post by **rewien** » 22 Feb 2009, 10:28

Puma wrote:Hello,

I can confirm that the script does not stop the loadcycle.

Since yesterday evening about 2800 loadcycles more with script working.
(I checked the working in "top")

So I will now start to torrent

WD type and serial:
WD5000ABPS-01ZZB0 / WD-WCASU5232253
It is a RE2 so it should handle 600.000 cycles...

I am now starting to worry!
-Should I change the disk? (it seems making louder noise... or am I imagening??)
-Which HD to buy?
-RAID 1 Would be a great help.

Please come soon with a solution.

Puma

Hello Puma,

Have you tried with this script?

while [ true ];do
sleep 5
tail /var/log/messages >NUL
done

When you log in with putty, just copy paste the script.
It works for me,even when I close putty it stops the cycles from increasing.
But when you restart your bubba you have to re add the script.
be sure to check the cycles when you add the sript to see if it stops.

When running the sript every thing like temperature stays stable.

Ofcourse this is a big problem, best thing would be to clone the disk as a backup, someone suggested clonezilla. but then again we need a disk that we know for sure would not be effected by this problem to backup it on.

Gr.Rewien

RobV · Post by **RobV** » 22 Feb 2009, 19:40

Hi Johannes,

The good news is: my load counter is quiet using the script.
Unfortunately I had to go all the way back to a sleep of 1 second!!!
Intermediate values of 4, 3 and even 2 seconds did not do the job, still causing a few load cycle increases per minute...

Would it be possible that my drive has somewhat different internal settings?

Code: Select all

Mon Feb 23 01:17:03 CET 2009
Device Model:     WDC WD1000FYPS-01ZKB0
Serial Number:    WD-WCASJ1898184
Firmware Version: 02.01B01

I have experimented with both the original code 'touch /tmp/wdfix' and your suggested 'tail /var/log/messages > /tmp/wdfix'.
Both work fine, but I prefer the touch command as it should cause less drive access (every single second).

Right now my fine working script looks like this:

Code: Select all

#!/bin/bash
while [ true ];do
sleep 1
touch /tmp/wdfix
done

I wonder what kind of wear out this back-up scenario could cause.
I'm just curious, does Linux overwrite the same HD spot over and over again or does it proceed to a different physical location to avoid wear out caused by re-re-re-writing of the same file?

Rob

Post by **johannes** » 23 Feb 2009, 01:10

Robv,

Just to double check: That disk doesn't look like it came from us, correct? Anyways, a 1 second sleep timeout of the disk does sound very strange. What was your load cycle count and power on hours before applying the fix?

forum.excito.org

[SOLVED] [Old] WD Green Power drives may kill themselves !!

script doesnt stop loadcycle

Re: script doesnt stop loadcycle