Sunday, January 18, 2009

Can consumers trust disk drive MTBF?

Linked to this post is Seagate's knowledge base, where owners of Seagate's 1 TB 7200.11 drives (including me) will find the six steps to determining whether your Seagate or Maxtor hard drive needs new firmware, and what to do if you lose data on that hard drive.  The problem is serious enough that Seagate is offering to recover lost data for free,

Hard drive failures are second only to Windows corruption in my ongoing headaches of computing in this decade.  I have two Hitachi 500 GB Deskstar drives in the basement pile of electronic junk.  They both failed catastrophically in less than a year of ordinary use.

Hard drive manufacturers quote average hard drive life as Mean Time Between Failures (MTBF).  Ordinary consumer drives are in the 500,000 hour MTBF range, while enterprise (e.g., more expensive) hard drives can range up to 1.5 million hour MTBF.  Since there are 8,760 hours in a 24x7 year, 500,000 hour average MTBF is a lot of years.  Right?  Yeah, 57 years is the answer.  The key word in this consumer claim is "average".  Real drives last anywhere from fifteen minutes to fifteen years.

But based on my own miserable experience, I challenge these vendor 500,00 hour MTBF claims as misleading and unproven.  I'd like to see a state attorney general document what the real  expected life is of a consumer hard drive, and how consumers should treat their drives to maximize life.  For instance, is letting Windows shut an idle disk drive down after 10 minutes or so causing thermal stress with continual power-cycle starts and stops?

What is your experience with hard drive longevity in a consumer environment?

7 comments:

  1. Peter,

    I've had VERY bad luck over the years with IBM/Hitachi "DeskStar" 3.5" hard disk drives. Premature failure (less than 1 year run time) was the common thread. As a matter of fact there is a slang name for the "DeskStar" drives - "DeathStar". Now on the other hand I've had excellent experiences with the IBM/Hitachi 2.5" laptop drives. I've NEVER had one of the IBM/Hitachi laptop drives fail for me. In laptop drives, based on 18 years of experience with them I'd rate Toshiba as 1, IBM/Hitachi as 2, Fujitsu as 3, WD 4th and WAY BACK behind Seagate as 5th. In 3.5" drives I'd rate WD as 1, Fujitsu 2, Seagate 3 and again WAY BACK in last place IBM/Hitachi.

    The MTBF is a math calculation based on the cumulative MTBF of the parts in the drive. Realistically, it is FICYION.

    ReplyDelete
  2. Anonymous5:33 AM

    I had a Toshiba 2.5 inch PATA drive fail in one of my laptops after about 4 years of use.

    Of course, that's just one data point.

    ReplyDelete
  3. Anonymous12:51 PM

    MTBF of a drive is obtained by multiplying a large quantity of the drives (thousands) with the number of hours running before experiencing a failure in the batch. For example, when a disk manufacturer batch tested 1500 units of hard disk and achieved an average of 30 days operation out of the batch between each individual unit failure, then the MTBF of the disk is 1500 x 30 x 24 hours = 1 million hours.

    ReplyDelete
    Replies
    1. You can't just boost the 'mean' value by testing more disks to the same value. You have to devide your total hours by the number of drives tested!

      Delete
  4. DANIEL1:35 AM

    The MTBF is the result of (the number of equipment tested * hours tested) divided by the number of failures. In order to achieve the percentage of failure you have to follow the next equation: 1- exp(-hours expected to be used / MTBF).
    (The exponential figure carries out an aproximate exponential increase of failure rate not tested)

    Additionaly on the web you can find how room temperature can affect MTBF, not to mention dust and altitude. This part is the one the manufacturer does not want you to know.

    In order to be reprentative the MTBF number has to be higher (menaing there were more hours applied and more equipment). A good implementation would be to order the manufacturers to display to the public the MTFB with a a minimal standard.

    In IT Industry is the minimal equipment you have to buy for stock.

    ReplyDelete
  5. The meaning of MTBF is that, after the declared number of hours, you have a probability of 63% of failure of your hard disk.

    A MTBF of 500,000 hours means that after 50,000 hours you have a probability of failure of 9.5%.

    Here it is a table with the probabilities of failure for your hard disk featuring an MTBF of 500,000 hours.

    MTBF 500000

    hours probability of failure
    10000 2.0%
    20000 3.9%
    30000 5.8%
    40000 7.7%
    50000 9.5%
    100000 18.1%
    150000 25.9%
    200000 33.0%
    250000 39.3%
    300000 45.1%
    350000 50.3%
    400000 55.1%
    450000 59.3%
    500000 63.2%

    How much do you like to risk a failure?
    If you don't like to risk more than 5%,
    after 30,000 hours is better to change hard disk.

    If you love risk, you can keep it more.
    The constructor can't decide for you how much you like to risk.

    I think that 5% of risk is enough, and after 25,000 hours it's time to change the hard disk.

    The formula is 1-exp(-h/MTBF)

    ReplyDelete
  6. What affect has the amout of read write activity on the MBTF?
    Are there guidlines on the max recommended activity per hour or similar?

    ReplyDelete

All comments are moderated.

Note: Only a member of this blog may post a comment.