Conclusion from Google: Memory Chips More Unreliable Than Previously Believed
Over the last two-and-a-half years, search engine giant Google analyzed the performance of the thousands of computers it uses for its own data centers. The surprising trend? Based on real-world data, Google concluded that the error rates of memory chips are higher than previously believed. Much much higher:
How many errors? On average, about one in three Google servers experienced a correctable memory error each year and one in a hundred an uncorrectable error, an event that typically causes a crash.
Older research showed that, for every 1 billion hours, a memory chip would fail on average around 200 to 5,000 times. Google’s project revealed a much higher ratio: 25,000 to 75,000 failures within the same time period.
To be clear, those numbers aren’t anything significant for the typical consumer. A billion hours is equal to more than 100,000 years after all, worth at least ten thousand lifetimes. Nevertheless, Google’s research should be useful for projects or businesses that have to rely on lots of memory chips to get the job. A prominent example is of course other companies that maintain their own data centers.
Of course Google couldn’t pass up the opportunity to market its own technical expertise, discussing the various technologies it uses to protect end-users from crashes caused by memory errors. Apparently “error correction code” and “chipkill” are just two of the things Google relies on to protect you from the evil data corruption. (Source, thanks to Sheree for the pic!)