IT infrastructure doesn’t typically ‘wear out’, so much as it becomes obsolete. Technology changes fast enough that newer models offer substantial benefits over systems that may be only a few years old. This is particularly true with controller hardware where obsolescence is driven by faster processors and greater numbers of CPU cores which follow a roughly 3- to 5-year refresh cycle. But storage systems are comprised of more than just controllers. They also include disk drives and lately, solid state (SSD) or all-flash storage, which does wear out.
Historically, controllers and storage have had roughly comparable refresh cycles. But SSDs are now coming out with longer lifespans, roughly twice that of the CPUs that drive controller upgrades. This lifecycle mismatch can have a decided economic impact on the cost of storage systems, and justify a move towards disaggregation of media from controllers in storage system design.
The Evaluator Group has developed an economic model to calculate the potential savings of storage disaggregation. This model can be accessed on the Evaluator Group website here.
NAND Flash Lifespan
Flash endurance is commonly expressed as the number of times a device can be filled per day, which is extrapolated out to a yearly lifespan. Historically, SSDs have
been guaranteed for 3 or sometimes 5 years. But new developments in flash technology have pushed this to 7 or 10 years, or more. This will cause problems as customers realize they’ll be wasting money replacing storage media before its time if they want to upgrade storage system performance and functionality as soon as
Why is this happening?
Flash storage media has a finite lifespan that’s a function of the number of times cells are erased, as the energy applied during each erasure degrades the substrate a small amount. NAND flash is erased in blocks, not bytes, like magnetic disk drives are. To accommodate this process, flash devices regularly conduct a ‘garbage collection’ step in which they consolidate the bytes not being erased. This is a copying process that increases the number of times data is written during its lifetime, called “write amplification”, and effectively shortens flash lifespan.
As flash has become mainstream, technology improvements have improved the endurance of flash devices and flash storage systems. Higher density flash technologies have made it feasible to increase over-allocation, providing more of the spare cells each SSD has to replace those that wear out. Flash manufacturers have also learned how to minimize the power used in the erasure process, reducing its impact on the flash substrate that’s at the foundation of flash endurance. Storage controllers are also taking steps to reduce write amplification.
Writes are now regularly cached and then coalesced into a more sequential data stream, one that is written to the media in a less random fashion. This process keeps larger segments of associated data physically together on the flash media, reducing the amount additional copying during garbage collection.
What does this mean?
This disparity in the refresh rates of storage controller technologies and flash storage media has an economic impact on traditional storage systems. Replacing SSDs before their time in order to get the latest controllers, as an example, adds to the long term cost of storage through the more frequent replacement of SSDs but also the cost of migrating data from the old system to the new after each of these upgrades.
Disaggregating the storage controller from the storage media allows the controller to be upgraded while the data stays in place on the SSDs, usually in enclosures. It also enables companies to change out storage media when it does make sense, such as when major capacity or performance increases are released, even though the controller may not be ready for an upgrade.
We’re starting to see a movement towards disaggregation with the rise of software-defined storage (SDS) and the popularity of “commodity” hardware. SDS has decoupled storage functions and features from the hardware, the first step towards making it easier to change out these resources independent from each other. The Open Storage Platform (OSP) is a new technology category that incorporates software-defined storage solutions. More information is available in the Evaluator Series Research on OSP here.
Facebook recently discussed their use of disaggregated storage with their “Lightening Flash Sled” at the Flash Memory Summit 2016. The purpose of disaggregation in this context (replacing embedded PCIe flash) is a little different from the lifespan disparity discussed in this blog, but the concept of improving flexibility and economics is the same.
To provide a more complete picture of the economics behind disaggregation, see the interactive economic model here. There is also a paper and video explaining the use of this tool.
Many products have long lists of features that sound the same but work very differently. It’s important to think outside of the checkbox of similar-sounding features and understand how technologies and products differ.