Scott McNealy, former CEO of the former Sun Microsystems, once infamously opined that storage was a (mere) feature of the server. The problem was that at the time he made that comment, the storage industry was writing its declaration of independence. Fibre Channel-based SANs were consolidating and replacing direct attached storage (DAS) architectures in many of the world’s large data centers. IP-based network attached storage (NAS) systems were consolidating and replacing print and file servers, much to the chagrin of both McNealy and Steve Ballmer.
Vendors with a server agenda have traditionally thought that storage just goes along for the ride. Indeed, many of these vendors that sell both servers and storage will say that when servers are sold as a package deal, server considerations are often given higher priority among buyers. On the other hand, one very prominent storage vendor was very successful at displacing a server vendor’s storage with their own server plus storage deals. After all, storage is where data lives. Storage keeps data alive.
Server vs. storage attitudes are now giving way to server + storage converged–not just servers racks plus storage arrays packaged together, but a closer coupling of server and storage technologies. One of the drivers of this recent trend has been server virtualization. VMware and Microsoft, for example, now offer application programming interfaces (APIs) that integrate storage and data management functions that live within intelligent storage arrays with management applications like vCenter and System Center.
Another way to more closely couple servers and storage is to use a storage device’s native processing power to run applications. Yes, you heard me correctly–a storage platform that runs applications. Purpose-built appliances have seen an uptick in adoption across many applications. The next iteration will be to leverage storage-based compute power and have I/O intensive applications installed directly on the storage, resulting in greater efficiency and productivity.
Compute plus storage convergence is an idea that has been kicked around in storage circles for a few years. Shortly after EMC bought VMware, an analyst colleague proclaimed that the acquisition meant that EMC intended to partition its Symmetrix array controllers so that it could run data intensive apps on Symmetrix. While that prediction hasn’t panned out yet, who knows? Maybe he’ll ultimately be proven right when future versions of EMC’s VMAX are announced.
I think it more likely that, later this year, we’ll see the first real adoption of general purpose storage systems–specifically scale-out storage platforms with embedded data-intensive applications built-in as part of the storage system. EMC Isilon and IBM SoNAS scale-out NAS systems for example are not there yet. I’ve written that Big Data Storage and Big Data Analytics are two different things separated by an architectural wall commonly known as “shared nothing.”
Hadoop shared-nothing clusters favor relatively small units of either spinning disk or solid state disk storage that is embedded within each processing node rather than sending I/Os to a general purpose NAS platform for example. To maximize performance, data contained in the storage device is placed as closely as possible to Hadoop’s processing power to limit system latency. But suppose you were to do the opposite i.e. move the processing power to where the data lives. That’s what I believe the computing industry will see later this year–the convergence of Big Data storage and Big Data analytics.