How to Compare the Cost of HCI Systems – ITPro Today Blog by Eric Slack

By , Saturday, March 9th 2019

Every year Evaluator Group conducts a survey-based study on HCI in the enterprise designed to gauge attitudes and acceptance for this technology in large companies and to understand which products are being considered and how they’re being evaluated. In the current and previous years’ studies, the two most important product attributes were performance and economics, characteristics that can be difficult to measure in HCI systems.

Traditional storage metrics are inadequate for a comprehensive solution that includes compute and storage, and traditional cost calculations ignore the relationship with performance. In this blog we’ll discuss what each of these characteristics mean, how they’re related and how they can impact a hyperconverged infrastructure (HCI) evaluation.

VMs per Node

HCIs are configured and deployed based on how many VMs they need to support, typically using a baseline VM-per-node estimate for the initial cluster configuration. As VMs are added, IT monitors the applications to determine when the system has reached capacity, usually when performance drops below an acceptable threshold. Then, more nodes are added to the cluster to support more VMs. The challenge with this process is that it relies on an upfront estimate of how many nodes will be required for the initial configuration, which can lead to overprovisioning. It also doesn’t give you an accurate basis for comparing the cost of different HCI systems at the outset, since you have no way of knowing if or how many more VMs could be supported with the initial configuration.

Obviously, application performance is a function of compute and storage performance of the HCI nodes. Compute performance is a function of clock speed and the number of CPU cores available, since all the leading HCIs use Intel processors. Given the high core counts available with modern servers used in HCIs, compute performance can quickly outpace storage performance, making storage the gating factor for total performance.

In storage systems, performance is typically expressed in terms of throughput (GB per second) and I/O operations per second (IOPS). These measurements can be taken with HCI solutions as well, reflecting the ability of the software-defined storage (SDS) layer to transfer data and complete storage input and output transactions. But this is only a component of the performance story for HCI.

Performance and Latency

Throughput and IOPS each measure performance in the aggregate, specifically how much data can be moved into or out of the system or how many instructions can be executed in a given timeframe. The basis for these two numbers is another performance characteristic, latency, measured in microseconds (usec). This is the time it takes a single transaction to pass through the entire storage system, from input to output. Latency is particularly variable in software-intensive architectures, like HCI systems that have SDS at their foundation. The way the software distributes data around the cluster, the way it handles data services and the overall efficiency of the software stack figure into this latency number.

While performance metrics can tell you a lot about the storage architecture, they can’t tell you which HCI system will be the best for a given environment. This is because performance is workload-dependent; it varies from one application to the next and, more importantly, on the mix of workloads a given system is subjected to. The concept of workload mix is especially pertinent in HCIs since these systems are very often used as a consolidation platform for many applications. In fact, the Evaluator Group studies mentioned earlier show that consolidation is the most common use case for HCI systems.

Workload dependency is why it’s most useful to express HCI performance in terms of the number of VMs a given configuration can support, while providing an acceptable level of latency. The best way to determine that number is to measure application latency of real-world applications running in VMs on a given HCI configuration, then add VMs until the system can’t hold latency under that threshold. This is the process that companies use to determine when they need to expand an HCI cluster, as described above.

VM and VDI Performance Testing

In lieu of running a test on real VMs, representative workloads can be generated by an appropriate testing platform and run on the HCI cluster, adding copies of these workloads until the latency threshold is reached. VMmark is an industry standard for measuring server VM workloads that is used by testing suites such as IOmark VM to calculate the maximum number of VMs an HCI system can support. Dividing this VM total by the number of nodes in the HCI cluster gives the VMs-per-node metric that’s needed to determine the true cost of an HCI solution. Similar tests can be run in a VDI environment to determine an accurate number of virtual desktops a given HCI cluster can support.

Their comprehensive, scale-out architecture makes HCIs flexible and easier to deploy, but also makes performance and cost more difficult to determine. An accurate measure of VMs or virtual desktops per HCI node is needed for a meaningful total cost of ownership (TCO) calculation. A testing platform that incorporates realistic virtual machine and virtual desktop workloads can provide these data. One of these is IOmark.