Recently we have been doing quite a bit of database testing in the Evaluator Group labs, using a variety of hardware and storage systems. We noticed some performance issues which reminded me that I needed to write a blog on this, since the issues we were seeing have been around for quite some time. However, due to increasing system performance, these issues are becoming an ever-larger problem.
An ongoing joke in the IT industry since at least the 1990’s is that if you ask a DBA what the cause of their problems is, they will blame storage. If you ask storage admins what the problem is, they blame the database people. Well, like many jokes there is a kernel of truth here, because in fact storage is quite often a bottleneck for many transactional databases. However, finding and fixing the problem is difficult, in part due to the way systems report performance.
Let me explain with a little bit of background before I dive into the issues. We have been running the HammerDB workload tool on several system configurations, looking primarily at the performance impact of different storage solutions. For those who are not familiar with HammerDB, it is an open-source tool that can run standard transactional or data warehouse workloads against a variety of databases and operating systems. The primary measurement provided by HammerDB is the number of “New Orders per Minute” that are processed, which is a function of the speed of the system, including processors, memory and of course storage.
Some of our recent testing is comparing the impact of different storage on the transaction rate of a “TPC-C like” workload. The OS and database we are using are Windows Server 2022 along with SQL Server 2019. What we noticed is that the monitoring tools within Windows report VASTLY different performance results than do the storage system monitoring tools. Specifically, the response times reported are off by two orders of magnitude.
While running workloads, we found that the external storage system tools report that all is well, with storage response times of under 1 milli-second for both reads and writes. Meanwhile, Windows along with SQL Server Management Studio (SSMS) are reporting response times averaging more than 100 ms.
Even more interesting is the fact that this occurs regardless of the external SAN storage system we use and remains true with external iSCSI storage, lest anyone concur that the problem must be FC SANs. Windows consistently reports response times that are about 100X greater than what the storage system itself is reporting.
Now, to make this even more interesting, this discrepancy does not always exist, it is workload dependent, and usually seen with bursty workloads like HammerDB. In order to verify this, we ran a different workload using the identical storage configuration.
We found that running a SQL backup of our database to the same storage system, Windows reported a response time of 2 milli-seconds, while the storage system reported 1.9 milli-seconds. In this case, the values are nearly identical.
“How can this be?” you may ask, and rightfully so.
By now I hope I have piqued your interest, I’m sure many of you are asking “What is going on and how can I accurately measure I/O response times?”
Well, it turns out this is a great question. We did some additional investigation and found that while running a typical HammerDB workload, I/O’s can momentarily spike up to nearly 20,000 outstanding I/O’s, and then drop back down to less than 100. During these spikes, the response time climbs dramatically, due to queuing delays. The Windows OS is accurately accounting for all delays in the round-trip of an I/O, including the local MPIO drivers, the SCSI driver and then the I/O to the storage along with queueing incurred. In contrast, the storage system only sees the few hundred outstanding I/O’s sent to it at any one time, and accordingly attributes the sub milli-second response times it is providing.
In essence, both points of view are correct, but the storage system never knows about the additional thousands of I/O’s being queued up on the host.
That leads to the next question, which is “Why are I/O’s queueing between my host and storage system?” Like many seemingly simple questions, the answer can get complicated very quickly. If you have been following advances in storage technology, you know that a new protocol based upon memory semantics has been gaining adoption over the past several years.
The NVMe protocol is designed to replace other (meaning primarily SATA and SCSI) protocols with a more modern protocol that has significantly less overhead along with numerous other improvements. We at Evaluator Group have been talking and writing about NVMe for at least five years now and I’m sure many of you are very familiar with NVMe also.
It turns out that NVMe is designed to solve EXACTLY the type of problem encountered here. Specifically, occasions when ultra-low latency response times are required, along with the ability to support thousands of outstanding I/O’s.
By now I’m hoping you are saying to yourself, “Yes, great so what is the solution to the problem?” Which is a perfect time to tell you that you’re going to have to wait for part two of this blog, where I go into details on potential solutions. Fear not, we will not be pointing to some vendor’s product as the answer to all your problems. Instead, I will provide a discussion of some potential solutions and options, along with additional data points we collected during our testing.
Stay tuned for Part II of “Why DBA’s and Storage Admins are Both Right.”