A Few Thoughts on Copy Data Management

As we mentioned in last month’s blog, Copy Data Management (CDM) is one of the interesting areas in enterprise data management that has been gaining in popularity and relevance.  CDM is about locating and managing all the copies of your data, anywhere they are, and optimizing the storage of those copies.  We have been evaluating the CDM market and wanted to share some initial thoughts.

  • The problem of too many copies is real. Storage technologies have given us the opportunity to easily make copies of our data and we use them.  Many of these copies are needed and have official business value – snapshots for restore, full backups for air gaps, replication for disaster recovery, long-term retention for compliance, operational copies for DevTest and training, etc.  Some data copies are just as useful but may be more unofficial in nature – extra copies of backups just to be safe, application/VM/database admin generated copies for self-recovery, end user copies of data, etc.  Many of these copies sit idle (too numerous, too old, forgotten) across several storage silos within many IT organizations.  As a result, secondary storage can represent the major of an organization’s storage capacity, with some industry reports saying it’s half of total capacity while others indicate it can be as high as 80% of total capacity.
  • The desire to use of copies of data is growing. The desire by organizations to derive additional business value from their production data continues to grow even as the problem of too many copies exists.  Technology enhancements around DevOps automation, data analytics, machine learning, reporting, and employee training have those groups looking to use copies of the data that are as similar to the actual production data as possible and kept updated on a regular basis.  In some cases, the data needs to be masked (like a social security number) so the format is correct, but no real information is exposed.
  • The benefits of CDM can be significant. The premise of today’s CDM solutions is that they can minimize the problem of too many copies by discovering what copies are already there and providing centralized management of all data copies going forward.  The benefits of using a CDM solution can include reduced storage hardware costs (primary and secondary) from reclaimed storage and slowed storage growth, lower storage software costs especially for products with capacity-based licensing, improved data management through better visibility into where data lives and how it is used, and improved compliance with current and future data privacy regulations.

As with many of today’s IT solutions, several different approaches are used by CDM solutions to discover and identify data copies and orchestrate how existing and new copies are coordinated, managed, and optimized going forward.  There are basically two architectural approaches used for copy data management which we are referring to as in-place CDM and capture CDM.

  • In-place CDM. In-place CDM solutions discover data copies on existing server and storage resources, catalog the copies, and then orchestrate the management of existing and future copies using the same infrastructure resources.  Advantages of the in-place approach are that organizations can continue to use the same infrastructure solutions they use today, little data has to cross the network, and the same storage stack can be used for additional workloads like DevOps that will be actually used for production.
  • Capture CDM. Capture CDM solutions capture a copy of the data from applications and servers, create a single instance or golden copy of the data, and then orchestrate the management of all copies from that golden image.  Capture CDM solutions are typically delivered as physical or virtual appliances with new secondary storage.  Advantages of this approach are the consolidation of secondary storage onto a new platform, the creation of a single system of record for data, and a highly integrated solution which often includes data protection and archiving capabilities.

Based on these initial observations of the market, it’s easy to see why there is growing interest in CDM solutions.  That leads to the next level of questions around CDM such as – What are the key features of a CDM solution? Is CDM the new Backup?  Does it replace Archiving?  How does it work across on-premises and cloud?  What vendors offer CDM solutions?

Time for more evaluation and another blog.