Establishing data relevance can help archiving strategy, Storage Soup blog by Randy Kerns

By , Wednesday, March 30th 2011

As highlighted in many reports, the massive amounts of data being created today cause concerns for Information Technology pros – especially those who manage storage.

These concerns involve how to process data, what to keep and where to put it. Issues include how to present the information, where to store data, what are the requirements for that data, and how much will it cost to retain it. Most of the newly created data is in the form of files.
.
The massive amount of data being created is great news for storage vendors because it means that more storage is required. But storing all this newly created data may be unsustainable for organizations because of the cost required, as well as the physical space and power that storage systems use.

All of this data may require a new approach to storage and archiving. That approach involves creating a method to establish data relevance as part of the analytics performed when data is ingested. Relevance implies that there would be immediate data analysis. For example, data received from a source (monitoring equipment, feedback data, etc.) would immediately go into a data analytics process. The relevant source data received would go to an archiving storage system while the analytic processing continued. The valuable information in intermediary form would be retained in the analytic nodes or on a shared storage system.

The source data sent to the archive would be available for data mining or reprocessing if required. The archive system would handle the data protection process – a one-time protection for new data based on the requirements established for the business.

The processing for establishing the data relevance could be performed by the analytics engine or as part of advanced functions in a storage system. The data relevance engine would move the relevant data to the most appropriate location based on a set of rules on the analysis. Some data could be retained on primary storage but the majority would be stored directly on a more economical archiving system.

This may not really be a new model, but it reduces the steps and time it takes to manage the data. Making a solution like this available for IT would have high economic value and immediate benefit in dealing with the massive amounts of data being created.