The Joys and Benefits of Application-Aware Copy Data Management

It was more than 10 years ago that Microsoft’s Steve Ballmer famously chanted “Developers! Developers! Developers!” while on stage. He’s taken a good deal of ribbing about it, but he was on to something. Since then, developers have become ever more critical to organizations of all kinds. Today, it doesn’t matter what industry you are in; odds are your company is, at least in part, a software company. Developers rule.

But this increased importance of developers and the databases they work with has led to increased stress on IT organizations. In fact, in worst case scenarios it even pits one side against the other. Why is this?

The Gartner analyst firm provides a straightforward means of explaining the new dynamic by what it refers to as bi-modal IT. Mode 1 is what you might think of as “traditional IT,” which has historically focused on values such as reliability, security and performance. Notice some words that are not there: fast, agile, responsive. Mode 1 IT is about keeping the lights on, keeping the bad guys out, making the bits move as fast as they need to move. These are all good things and they aren’t easy.

But Mode 2 is where developers and databases increasingly live. Mode 2 is all about fast, agile, responsive. Gone are the days of 2-year software development projects. Now organizations are striving for code updates that drop monthly, weekly, even daily. And their demands on IT match the speed at which the developers move. They want data, and they want it now.

This is where the friction happens. A developer will say, “I need a workable copy of Oracle today.” And IT responds, “OK, we’ll have that in about 2 weeks.” What’s a developer to do? Sometimes this leads to the use of synthetic datasets (one might also refer to this simply as “fake data”). These never work as well as real data, and the result is more bugs and slower software development. Other times the Dev/Test side of the house will expense spinning up a bunch of systems in the cloud, so-called “shadow IT.” This gives security and compliance teams gray hairs, as well it should, and still doesn’t provide an ideal development environment.  

Nobody is happy about any of this. Both sides are frustrated. Conceptually, the answer is easy: just make the Mode 1 side faster and more responsive! OK, but how? By hiring a lot more IT staff? That’s not happening. By buying double and triple the amount of storage and server hardware? That’s not happening either. And even if it did, there are too many manual procedures clogging up the works. Procedures don’t get twice as fast because you add twice as much hardware.  

Fortunately, the past few years have seen the advent of copy data management (CDM). The original focus of CDM was copy reduction. Copy sprawl was eating up resources and costing money, so CDM went to battle against wasteful space consumption. That’s still a legitimate problem, but it’s far less important than the problem of how to achieve development speed and agility. Extra copies might cost an organization some budget, but if the organization is not agile enough to compete it could eventually lead to turning the lights out permanently.

When it comes to developers and databases, CDM is about delivering copies of data to the people who need them; delivering copies quickly (within minutes); and delivering copies securely (with tight access controls).

Different vendors have different approaches to solving the problem, and it’s worth exploring multiple vendors to see which way best fits your needs. But ultimately what matters are results, whether you are on the IT side of the house or the development side.  When reviewing CDM solutions, look for one that will deliver the following capabilities.

Fast access to data copies. This is Job One, and the primary reason to consider CDM for development use cases. Copy delivery should only take minutes, whether using the latest copy of data (closest to current state) or a copy from days or even weeks earlier (may be required for troubleshooting). Most solutions will offer copy mounting ability, i.e., presenting a copy of a database to a server. But also consider copy manipulation, such as the ability to change the database name on the fly. The goal is to eliminate as many manual steps as possible.

Data security. Naturally, a good CDM solution will protect data even as it provides access to it. Making a copy accessible isn’t any good if everyone can see it. Look for solutions that offer proper access controls, preferably leveraging already existing access methods such as Active Directory or LDAP.

Full system delivery. Mapping to a data copy can save a great deal of time compared to traditional restore-from-a-backup approaches, but it can still leave many manual steps. Look for a CDM solution that can bring up an entire working environment: compute, networking and data, with the ability to manipulate all of these are required (e.g., to bring up systems on a different IP network).

User self-service. While this is not a core requirement, a self-service option can really help ease the tension between IT and dev. With a self-service model, developers and testers can access data copies whenever they need them, based on their work needs, without having to go through IT, submit a ticket, wait for approvals, or any of the processes that slow things down. On the IT side, a self-service capability needs to have enforceable guard rails so that development can overstep their bounds, such as asking for 50 copies instead of five, which could hurt system performance.

Data masking capabilities. A good CDM solution will not just deliver data; it will deliver data that has already been through your required data masking protocols. In fact, being able to access data quickly might not be any help at all if that data can’t comply with required security protections.  

RESTful APIs. Automation is the order of the day, and a CDM solution should provide API-based access to functionality. This way, developers and testers can include copy data delivery as part of dev and test automation workflows. For example, a suite of application tests can call to the CDM solution via APIs and have copies of fresh data mounted to the test servers prior to commencing with the test suite.

DevOps Integration. Not all organizations have made the move to DevOps, but if DevOps is on the table either now or in the future, look for a CDM offering that has made the move into integrating with popular DevOps platforms such as Chef, Puppet, Jenkins and so on. While an API set is really all you need to integrate CDM with DevOps, it’s helpful if the vendor has already provided sample integration scripts or plug-ins.

Each organization will have its own specific requirements, but the list above should provide a starting point for evaluating CDM as it relates to software development use cases. The bottom line is that dramatic improvements in infrastructure delivery times are possible with CDM, reducing copy wait times from days or week