Being Agile While Data Comes In and Goes Out

Dealing with data warehouses, data marts, and even data lakes, can be awkward in an agile environment. While adding a single metric onto a dashboard can be very natural, no one builds a dimension table a few columns at a time. This awkwardness has caused many variants in how an agile methodology might be applied to one’s analytics databases.

Sometimes folks keep the veneer of "agile," while defining a deliverable as a component along the way rather than delivering something of concrete value to the end user, such as a pertinent design, or perhaps a requirements document. Troubles can arise if an organization chooses to see a single delivered report or dashboard, or even a metric as the “thing of value” for the agile team’s sprint—when that team’s effort includes data that does not yet exist within the data mart or data warehouse.  Or, perhaps an organization is sophisticated enough to split off the database side of development efforts and establish a separate Kanban stream used as a resource for multiple agile teams delivering code of one sort or another. Business users may request specific data from the Kanban; this assumes business users do enough ad hoc query work to say that having requested data available in the database is a delivered value to them. Conveniently, the agile teams are waiting for that very same new data to appear. If the Kanban can keep ahead of the agile teams’ data needs, then things can progress smoothly. 

Another option, albeit rarely seen, is to redefine the customer of the story/feature. Data going into an analytics database is not precisely the exact same-sized chunk as data going out of an analytics database for a dashboard. If the users benefiting from the work were defined as the actual dashboard developers rather than the ultimate dashboard business users, their received value would clearly be the availability of a new data source. Then the dashboard developers having this new data would be enabled to begin work on their own tasks, stories, and/or features involving the use of that new data. The source may have had dozens of data elements added into the database while a given report may only need a handful of data points. In this way the data going into the database is a differing amount of work than the data going out for a dashboard. With this kind of approach, the mental block seems to be that the dashboard developers are not considered members of the business and therefore not resources who need to receive value. It is a curious idea to ponder, since everyone works within the business, don’t they?

The reason no one builds a dimension table one column at a time is that adding additional data items as one is pulling in a new source is often trivial. While going back and reworking already existing code to add in an additional column or two is also a small-ish effort, it is significantly more work than having added it all initially. And it is more natural to pull in data items, that likely will be needed, all at once in that initial iteration. Pulling-data-in versus using-data-in-analysis are very separate tasks with separate focal points and even very separate tools sets and skills. The idea of some agile teams trying to tackle bringing in a new data source and then using that source in a new or enhanced dashboard in a single sprint is often a chaotic mistake that only creates opportunities for the team to fail.

As an industry, we need to embrace a means to remain agile even when we are doing things very differently from building a web interface or app. And perhaps we need to also embrace the idea that business and IT are not an “us versus them” dynamic. We are all part of the business.