Spanning the Dataverse with a Unified Data Strategy and Data Persona Analytics – Part 4: A Complex Problem
In the first installment of this series, we discussed the phenomenon of the big data explosion and its impact on traditional data management. We proposed a foundation for a cohesive approach to the design of modern data platforms. In the second installment we suggest that “An Alternative Approach” exists and that the widely accepted approach to managing data should be challenged. In the third installment, we showed how this approach was rooted in the "personality" of each type of data, and the necessity of a platform design that can leverage data at the moment its value is highest. In this fourth section, we will illustrate these concepts through a classical example that demonstrates the higher value of data when leveraged on a state-of-the-art infrastructure.
The architecture that emerges from a unified data strategy (UDS), introduced in the first installment of this series, combined with the efficiency and flexibility of a software defined data center will form the essential core of modern data-centric IT. The methods of the science of data persona analytics will allow for the development of clear definitions as well as a deep understanding of every unique dataset. The evolution to these approaches is being driven by the developing recognition of the utility and respective value of data in all forms. The open ended variations of data and the seemingly unlimited amounts of data in every form constitute the source of limitless innovation. The classic models and system will not only survive but they will thrive as they will be joined in a UDS by modern data management systems that exist to manage modern forms of data. The data will freely flow between these system to satisfy the aforementioned business requirements and functions.
The systems that manage these new forms of data as part of what is effectively a data enabled cloud will need all of the capability, flexibility and elasticity that are commonly associated with the most sophisticated cloud implementations. Therefore the most effective systems will run on heavily virtualized infrastructure. Only through the capabilities of a true platform of virtualized hardware that provides the aforementioned capabilities as innate aspects of that platform can the facilitation of boundless data flow between all applications with disparate requirements for injection, access, manipulation, and protection be achieved. Such systems will be comprehensive in their ability to manage data in every conceivable dimension to meet the requirements of the modern business. Consequently, data will be discovered and identified and ultimately treated as the most valuable asset in the 21st century economy.
Maximize the Value of Your Data
Companies must endeavor to first recognize all forms of information as different dimensions and manifestations of data. At that point, each company must determine through Data Persona Analytics the individual attributes and requirements of each and every distinct dataset. Then they will determine which data management systems to use to manage that dataset. Only at that point will they be able to construct a true Unified Data Strategy to comprehensively and holistically manage the data and optimize its aggregate value.
The idea of the “moment of attention” introduced in part 3 of this series emphasizes the sensitized nature of accessing the right data at the right time. Datasets may be transient or permanent in nature but the value of that data is directly related to the amount of time that it is meaningful to someone under some set of circumstances. Clearly the “moment of attention” and the actual real length of the “moment” vary greatly but regardless of that variation the value of the data diminishes once the “moment” has passed. The use case below describes the massive amount of data that is generated by each occurrence of the great American Icon, a baseball game. It should be obvious that the length of the “moment” changes as it is dependent on the intended use of the various assortments of data generated.
A future part five of this series will endeavor to prescribe an approach and depict an example of a rudimentary data persona analytics exercise. We will expand on the below litany and variety of data that emanates from the American pastime by showing how to use potential simple survey questions, basic algorithms and weighted parameters to define an entire unified data strategy. Ultimately, we will endeavor to describe “the product” rather than just a methodology which will enable firms to adopt a UDS.
A Common Use-Case for UDS - A Day at the Ballpark
Consider a professional baseball game or any other popular professional sporting event. When a fan sits in the upper deck of Dodger Stadium in Los Angeles, or any other sporting arena on earth, the fan is happily distracted from the real world. For the time period of the event the focus of the fan varies from the tactical aspects of the next play to the strategic implications of that game’s results. The fan is immersed in the game in many dimensions from the taking of photographs and videos to the recording of individual data points related to the events on the field to be shared instantly or at a later time with friends or some greater community.
Ultimately, professional sports constitutes a trillion dollar industry - an industry whose product is on the surface entertainment but when one barely pierces that thin veneer it quickly becomes clear that the more significant product produced is data. The aforementioned fan is a consumer of this data as are billions of other fans. And if that fan is also happens to be a data scientist, the plethora of data in variant forms and disparate levels of value becomes clear. The main product of America’s pastime is not entertainment, it is data.
A base hit may result in a run for the batting team and may or may not give the batter a “run batted in” (RBI). A cherished data point, as a typical major league baseball player might translate that data point into many thousands of dollars and when aggregated with more positive statistics, thousands of dollars become tens of millions of dollars. This example starts with the classic statistics of American baseball. Classic stats are simple as most fans are usually aware of them at least in generalities. These stats are very important to the history of the game as they appear in standard tuple/column formats in an almost unlimited number of repositories of data pertaining to the sport. However, these stats for years survived only in the archives of newspapers and official documents in the offices of the authorities of the sport.
In modern times, classic stats have a vast array of uses and formats. Each game has an “official scorer” who serves as arbitrator of the reality of the description of each play on the field. Those stats, which are simple row and column data are sacrosanct and must be cherished for eternity in the vaults of the sports along with the most valued artifacts and relics. This data’s persona is obvious as 100% data protection is an absolute requirement so only the most carefully designed RDBMS with the best physical infrastructure will be adequate for this set of data.
But there is so much more. Every pitch and every play produces an unlimited amount of extraneous and often superfluous data that may or may not be of value to someone at some time. As the sport has evolved new analytical techniques have developed to examine varied sets of data on intricate levels that could not be conceived of decades ago. Some of this data can be derived from other data but some is completely unique and to certain degrees subjective.
New stats with equally obscure acronyms have emerged such as “wins above replacement (WAR)” or the “ERA+” as well as other purely subjective assessment stats which estimate a fielder’s range along with a plethora of other metrics. This is the realm of the mathematicians and statisticians who define the sport via the ideas of “Sabermetrics.” This advanced data may be no more than an extension of the classic stats and on occasion this data may provide vital insight but in either case it has an enormous impact. Countless organizations and websites are dedicated to the ideas and the analysis behind this relatively new analytical methodology. Individuals who previously would not have dreamed of becoming a significant figure in a professional sport are sought out by upper management with the same enthusiasm as they might recruit a star player. Numeric data of all types is captured through an unlimited array of sources, defined through some mechanism, filtered with specific criteria and dispersed by any number of apparatuses.
The fantasy baseball industry cannot exist without this data and the respective massive amount of advertising income subsequently vanishes without it. Each fantasy league owner, whether a corporation an informal neighborhood league run from someone’s personal laptop must utilize subsets of this data. The data must be dispersed real-time and the applications that use the data must be massively scalable and they must be provisioned self-service and immediately. The resulting data however most often has minimal protection requirements and simple freeware data management systems will usually suffice.
The same data however will be utilized by the professional clubs management when considering bonuses, value and lengths of contracts. The professional club will use many other forms of data that neither the gamers nor competing clubs will not have access to such as personality tests, medical tests, specific scientifically based athletic tests and more.