Data Persona Analytics: The Data-Application Matchmaker

Page 1 of 2 next >>

Spanning the Dataverse with a Unified Data Strategy and Data Persona Analytics – Part 3: Data Persona Analytics and the Moment of Attention

In the first installment of this series, we discussed the shortcomings of traditional data management, the explosion of information as data, and the importance of a cohesive approach to designing new infrastructures. We introduced the concept of a Unified Data Strategy (UDS) as the foundation for that approach. In the second installment, we suggested that an alternative approach exists to classic data management methodologies.  In this, the third installment, we will describe how the characteristics of data and the changing value of that data over time inform the design of that foundation.

The 21st century approach to data management will be far more flexible and comprehensive than previous approaches.  Datasets need to be managed based on their individual merits and attributes as opposed to data being structured according to the limitations of a dominant model or a particular product. The idea of a UDS will allow a business to comprehensively manage its most important asset, its data.  This does not imply that the value or even the level of utilization of the classic RDBMS systems that the industry has embraced over the past four decades will evaporate. Rather, the utilization of each technology to include the RDBMS will require more careful scrutiny as the modern forms of data require new and innovative management strategies.  

 In an ideal scenario, data will flow between applications based on the applications interest in that data as opposed to that data being unnecessarily restricted and consequently minimally accessible.  Most importantly data will be filtered based on dynamic demands and the flow of subsets of data will be appropriate to the request.  

As data management matures, new methodologies will be employed to categorize and filter that data to present the exact desired dataset. The “personality” of a particular dataset should be determined before the method of management is selected.   A new science called Data Persona Analytics (DPA) is emerging. DPA is defined as the science of determining the static and dynamic attributes of a given data set so as to construct an optimized infrastructure that manages and monitors data injection, alteration, analysis, storage and protection while facilitating data flow. Each unique set of data both transient and permanent has a descriptive Data Personality Profile which can be determined through analysis using the methodologies of DPA.

Although this science is yet to be widely accepted the various components already exist albeit mostly informally. The science of DPA will direct the appropriate application of the wide variety of data management systems. 21st century data scientists will endeavor to perfect the algorithms that constitute this new science. To accomplish the goals of Data Persona Analytics those algorithms will be developed in a similarly to how psychological personality tests were developed.

The business function of a given data set must be determined first and therefore that business function influences the persona of the respective specific dataset. It is also important to note that applications not only consume data but often they transform ingested data into completely new datasets with entirely different personas.

DPA is the mechanism of execution of a UDS.  From the mathematical perspective, DPA is a centralized notion with an unlimited set of orthogonal vectors each of which is based on a tangible set of unique properties of that transient or permanent dataset. The properties may consist of simple concepts such as "need to protect from access, manipulation or loss" or "need to be accessed with ultra-low latency". Transactional systems might require "very/ultra-low latency" or a data mining requirement may consist of "extreme HPC logical processing capability" which in-turn may require the data to be pre-sorted or even pre-staged.   Some data may be "metadata rich" or "optimal for bit based searches".  These are but a few of the many attributes that can and will be considered.

Data consumption goes through five phases, Capture, Define, Filter, Store and Disperse. When including the protection requirements of a given dataset it is clear that each unique dataset has a unique personality that should be understood as such. It is worth reiterating that no longer will the data and consequently the RDBMS control the company, the company will control the data.

“Attention” a 21st Century Commodity

In 21st century parlance “attention” is a valuable commodity and to be successful a company must be able to capture and focus “attention” in an environment where distractions are endless.  Data comes in many forms, has many attributes and has been transformed so that it can now be considered to be an actual tangible commodity. The businesses that will be successful will be those which are able to attract and maintain the “attention” of their customers each of whom maintains multiple devices capable of infinite data access and unlimited visibility to the greater universe of data or what we have called the “Dataverse”.  To accomplish this lofty aspiration data must flow with the alacrity of the speed of light while being customized for that viewer’s interest or the “moment of attention” will be lost.

The modern data modeler or analyst should be considered to be a Data Scientist. Just as General Relativity showed that Newtonian physics was incomplete, in the same way the relational model although mathematically perfect is incomplete and new data models will simply go beyond what the relational model reveals. The objective of all the new data models will be to more effectively access the exact requested dataset in the most efficient and swiftest fashion. The systems that accomplish this objective most successfully will capture that “moment of attention” and all the consequential value that is subsequently derived. Data Persona Analytics will reveal and define the optimal datasets to attract and maintain attention.

Click at bottom right to continue to page 2 of this article.

Page 1 of 2 next >>