An Elegant Organizational, Monitoring and Management Methodology
The provisioning and organizational tools and features that must be available to the modern Data Scientist to expedite the building of a UDS must address a series of disparate but complimentary requirements. The Information Technology industry has produced many products and platforms that are effective at addressing individual aspects of this challenging problem such as VMware vSphere. Presently however, no comprehensive platform exists that can completely address all of the extended requirements of a proposed UDS framework which are listed below:
- An application and instances of the application must be built and templatized with a tool that maintains sensitivity to the unique aspects of each individual application but is also designed as a framework tool which is agnostic to all applications. Many instances of the application must be provisionable to the infrastructure via a tool that is designed for the purpose of mass deployment of applications.
- All data must be analyzed and all subsets defined so as to accurately classify the datasets. At this point the most effective data management system can be chosen to manage the individual datasets. Adequate server and storage resources can then be automatically provisioned to adhere to the requirements of those datasets and the business functions that delineate those requirements. Some of the questions to be asked are listed below:
- Which data set must be analyzed instantly and what types of data are those datasets comprised of?
- What data is the focus of transactions which must be processed with very/ultra-low latency?
- What data has strict requirements for referential integrity?
- What data must be protected and to what extent in the following manners?
- Data management systems must be provisionable and complimentary to the instances of the applications as they are provisioned. This tool or feature of another tool should have the capability of creating base servers (VMs) which are predesigned and configured to house the data management systems. This tool must also be able to build each needed data management system within these generic servers and subsequently templatize that data management system and server pair. Finally the tool should be able to provision an individual DataManagementSystem(DMS)-Server pair and use simple screens to associate customizable levels of resources, security attributes, storage, data protection requirements and access controls. Application-data pairing is a central theme of the UDS. Other attributes that should be present in this tool are listed below:
- The ability to ingest data and use that data as the source for provisionable and cataloged data management systems.
- The tool must be capable of including within the provisioned system or template the ability to satisfy availability capabilities which meet any business Service-Level Agreement (SLA)
- The tool should have an open API which allows 3rd party developers of new data management tools to instrument their new systems to be provisioned and managed by this tool.
- A variety of different data management systems must be available to the above data systems management framework tool. All types of data management systems must be considered to be candidates for deployment. Therefore many different data management systems must be available as part of the individual UDS.
- Classic RDBMS systems which may include but not be limited to Oracle, Sybase, SQL server
- Freeware RDBMS systems that are built with limited relational capabilities such as MySQL, and vPostgress
- In-Memory data management systems such as Gemfire and the respective SQL interface SQLFire or the like should be available
- Unstructured data management systems usually based on but not limited to Hadoop technologies such as vHadoop must be available.
- Other data management systems respective to the forms of data that comprise the individual UDS must be available as well
- The system must include a master control tool that can maintain catalogs of all application templates with references to their server-DMS counterparts. This master tool must have a comprehensive view of the entire system and allow for the invocation of each tool within the overall system.
- The system should have the ability to view each of the tools and drill down capabilities to all of the sub-tools as well as individual functional capabilities and metadata pertaining to the datasets that those tools control. Pooled resources, software licenses pools, stages of deployment and the overall system configuration should be visible from this single tool.
- The system should be capable of creating scripted and customized applications using advanced capabilities that were not originally envisioned when developing the initial application and data management templates.
- Thesystem must include the capability to expose, examine and analyze data from each level of the stack and the stack as a whole.
The agnostic nature of each of these tools in regards to the data management systems they manage and organize is the central and most important aspect of the UDS. The compilation of these tools effectively defines an optimized system of Cloud Provisioning Services that is conceptually integrated with the Unified Data Strategy. The ability to distinguish and differentiate between Tier 2/3 apps and real Business Critical Apps (BCA apps) or Tier 1 apps will be essential. A true UDS will address all data in all forms with all levels of importance as well as the applications that use data of all levels of importance in all forms.
Check next month for the the fourth part of this article, which will focus on a compelling use case for a UDS and how Data Persona Analytics can be used to comprehensively manage the data that emanates from a true iconic American institution.