Rapid ROI and Value: Federation lends itself to extremely rapidly implementation, use, and value. Because there is no requirement to define, create, and maintain a new data store and its associated structures and processes, analytics and reporting can be rapidly implemented and deployed; all that is required is the ability to query for read access only the appropriate existing data stores. Total lifecycle operating expenses are typically significantly less as a direct result.
Flexibility: Another benefit of this approach is that it lends itself to highly flexible analysis approaches because extremely disparate types and sources of data can now be directly and rapidly accessed, with no consideration of “mapping” or “force-fitting” disparate data types into a single schema/architecture. This allows for easy analysis across data sources and types that previously had never been considered due to the difficulty, or in some cases impossibility, of putting them into a single storage paradigm. For example, the ability to use non-time-series meta-data from one data source as a grouping criteria for time-series data from another.
Implementers and users of federated approaches do not need to be experts in the creation implementation and use of relational databases. They can focus on what types of analysis and reporting they want to do without worrying about the underlying data mart/base structure and its associated programming.
Analytic Performance Challenges: Certain types of desired against federated data sets may not perform as optimally as highly complex and optimized customized SQL query(ies). This is somewhat balanced by the benefit of not requiring as high a level of SQL expertise by the end user of the analysis.
Data Retention Challenges: Not all desired types of data are kept in individual data repositories for long enough duration to feed the federated analytics that are desired. Some real-time infrastructure monitors only keep data for days or weeks for example (because they are optimized for real-time, operational uses instead of longer term analytic uses). This is often mitigated by simply creating an adjunct data store, and ensuring it is populated from such sources on a daily or weekly basis. Note that such adjunct data stores will obviously inherit the previously described limitations.
Data Access Considerations
Just as data storage architectural considerations fall into two main areas, data access methodologies generally align to two fundamentally different approaches: structured queries into structured databases (SQL, and/or proprietary), and more general purpose APIs including web service APIs.
Structured Datamart/Database Queries
The structured database query approach to data access has been the traditional means by which to access data, especially time series performance data.
Advantages that accrue to this type of data access:
- The ability to data mine data that is effectively already “read optimized” ensures that response times to queries is typically quite rapid
- The use of structured query type approaches allows distribution of the analytic processing functions enabling more complex effective queries to be conducted in a time effective fashion
- The ability to leverage the concept of a database “join” to merge together and subsequently analyze disparate data sources
Several downsides to the structured data query approach:
Most significant and potentially limiting is the requirement that all desired data sources must already be in structured and/or relational databases. Many toolsets, especially those that manage and monitor in real time, do not have such structured backend data stores. Workarounds can be created by which API-based data stores are queried programmatically using the existing APIs, with the resultant data then re-written into a relational data store.