Transitioning Big Data Processing for Quicker Decision-Making

<< back Page 2 of 4 next >>

Bookmark and Share

Data Transactions

Data transactions adhere to ACID properties. ACID stands for atomicity, consistency, isolation and durability. Atomicity requires that “all or nothing” in the transaction steps are done. Consistency ensures that the state of the system is valid before and after the transaction and there is no ambiguity. Isolation ensures that concurrent transactions behave as if transactions were done one after another, effectively isolating one from another. Durability simply makes sure that the data remains despite power losses, crashes of memory/CPU and other errors. Most commercial relational databases support ACID. 

Transaction Times

ACID compliance for data does not come for free. In order to support the features, auxiliary activities take place during a data transaction. The following represent broad activities for ACID data transactions:

  • Recovery Logs – transaction record written to log file
  • Buffer Pool Management - manage in-memory buffer pools for concurrent transactions
  • Data Locking – database table, row level locks
  • Latching resources – OS semaphores and other resource

The overhead of these activities is rather high when compared to the actual steps of the transaction. The figure 2 pie chart time taken by a transaction, shows that 96% of time is spent on the overheads and only 4% on the actual transaction. This is a huge penalty for ensuring data transactions adhere to ACID compliance and clearly circumspection must be used in applying such semantics in an enterprise data center.

Distributed Computing

Distributed computing, like open source Hadoop, is increasingly used in enterprises for historical and exploratory analytics. The “sum of the parts” in a cluster was brought to bear on solving computationally intensive problems like search index processing.

CAP Theorem

Professor Eric Brewer conjectured that in a distributed system compromise is a given and concluded that a distributed system cannot satisfy Consistency, Availability and Partition Tolerance (CAP) simultaneously. CAP theorem provides clarity in the product feature for consistency, availability and partition tolerance. By picking any two of the three CAP features, a product can ensure their contract for the selected features. Figure 3 depicts existing products on the CAP graph. The product can choose a data model to support e.g., Key-Value or Document or Column or Relational. We observe that traditional RDBMS support consistency and availability. 


Enterprise Data Center Architecture

Enterprises backend systems like relational databases (DB2, Oracle, MySQL, SQL Server) are used for ACID transactions, a transaction process monitor for high speed reliable messaging and lately, Hadoop processing on a distributed cluster. Any mix of the technologies (figure 4) is possible in an enterprise.



<< back Page 2 of 4 next >>