Syncsort Bridges Gap from Big Iron to Hadoop Data Lake with New CDC

Jun 20, 2017

Syncsort, a provider of data integrity and integration solutions for next-generation analytics, has announced new capabilities in its mainframe data access and integration solution that populates Hadoop data lakes with changes in mainframe data.

The new DMX Change Data Capture (DMX CDC) functionality delivers real-time data replication that enables organizations to continuously keep Hadoop data in sync with changes made on the mainframe, to make the most current big data information available in the Hadoop data lake for analytics.

Many organizations are using Syncsort’s big data integration solution, DMX-h, to populate their data lakes with enterprise-wide data, including complex data from the mainframe for a variety of use cases, such as Hadoop as a service, data as a service, data archive in the cloud, fraud detection, anti-money laundering and customer 360 initiatives, said Tendü Yogurtçu, CTO, Syncsort. But after populating the data lake, it is important to continue to keep that data fresh to enable real-time analytics and accurate decisions based on up-to-date information, he noted. The new CDC offering has been introduced to provide customers with a solution for ensuring the data lake is refreshed in real-time with the incremental updates, while meeting SLAs and conserving network resources.

Syncsort already is in many large production deployments for accessing data from traditional enterprise systems and integrating it with Hadoop, including DataFunnel, its single-click solution for populating the data lake data from thousands of tables and automatically creating the corresponding metadata in Hadoop. The new functionality extends this with real-time data replication, reducing the network load and providing up-to-the-minute mainframe data for analytics.

Accoding to Syncsort, the new offering helps save time time and resources using the DMX-h GUI and dynamic optimizations, which eliminates the need for coding and tuning; requires virtually no use of chargeable mainframe CPU resources; and eliminates impact on mainframe database performance by avoiding database triggers.

Providing reliable data transfer, even during loss of mainframe to Hadoop connections or Hadoop cluster failures, the DMX CDC can picksup where a transfer stopped without restarting the entire process.

Currently, the solution supports IBM DB2 for z/OS and IBM z/OS VSAM files, with more sources to be added.

For more information on the new CDC capabilities, go to www.syncsort.com.