The Value of Data Shaping from Legacy Relational Databases for NoSQL

Relational databases, once the epitome of data management technology, are becoming increasingly archaic as single servers lack the nuance to support the large quantities of data generated by modern enterprises. NoSQL databases offer a solution to legacy database architectures, offering scalable, consistent performance—but how do you migrate such large quantities of data without overcomplicating a new architecture?

DBTA recently hosted a webinar sponsored by AWS, titled “Using Your Existing SQL Skills to Shape Data for NoSQL,” featuring speaker Robert McCauley, specialist solutions architect at Amazon Web Services, specializing in Amazon DynamoDB, to discuss a methodological approach towards relational database migration to a NoSQL database by leveraging SQL skills with Amazon DynamoDB.

Since DynamoDB’s inception in 2012, Amazon has embraced a future of scalable database technology to counter heavy application usage. By analyzing access patterns and applying that knowledge to database solutions, Amazon created DynamoDB as the default database for all new applications, now having replaced all Amazon relational database engines and running more than half of the systems within the company, including AWS. This wholehearted adoption of DynamoDB serves as testament towards its efficiency and scalable qualities—necessary capabilities for enterprises to keep up with data generation and modernization.

McCauley highlighted the issues of legacy relational databases in use, citing its monolithic nature as a massive pain point for any enterprise. Existing as a single server, relational databases are handling too many applications without the ability to scale efficiently. Unpredictable performance as you attempt to scale up, JOINS taxing CPU operability, and higher cost of ownership due to maintenance and efforts towards scale render relational databases as more costly than they are useful for many organizations. 

How do you know if your relational application architecture would benefit from a migration to DynamoDB? According to McCauley, key-value lookups, 1:many queries, serverless apps, and JSON document storage are instances that can benefit greatly from said migration. ERP apps and data warehouses/data marts, however, are best suited for relational SQL databases.

McCauley then detailed DynamoDB tables and their specific advantages for legacy relational migration, listed as the following:

  • No instances
  • Developers create a new, empty table at no cost
  • SSD-backed storage
  • Schemaless attributes that eliminate schema changes for storing new variables
  • Global secondary index (GSI) for diverse access pattern support

Migration from relational databases to a single table format requires a conceptual strategy called data shaping, which refers to the process in which developers prepare data for integration with DynamoDB or any other NoSQL database. McCauley bases this concept with sample schema from customer tables, illustrating how they can be converted to a DynamoDB single table. A notable feature pointed out by McCauley is the elimination of JOINS through UNION ALL functions that stack tables vertically to avoid unnecessary data duplication with DynamoDB. When duplication is necessary, users can employ SQL to duplicate data for queries, and for future use.

Migration strategies and tools are then explained by McCauley, using AWS Database Migration Service (DMS) as the flagship example for efficient migration of relational source data. Since AWS DMS is multi-threaded, multi-server, and multi-host, it can source huge amounts of data for optimal migration. This is not without some limitations, McCauley added; since the solution transfers data from one table to one table, it does not combine tables for NoSQL table format. To mitigate this challenge, DMS can be used after you prepare your data with a SQL VIEW to look like your DynamoDB table. McCauley then listed relevant strategies for data migration to NoSQL databases:

  • Use DMS or your favorite Extract, Transform, Load tool.
  • Write and run custom scripts to gain experience with bulk loads.
  • Archive legacy data and migrate only dimension data, starting fresh.
  • Write to both databases for some time to verify DynamoDB.

The webinar is bolstered by a variety of visual examples supplied by McCauley, critically demonstrating the disadvantages of legacy relational databases and the advantages of adopting and migrating to NoSQL databases, such as DynamoDB.

You can view an archived version of this webinar here.