Key Considerations When Data Modeling for Big Data

Bookmark and Share

Traditionally, the cardinal rule has been to model data first and load it later. But with new technologies and repositories such as Hadoop, NoSQL, and data lakes, and big data itself, the rule is being flipped to load first and model later.

And, with SQL remaining an effective and widely embraced query language, companies have to balance working with traditional methods against the need for some of the newer methods as well. This topic was recently covered by Danny Sandwell, director, product marketing with CA Erwin Modeling, Gertjan Vlug, director of Attunity Compose business development, and Erin Franz, lead analyst with Looker, in a DBTA round table webinar.

Data has commonly been classified by the three “V’s”: volume, velocity, and variety. The influx of big data has presented new opportunities for organizations but those opportunities have also come with more questions and issues. “Specifically, the lack of schema on write makes it inherently difficult to manage,” noted Sandwell.  Traditional data modeling followed the steps of design, document, implement, integrate and publish.

The shift to big data has caused the process for organizations to change. Data modeling for big data steps are discover, document, standardize, integrate, and publish. The goal of the new steps is to take the complexity out of managing big data volume for the end user.

Recently, the Wall Street Journal conducted a survey that found that for every dollar you spend on big data analysis; you’ll get a 55-cent return. “Of course, there are very profitable use cases though,” noted Vlug, who added that there is differing value in repetitive data vs. non-repetitive data. Non-repetitive data is unique and different compared to repetitive data, such as information that is gathered on social media. “Big data integrations are probably giving you much more value than just big data analysis. If you start to understand that big data integration gives information to the mainstream users, that will probably bring you more value,” stated Vlug.

Traditionally, BI has relied on a complex ETL process. This is useful to answer questions that users come up with or if they already suspect they know the answer. But, as a business changes and grows, new questions could emerge that a user may not even be aware of. “We like to do a more flexible approach. The approach that Looker is taking is that, instead of siloing the data, we are connecting to a unified data warehouse,” explained Franz.

To view a replay of this webinar, go here.