Utilizing data lakes are an alluring option for users with an enormous amount of information, yet questions remain regarding data accuracy, security, and relevancy.
Three experts in the big data space, including Anne Buff, business solutions manager for SAS best practices at the SAS Institute; Abhik Roy, database solution engineer at Experion; and Tassos Sarbanes, data architect at Credit Suisse, participated in a roundtable discussion at Data Summit 2016 that focused on these questions and more regarding data lakes.
“It’s a big red flag when you go into a company and the number one thing they say is, ’let’s get our data in one place, then we can do something,’” Buff said. “The idea that co-located data automatically becomes integrated is false.”
Buff suggested the solution is an appropriate place for well trained data and analytically capable people to work and discover new possibilities for data or it’s a solution for data to come into an enterprise initially but then be moved securely to other places.
The participants debated whether or not a data lake is a good place to store data for lengthy periods of time. The question then became how to take that data out and spread it around after insights are discovered.
Its real purpose should be to get light modern data to work together, Buff said.
As the cloud grows and instantaneous real-time data becomes more accessible, does the data lake still have a place in the enterprise?
It does, everyone agreed, but for different reasons, including easy access to on-premises data, barring the enterprise already has an established governance policy.
As different teams within the organization access data inside the data lake, security remains a prominent focus.
Buff suggested employees should be trained in how to handle this and their access to data.
CEOs and chief data officers are an important part of owning and controlling the data in an organization, Roy said.
“Data governance is extremely important,” Roy said.
Participants stressed that the big data ecosystem should have a distinct role in delivering what the business is trying to achieve when it comes to running analytics and batch processing with a data lake.
Data Summit is an annual 2-day conference, preceded by a day of workshops, that offers a comprehensive educational experience designed to guide you through all of the key issues in data management and analysis today. The event brings together IT managers, data architects, application developers, data analysts, project managers, and business managers for an intense immersion into the key technologies and strategies for becoming a data-informed business.
Many presentations from Data Summit 2016 have been made available for review at www.dbta.com/DataSummit/2016/Presentations.aspx.