As data continues to proliferate throughout organizations many enterprises are adopting data lakes to support data discovery, data science, and real-time operational analytics capabilities.
The ability to inexpensively store large volumes of data from diverse sources and make that data readily accessible to workers and applications across the enterprise is a huge advantage for companies pursuing new types of analytics, especially those involving the Internet of Things and cognitive computing use cases.
However, building and maintaining a data lake to support new analytics applications involves a number of technical challenges, from data architecture and integration, to data security and governance.
DBTA recently held a webinar with Joe McKendrick, lead analyst at Unisphere Research, and Mark Van de Wiel, CTO at HVR, who highlighted common pitfalls, key best practices, and success stories in building and maintaining a data lake for big data analytics.
The challenges of operating in a data-driven world include information that moves too slowly, too much information, and the abundance of innovations and approaches that surface daily, McKendrick explained.
According to a survey hosted by Unisphere Research, less than one-third of enterprises can make use of most of their data and ETL still dominates data movement.
To get started on the data lake journey McKendrick posed these crucial steps business should take:
- What does the business need?
- Address governance and security concerns
- Provide lots of training and guidance
A data lake stores lots of diverse data in original or raw format, can receive data continuously, enables quick access to data, and allows users to prepare data at any time, according to Wiel.
HVR can help enterprises set up a data lake, allowing business to take full advantage of the tool. HVR’s platform offers log-based CDC, Hive integration, data publication, encryption, optimized data transfer, and big data compare.
An archived on-demand replay of this webinar is available here.