Harnessing operational big data does not come with a “one size fits all” solution. Organizations are at different points in their data management cycles, and whether that they are building new applications or optimizing existing ones, each needs a unique solution. This was among the key points made during a special DBTA roundtable webinar on harnessing operational big data. The webinar featured Matt Allen, senior product marketing manager with MarkLogic, Kevin Petrie, senior director with Attunity, and Jason Paul Kazarian, senior architect with Hewlett Packard Enterprise.
Some of the traditional pain points in EDW and ODS are that they’re not suited for unstructured data, legacy RDBMS do not scale flexibly or efficiently, and there is a dependency on ETL to move a lot of data. Dealing with these issues can be frustrating for enterprise architects and they are beginning to look for new solutions. One of the solutions could be Hadoop.
“There are a lot of advantages to choosing Hadoop, including low-cost scale and the fact that Hadoop manages raw data quite well,” explained Allen. “There are some significant limitations, and the fact that it is not a database means it can only be used for batch processing and can’t be used for real time applications," he noted. MarkLogic is able to address some of the previous pain points by being flexible. MarkLogic allows users to run transactional apps and analytical queries.
“The rate of growth is starting to force some hard decisions about how to meet operational and analytical requirements without breaking the bank," obseved Attunity's Petrie. With many new data platforms on the market, there are opportunities but also confusion as to what is right for earch organization. Attunity aims to have organizations data in the right place at the right time and they accomplish that goal with three steps: measure data and resource usage, accelerate data flows, and automate data transformation.
One of the purposes for harnessing operational big data is to share that data with others, but once data is shared, security also becomes an important factor, especially with Hadoop, pointed out HPE's Kazarian. He noted while this specific conversation concerns Hadoop, security is an issue wherever there is a collection of data. A few of the security issues with Hadoop are due to constant change in an open source community, multiple feeds of data with different protection needs, and multiple types of data in the data lake. “Collecting everything in a Hadoop data lake makes it an ideal target for malfeasance,” noted Kazarian. Kazarian discussed the importance of not only protecting “identifiers” but “quasi-identifiers” as well. An example of a quasi identifier would be a person’s zip code, sex, and birthday. While not outright naming someone, a hacker could deduce a person’s identity from this information. “Using the idea of abstracting data gives a huge leverage to share data with third-party firms,” said Kazarian.
To watch a replay of this webinar, go here.