Image courtesy of Shutterstock
2015 is going to be a big year for big data in the enterprise, according to Oracle. Neil Mendelson, Oracle vice president of big data and advanced analytics, shared Oracle’s “Top 7” big data predictions for 2015.
- “Corporate boardrooms will talk about data capital, not big data." According to Oracle, as data is increasingly seen as a form of capital, there will also be the requirement to create new products, services, and ways of working as with it as financial capital. The onus will be on CEOs to secure access to, and increase the use of, data capital by digitizing and “datafying” key activities with customers, suppliers and partners. For CIOs, this means providing “data liquidity” – the ability to get data the firm needs into the shape it needs with minimal time, cost and risk. "When you think of any asset, you think of managing that asset. Often, assets are thought of as being tangible - things that really have true value and data is one of those true assets. It is not just real estate or physical inventory. Data about people, places, and things can, in many cases, be the most important asset that the company possesses,” notes Mendelson.
- “Big data management will grow – up.” Hadoop and NoSQL will advance from experimental pilots to standard components of enterprise data management, taking on a role right alongside relational databases. Over the course of the year, firms that change described as “early majority” will decide on the best roles for Hadoop and NoSQL in their organizations. The need for data liquidity will cause architects to find new ways to complete the full big data environment as a mature enterprise-grade system, including Hadoop, NoSQL, and relational technologies. "When big data first came to the fore, for many people that meant Hadooop and Hadoop alone, said Mendelson. “But as we see big data applications begin to mature, you will see applications that have Hadoop alongside relational,” he said. For example, in order to analyze clickstream data, you need to merge it with the information about customers that is in a relational database repository. You want to be able to query it without being forced to move it. This requires much more robust capabilities than you have had before. It is not only the ability to use Hadoop alone and not just the ability to use NoSQL alone but in the combination that it is most powerful.
- “Companies will demand a SQL for all seasons.” Companies will demand that SQL works with all big data, not just data in a Hadoop, NoSQL, or relational silo. They’ll also demand that this big data SQL works just like full-fledged modern SQL that their applications and developers already use. This will put pressure on nascent Hadoop-only SQL to mature very quickly. “SQL has been, since the late 1980s, the language of data but that is broadening out to include Hadoop and NoSQL. According to Mendelson, in the early days of big data, vendors said SQL was not needed since since they had APIs and programmatic interfaces, but not every company has the resources to deal with those and if you access is going to be truly opened up to more people in the organization, standard SQL access will be a requirement. “Everything old is new again,” says Mendelson. “If you want to really open up data you, open it up by access to SQL.” In addition, an added wrinkle is that SQL is increasingly being used to describe different things. “For the last 20 years, when someone said ‘SQL’ they all meant the same thing - the structured query language as defined by a standards body and there are different levels of that standard. For the last 20 years, we have had the notion of a minimum entry level when we talk in terms of SQL. We are definitely seeing SQL being embraced by new SQL engines built on top of NoSQL and Hadoop. But these engines are not adhering to the same standards that we once thought everyone adhered to.” Vendors are describing their offerings as having SQL access, but they have a dialect that is radically smaller than what was classically thought of as having the minimum bible set, he notes.
- “Just-in-time transformation will transform ETL.” New in-memory streaming technologies will change speed the rate at which people can can act on data, causing a re-examination of extract, transform, and load (ETL) processes. Data scientists will increasingly opt for real-time data replication tools instead of batch-oriented ones to get data into Hadoop, which has been the norm, and they will also want to take advantage of distributed in-memory processing to make data transformation rapid enough to support interactive exploration. “Increasingly, the latencies that are inherent to an ETL process in the past have had to speed up because people are interested in more real time information and they may not even land that data before they begin to transform it and analyze,” said Mendelson. Increasingly there will be the need to do this in real time, not just in batch loads, but really on the fly.”
- “Self-service discovery and visualization tools will come to big data.” New data discovery and visualization tools will allow people with expertise in the business - but not in technology - use big data in daily decisions. In addition, because a great deal of this data will come from outside the organization, and beyond the careful curation of enterprise data policies, there will be the need to simplify this complexity, using new technologies that combine consumer-grade user experience with sophisticated algorithmic classification, analysis and enrichment. This will allow business users to explore big data more easily. “For a platform to really take hold it has to move from use by just the privileged few to a much greater set,” says Mendelson. While few people are able to work with technologies such as Pig and Sqoop, and more people know SQL, an even greater number of people want to be able to access it from a visual point of view because that is what they are used to. To allow business analysts to work more productively with more data, visual interfaces will come to Hadoop, transforming the availability of information to a much broader group of users.”
- “Security and governance will increase big data innovation.” According to Oracle, many firms have actually found their big data pilots shut down by compliance officers concerned about legal or regulatory violations. This is particularly an issue when creating new data combinations that include customer data. Firms may be surprised to find big data experimentation easier to open up when the data involved is more locked-down. This will include extending modern security practices like data masking and redaction to the full big data environment, in addition to the must-haves of access, authorization, and auditing.“One of the things that is currently hampering the broader use of big data is the fact security governance has been is so nascent. If you have a very important asset that is really strategic, what you want to be able to do is lock that down and limit the number of people that can get to it. But if you have enterprise-grade security and governance then you can allow a broader range of people to access the information, because you are essentially controlling what they see and what they don’t.
- “Production workloads blend cloud and on-premise capabilities.” Once companies see enterprise security and governance extended to high-performance cloud environments, they’ll start to shift workloads around as needed. For example, an auto manufacturer that wants to combine dealer data born in the cloud with vehicle manufacturing data in an on-premise warehouse may ship the warehouse data to the cloud for transformation and analysis, only to send the results back to the warehouse for real-time querying.“Up until this point we have seen a very small number of elite companies begin to really put production workloads in Hadoop in the cloud, and what we think in 2015 we are going to see that in a much broader way. Part of this is the overall acceptance of these technologies and part of this is how it is actually offered. Until now, what has been largely available in the cloud has been a virtualized Hadoop environment and what we think it will evolve to is not only being able to offer a virtualized environment but a high performance environment that runs on effectively bare metal. It is the performance at the end of the day that people are looking for – that scalability that allows them to put an enterprise-grade real production application in the cloud and, as they do they are going to need both cloud-to-cloud communication and ground-to-cloud communication, so you will start seeing the emergence of big data in the cloud in a high-performant manner but you will see it communicating with other information sources both in the cloud and on-prem.”
Paraphrasing Microsoft co-founder Bill Gates, Mendelson said that vendors often over-emphasize what can happened in 2 years and underemphasize what can happen in 10 years. 2015 will be a big year for big data, he notes. “The technology is moving very quickly and it is gaining to the point where a broader set of people can get into it not just because it is affordable but because they no longer require specialized skills in order to take advantage of it.”