New Oracle Big Data SQL Helps Customers Securely Integrate Data Across Hadoop, NoSQL, and Oracle Database

Organizations are looking for innovative ways to manage more data from more sources than ever before. Although technologies like Hadoop and NoSQL offer specific ways of addressing big data problems, they can introduce data silos that complicate the data access and analysis needed to generate critical insights. To maximize the value from information and deliver on the promise of big data, companies need to evolve their data management architecture into a big data management system that seamlessly integrates all types of data from a variety of sources, including Hadoop, relational, and NoSQL. While simplifying access to all data, a big data management system should also enable organizations to leverage existing skills and maintain enterprise-grade data security and governance for sensitive or regulated information.

To address a growing need for comprehensive big data solutions, Oracle today introduced Oracle Big Data SQL, which allows customers to run one SQL query across Hadoop, NoSQL, and Oracle Database, minimizing data movement while increasing performance and overcoming data silos. According to Oracle the new solution helps customers gain a competitive advantage by making it easier to uncover insights faster, and allows them to leverage existing SQL skills while protecting data security and enforcing governance.

Oracle Big Data SQL runs on Oracle Big Data Appliance and can work in conjunction with Oracle Exadata Database Machine.

The Oracle Database has been able to handle structured, unstructured and semi-structured data for a long time, said Neil Mendelson, vice president, Big Data & Advanced Analytics. But beyond the variety of data types, new computing architectures such as Hadoop are coming into play, he noted. Customers want to put these architectures to work to take advantage of data that they had not leveraged before, such as medical records and images, social feeds from Twitter and Facebook, and sensor data in the manufacturing and oil& gas industries. They want to put that data to work as part of a shift from looking historically into data to actually predicting what will happen in the future.

At the same time, Hadoop is reaching a turning point, with companies seeking to move from using it in an experimental phase, to seeking to use it in large-scale operational implementations.  However, there remain three main challenges, said Mendelson. One is that to avoid creating a new data silo, they have to be able to integrate it with the rest of the data in their enterprise environment. The second is that customers have been slow to find the necessary skills to be able to exploit the technology. And, then third, from a security standpoint they can’t take the technology from the lab to operational use without having the same data security and data governance they have for the rest of their data sets. “That is really where Big Data SQL comes in,” said Mendelson.  “Big Data SQL allows the full dialect of Oracle SQL to apply not only objects within the Oracle Database but across Hadoop and NoSQL as well.”

Speed to Insight

By leveraging SQL to query and analyze data across a range of data management systems, organizations no longer have to copy and move data between platforms, analyze with a MapReduce-powered language, or construct separate queries for each platform and then figure out how to connect the results. Instead, Smart Scan technology, inherited from Oracle Exadata, executes locally to find the data needed for a given query, minimizing data movement and increasing performance.

“Smart Scan is what makes Exadata run as fast as it does,” said Mendelson. It is a software layer that sits on the storage nodes and it does the filtering and the predicate push down at the storage layer so that if for example a query calls for only records for customers in New York, you don’t have to fetch all the records off the disk, and then throw away the records for all customers but those in New York. That predicate is pushed down to the storage layer and it only sends up the records that are applicable  so the amount of data that gets moved within the machine is less and the performance goes up as well. “We have taken similar technology from Exadata and now we have moved that on to Hadoop,” said Mendelson.

SQL Skills to Access Hadoop and NoSQL

Oracle simplifies access to enterprise big data by combining relational and non-relational technologies into a single architecture, using SQL, the industry-standard language for accessing data. By extending SQL across the entire enterprise data infrastructure, organizations can use existing SQL skills to uncover and analyze all data. In addition, according to Oracle, integration with Oracle engineered systems for data management offers fast deployment and a low total cost of ownership for Hadoop, relational, and NoSQL data, and allows customers to secure data insights immediately.

Data Security

With Oracle Big Data SQL, customers can now apply existing security policies to data in Hadoop and NoSQL, extending enterprise governance and security for sensitive and regulated data in the enterprise.

Oracle Big Data Appliance includes comprehensive data encryption capabilities for protecting data privacy and meeting regulatory requirements. With data at rest and network encryption capabilities, sensitive and regulated information stored on Oracle Big Data Appliance is protected against theft and unauthorized access. Oracle Big Data Appliance also includes enterprise-grade authentication (Kerberos), authorization (LDAP and Apache Sentry project), and auditing (Oracle Audit Vault and Database Firewall) that can be automatically set up on installation, greatly simplifying the process of hardening Hadoop.

To ensure organizations can maintain data governance across big data, Oracle Big Data SQL extends the advanced security capabilities of Oracle Database to Hadoop and NoSQL data. With Oracle Big Data SQL, customers can take advantage of proven Oracle Database security solutions for data redaction, privilege analysis, and strong controls that limit privileged user access to data.