MarkLogic 7 Enhances Enterprise NoSQL With Automated Tiered Storage and Expanded Support for Hadoop

Oct 10, 2013

MarkLogic today announced the latest version of its Enterprise NoSQL database platform, MarkLogic 7. To help organizations gain better operational agility and optimize storage costs, MarkLogic 7 supports cloud computing, extends support for Hadoop, and has new features to enable database elasticity, and searchable tiered storage. Additionally, to help users understand and gain more meaning from their data, MarkLogic is introducing MarkLogic Semantics, which combines the power of documents, values and RDF triples to enable analysts to make more informed decisions, and to support the delivery of most contextually relevant information to users.

Reducing storage costs with MarkLogic 7

With this release, MarkLogic automates the process of placing data into different storage tiers so that the data is automatically placed on the storage that is most cost-effective and performance-appropriate for the data, explained Joe Pasqua, Senior Vice President, Product Strategy, MarkLogic.

“What we have done is made it possible to run a single instance of MarkLogic across multiple different types of storage transparently at the same time,” said Pasqua. The data remains readily available with sub-second query performance, so it can still adhere to governance, compliance and security policies over the entire lifetime of the data, even while leveraging cost-effective public or private cloud platforms. “Customers that are upgrading from MarkLogic 6 to MarkLogic 7 are going to have huge cost reduction for storage without doing anything,” said Pasqua.

Expanded support for Hadoop in MarkLogic 7

In addition to local disk, SAN, NAS and other traditional file systems, MarkLogic can run directly on top of the Hadoop Distributed File System (HDFS) for either operational or archive data. “We have been integrating more and more with Hadoop over time. We now have a number of different touch points and way that we integrate with Hadoop, and the most recent thing we have done is added it as a tier in our tiered storage mechanism. We can also run on it natively independent of our tiered storage. We can use Hadoop to run big parallel jobs for us,” said Pasqua. In addition, Pasqua added, “we have something called MLCP which is our MarkLogic Content Pump and we have integrated that with Hadoop so we can also use Hadoop to do a massive parallel ingestion job into MarkLogic. We have a roadmap with Hadoop and we are adding more and more capability to it over time.”

Elasticity at the data layer in MarkLogic 7

“One of the major areas that we have added in MarkLogic 7 is elasticity and the goal is to make the data layer dynamically expand and contract in response to changing load requirements,” said Pasqua. “The whole idea of elasticity has come to the forefront in people’s minds with public clouds like Amazon. However, it has been elusive to achieve at the data layer – and it is a lot harder there. Because when you are expanding and contracting at the data layer, it is not just about adding more servers, it is about having your data in the right place. If you just add a bunch of servers, but don’t distribute your data to make the data more accessible then you are not really providing the best service because you are still going to bottleneck on the data.”

With MarkLogic 7, a combination of advanced performance monitoring, programmatic control of cluster size, sophisticated data rebalancing, and granular control of resources enable this elasticity at the data layer whether it is on-premise or in the cloud, according to the company. MarkLogic can also leverage Amazon S3 either as a native storage tier or for cloud backup from an on-premises deployment, enabling hybrid on-premises/cloud architectures. This is possible, according to the vendor, because MarkLogic supports ACID transactions, which is necessary for any database that will be used for mission-critical applications.

MarkLogic Semantics option

MarkLogic 7 offers a new semantics option that stores the RDF (Resource Description Framework) triples (also known as Linked Data), documents and values in the same proven NoSQL database, maintaining context and marking the facts available for decision-making. And, with a specialized triple index, MarkLogic Semantics enables industry-standard SPARQL queries combined with queries against documents and values, so that all relevant information is delivered in applications and analytic reports. According to MarkLogic,until now, triple stores have been separated from the data source itself, which often causes meaning and context to be lost, but with MarkLogic Semantics this problem is solved by adding the RDF triple store to the Enterprise NoSQL document, value and metadata store.

Available now through the Early Access program, MarkLogic 7 will be generally available for production uses in late November. The Tiered Storage and Semantics options are also available in Early Access and will be available along with the production release.