Breaking News - MarkLogic 5 Enables Big Data Applications, Combines Power of Hadoop with Real-Time Analytics

Nov 1, 2011

MarkLogic Corporation today announced general availability of MarkLogic 5, the latest version of its next generation database for unstructured information. MarkLogic 5 includes a new connector for Hadoop to enable large-scale batch processing for big data analytics on the structured, semi-structured, and unstructured data residing inside MarkLogic. Using MarkLogic for real-time analytics with Hadoop for batch processing supports companies that need real time, secure, enterprise applications that are cost effective with high performance. With simple drop-in installation, organizations can run MapReduce on data inside MarkLogic and take advantage of Hadoop's development and management tools, all while being able to leverage MarkLogic's indexes and distributed architecture for performance, resulting in enhanced search, analytics, and delivery in MarkLogic, and enabling organizations to progressively enhance data without having to remove it from the database.

MarkLogic counts 275 distinct organizations among its customer base, and well over 500 live implementations the vast majority of which are in mission-critical applications, Bill Veiga, vice president of solutions marketing, MarkLogic, tells 5 Minute Briefing. For these customers in both the private and public sector such as JP Morgan Chase, LexisNexis, the U.S. Army and the FAA, MarkLogic aims to solve customers' big data problems and is designed for massive scalability, says Veiga. The new MarkLogic 5 release punctuates and builds out support for the mission-critical applications that customers are creating on the technology, he observes. Key goals of the new release are to help customers have confidence at scale, help them drive enterprise big data and help them manage the complexity that comes with that as they start tackling all kinds of real world data, Veiga notes.

The MarkLogic Connector for Hadoop lets users combine brute force batch processing with MarkLogic's capabilities for ad hoc, real-time analytics. With Hadoop, they can run specialized, low level batch processing on raw data, and the Connector for Hadoop brings in MarkLogic to let them run new, previously unanticipated queries on an ongoing basis.

MarkLogic sees Hadoop as being able to support MarkLogic for various uses, Jason Hunter deputy CTO, MarkLogic, explains. For example, an intelligence-gathering organization could collect data that is into hundreds of petabytes, not understanding what exactly is there, but then decide to investigate a particular topic in-depth. In such a scenario, users would want to use MarkLogic for interaction with this content, asking questions and getting answers in sub-second time, and then asking other questions and exploring the data for insights. However, Hunter explains, because the data is so large it would probably not be cost-effective to load hundreds of petabytes of data into MarkLogic if they don't have to, and so they can load the data into Hadoop and run a Hadoop job to select the portion of the content that it makes sense to do real-time analytics against and load that into MarkLogic for interactive queries. "So you go from hundreds of petabytes down to one petabyte, or half a petabyte, do bulk load and do interactive queries against it."

Among additional key features in the new release:

MarkLogic 5 provides rich media support, letting users store and manage rich media like images or video along with textual data in MarkLogic, thus consolidating infrastructure, reducing administrative costs, and speeding up development of rich media applications. Rich Media Support also enables organizations to ensure availability of their rich media assets by leveraging the high availability and disaster recovery capabilities of MarkLogic 5.

With Document Filters, users can run full text searches on more than 200 document and rich media formats. This feature enables automatic extraction of metadata and text, and eliminates the cost of manually converting legacy formats.

Supporting a tiered storage approach, MarkLogic 5 also enables organizations to boost their big data solution performance by adding a solid state drive (SSD) tier between memory and disk drives. MarkLogic manages the tier automatically, making it easy to get the most out of fast, limited capacity drives.

MarkLogic supports out-of-the-box integration with HP Operations Manager and Nagios, allowing organizations to monitor your MarkLogic cluster with the IT monitoring tool that they currently use for their existing infrastructure.

A new monitoring dashboard gives users a browser-based view of their MarkLogic cluster's instrumentation data, with real-time charts of metrics such as I/O rates and loads, request activity, and disk usage.

Database replication helps users to protect your big data solutions from site-wide disasters and reduce the risk and cost of unexpected downtime.

Point-in-time recovery enables users to recover from backups then roll forward using the transaction log to a specific point-in-time, minimizing the window for lost data between the occurrence of a disaster and the time the last backup was taken.

To learn more about MarkLogic 5, visit the MarkLogic 5 web page. In addition to MarkLogic 5, MarkLogic has announced improvements to the MarkLogic Developer Community and MarkLogic Express, a new developer license. Details can be found on the MarkLogic Developer Web Page.

Newsletters

Breaking News - MarkLogic 5 Enables Big Data Applications, Combines Power of Hadoop with Real-Time Analytics

White Papers

Sponsors