New release will add capabilities for data integration, security, and manageability.
The next major release of MarkLogic's enterprise NoSQL database platform is available for early access now and will be generally available by the end of this year. Gary Bloom, president and CEO of the company, recently reflected on the changing database market and how new features in MarkLogic 9 address evolving requirements for data management in a big data world. "For the first time in years, the industry is going through a generational shift of database technology - and it is a pretty material shift," observed Bloom.
What are the challenges that customers are dealing with now as far as data management?
Gary Bloom: If you look at what is going on in the marketplace, there is a massive problem that customers are struggling with and that has to do with data being in silos. What customers want is a 365-degree view of their data and they want it to be very actionable, meaning that they want to change their business processes, change their workflows, run their business differently based on the look of all of their data. However, the reality, partly driven by the architecture of the relational database model - and that has been the primary database model for the last 30 years – is that what winds up happening is that you get anything but a unified view. What you get is lots of different people taking snapshots of the data, and then doing different things with these snapshots. It is actually very hard to integrate those data silos.
How does that get resolved?
GB: The way that got solved in the relational database era was through ETL processes, where you essentially transformed everything and you put it into another relational database. The problem with the transformation process is that every time the source data changes, you have to rewrite your ETL processes and every time the user wants to ask a different question of the data you have to re-index the relational database to make the SQL statement run properly.
What do you propose?
GB: With MarkLogic’s operational and transactional enterprise NoSQL database we have come up with an approach in which ETL essentially goes away. MarkLogic ingests all the data as-is, including structured data and unstructured data, and it also includes all of information about the data –metadata.
MarkLogic then creates a universal, ask-anything index over the data, and then from there, customers go ahead and build applications. We focus on the enterprise requirements as well: trusted transactions which the industry sometimes calls “ACID” capabilities, security, and high availability and disaster recovery. We check off all the boxes for what real data centers need to run their businesses.
How do you do that?
GB: Google does not change Google every time an organization publishes five new web pages. They just index that data in. What we have done is the exact same thing for corporate data, the big difference is that it is not just web pages, it is all your business data. Instead of taking mini snapshots of data – which, by the way creates an enormous cybersecurity risk because it results in copies of mission-critical business data in many repositories throughout an organization – we give customers the opportunity to create a unified layer where. All the data comes into the system with a unified view, and new workflows and applications can then be built on top of it. An example is Deutsche Bank which took 30-plus trading systems and, rather than do a bunch of ETL processes and put it back into a relational database, put a Mark Logic layer in. MarkLogic takes the data in from all the different trading systems and, once it is in MarkLogic, all the post-trade processing, including all regulatory compliance, is dealt with from a single integrated repository. Without the ETL processes, an application can be built in a fraction of the time.
How does this help?
GB: If you think about it, data scientists are spending about 80% of their time simply massaging and wrangling the data to get it into a format so they can actually do something with it. In a data warehouse about 60% of the expense is the cost of the ETL processes and buying that software and running the ETL processes so you can have a data warehouse. Essentially, what we do is we just let the customer integrate all that data into the MarkLogic enterprise NoSQL database platform. We essentially take out the cost dimension and the time dimension so our customers tend to build applications very rapidly.
What is the biggest problem customers are dealing with as far as big data?
GB: When all the companies started moving into this whole next-generation database market, many people thought it was all about the fact that there was all this unstructured data that could not be managed, or about the speeds and feeds of social media, networking, and Internet of Things data.
Both of those are correct. Yes, there are the speeds and feeds problems and the data variety problems - the structured, unstructured, social media, video, voice data - but there is also the problem of integrating data in silos with data spread all over the enterprise, and that is predominantly structured data. One of our customers in the healthcare field had 140 HR systems, so if someone in the organization wanted to know something about the employees, data from 140 systems had to be brought together. It built a repository in MarkLogic and most of that data is structured data.
The big data challenge is not just unstructured data.
When we talk about integrating data in silos, it is all three of those data categories. It is structured data, the variety of data around unstructured, and the high volume data as well. Just like most of the modern architectures we run on pretty traditional scale-out, elastic Intel architecture. We became the database underneath Obamacare, bringing together all the healthcare policies so they could have an Amazon.com-like experience for people to purchase healthcare insurance in the U.S.
MarkLogic 9 will be released later this year? How does it fit in?
GB: It complements a lot of features that we have already built such as tiered storage, semantics - capabilities that allow organizations to have these big repositories and merge that data.
What is new in the release?
GB: In MarkLogic 9, we focused on three primary themes. Giving customers more tools to integrate data; that is number one. Then, if we are going to create this master repository of all trade data, or all HR data, all financial data, or all intelligence data, then security becomes really important, that is the number two. And third, is that these systems get pretty big, pretty fast and they running in complicated environments. For example, the system might include a physical on-premise data center in the U.S. and a cloud provider in Europe, so data must be moved between these different places. Those are the three things – continuing to improve the ease with which customers can integrate data once it is in the database; making it highly secure; and the third, giving the customer the tools to manage it.
What has been done for data integration?
GB: There are three primary pieces of the data integration strategy. One is continued evolution of our semantics capability. Version 9 brings support for conceptual relationships and semantic relationships, and it also brings in the ability to capture and query the model as you need it, which means that we are making it easier for the customer - once they have brought all the data together - to understand the relationship of the different elements of the data. The second piece to integration is the data query capability. We are dramatically improving our SQL API to allow more BI tools against a new generation of database as well as against the old generation database. It is really helping people cross the bridge to the next generation. And third is data movement. Organizations are bringing data in from existing databases, as well as batch processes, messaging streams, social media streams, all different kinds of data is coming in rapidly and this requires much more effective data movement capability and so what we are doing in MarkLogic 9 is just making it easier to do the ingestion process to get the data into MarkLogic.
And security enhancements?
GB: We already have Common Criteria certification which is a government certification driven by the fact that we do a lot of work in the government sector. In MarkLogic 9 we are adding two major features. One is advanced encryption. This is encryption for data at rest, but it is really transparent encryption meaning that we are encrypting the entire database so that someone who has access to the storage medium will not be able to see the data in the database. The second thing that is really major in the security world is redaction, which allows users to hide or mask any data in the database that they don’t want somebody to see. Even a DBA or a system administrator working with MarkLogic database can be restricted so that they can’t see the data in the database. In a healthcare application, this can protect the patients’ names, but still allow someone do analysis on all the information, or in a banking application to restrict data access to only a group of clients data not all the company’s clients.
What does MarkLogic 9 add for manageability?
GB: One of the things we are doing is introducing OpsDirector, a new user interface for the administrator over a MarkLogic environment to administer multiple clusters and multiple databases at scale. The second thing we are doing is adding rolling upgrades so we can support updating the database to new versions, putting new patches in, making changes to the database code itself without ever bringing down the production cluster. And the last thing we have added is a telemetry capability so that, if the customer opts in, MarkLogic can collect data directly from the customer. It dramatically improves the resolution of any issues and even more importantly it helps to proactively avoid and identify issues that could become problems.
Why is driving these new capabilities?
GB: Really, for the first time in years, the industry is going through a generational shift of database technology - and it is a pretty material shift. Almost all the analyst firms forecast that the database market will grow dramatically over the next 5 to 10 years, but virtually all the incumbent database suppliers have flattened out in their growth. What is happening is that these are not just new technologies being introduced but collectively they represent a generational shift in the database market. For MarkLogic, the reason security, availability, and data integration are so important is that we are integrating silos of data, but we are doing it in major corporations for mission critical computing. We are driving a generational shift in a part of the market that typically moves pretty slowly to new technologies because of the high standards that are there.
Interview conducted, edited, and condensed by Joyce Wells.
Image courtesy of Shutterstock.