A user can say: Tell me all the countries whose currency is the euro that GE is operating in, and if you have a document or something in your database that talks about Italy and General Electric, but never mentions the euro or GE, you will still be able to answer that question. What you or I would be able to do as human beings is correlate all those relationships, but the normal database does not do that. If the specific information is not in the database, you are not going to be able to find it. But with semantics, having that knowledge network allows you to answer queries that you could never have answered before - and that is fundamentally changing what people can do with their database.
What is different about the MarkLogic approach to semantics?
Pasqua: Where MarkLogic has a real innovation in this space is that we did not build a separate semantics system. In MarkLogic, we felt that it was very important that this knowledge network be completely tied with all the rest of your data so if you want to do a query that involves a normal sort of database query of values or you want to do full text search or do a geospatial query, and you want to tie all that to your knowledge network, you can do all that in a single query. You don’t have to go to one system and get a certain piece of information, go to another system and get another kind of information and then try and pull them together yourself. With MarkLogic, we fundamentally felt it was important to have all that in one place so that the information you have in your knowledge network can apply equally well to refine the results of a geospatial query as anything else. It is that ability to combine all of that information that makes our approach really powerful.
You recently introduced updates to the semantics features.
Pasqua: In MarkLogic 8 we added some major new features and one is called inferencing. We talked about how, with semantics, the system can draw conclusions the same way that a person would pull together those facts to draw conclusions. The inferencing mechanism that we added in MarkLogic 8 makes it really easy to do that. We use an industry standard representation for our semantic information so you can connect all of the information you have in your database to the semantic web, to sites like DBPedia, and all of the information that they have. All of that can work together. And then, you can use our inferencing mechanism to run rules against your knowledge and infer new information and new facts that did not exist in an individual source. But MarkLogic can figure out based on the facts and relationships that are in your data that you have pulled from the semantic web.
What else was added in MarkLogic 8?
Pasqua: Inferencing was a major feature in MarkLogic 8. There were also other additions like new features around SPARQL, the industry standard query language. We also improved performance and we will be continuing to do so. But what I think is really cool, not that those weren’t, is that when we start to think ahead about how people are going to use semantics, there is an entirely new domain that I think people are going to focus on that is as big or bigger than what we have seen already. That is what I would refer to as the semantics of data. So everything we have been talking about so far, we have been talking about the semantics of things in the real world, Italy and currencies and things like that. And tying them all together but we can also apply semantics to the data itself and the real relationships between data elements in a system. For 35 years people have been doing that with the rigid relational database model and that has been the dominant data model and approach the industry has had. But the industry has run up against a brick wall because it is so rigid and inflexible and that is one of the main things MarkLogic is meant to address – to provide more flexibility in the way that you deal with data.
And what we are starting to see customers doing – and you will start to see MarkLogic incorporating in the future – is actually using semantics and semantics relationships to provide an entirely new way for people to model their data. We are going to give you all of the advantages of being able to express rich relationship s between your data with none of the rigidity, time, and effort that that people deal with today in the relational model.
NoSQL was first seen as being no SQL and now seems to be more widely understood as not only SQL. Would you say there is greater appreciation now for SQL in the big data world?
Pasqua: That is something that we are seeing and it makes sense because there is 35 or 40 years’ worth of SQL knowledge out there, all this tooling, and expertise that has been built up and is available, and all this data that is locked in SQL databases. I think as you see technologies like MarkLogic expanding beyond the early adopters and into the majority of these larger enterprises, organizations are saying, okay, we understand the value that you bring to the table but you have to work with us.
MarkLogic introduced SQL support years ago in MarkLogic 6. We realized that we needed to help our customers to bridge off of their existing technology bases and help them adopt new approaches. We are going to continue to do that and help customers take relational data, SQL information, and get it into MarkLogic to efficiently support tools that today use SQL as their primary interface language.
This is part of dealing with an enterprise requirements. These customers have this investment and you can’t ignore it. We are going to help our customers who have this pre-existing, legacy SQL investment and help them bridge to a new generation of technologies.
How does MarkLogic fit with Hadoop – what is the role of MarkLogic in the infrastructure?
Pasqua: I think it is finally getting clearer how all that plays out. There was confusion a few years ago about where Hadoop fits into the ecosystems and how it relates to NoSQL and other technologies. What is really shaking out now is that Hadoop is really good at what it was built to do – which is large-scale analytics queries and data analysis – and all these attempts and pushes to move it into the role of an operational system or a more broad capability database were kind of misguided.
We are an operational database and so fundamentally, the way we think about the world is that we are out there running the operational systems, the trading systems, even the back-end systems for something like the Saturday Night Live app. And the data from MarkLogic can actually flow into a Hadoop system or deep long-running back-end Linux system that can provide new insights that are cycled back into that operational system. We want to make that linkage really tight.
What are the next steps in enterprise features to MarkLogic?
Pasqua: We are busy at work on what we are internally calling our “8-plus” release. We released MarkLogic 8 in February 2015, and in the next month or so, we are going to be introducing an interim version 8 release with some pretty cool new features, some in semantics and some in the JavaScript and JSON area, as well as some new enterprise capabilities. And then we also have MarkLogic 9 in the works for next year.
There are a few areas that we are going to focus on: how do we help our customer base move from their existing set of technologies into a NoSQL base. We will be working on data integration technologies, improving our SQL support, using semantics to create a much richer sort of data modeling environment, and we will be working on better manageability. In big enterprises, you have to be able to manage huge installations, so we will be improving our manageability. We will be adding more features that are oriented toward privacy, redacting sensitive information and encryption – those types of capabilities. And, we going to continue to double down on some of the existing features that customers are really interested in like advanced geospatial capabilities and tiered storage and so on. So there is really a broad set of capabilities that we are looking at – but they are really oriented toward helping our customers deal with the most critical issues that they are telling us they are faced with today.