Making Cloud Workloads 'Observable': Industry Leader Q&A with Sumo Logic's Bruno Kurtic

Ensuring that applications are performing optimally has always been important. However, due to the COVID-19 pandemic, many organizations have sped up their cloud plans and are now relying on new tools to ensure that their cloud-based applications are performing with the right levels of availability, performance, and security. Bruno Kurtic, founding vice president of product management and strategy at Sumo Logic, recently discussed current trends in cloud workload management and how new tools can improve continuous cloud app observability.

What does Sumo Logic provide?

Bruno Kurtic, Sumo LogicBruno Kurtic:  Sumo Logic offers a continuous intelligence platform and we cover areas of operations— ITOps, DevOps, security, analytics, security operations—as well as business intelligence. We focus on modern applications, modern infrastructure, and clouds to help our customers deliver more reliable digital services, which, ultimately, deliver better customer experience. To us and also to our customers, reliability is three things: availability, performance, and security of digital services.

The cloud is as different from the data center as the data center is from a mainframe.

What has changed for IT companies since the COVID-19 pandemic?
BK: A lot of enterprises are now more seriously adopting cloud faster because they're trying to respond to new demand spikes and it is also forcing them to adopt more SaaS services to focus on things that help them govern the remote workplace. And so, there is a great acceleration in the adoption of the cloud. There's an acceleration of digital business models because enterprises are seeing that's how the behavior of the buyer is shifting. And there's an accelerated adoption of SaaS services to support the internal business.

How is that impacting companies' ability to maintain control of the applications that they build and serve their customers?
BK: There is a renewed focus on reliability, scalability, and performance of systems that are generating revenue. "Observability" is the new evolutionary step in delivering reliability. Observability is really a property of the system, the property of the application itself that enables an external observer, a person who's monitoring and troubleshooting it, to be able to understand what's happening to that application in real time even if the conditions that caused it were previously unknown.

How is that accomplished?
BK: More data is needed, and deeper analytics and intelligence is required, in order to be able to go from knowing that something is wrong and that an SLA is being violated to identifying why that is happening as quickly as possible so systems can be restored, the application can be brought back, and the organization can avoid angering its user base. The observability and associated technologies with our latest product launch are focused on helping our customers capture all the data in an economical way and enabling them to perform analytics on that data to understand the deep behaviors of the application and the causes.

Cloud is fundamentally a whole different operating model, and the shift in how you need to manage cloud workloads has to be total and complete.

Why do organizations need to be proactive in managing these cloud environments?
BK: I actually believe that the cloud is as different from the data center as the data center is from a mainframe. I think it's fundamentally a whole different operating model, and the shift in how you need to manage these cloud workloads has to be total and complete. Organizations have to adopt new methodologies, new processes, new tools, new skill sets, and teams have to be recombined into different types of teams—cross-functional teams—because cloud environments are ever-changing. The cloud delivers to the hardware, what agile development delivers to software: rapid evolution and change.

How do these new processes help?
BK: You can now adopt hardware in the way that you're building software. You can iterate it and make it to evolve with your software as quickly as you need to evolve. You need to think about how to manage these cloud environments differently. Amazon alone has close to 200 enterprise cloud services, Google has the same number, and so do Azure, Alibaba, and IBM, which means hundreds and hundreds of new types of components to manage. When you have an infrastructure and microservices environment that absolutely is going to be different in a week from what it is today, the way you think about the availability of your—let's call it a "new hardware stack"--has to be done in a much more agile way than it was before.

What have you added to the platform to help customers manage their cloud environments better?
BK: We have announced a set of platform capabilities that span the breadth of what we believe observability and reliability needs to be run across. It has to start with a software delivery process, meaning your development and CI/CD [continuous integration/continuous development] pipeline. DORA [DevOps Research & Assessment LLC] put out KPIs to manage the efficiency and effectiveness of a software delivery process. Some of the KPIs are about how often you ship to production changes that break your application, and how quickly you are able to recover from those breakages. 

What is next?
BK: From there, you move into production and you have to manage your application, and your application components, such as databases, web servers, and others, but you can't stop there. To manage your cloud infrastructure, you have to also manage your application platform including your Kubernetes infrastructure and microservices layer. And then it doesn't stop there either because you have to also manage the edge--the edge itself, the CDN (content delivery network) metrics—that final mile that delivers to your customer the content and your application. That is the breadth of what needs to be analyzed and, to enable this, we added two new solutions and one update.

For years, C-level executives and data architects said their strategy was multi-cloud but there was no evidence to back that up—until now.

What are they?
BK: The first one is for software delivery observability, which is essentially a solution that integrates the end-to-end software development pipeline, and allows you to monitor it, manage it through the DORA metrics paradigm to help you improve that process—be more agile, cause fewer errors.  That is a new solution on which we have partnered with Atlassian. We integrated all of their tool sets, as well as open source tools such as Jenkins and other vendors' tools. That's solution number one.

What are the others?
BK: Solution number two is an update to our microservices solution. We announced a Kubernetes solution at our conference last fall and we have announced an update to that solution, which extends its capabilities by introducing our tracing capability that allows you to monitor your microservices-based application all the way from the application down to its underlying platform—through log metrics and traces, deep analytics, and contextual navigation that goes from a signal all the way down to the root cause of the issue with the application. So that's the second one and it is an update for an existing solution.

And the third?
BK: In the cloud management stack, we introduced a new end-to-end AWS observability solution that essentially collects data, log metrics, events, and metadata, across a set of key services—compute, database, networking, and others in AWS. We have built into it a sophisticated, machine learning-aided anomaly detection and root cause explorer engine, which looks at the data from AWS and tries to determine the cause and effect automatically. It navigates a user from where the problem might have started and where it ended up so the user can much more effectively determine where the root cause of the issue in the application might be.

Are there plans to expand the observability solution to other cloud platforms?
BK: We are going to expand into other cloud providers and kind of replicate this model elsewhere. We started with AWS because it is still the dominant cloud and the majority of enterprises are there—so we wanted to start with that. We're also running on AWS, ourselves, and use these capabilities on our own platform so our own internal team can leverage it. And so it was complementary to both where our customers are and what we do as well.

Are hybrid and multi-cloud scenarios making cloud operations management more challenging? It's not just one platform anymore.
BK: Correct. In fact, we do analysis on an annual basis called the "Continuous Intelligence Report," in which we analyze the telemetry from our customers to understand how people are adopting various technologies in the cloud, how it is different this year from previous years, and also how it is different from what they do on-premise. The dominant cloud migration strategy for enterprises has been multi-cloud. All the C-level executives and architects have told me multi-cloud is their strategy.

Kubernetes has become the insulation between clouds and enables enterprises to actually have portable workloads across any environment—on-premise or in the cloud.

What are the reasons?
BK: Reason number-one is that nobody wants to be beholden to a single provider. Nobody likes what happened in the 1990s with Microsoft, where they dominated everything, and so now everybody thinks Microsoft is Amazon. They want to have a choice. And they also want to have the best technology stack. They want to be able to pick machine learning tools from Google and compute tools from Amazon, or another vendor. And these global companies also want to be able to pick the best provider for specific regions. And so this is what they've been telling us over the last 5 years—but we really didn't see much evidence of it actually being the case. They would say that, but then they would tell us they were starting with one cloud platform just to learn how to do it—until last year. 

Have things changed?
BK: When we updated the report last year, we discovered that the single fastest growing customer segment for us is customers with multi-clouds—meaning customers actively sending us production telemetry from applications running across multiple cloud providers, which basically was growing more than 50% year-over-year. This means that last year that talk track had finally become a reality.

BK: The relationship within Kubernetes and multi-cloud is almost perfect. Essentially customers that are strictly on-premise and not in the cloud had single-digit percentage chance of running Kubernetes, people in a single cloud, about 20%,  people in two clouds, about 40% to 50%, and people running three clouds had more than an 80% chance of running Kubernetes. It seems like Kubernetes has become that insulation between clouds and enables enterprises to actually have portable workloads across any environment—on-premise or in the cloud.

Looking ahead, what do you see?
BK: What we expect to see, and what we're banking on, is continuous, acceleration in cloud adoption—multi-cloud adoption continued with Kubernetes-style, microservices-based applications, service meshes, and related microservices technologies. And I think there will also be greater reliance on modern methodologies to actually deliver reliable, secure, and performant applications. This methodology is embodied in this construct of observability and what you need to enable outside observers to see in order to be able to deliver this.

We also believe that more teams that manage reliability will continue to bring into the fold more sophisticated security because the same data you use for reliable performance and availability is used to build secure applications. This cross-functional approach to tooling, as well as to routines, is going continue and we will deliver more and more insights on top of the same data with a single set of tools.

Interview conducted and edited by Joyce Wells.


Subscribe to Big Data Quarterly E-Edition