Emerging tech strategist Andy Thurai walked attendees through the process of future-proofing IT Ops teams and infusing AI into their operations during his presentation at Data Summit Connect Fall 2020.

Everyone needs to future-proof their IT Ops teams, said Thurai. The march of progress and new approaches is not going to end here and the IT landscape change or IT transformation is going to not going to end today. "You're moving from wherever you are to AWS, to containers, to Kubernetes or what have you. But the changes will keep coming. There are newer things that will come that you to adopt to it. So, so over the course of the last decade, the development engineering and release teams have done a lot of work to mature that they have devops and Ci/CD So that portion of pushing the changes out smoothly is somewhat mature, but surprisingly, the IT operations, were not part of that. They were ignored to an extent."

Align IT Ops with DevOps

In giving advice to companies, said Thurai, the very first thing he tells them to do is to align IT Ops with DevOps. "When the change gets pushed through, if it going to affect the IT teams, and they need to know that. In other words, the IT Ops teams need to be part of the DevOps culture. It's easier said than done. And a lot of companies promise that they do that. But common times, I don't see them doing that. The IT Ops teams are not even aware of,  what the changes are that have  had happened over the course of last day, or 24 hours. And then when the change happens, they'll be scrambling last minute, trying to figure out what the issue is."

The Need for a Single Source of Truth

The second piece of advice Thurai said he gives to IT teams, is to have a single source of truth. This doesn't mean that you have to have the same tool to monitor everything across the board but they all need to be somehow correlated. "The problem is if you don't have a single source of truth, or as  I call it a single-observed truth, then it's going to be extremely hard for you to figure out the root cause."The MTTR (mean time to recovery or mean time to restore) consists of two components, said Thurai. One is MTTI, or mean time to identification, and then the time it takes to fix that problem that you identified. if your MTTI is going to increase, your MTTR is also going to increase, said Thurai. "In other words, you can't fix what you don't know is broken. So unless, you know, what's broken, it's going to be extremely difficult. This is again, assuming that if you know what's broken, you have a mechanism in place to fix that."

If you don't have that in place, you have even bigger problems, said Thurai. If you are able to get to the root cause and if you know what to do with that, then, combined, your time is going to come down, said Thurai.

Too Much Data Means You Need AI and ML Help

"And the last thing that I advise companies is, whether we agree or not, there is too much data now, particularly with IT operations," said Thurai. "When I say too much data, it's too much a volume of data, too much a variety of data. And even too much of unstructured data at times, from, from logs, from change management systems, a lot of them are, are structured. And with the case of logs, it may be semi-structured as well, or even unstructured. Your current analytic system may not be able to handle that." This is where the use of AI and ML will, will rise above the rest for root root cause analysis, the pattern matching, and the correlation of incidents that have happened. Because when an incident happens, it doesn't happen just in one stack. It doesn't happen at the application level. It doesn't just happen or the  infrastructure level or what have you. If it happens in one area, it's going to get propagated, and it's going to affect the entire application stack." Having a human taking a look at all of these componenents and then trying to figure out what has happened a monumental task, said Thurai. "And, obviously I keep telling people that AI and ML, thrive on data."

