Seven Steps to Ensure Network Resilience

Over the last year, reports of network outages have become a regular occurrence. Companies including Google, Cloudflare, and YouTube have all recently reported outages that impacted millions for hours while the issues were resolved. For many companies, the cost of downtime is expen­sive and damaging to their reputations. When a company’s network is unavail­able, employees, customers, and the orga­nization as a whole are forced to face the resulting consequences.

When an outage occurs, employees’ production screeches to a halt. Even when the outage is finally resolved, it takes a company an average of 23 minutes to refocus on work, resulting in the company losing money due to lack of productivity. Customers who depend on an organization’s platform cannot access certain features or services, which impacts both their business and personal lives. Beyond these frustrations, customers’ data security is put at risk during a network outage.

Outages can stem from a variety of factors, including human error, environmental conditions and network elements, from ISP carrier issues to fiber cuts and cable interconnects. Additionally, as network devices become more complex and require frequent updates, they become more susceptible to bugs, exploits and cyberattacks, all of which contribute to outages.

Despite technology stacks needing more frequent updates, Information Technology Consulting (ITIC) found that 85% of major corporations now require an uptime minimum of 99.99% for mission critical hardware, operating systems and main line of business (LOB) applications. Not meeting these service levels, though, can be expensive—Gartner reports that the average cost of network downtime is $5,600 per minute, resulting in well over $300,000 per hour for many enterprises.

Organizations need a robust, secure and resilient network to ensure systems remain in place and available and business isn’t interrupted in any way in the event of a network outage.

Enter: Network Resilience

Network resilience is the ability to withstand and recover from a disruption of service. Viewed as a competitive advantage for organizations across a variety of verticals, these solutions help organizations prevent data losses and minimize damage while allowing employees to continue doing business. In case of failure, a network should be able to bounce back quickly.

When a disruption occurs, engineers cannot afford to rely on the network to manage that same network, because when an outage takes place, it can result in deadlock. For example, during Google’s recent outage mentioned above, the company released a report that included details about the disruption. In the report, the company stated, “Google engineers were alerted to the failure two minutes after it began, and rapidly engaged the incident management protocols used for the most significant of production incidents. Debugging the problem was significantly hampered by failure of tools competing over the use of the now-congested network.”

Google is taking the proper steps to handle future outages. As mentioned in the report, it will be reviewing and testing its tooling and procedures to ensure they are prepared if this kind of outage occurs again. However, while evaluating their current process, a few additional steps can be taken to ensure full network resilience. These include not only having data backup or redundancy—but also having the capability to get a network operating at normal capacity, sometimes even before resolving the cause of disruption.

However, many organizations do not consider resilience when designing and building their networks. For the companies that are interested in implementing resilience in their networks, they must weigh the cost of a non-resilient network in the design stage.

Designing a resilient network is expensive, time consuming and must incorporate both short- and long-term resilience requirements. Some companies may not realize its importance and withhold financial resources. However, over the long term, a resilient network more than pays for itself. By being resilient, companies will save thousands of dollars in revenue if—and when—the network goes down, making the upfront cost well worth the price.

Other companies might recognize the challenge of either adding in resilience capabilities to a network already in place or feel that they don’t have the in-house resources and expertise to design a resilient network from scratch that incorporates both short- and long-term needs. For organizations that are interested in designing and implementing a resilient network, there are steps that they can follow to ensure its success, both now and in the future.

The Seven Steps To Network Resilience

Companies that are thinking about designing a resilient network should start by evaluating how resilient their current network is. To do so, they should measure how long it takes to resume normal business operations after a failure is resolved. Once companies understand the current status of their network, they are ready to follow the seven steps to achieve full network resilience, which are as follows:

  1. Identify weaknesses: Companies should recognize weaknesses and points of failure in the network. A good starting point is assuming that the network currently is vulnerable, and that work needs to be done to achieve resilience.
  2. Determine costs: Organizations should look at the average cost of a network outage to justify additional financial resources. Information Technology Consulting (ITIC) reports that a single hour of downtime can reach between $1-5 million.
  3. Champion network resilience efforts: By continually reminding those outside of the IT department how much money network resilience can save the organization, IT teams will have greater luck implementing resilience into their networks.
  4. Create a network roadmap: IT teams should identify potential weaknesses and continually update their network roadmaps to understand where issues may arise, and how resilience can help organizations work through them.
  5. Protect the network: Instead of focusing on the hardware, organizations should build resilience from core to edge. This way, the network is protected at all stages, rather than only at a few components.
  6. Deploy resilience features: Features such as strategic redundancy, alternative routes and segmentation can help organizations implement a resilient network in small steps, rather than all at once.
  7. Foster support: IT teams should create awareness of network resilience from all organizational levels to grow cooperation throughout the company.

It’s important to note that a resilient network is not only an IT issue—it is a business issue. In order to be successful, IT teams must get buy-in from senior leaders in departments throughout the organization. IT teams should treat network resilience the same as any issue that impacts the whole business enterprise. It’s imperative that executives consider and can answer the following questions about the reliability of the organization’s network:

  • What network vulnerabilities can we prevent?
  • What can we not control?
  • What is the potential cost of a network outage?
  • What can we do to manage and mitigate the risk we decide to accept?

When executives are able to answer these questions, it will be easier for IT teams to design and implement a resilient network.

As more companies continue to rely on interconnected networks, virtualized cloud services, and Internet of Things (IoT) technologies, the potential for downtime and its costs will only rise. By achieving true network resilience, companies can focus on maintaining their services, removing single points of failure and having a plan to bring the network back up to continue normal operations—before it costs them.