Ensuring High Availability for SAP Landscapes on AWS

While the cloud can offer an organization a wide array of benefits—including a reduction in infrastructure costs, greater flexibility to deploy regionally, and more—it’s not always an organization’s first choice when it comes to building out an SAP landscape, where high availability (HA) is essential.

A cloud service provider such as AWS may offer a service level agreement (SLA) that guarantees that at least one VM in a HA failover cluster will be available 99.99% of the time, but the availability of a single VM does not translate to 99.99% availability of this mission-critical application. If you’re invested in real-time analytics or building out AI services that depend on uninterrupted access to your SAP landscape, you don’t want to find yourself unable to access the data and services you need.

But there are ways to configure an SAP landscape in AWS EC2 for true HA. You simply need to know where the vulnerabilities lie and take proactive steps to address them.

First, the underlying problem with provider assurances of HA on any of the cloud platforms is that they only guarantee the availability of a VM. The SLAs don’t guarantee that the VM can access the data associated with the application that it is supposed to be supporting. They certainly don’t guarantee that an underlying cluster architecture will automatically reconfigure itself with the complex relationships and dependencies that characterize an SAP landscape. So, while it’s nice to know that at least one VM in your cluster can register a heartbeat, that’s really not the availability that your analytics depend on.

How do you deploy SAP on AWS to ensure the kind of application availability you’re really seeking? You need to plan out your failover scenarios, and there are really two parts to this. The first part involves SAP cluster failover management in the AWS environment; the second involves the orchestrated activation of your backup SAP landscape.

Failover Management in the Cloud

While SAP HANA system replication (HSR) services can ensure that the data in your SAP HANA database is effectively replicated between your primary and secondary (backup) cluster, HSR does not itself provide any cluster failover management services. If the landscape in one AWS Availability zone (AZ) becomes non-responsive—for whatever reason—there are no built-in tools that will automate failover to a geographically distinct AZ where the backup landscape can take over the workload. You can rely on HSR to ensure that the SAP HANA data is available to the secondary landscape, but bringing that landscape online and reconfiguring the network and routing tables are manual tasks that take time and require expertise—and while your IT team is working to reroute everything to the secondary landscape and start up the services in the proscribed order,  your SAP system is itself effectively offline.

You can approach the 99.99% application availability levels you seek by deploying an SAP-certified failover cluster management tool. These are third-party tools that SAP has tested and certified to monitor the health of your SAP landscape and to automate the failover of a landscape in one AZ to another if conditions require it.

At a high level, a failover cluster management tool monitors the health of the various elements in the primary SAP landscape and initiates a failover to the secondary landscape if one or more elements in the primary landscape go offline. A clear benefit here lies in speed of response: Automated failover can occur without human intervention. There’s no time lost waiting for a system admin to respond to an alert. The failover management software can initiate failover instantly, eliminating what could be a significant (and costly) amount of downtime.

More sophisticated failover management tools can also provide powerful services to help you avoid even needing to failover. Since the majority of incidents that prompt a failover arise from application and operating system errors (rather than external catastrophes that bring down an entire data center), cluster failover management tools that can detect and proactively resolve small issues (such as a hung process) can help you prevent those smaller issues from snowballing into a larger issue that causes some element of the SAP landscape to appear nonresponsive. While deploying a cluster management tool to automate failover between landscapes is an important component of strategy for ensuring HA, deploying a cluster management service that includes automated application recovery services can help you pre-empt the need to failover. If all that it takes to ensure that your analytics can interact with your SAP landscape is the automated restart of a background service, it takes far less time to do that than to fail over to a secondary cluster.

Orchestrating the Activation of a Backup SAP Landscape in the Cloud

There will be times when you need to fail over, though, which brings us to the second issue I mentioned at the top of this article: Activating the secondary instance of your SAP landscape expeditiously. The challenge here is that the elements of an SAP landscape are interdependent, and services need to be started in a certain order. This is another reason why a manual restart can take as long as it does; if services are not started in the proper sequence, your landscape won’t operate properly.  

Thankfully, a failover cluster manager that is “SAP application aware” can also provide you with the features required to automate the properly sequence restart of your SAP landscape. You’ll need to configure those startup procedures to reflect the demands of your specific landscape—so make sure the failover management software you select makes that easy to do—but once you’ve deployed your failover cluster management software to automate the failover and orchestrate the activation of your landscape, you’ll be able to run your analytics and AI services with the high availability that you’re expecting.