What is High Availability?

High availability indicates a system's capability of being resilient to known or unknown failures.

What is High Availability?

  • High availability (HA) denotes a system's potency to withstand any failure and rapidly recover from it, all while delivering uninterrupted services to customers. These systems are proven, reliable, and contain redundant software and hardware components, allowing them to stay operational in extreme situations like power outages and part failures. Companies use system uptime as the standard measurement unit for high availability.
  • It is incredibly challenging to achieve 100 percent system uptime in any network. Therefore, most companies aim to reach the five nines or 99.999% availability - the ultimate availability standard - while serving customers. To achieve the highest level of service availability, accurate planning and consistent system monitoring are vital. To start with HA planning, identify and create a list of all the mission-critical systems or applications that drastically affect your routine business operations. Some other points to consider while building a redundant site architecture include:

    • Single point of failure (SPOF): A highly available system should neglect any single point of failure. SPOF threatens the smooth functioning of a system. A switch or router that controls internet access on a particular floor or area of an organization's building is an example of a single point of failure. In cloud environments, it is common to encounter software and hardware-based SPOF. Utilizing high availability clusters is one way to neglect SPOFs in cloud architectures.
    • High redundancy: Highly available systems are generally always accessible to customers with minimum downtime. The presence of redundant software and hardware components allows such systems to seamlessly shift to the healthy resource, such as a secondary server, to provide uninterrupted services. These systems should also have the ability to minimize data losses and downtime duration while switching to a redundant component to execute an ongoing task.
    • Automatic failover: A highly available system should automatically detect any problem in its hardware or software and switch to a backup option to ensure continued operations. Imagine a situation of two or more applications, databases, or systems failing concurrently due to the same reason. In that case, building inherent capabilities to detect and remedy the common cause of failure is vital in redundant site architectures.
  • AWS offers varied solutions to create applications that are highly available and reliable. Outlined below is a list of all the popular AWS compute and storage services that help you build fault-tolerant and reliable systems:
    • Multiple availability zones: Hosting web apps on various servers or EC2 instances located in different availability zones (AZ) is one way to achieve redundancy in AWS. If an AZ goes down, the app will smoothly shift to an alternate EC2 instance in a different zone to run uninterruptedly. It also helps discard any single point of failure.
    • Elastic load balancing (ELB): This is a powerful service that AWS offers to effectively tackle application traffic overload by smartly dispensing user requests across multiple servers. Depending upon the data volume and server health, the ELB service chooses two or more EC2 instances in the same or different zones to balance the traffic. With such service, you can enhance the overall reliability and fault-tolerance of your site. Additionally, AWS supports the creation of auto-scaling groups, allowing you to launch new server instances to cope with the increase in traffic volume.
    • Amazon relational database service (RDS): A fault-tolerant application should support a redundant and readily-accessible database. Amazon RDS allows you to maintain exact copies of the database in different availability zones with automatic failover. Whenever the primary database fails or becomes overloaded, the standby or replicated database takes over to fulfill user requests. Building secure and highly available database instances are straightforward with 'Multi-AZ deployment' functionality in AWS. AWS also offers the simple queue service (SQS) that can be combined with RDS to enhance the fault tolerance capabilities of your database. Under this service, the API requests to the database are put in a queue to prevent possible deadlocks and traffic volume spikes.
    • Amazon elastic block store (EBS): is a part of AWS's high availability storage solution portfolio. Combining EBS with Amazon EC2 services allows you to build a secure and highly reliable app. If your app requires persistent data storage, then Amazon EBS can be an ideal option. EBS volumes in AWS are highly reliable and can be linked to new server instances quickly. With AWS snapshot functionality, you can also create backups of EBS volumes for additional safety.
    • Amazon simple storage service (S3): offers secure and economical data storage with inherent HA capabilities. It provides eleven 9s of data durability by storing the replicas of data objects on different servers across various data centers.
  • Building a highly reliable and fault-tolerant system is possible using Microsoft Azure public cloud services. Outlined below is a list of popular Azure services and features that help create highly available apps:
    • Availability set: offers high availability for your apps hosted on multiple VMs in a single Azure region. It is essentially a collection of two or more identical Azure VMs running on separate physical nodes in a data center to prevent the single point of failure. Although Azure public cloud services are inherently reliable, Microsoft still encourages customers to create availability sets to make their VM infrastructure more resilient to planned and unplanned downtime. Since multiple instances of Azure virtual machines run on different physical hosts in an availability zone, failure in underlying hardware at one host will only affect a particular subset of VMs. The remaining instances run uninterruptedly to guarantee regular operations. Available sets in Azure also have a drawback; they do not help prevent application-related failures.
    • Availability zones: not only protect your app from the failure of the underlying hosting server but also the entire data center. Azure availability zone allows you to host your applications in multiple data centers located at distant geographical locations to guarantee consistent availability. Most Azure services are either zonal-specific or zonal-redundant. For instance, if you are leveraging Azure zonal-specific data storage services, only a single data center in a specific region will store the replicas of your database.
    • Storage redundancy: Azure provides the option to store your application data redundantly in single or multiple availability zones, allowing you to meet your data durability and availability requirements. If you're looking for twelve 9's of data durability, then Azure zonal-redundant data storage (ZRS) services can be the perfect solution. In contrast, Azure locally redundant storage (LRS) is a less-durable solution as it stores replicas of your data in a single data center only. Any outage at the data center can result in the loss of your entire data; hence, choose this option only if the information you're storing is easily recoverable.
    • Load balancing: Azure provides a load balancing solution to help customers effectively manage highly available applications and sudden spikes in traffic volume. You can employ the Azure load balancer to smartly disperse the application traffic across multiple backend servers to ensure low latency and high throughput.
    • Site recovery: If you're running an online website, such as an eCommerce store, requiring high uptime and throughput, you can sign up for the Azure site recovery (ASR) services. ASR gives you the flexibility to host your site or workloads at a secondary location when the primary data center goes down. With ASR automatic failover feature, you can stay operational and prevent revenue losses during unexpected outages.
Featured in this Resource
Like what you see? Try out the products.
Server & Application Monitor

Comprehensive server and application monitoring made simple.

Email Link To TrialFully functional for 30 days
Database Performance Analyzer

Monitor and optimize multiple database management system (DBMS) platforms for cloud and on-premises environments.

Email Link To TrialFully functional for 14 days
Virtualization Manager

Virtual machine monitoring and management designed to resolve performance issues.

Email Link To TrialFully functional for 30 days
Log Analyzer

Easily investigate machine data to help identify the root cause of IT issues faster.

Email Link To TrialFully functional for 30 days
Network Performance Monitor

Multi-vendor network monitoring that scales and expands with the needs of your network.

Email Link To TrialFully functional for 30 days
Patch Manager

Patch management software designed to quickly address software vulnerabilities.

Email Link To TrialFully functional for 30 days

View More Resources

What is Database Management System (DBMS)?

Database performance management system is designed to help admins more easily troubleshoot and resolve DBMS performance issues by monitoring performance and providing root-cause analysis of your database using multi-dimensional views to answer the who, what, when, where, and why of performance issues.

View IT Glossary

What Is Network Visualization?

Network visualization allows you to pictographically showcase the network architecture, including device arrangement and data flows.

View IT Glossary

What is agentless monitoring?

Agentless monitoring helps you monitor your overall network health without deploying any third-party agent software.

View IT Glossary

What is Network Discovery?

Network discovery is a process of finding devices that also allows systems and nodes to connect and communicate on the same network. This helps network administrators locate devices, create network maps, organize device inventories, enforce accurate device access policies, and gain better control of the infrastructure. Network discovery also helps to find static, dynamic, reserved, and abandoned IP addresses.

View IT Glossary

What is MIB?

MIB is an organized, up-to-date repository of managed objects for identifying and monitoring SNMP network devices.

View IT Glossary

What is CPU usage?

CPU utilization indicates the amount of load handled by individual processor cores to run various programs on a computer.

View IT Glossary