Single point of failure -SPOF

In distributed system world, Single point of failure (SPOF) is a key word that you should always be aware.

It means if a part of system fails, the whole system will be down. For example, if Service A sends messages to Service B via a single instance of message queue, then if the queue fails, the communication between Service A and B will be completely loses. Then this message queue is Single point of failure (SPOF) of the system.

The key solution to remove SPOF is using “Redundancy“, here is very well document by Oracle that explains the point.

The system “Reliability” explained by Amazon.

Example:

How AWS remove SPOF of load balancer:

  • Elastic Load Balancing (ELB) : There are two logical components in the Elastic Load Balancing service architecture: load balancers and a controller service. The load balancers are resources that monitor traffic and handle requests that come in through the Internet. The controller service monitors the load balancers, adds and removes capacity as needed, and verifies that load balancers are behaving properly.
  • Amazon SQS Standard queues : provides At-least-once delivery:Amazon SQS stores copies of your messages on multiple servers for redundancy and high availability. On rare occasions, one of the servers that stores a copy of a message might be unavailable when you receive or delete a message.
  • Amazon RDS Multi-AZ: In a Multi-AZ deployment, Amazon RDS automatically provisions and maintains a synchronous standby replica in a different Availability Zone. The primary DB instance is synchronously replicated across Availability Zones to a standby replica to provide data redundancy, eliminate I/O freezes, and minimize latency spikes during system backups.

Useful links:

https://docs.oracle.com/cd/E19424-01/820-4806/fjdch/index.html

https://wa.aws.amazon.com/wat.pillar.reliability.en.html

This blog is part of category Distributed System

Author: aerodc

Software Engineer