In distributed system world, Single point of failure is a key word that you should always be aware.
It means if a part of system fails, the whole system will be down. For example, if Service A sends messages to Service B via a single instance of message queue, then if the queue fails, the communication between Service A and B will be completely loses. Then this message queue is SPOF of the system.
The key to remove SPOF is using “Redundancy“, here is very well document by Oracle that explains the point.
The system “Reliability” explained by Amazon.