When we want to monitor the distributed system, we usually use “percentile”. For example, P99 – that means percentile 99, we mesure the performance until 99% and we exclude the last 1% performance.
Concret example, we say a Service’s latency P99 = 100ms, that means 99% of service response time is less than 100ms.
Normally, the calculate of percentile is expensive. Because we have to take for example 100 samples and order them , find the 99th one.
For monitoring, we usually take P50, P99 and P99.9.