Percentile – Monitoring

When we want to monitor the distributed system, we usually use “percentile”. For example, P99 – that means percentile 99, we mesure the performance until 99% and we exclude the last 1% performance.

Concret example, we say a Service’s latency P99 = 100ms, that means 99% of service response time is less than 100ms.

Normally, the calculate of percentile is expensive. Because we have to take for example 100 samples and order them , find the 99th one.

For monitoring, we usually take P50, P99 and P99.9.

Here is a good example by Elastic which can help to understand the concept. And anther one for going deeper.

The links:

https://www.elastic.co/blog/averages-can-dangerous-use-percentile

https://blog.bramp.net/post/2018/01/16/measuring-percentile-latency/

Author: aerodc

Software Engineer