Percentile – Monitoring

When we want to monitor the distributed system, we usually use “percentile”. For example, P99 – that means percentile 99, we mesure the performance until 99% and we exclude the last 1% performance.

Concret example, we say a Service’s latency P99 = 100ms, that means 99% of service response time is less than 100ms.

Normally, the calculate of percentile is expensive. Because we have to take for example 100 samples and order them , find the 99th one.

For monitoring, we usually take P50, P99 and P99.9.

Here is a good example by Elastic which can help to understand the concept. And anther one for going deeper.

The links:

Single point of failure -SPOF

In distributed system world, Single point of failure (SPOF) is a key word that you should always be aware.

It means if a part of system fails, the whole system will be down. For example, if Service A sends messages to Service B via a single instance of message queue, then if the queue fails, the communication between Service A and B will be completely loses. Then this message queue is Single point of failure (SPOF) of the system.

The key solution to remove SPOF is using “Redundancy“, here is very well document by Oracle that explains the point.

The system “Reliability” explained by Amazon.


How AWS remove SPOF of load balancer:

  • Elastic Load Balancing (ELB) : There are two logical components in the Elastic Load Balancing service architecture: load balancers and a controller service. The load balancers are resources that monitor traffic and handle requests that come in through the Internet. The controller service monitors the load balancers, adds and removes capacity as needed, and verifies that load balancers are behaving properly.
  • Amazon SQS Standard queues : provides At-least-once delivery:Amazon SQS stores copies of your messages on multiple servers for redundancy and high availability. On rare occasions, one of the servers that stores a copy of a message might be unavailable when you receive or delete a message.
  • Amazon RDS Multi-AZ: In a Multi-AZ deployment, Amazon RDS automatically provisions and maintains a synchronous standby replica in a different Availability Zone. The primary DB instance is synchronously replicated across Availability Zones to a standby replica to provide data redundancy, eliminate I/O freezes, and minimize latency spikes during system backups.

Useful links:

This blog is part of category Distributed System

First month at Amazon – “Culture Shock”

It has been nearly a month since I started to work at Amazon in Seattle. To be honest, as a software developer, I kind of realize my years “dream” to work in a the top notch world class tech company.

But when in this big war ship, you find yourself quickly be educated or shocked by Amazon strong company culture and I would like to share some of them.

  1. Day 1” culture, each day should be considered as your first day at Amazon, that means you should be always passioned, motived and curious.
  2. Customer Obsession“: “We Start With the Customer and We Work Backward”. “Focus on customers over competitors”…
  3. Empty chair“: It is said early Jeff used to put an empty chair in each meeting. That empty chair represent our customer and what will he/she say or expected..
  4. Two pizzas” team rule: In the early days of Amazon, Jeff Bezos put a rule: every internal team should be small enough that it can be fed with two pizzas..
  5. Word Doc over PPT: Amazon love word document over PPT and each document should not be over 6 pages..
  6. Amazon loves writing: we have our internal wiki tools, you could find anything in that wiki site. We put design documents, any thoughts and everything useful into written wiki pages..
  7. “You own your proper career”: in amazon everyone could be leader, at least you are the leader of your self. You are given enough space and freedom to be driven by your own idea and actions. You don’t need to wait the orders by someone else.

Github Invalid username or password Problem

Using GitHub under command line and two factory authentication , we may encounter an “Invalid username or password” problem when we enable 2FA two-factor auth and try “git push“.

The solution is to instead of using your GitHub account password, you need to generate a secret token.

Here is the instruction from Github:

Finally, you should do is:

$ git clone
Username: your_username
Password: your_token

This blog is under category “others“.

Technology secrets behind Alibaba 11.11

This blog is inspired and based on the Alicloud WeChat article

This year 2019-11.11, Alibaba Tmall double 11 sales event accomplished Turnover of 268.4 billion Chinese Yuan! The peak value of orders reached 544,000 units/second, and the data processing capacity per day reached 970PB! And all the system is based on Alibaba Cloud.

According to CTO Alibaba said in this blog, there were four technology secrets behind this.

  • 3rd Generation of X-Dragon Architecture. An AWS nitro similar technology.
  • OceanBase and PolarDB. Those are Alibaba’s self-made Databases.
  • Calculation and storage are separated. The storage is on remote and could be easily for expansion
  • RDMA(RemoteDirect Memory Access) in order to access the remote storage data quickly.

We could see in order to support and boost such large data requesting case, we should improve on the physical machine side and also database sides. Reading retrieving data quickly is the key.

AWS Cognito + MP JWT RBAC + Quarkus

In this blog, we will try to build a Role-Based-Access-Control (RBAC) with Quarkus, MicroProile JWT RBAC and AWS Cognito.

AWS Cognito will create JWT token and RSA Public Key Distribution. Quarkus, MicroProile are responsible for Java Server-side API endpoints.

Useful links:

Eclipse MicroProfile – JWT RBAC Security (MP-JWT)


  • Create AWS Cognito User Pool and then in this User Pool create a User and Group. Here we use “Cognito Groups” as “User Roles”.
  • Create an AWS Cognito Identity Pool and get an identity pool Id , eg "eu-central-1_xxxxx". This Cognito Identity Pool will be the JWT Issuer and we could find the RSA Publicy Key under ""
  • Create the endpoint by using Quarkus, for example:
public class OrderResource {

    @RolesAllowed({"USER", "ADMIN"})
    public Response list(){
        return Response.ok(Arrays.asList("Order1", "Order2")).build();

Most important since the default group claim in MP-JWT is “groups” but the Cognito group claim is “cognito:groups” so we need config a mapping.


Other necessary configs:




For testing and getting a cognito jwt token you could try aws cli:

aws cognito-idp admin-initiate-auth --region eu-central-1 --cli-input-json file://auth.json

Then you put that token in the HTTP header “Authorization” and begins with “Bearer ” for example:

curl -X GET \
https://example/orders \
-H 'Authorization: Bearer YOUR_JWT_TOKEN' \

There you are the integration Quarkus + MP JWT and AWS Cognito. Enjoy!

This blog is under category “API“.

Tips : H1b Visa Stamping in Paris France – Conseils pour préparer votre H1B visa à Paris France

  • Fill the famous DS-160 form online.
  • Take the appointment online as early as possible.
  • Arrive early on the visa stamping day because there will be a very long queue outside the US embassy.
  • Don’t bring the laptop and Ipads with you because they are not allowed and you should search the nearby hotel to store them temporarily.
  • Bring everything about the documents even they are not asked on the list. For example, your CV, offer letter, etc.
  • Speak clearly and in detail about your experiences when they are asking.
  • In the end, you should ask if your visa is approved or checked!
  • If your visa unfortunately checked, be patient and send emails to USA Paris embassy for the update regularly.

Lisbon Travel Tips – 里斯本 旅游攻略

Lisbon is a beautiful city and suitable for a 3-4 days travel. After spending days here and I got some lessons learned during the travel. I want to write them down to help others to have a better stay.

  • Lisbon Airport is a mess, you should plan 2 hours ahead for the flights.
  • Uber and Bolt work in Lisbon, better to use them instead of Taxi. If you choose Taxi, ask the driver to use the taximeter. And prepare the cash.
  • At the airport, you could buy the Lisbon Card. With it, you could take free public transport like the tramway and access to many museums and historical sites.
  • Always buy the tickets for top attractions (Jeronimo’s monastery for example) online to avoid long queues.
  • Most of the museums are closed on Monday!!!
  • At the famous Pastéis de Belém near Jeronimo’s monastery, you could see a very long queue, but it is only for taking away. You could enter directly into the boutique to eat on the table. It is more comfortable and easy.
  • If you want to see Fado Music, you could go to “O Faia” restaurant. The music and food are top but they have minimum price each person is 50 euros and the show begins only at 21H30 PM.
  • Be careful to eat in the restaurants in the Alfama area, they could charge you service fees without telling you before.
  • You could download the “lonely planet app” to help you organize your visits.
  • In the end, Lisbon is a tourist city and there are really many tourists and also thieves. And don’t forget to bring comfortable Sportif shoes!

Probabilistic Data Structures

Context: When we are dealing with a very large set of data or a data streaming, we could not put them all in the memory.

1.Bloom filter: Used to query if the item exists (Membership query). A Bloom filter is a bit array of m bits initialized to 0. To add an element, feed it to k hash functions to get k array position and set the bits at these positions to 1. To query an element, feed it to k hash functions to obtain k array positions. If any of the bits at these positions is 0, then the element is definitely not in the set. If the bits are all 1, then the element might be in the set.

2. HyperLogLog: used for estimating the number of distinct elements (Cardinality). HyperLogLog counter can count one billion distinct items with an accuracy of 2% using only 1.5 KB of memory. It is based on the bit pattern observation that for a stream of randomly distributed numbers if there is a number x with the maximum of leading 0 bits k, the cardinality of the stream is very likely equal to 2^k.

3. Count-Min Sketch: used for querying single item count(Frequency).The basic data structure is a two-dimensional d , w array of counters with d pairwise independent hash functions h1 … hd of range w.

Link from


  • Stateless:  RESTFul API is stateless. A stateless protocol does not require the server to retain session information or status about each communicating partner for the duration of multiple requests.

UDP, HTTP, and IP are stateless protocols. TCP is stateful.

  • HEAD, GET, OPTIONS and TRACE are SAFE methods,which means they are intended only for information retrieval and should not change the state of the server. In other words, they should not have side effects.
  • Methods PUT and DELETE are defined to be idempotent, meaning that multiple identical requests should have the same effect as a single request. Methods GET, HEAD, OPTIONS, and TRACE, being prescribed as safe, should also be idempotent.