tags : Distributed Systems, System Design, Infrastructure, Web Server, Web Performance

Basics

Load balancing is a vague term. but common categorization

DNS

GSLB and such.

Network

Network devices sending traffic to multiple next-hops based on equal cost routing. The next-hops can be another layer of load balancer, or the end destination (in which case it’s anycast).

L4

  • They look at the “layer 4” headers to make a decision and forward each flow. These are smart, do consistent hashing, so if a destination goes down flows not going to that one still go to the same endpoint. They often have health-checks etc. Basically can have logic.
  • IP Virtual Server - Wikipedia

L7

These look higher in the network stack, often at HTTP headers and the like. Frequently they’ll do TLS offload. These can do complicated stuff.

Guidelines

Tips

  • Limit concurrent connections per backend server on LB and just let it queue there
  • Set balancing to not stick to app server (if app allows it) and distribute it by leastconn

How many threads/connections per machine is optimum for performance?

NOTE: I think the following is wrong. I have to come back to this later.

  • See Resource section of HTTP

TODO Doubt

  • If 1 thread/req,
c = cpu cores
w = % time spent waiting for DB/IO/APIs
x = account for variability of requests

c / (1 - w ) + x

Eg. 16 cores, 80% of the app’s time is spent waiting for APIs/DB etc. Then, 16/(1-0.8) = 80.

  • What to do with this number?
    • Set the queue size to be about that.
    • Set application’s thread count to slightly above that but not more.