tags : Distributed Systems, System Design, Infrastructure, Web Server, Web Performance
Basics
Load balancing is a vague term. but common categorization
- Read this for loadbalancing the loadbalancers
DNS
GSLB and such.
Network
Network devices sending traffic to multiple next-hops based on equal cost routing. The next-hops can be another layer of load balancer, or the end destination (in which case it’s anycast).
L4
- They look at the “layer 4” headers to make a decision and forward each flow. These are smart, do consistent hashing, so if a destination goes down flows not going to that one still go to the same endpoint. They often have health-checks etc. Basically can have logic.
- IP Virtual Server - Wikipedia
L7
These look higher in the network stack, often at HTTP headers and the like. Frequently they’ll do TLS offload. These can do complicated stuff.
Guidelines
Tips
- Limit concurrent connections per backend server on LB and just let it queue there
- Set balancing to not stick to app server (if app allows it) and distribute it by
leastconn
How many threads/connections per machine is optimum for performance?
NOTE: I think the following is wrong. I have to come back to this later.
- See Resource section of HTTP
TODO Doubt
- If 1 thread/req,
c = cpu cores
w = % time spent waiting for DB/IO/APIs
x = account for variability of requests
c / (1 - w ) + x
Eg. 16 cores, 80% of the app’s time is spent waiting for APIs/DB etc. Then, 16/(1-0.8) = 80.
- What to do with this number?
- Set the queue size to be about that.
- Set application’s thread count to slightly above that but not more.