Weighted load balancing is a core systems problem across large-scale platforms.
It shows up in traffic routing, shard assignment, experiment bucketing, and capacity planning — and it’s a common pattern tested in FAANG SQL interviews.
In this post, we’ll model capacity-aware request distribution using pure SQL, focusing on correctness, scalability, and clean logic — not brute-force tricks.
Problem Statement
You’re given a set of servers with different capacities:
| server_id | weight |
|---|---|
| S1 | 5 |
| S2 | 3 |
| S3 | 2 |
A higher weight means the server can handle more traffic.
Requirement
For every incoming request:
Traffic should be distributed proportionally to server capacity
Out of every 10 requests:
- S1 gets 5
- S2 gets 3
- S3 gets 2
- ❌ No row expansion by weight
Why Not the Naive Approach?
A common (but flawed) solution is to duplicate rows by weight:
This:
- explodes row counts
- doesn’t scale when weights are large
- is not how real systems think about routing
Instead, we’ll use ranges + modulo logic.
Key Idea (High Level)
- Convert server weights into contiguous slot ranges
- Convert each request into a repeating slot number
- Route the request to the server whose range contains that slot
This mirrors how real load balancers reason about traffic — just expressed in SQL.
Dataset (SQL Server)
Step 1: Build Server Slot Ranges (Using Window Functions)
First, we turn weights into ranges.
Logic
end_slot= running sum of weightsstart_slot=end_slot - weight + 1(math trick)
Result
| server_id | start_slot | end_slot |
|---|---|---|
| S1 | 1 | 5 |
| S2 | 6 | 8 |
| S3 | 9 | 10 |
These ranges are non-overlapping and contiguous, which is critical.
Step 2: Assign Each Request a Slot (Modulo Logic)
Next, we convert each request into a slot number from 1 to total_weight.
Formula
Why this works:
% (modulo) creates a repeating cycle- subtracting
1 aligns counting with modulo’s 0-based behavior - adding
1 shifts back to human-friendly slots
Example
| request_id | slot |
|---|---|
| 1 | 1 |
| 2 | 2 |
| ... | ... |
| 10 | 10 |
| 11 | 1 |
| 12 | 2 |
Final Query: Weighted Routing (Single CTE, Clean Version)
This is the FAANG-style final solution.
Why This Solution Is Interview-Grade
- ✅ No row expansion
- ✅ Deterministic routing
- ✅ Uses window functions correctly
- ✅ Scales to large weights
- ✅ Mirrors real-world system design
This is systems thinking expressed in SQL.
Common Interview Follow-Ups
- What happens if a server goes down?
- → Recompute ranges using only active servers.
- How do you minimize reshuffling when servers change?
- → Move to consistent hashing.
- Why not use
NTILE? - → NTILE distributes evenly, not proportionally by weight.
Final Takeaway
SQL isn’t just for reporting.
When used correctly, it’s a powerful way to model distribution, fairness, and correctness — exactly what FAANG interviews look for.
If you understand this pattern, you’re not just writing SQL —
you’re reasoning like a systems engineer.