Skip to content

Add Power-of-Two-Choices Peak-EWMA load balancer (p2c)#3367

Open
rajvarun77 wants to merge 1 commit into
apache:masterfrom
rajvarun77:p2c-ewma-load-balancer
Open

Add Power-of-Two-Choices Peak-EWMA load balancer (p2c)#3367
rajvarun77 wants to merge 1 commit into
apache:masterfrom
rajvarun77:p2c-ewma-load-balancer

Conversation

@rajvarun77

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: resolve #3340

Problem Summary: brpc has no Power-of-Two-Choices load balancer — the most widely deployed tail-latency-aware policy (Envoy LEAST_REQUEST, Finagle/linkerd Peak-EWMA). A single degraded backend keeps receiving 25% of rr traffic until humans intervene; la's averaging window reacts slower than one observation.

What is changed and the side effects?

Changed: New p2c policy (src/brpc/policy/p2c_ewma_load_balancer.{h,cpp}): each selection samples two random servers (p2c:choices=N widens; evaluates all when N ≥ cluster size, with random tie-breaking to avoid herding) and routes to the lower peak_ewma_latency_us * (inflight + 1) / weight. Latency spikes replace the average immediately; recovery decays over tau_ms (default 10s); failures are punished with at least the RPC timeout. Uses only existing hooks (SelectServer/Feedback, DoublyBufferedData membership like rr); per-node stats are stable pointers shared by both buffers as in la. Registered in global.cpp; 10 unit tests; p2c docs in docs/{cn,en}/client.md.

Benchmark (rpc_press, 4 echo backends, slow = +5ms bthread_usleep, qps=4000, 15s, 50 threads, 3 reps averaged, zero errors; p99/p999 µs):

scenario la p2c (best-of-2) p2c:choices=4 rr
all healthy 450 / 1042 418 / 768 435 / 1136 567 / 12833
1 slow of 4 384 / 771 2451 / 6460 399 / 545 6440 / 6511
2 slow of 4 1045 / 2709 6458 / 6779 4284 / 6486 6474 / 6542

Slow-node traffic share (1 slow of 4): rr 25%, p2c 1.3%, p2c:choices=4 0.6%, la 0.0% — consistent with the numbers on #3340.

Side effects:

  • Performance effects: none on existing policies; p2c selection is O(1) (two score evaluations).
  • Breaking backward compatibility: no.

Check List:

  • Compiles in CMake/Bazel/Makefile (sources and tests are picked up by the existing globs); brpc_p2c_ewma_load_balancer_unittest passes 10/10 and brpc_load_balancer_unittest passes 16/16.

cc @chenBright (thanks for the green light on #3340!) @zyearn

🤖 Generated with Claude Code

Implements the p2c load balancing policy proposed in apache#3340: each
selection samples two random servers (configurable via choices=N) and
routes to the lower peak-EWMA latency * (inflight+1) / weight score.
Upward latency spikes take effect immediately while recovery decays
over tau_ms (default 10s), so a degraded server is shed within one
observation at O(1) selection cost.

Uses only existing LoadBalancer hooks (SelectServer/Feedback,
need_feedback) with DoublyBufferedData membership like rr/la; per-node
stats are shared_ptr-owned by both buffers. Registered as "p2c" in
global.cpp. Includes unit tests (functional, weighted, exclusion,
error punishment, concurrency churn) and docs in cn/en client.md.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Add Power-of-Two-Choices Peak-EWMA load balancer (p2c) — O(1) tail-latency-aware policy

2 participants