Add Power-of-Two-Choices Peak-EWMA load balancer (p2c)#3367
Open
rajvarun77 wants to merge 1 commit into
Open
Conversation
Implements the p2c load balancing policy proposed in apache#3340: each selection samples two random servers (configurable via choices=N) and routes to the lower peak-EWMA latency * (inflight+1) / weight score. Upward latency spikes take effect immediately while recovery decays over tau_ms (default 10s), so a degraded server is shed within one observation at O(1) selection cost. Uses only existing LoadBalancer hooks (SelectServer/Feedback, need_feedback) with DoublyBufferedData membership like rr/la; per-node stats are shared_ptr-owned by both buffers. Registered as "p2c" in global.cpp. Includes unit tests (functional, weighted, exclusion, error punishment, concurrency churn) and docs in cn/en client.md. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Issue Number: resolve #3340
Problem Summary: brpc has no Power-of-Two-Choices load balancer — the most widely deployed tail-latency-aware policy (Envoy LEAST_REQUEST, Finagle/linkerd Peak-EWMA). A single degraded backend keeps receiving 25% of
rrtraffic until humans intervene;la's averaging window reacts slower than one observation.What is changed and the side effects?
Changed: New
p2cpolicy (src/brpc/policy/p2c_ewma_load_balancer.{h,cpp}): each selection samples two random servers (p2c:choices=Nwidens; evaluates all when N ≥ cluster size, with random tie-breaking to avoid herding) and routes to the lowerpeak_ewma_latency_us * (inflight + 1) / weight. Latency spikes replace the average immediately; recovery decays overtau_ms(default 10s); failures are punished with at least the RPC timeout. Uses only existing hooks (SelectServer/Feedback,DoublyBufferedDatamembership likerr); per-node stats are stable pointers shared by both buffers as inla. Registered inglobal.cpp; 10 unit tests;p2cdocs indocs/{cn,en}/client.md.Benchmark (rpc_press, 4 echo backends, slow = +5ms
bthread_usleep, qps=4000, 15s, 50 threads, 3 reps averaged, zero errors; p99/p999 µs):lap2c(best-of-2)p2c:choices=4rrSlow-node traffic share (1 slow of 4):
rr25%,p2c1.3%,p2c:choices=40.6%,la0.0% — consistent with the numbers on #3340.Side effects:
p2cselection is O(1) (two score evaluations).Check List:
brpc_p2c_ewma_load_balancer_unittestpasses 10/10 andbrpc_load_balancer_unittestpasses 16/16.cc @chenBright (thanks for the green light on #3340!) @zyearn
🤖 Generated with Claude Code