Interconnecting with the UBShmTransport Based on the LD/ST Shared Memory Semantics.#3290
Interconnecting with the UBShmTransport Based on the LD/ST Shared Memory Semantics.#3290zchuango wants to merge 29 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new UBRing-based shared-memory transport mode to brpc (IPC + optional ubs-mem backend) and wires it into the Socket/Transport framework, along with docs and a performance example.
Changes:
- Introduce UBRing transport (
SOCKET_MODE_UBRING) with endpoint handshake, polling, and ring manager infrastructure. - Add shared-memory backend abstraction (POSIX IPC + ubs-mem via dlopen’d SDK stubs/headers) plus timer utilities.
- Update build/docs/examples to expose the feature and provide a basic performance harness.
Reviewed changes
Copilot reviewed 43 out of 43 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
| src/brpc/ubshm/ubs_mem/ubshmem_stub.cpp | Adds stub implementations of ubs-mem APIs for non-ubs environments/UT. |
| src/brpc/ubshm/ubs_mem/ubs_mem.h | Introduces ubs-mem C API header used by the UBS backend integration. |
| src/brpc/ubshm/ubs_mem/ubs_mem_def.h | Defines ubs-mem types/constants used by the UBS backend integration. |
| src/brpc/ubshm/ubs_mem/declare_shm_ubs.h | Declares the dynamically loaded ubs-mem function pointer table. |
| src/brpc/ubshm/ubr_trx.h | Defines core UBR transaction structures and states. |
| src/brpc/ubshm/ubr_msg.h | Defines UBR message chunk format used by the ring transport. |
| src/brpc/ubshm/ub_ring.h | Declares UBRing read/write and lifecycle APIs used by the endpoint. |
| src/brpc/ubshm/ub_ring_manager.h | Declares global manager for UBR transactions and link bookkeeping. |
| src/brpc/ubshm/ub_ring_manager.cpp | Implements UBR transaction manager and UB event callback plumbing. |
| src/brpc/ubshm/ub_helper.h | Declares UBRing global init/availability helpers. |
| src/brpc/ubshm/ub_helper.cpp | Implements global init/fini, availability flags, and polling init. |
| src/brpc/ubshm/ub_endpoint.h | Declares UB shared-memory endpoint and polling infrastructure. |
| src/brpc/ubshm/ub_endpoint.cpp | Implements handshake, polling loop, and I/O integration with Socket/InputMessenger. |
| src/brpc/ubshm/timer/timer_mgr.h | Declares timer module used by UBS cleanup/recovery flows. |
| src/brpc/ubshm/timer/timer_mgr.cpp | Implements epoll/kqueue-based timer dispatch for UBRing subsystems. |
| src/brpc/ubshm/shm/shm_ubs.h | Declares UBS backend shared-memory operations. |
| src/brpc/ubshm/shm/shm_ubs.cpp | Implements UBS backend via dynamically loaded ubs-mem SDK. |
| src/brpc/ubshm/shm/shm_mgr.h | Declares backend-agnostic SHM manager interface. |
| src/brpc/ubshm/shm/shm_mgr.cpp | Implements SHM manager selecting IPC vs UBS backend via flag. |
| src/brpc/ubshm/shm/shm_ipc.h | Declares POSIX IPC SHM backend operations. |
| src/brpc/ubshm/shm/shm_ipc.cpp | Implements POSIX IPC SHM backend operations. |
| src/brpc/ubshm/shm/shm_def.h | Adds SHM structs/constants used across SHM backends and UBRing. |
| src/brpc/ubshm/common/thread_lock.h | Adds RAII-style mutex/spin/rwlock/semaphore guard macros. |
| src/brpc/ubshm/common/common.h | Adds common macros/types/constants used throughout UBRing code. |
| src/brpc/ubshm_transport.h | Declares UBShmTransport implementing the Transport interface. |
| src/brpc/ubshm_transport.cpp | Implements transport selection between UBRing and TCP fallback paths. |
| src/brpc/transport_factory.cpp | Wires SOCKET_MODE_UBRING into transport creation/context init. |
| src/brpc/socket.h | Adds UB endpoint/connect friend declarations for Socket integration. |
| src/brpc/socket_mode.h | Adds SOCKET_MODE_UBRING enum value. |
| src/brpc/rdma_transport.cpp | Adjusts RDMA transport’s TCP fallback member initialization (currently broken). |
| src/brpc/input_messenger.h | Adds UB endpoint friend declaration to support message processing hooks. |
| src/brpc/input_messenger.cpp | Extends RDMA-special message queuing behavior to UBRing sockets. |
| src/brpc/controller.h | Guards latency_us() against unset begin time. |
| README.md | Adds docs link for UBRing. |
| README_cn.md | Adds docs link for UBRing (CN). |
| example/ubring_performance/test.proto | Adds proto for UBRing performance test example. |
| example/ubring_performance/server.cpp | Adds UBRing-capable perf test server example. |
| example/ubring_performance/client.cpp | Adds UBRing-capable perf test client example. |
| example/ubring_performance/CMakeLists.txt | Adds standalone CMake build for the performance example. |
| docs/en/ubring.md | Documents build/run/configuration and backend selection for UBRing. |
| docs/cn/ubring.md | Chinese documentation for UBRing build/run/configuration. |
| CMakeLists.txt | Adds WITH_UBRING option and compile definition wiring. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| g_last_time.store(0, butil::memory_order_relaxed); | ||
|
|
||
| brpc::ServerOptions options; | ||
| options.socket_mode = FLAGS_use_ubring? brpc::SOCKET_MODE_UBRING : brpc::SOCKET_MODE_TCP; |
There was a problem hiding this comment.
brpc::ServerOptions socket_mode default use tcp mode is better。
There was a problem hiding this comment.
it reference example/rdma_performance code style,switching to the default TCP mode also works fine.
| return -1; | ||
| } | ||
| ubring::GlobalUBInitializeOrDie(); | ||
| if (!ubring::InitPollingModeWithTag(bthread_self_tag())) { |
There was a problem hiding this comment.
Does ubring only support polling mode?
There was a problem hiding this comment.
Yes. The LD/ST shared memory has this limitation. Currently, only the polling mode is supported. The time waiting mode requires the support of the OS kernel or hardware.
|
|
||
| ### 2. UBS-Mem 远端共享内存 (ub\_shm\_type = 2) | ||
|
|
||
| 此模式使用 ubs-mem(Unified Block Storage Memory),这是来自 openEuler 的开源远端共享内存框架。它支持机架内节点之间的共享内存通信,类似于 RDMA 但部署要求更简单。 |
There was a problem hiding this comment.
Can you list the libraries that need to be used?
There was a problem hiding this comment.
Okay, I'll list the depends libraries later
|
LGTM |
|
The issue has been communicated, and subsequent PR efforts will proceed in stages. |
|
There is a compilation error on macOS: |
|
Please update cmake ci to compile UBShmTransport: brpc/.github/workflows/ci-linux.yml Lines 69 to 74 in 3aa5dab brpc/.github/workflows/ci-macos.yml Lines 34 to 38 in 3aa5dab brpc/.github/workflows/ci-macos.yml Lines 57 to 60 in 3aa5dab |
okay, good suggestion ! I will check CI/Testing pipeline later. |
|
@chenBright I have resolve the macOS compilation error and add updated cmake ci to compile UBShmTransport, recheck it please,The current CI testing error seems to be intermittent; I can pass CI tests in my own repository. |
|
@zchuango When I run the ubring demo on Ubuntu, the server crashes. ./ubring_performance_client -use_ubring=true -echo_attachment=true -attachment_size=1048576
./ubring_performance_server -use_ubring=true |
|
@chenBright Is the error occurring during startup or a runtime error? Could you please provide relevant environment information, including OS and CPU details, so I can try to reproduce the problem? |
The error occurred at runtime.
Some environment information: uname -r
5.10.134-16.3.al8.x86_64
lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 24.04.1 LTS
Release: 24.04
Codename: noblelscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 52 bits physical, 57 bits virtual
Byte Order: Little Endian
CPU(s): 192
On-line CPU(s) list: 0-191
Vendor ID: GenuineIntel
BIOS Vendor ID: Intel(R) Corporation
Model name: Intel(R) Xeon(R) Platinum 8469C
BIOS Model name: Intel(R) Xeon(R) Platinum 8469C CPU @ 2.6GHz
BIOS CPU family: 179
CPU family: 6
Model: 143
Thread(s) per core: 2
Core(s) per socket: 48
Socket(s): 2
Stepping: 8
CPU(s) scaling MHz: 82%
CPU max MHz: 3800.0000
CPU min MHz: 800.0000
BogoMIPS: 5200.00
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb r
dtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 mon
itor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c r
drand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cat_l2 cdp_l3 invpcid_single intel_ppin cdp_l2 ssbd mba ibrs ibpb stibp ibrs_enhanced
tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a avx512f avx512dq rdseed
adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm
_total cqm_mbm_local split_lock_detect avx_vnni avx512_bf16 wbnoinvd dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_r
eq hfi avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq rdpid bus_lock_det
ect cldemote movdiri movdir64b enqcmd fsrm uintr md_clear serialize tsxldtrk pconfig arch_lbr amx_bf16 avx512_fp16 amx_tile amx_int8 flush_l
1d arch_capabilities
Virtualization features:
Virtualization: VT-x
Caches (sum of all):
L1d: 4.5 MiB (96 instances)
L1i: 3 MiB (96 instances)
L2: 192 MiB (96 instances)
L3: 195 MiB (2 instances)
NUMA:
NUMA node(s): 2
NUMA node0 CPU(s): 0-47,96-143
NUMA node1 CPU(s): 48-95,144-191
Vulnerabilities:
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Retbleed: Not affected
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Srbds: Not affected
Tsx async abort: Not affectedComplete runtime log: ./ubring_performance_client -use_ubring=true -echo_attachment=true -attachment_size=6291456
I0526 23:04:09.249178 98087 0 /workspace/cgm/brpc/src/brpc/server.cpp:1232 StartInternal] Server[DummyServerOf(./ubring_performance_client)] is serving on port=8001.
I0526 23:04:09.249319 98087 0 /workspace/cgm/brpc/src/brpc/server.cpp:1235 StartInternal] Check out http://k8s-al-sh-gpu-rdma-h20-0032:8001 in web browser.
[Threads: 1, Depth: 1, Attachment: 6291456B, UBRING: yes, Echo: yes]
I0526 23:04:09.257395 98087 0 /workspace/cgm/brpc/src/brpc/ubshm/shm/shm_mgr.cpp:72 ShmMgrInit] shm mgr init success, shm type=1
I0526 23:04:09.267279 98099 0 /workspace/cgm/brpc/src/brpc/ubshm/ub_ring.cpp:269 UbrTrxHBCallback] Heartbeat cannot be started, wait connected state.
Avg-Latency: 0, 90th-Latency: 0, 99th-Latency: 0, 99.9th-Latency: 0, Throughput: 29.9741MB/s, QPS: 0k, Server CPU-utilization: 0%, Client CPU-utilization: 101%
[Threads: 2, Depth: 1, Attachment: 6291456B, UBRING: yes, Echo: yes]
I0526 23:04:30.303327 98099 0 /workspace/cgm/brpc/src/brpc/ubshm/ub_ring.cpp:269 UbrTrxHBCallback] Heartbeat cannot be started, wait connected state.
Avg-Latency: 0, 90th-Latency: 0, 99th-Latency: 0, 99.9th-Latency: 0, Throughput: 0.299211MB/s, QPS: 0k, Server CPU-utilization: 0%, Client CPU-utilization: 102%
[Threads: 4, Depth: 1, Attachment: 6291456B, UBRING: yes, Echo: yes]
W0526 23:04:50.475469 98092 4294969093 /workspace/cgm/brpc/src/brpc/ubshm/ub_endpoint.cpp:385 ProcessHandshakeAtClient] Fail to get hello message from server:brpc::Socket{id=5 fd=14 addr=0.0.0.0:8002:57824} (0x564645f47910): Got EOF
W0526 23:04:50.475563 98087 0 /workspace/cgm/brpc/example/ubring_performance/client.cpp:131 Init] RPC call failed, retrying... (3 left): [E1014]Fail to complete ubring handshake from brpc::Socket{id=5 fd=14 addr=0.0.0.0:8002:57824} (0x564645f47910): Got EOF
W0526 23:04:51.475721 98087 0 /workspace/cgm/brpc/example/ubring_performance/client.cpp:131 Init] RPC call failed, retrying... (2 left): [E112]Not connected to 0.0.0.0:8002 yet, server_id=5
W0526 23:04:52.475883 98087 0 /workspace/cgm/brpc/example/ubring_performance/client.cpp:131 Init] RPC call failed, retrying... (1 left): [E112]Not connected to 0.0.0.0:8002 yet, server_id=5
E0526 23:04:53.476011 98087 0 /workspace/cgm/brpc/example/ubring_performance/client.cpp:135 Init] RPC call failed after multiple retries./ubring_performance_server -use_ubring=true
I0526 23:00:15.982886 97452 0 /brpc/src/brpc/ubshm/shm/shm_mgr.cpp:72 ShmMgrInit] shm mgr init success, shm type=1
I0526 23:00:15.997779 97452 0 /brpc/src/brpc/server.cpp:1232 StartInternal] Server[test::PerfTestServiceImpl] is serving on port=8002.
I0526 23:00:15.998154 97452 0 /brpc/src/brpc/server.cpp:1235 StartInternal] Check out http://k8s-al-sh-gpu-rdma-h20-0032:8002 in web browser.
I0526 23:00:46.670268 97457 4294969601 /brpc/src/brpc/ubshm/ub_ring.cpp:1021 UbrTrxCloseCheck] Trx close skipped, already closing, trx local name=UBRING_127.0.0.1:35304_S
I0526 23:00:46.670297 97457 4294969601 /brpc/src/brpc/ubshm/ub_ring.cpp:62 UbrTrxClose] Trx close skipped, already closing, local name=UBRING_127.0.0.1:35304_S
I0526 23:00:56.666588 97464 0 /brpc/src/brpc/ubshm/shm/shm_ipc.cpp:185 IpcShmRemoteFree] IPC free remote shm=UBRING_127.0.0.1:35304_C success.
I0526 23:00:56.666952 97464 0 /brpc/src/brpc/ubshm/shm/shm_ipc.cpp:78 IpcShmMunmap] IPC unmap shm=UBRING_127.0.0.1:35304_S length=4194304 success.
I0526 23:00:56.667327 97464 0 /brpc/src/brpc/ubshm/shm/shm_ipc.cpp:185 IpcShmRemoteFree] IPC free remote shm=UBRING_127.0.0.1:35304_C success.
E0526 23:02:17.708842 97484 4294969601 /brpc/src/brpc/ubshm/common/common.h:173 HasTimedOut] task time out 5 seconds.
W0526 23:02:17.708876 97484 4294969601 /brpc/src/brpc/ubshm/ub_ring.cpp:85 UbrTrxClose] Local shm UBRING_127.0.0.1:41854_S wait for the peer to close timed out, force cleanup.
I0526 23:02:17.709291 97484 4294969601 /brpc/src/brpc/ubshm/shm/shm_ipc.cpp:185 IpcShmRemoteFree] IPC free remote shm=UBRING_127.0.0.1:41854_C success.
I0526 23:02:17.709631 97484 4294969601 /brpc/src/brpc/ubshm/shm/shm_ipc.cpp:78 IpcShmMunmap] IPC unmap shm=UBRING_127.0.0.1:41854_S length=4194304 success.
I0526 23:02:17.709974 97484 4294969601 /brpc/src/brpc/ubshm/shm/shm_ipc.cpp:185 IpcShmRemoteFree] IPC free remote shm=UBRING_127.0.0.1:41854_C success.
[1] 97452 bus error (core dumped) ./ubring_performance_server -use_ubring=truecoredump: Using host libthread_db library "/usr/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `./ubring_performance_server -use_ubring=true'.
--Type <RET> for more, q to quit, c to continue without paging--
Program terminated with signal SIGBUS, Bus error.
#0 __memset_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:228
warning: 228 ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory
[Current thread is 1 (Thread 0x7f69dcff96c0 (LWP 19284))]
(gdb) bt
#0 __memset_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:228
#1 0x0000558a87ac463b in memset (__len=<optimized out>, __ch=0, __dest=<optimized out>)
at /usr/include/x86_64-linux-gnu/bits/string_fortified.h:59
#2 brpc::ubring::ShmLocalCalloc (shm=shm@entry=0x7f68f06f7e90) at /brpc/src/brpc/ubshm/shm/shm_mgr.cpp:117
#3 0x0000558a879e925c in brpc::ubring::UBRing::UbrAllocateServerShm (this=0x7f69ac058900, remote_trx_shm=remote_trx_shm@entry=0x7f68f06f7e40,
local_trx_shm=local_trx_shm@entry=0x7f68f06f7e90) at /brpc/src/brpc/ubshm/ub_ring.cpp:796
#4 0x0000558a879e32e5 in brpc::ubring::UBShmEndpoint::AllocateServerResources (this=this@entry=0x7f66c4023a40,
remote_trx_shm=remote_trx_shm@entry=0x7f68f06f7e40, local_trx_shm=local_trx_shm@entry=0x7f68f06f7e90)
at /brpc/src/brpc/ubshm/ub_endpoint.cpp:712
#5 0x0000558a879e3e6b in brpc::ubring::UBShmEndpoint::ProcessHandshakeAtServer (arg=0x7f66c4023a40)
at /brpc/src/brpc/ubshm/ub_endpoint.cpp:548
#6 0x0000558a877d11b7 in bthread::TaskGroup::task_runner (skip_remained=<optimized out>) at /brpc/src/bthread/task_group.cpp:388
#7 0x0000558a8786d6c1 in bthread_make_fcontext ()
#8 0x0000000000000000 in ?? () |
|
Another crash: ./ubring_performance_client -use_ubring=true -echo_attachment=true -attachment_size=6291456
I0526 23:17:51.313918 98707 0 /brpc/src/brpc/server.cpp:1232 StartInternal] Server[DummyServerOf(./ubring_performance_client)] is serving on port=8001.
I0526 23:17:51.314074 98707 0 /brpc/src/brpc/server.cpp:1235 StartInternal] Check out http://k8s-al-sh-gpu-rdma-h20-0032:8001 in web browser.
[Threads: 1, Depth: 1, Attachment: 6291456B, UBRING: yes, Echo: yes]
I0526 23:17:51.321939 98707 0 /brpc/src/brpc/ubshm/shm/shm_mgr.cpp:72 ShmMgrInit] shm mgr init success, shm type=1
I0526 23:17:51.332043 98719 0 /brpc/src/brpc/ubshm/ub_ring.cpp:269 UbrTrxHBCallback] Heartbeat cannot be started, wait connected state.
Avg-Latency: 0, 90th-Latency: 0, 99th-Latency: 0, 99.9th-Latency: 0, Throughput: 64.0254MB/s, QPS: 0k, Server CPU-utilization: 0%, Client CPU-utilization: 102%
[Threads: 2, Depth: 1, Attachment: 6291456B, UBRING: yes, Echo: yes]
[1] 98707 bus error (core dumped) ./ubring_performance_client -use_ubring=true -echo_attachment=true./ubring_performance_server -use_ubring=true
I0526 23:17:49.302722 98508 0 /brpc/src/brpc/ubshm/shm/shm_mgr.cpp:72 ShmMgrInit] shm mgr init success, shm type=1
I0526 23:17:49.318155 98508 0 /brpc/src/brpc/server.cpp:1232 StartInternal] Server[test::PerfTestServiceImpl] is serving on port=8002.
I0526 23:17:49.318282 98508 0 /brpc/src/brpc/server.cpp:1235 StartInternal] Check out http://k8s-al-sh-gpu-rdma-h20-0032:8002 in web browser.
W0526 23:18:15.647572 98668 8589934810 /brpc/src/brpc/ubshm/ub_endpoint.cpp:480 ProcessHandshakeAtServer] Fail to read Hello Message from client:brpc::Socket{id=234 fd=11 addr=127.0.0.1:51360:8002} (0x7f2c24025030) 127.0.0.1:51360: Got EOF
I0526 23:18:16.331656 98520 0 /brpc/src/brpc/ubshm/ub_ring.cpp:269 UbrTrxHBCallback] Heartbeat cannot be started, wait connected state.
E0526 23:18:20.648439 98660 8589934772 /brpc/src/brpc/ubshm/common/common.h:173 HasTimedOut] task time out 5 seconds.
W0526 23:18:20.648472 98660 8589934772 /brpc/src/brpc/ubshm/ub_ring.cpp:85 UbrTrxClose] Local shm UBRING_127.0.0.1:36514_S wait for the peer to close timed out, force cleanup.
I0526 23:18:21.332054 98660 8589934772 /brpc/src/brpc/ubshm/shm/shm_ipc.cpp:185 IpcShmRemoteFree] IPC free remote shm=UBRING_127.0.0.1:36514_C success.
I0526 23:18:21.332468 98660 8589934772 /brpc/src/brpc/ubshm/shm/shm_ipc.cpp:78 IpcShmMunmap] IPC unmap shm=UBRING_127.0.0.1:36514_S length=4194304 success.
I0526 23:18:21.332848 98660 8589934772 /brpc/src/brpc/ubshm/shm/shm_ipc.cpp:185 IpcShmRemoteFree] IPC free remote shm=UBRING_127.0.0.1:36514_C success.warning: 228 ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory
[Current thread is 1 (Thread 0x7f7db0ff96c0 (LWP 98717))]
(gdb)
(gdb) bt
#0 __memset_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:228
#1 0x000055df0fec824b in memset (__len=<optimized out>, __ch=0, __dest=<optimized out>)
at /usr/include/x86_64-linux-gnu/bits/string_fortified.h:59
#2 brpc::ubring::ShmLocalCalloc (shm=shm@entry=0x7f7d91af4e00) at /brpc/src/brpc/ubshm/shm/shm_mgr.cpp:117
#3 0x000055df0fea20de in brpc::ubring::UBRing::ApplyAndMapLocalShm (this=this@entry=0x7f7d8c031a00,
localTrxShm=localTrxShm@entry=0x7f7d91af4e00, localName=localName@entry=0x7f7d91af4e50 "127.0.0.1:51360")
at /brpc/src/brpc/ubshm/ub_ring.cpp:911
#4 0x000055df0fea25a2 in brpc::ubring::UBRing::UbrAllocateLocalShm (this=0x7f7d8c031a00,
local_trx_shm=local_trx_shm@entry=0x7f7d91af4e00, shm_name=shm_name@entry=0x7f7d91af4e50 "127.0.0.1:51360")
at /brpc/src/brpc/ubshm/ub_ring.cpp:827
#5 0x000055df0fe9aa35 in brpc::ubring::UBShmEndpoint::AllocateClientResources (this=this@entry=0x55df11984ee0,
local_trx_shm=local_trx_shm@entry=0x7f7d91af4e00, shm_name=shm_name@entry=0x7f7d91af4e50 "127.0.0.1:51360")
at /brpc/src/brpc/ubshm/ub_endpoint.cpp:687
#6 0x000055df0fe9ae6a in brpc::ubring::UBShmEndpoint::ProcessHandshakeAtClient (arg=0x55df11984ee0)
at /brpc/src/brpc/ubshm/ub_endpoint.cpp:356
#7 0x000055df0fc09c97 in bthread::TaskGroup::task_runner (skip_remained=<optimized out>)
at /brpc/src/bthread/task_group.cpp:388
#8 0x000055df0fde1571 in bthread_make_fcontext ()
#9 0x0000000000000000 in ?? () |
Co-authored-by: 郭业昌 <lvpengfei@MacBook-Air.local>
|
@chenBright please try it again, I have add some refine code logical |
|
@zchuango Could you add some unit tests for UBShmTransport? |
Yes, I am writing some test cases for UbShmTransport, planning to submit them to unittest in the next phase. Please help write a review approve for the merge. @chenBright |
I think it's best to submit unit tests in this PR. |
chenBright
left a comment
There was a problem hiding this comment.
I'm using the latest code and I'm encountering the same crash as before.
| int32_t g_epollFd = -1; | ||
| std::atomic<uint32_t> g_totalTimerNum; | ||
| TimerFdCtx *g_timerFdCtxMap = NULL; | ||
| uint32_t maxSystemFd; | ||
| static pthread_t g_epollExecuteThread; | ||
| static int32_t g_timerModuleInitialized; |
There was a problem hiding this comment.
- These variables need to be set with default values.
- The variable name should be snake_case.
| maxSystemFd = (uint32_t)rlim.rlim_cur; | ||
|
|
||
| if (g_timerFdCtxMap == NULL) { | ||
| g_timerFdCtxMap = (TimerFdCtx *)malloc(sizeof(TimerFdCtx) * maxSystemFd); |
There was a problem hiding this comment.
g_timerFdCtxMap may consume a lot of memory.
| return atomic_load(&g_totalTimerNum); | ||
| } | ||
|
|
||
| void CloseTimerFd(uint32_t fd) { |
| #define LIKELY(x) __builtin_expect(!!(x), 1) | ||
| #define UNLIKELY(x) __builtin_expect(!!(x), 0) |
There was a problem hiding this comment.
Use BAIDU_LIKELY and BAIDU_UNLIKELY instead.
| uint8_t inner[UBR_MSG_PAYLOAD_LEN]; | ||
| } UbrMsgPayload; | ||
|
|
||
| typedef struct __attribute__((aligned(64))) TagUbrMsgFormat { |
There was a problem hiding this comment.
Use BAIDU_CACHELINE_ALIGNMENT instead.
| #define LOCK_GUARD(mtxPtr) \ | ||
| pthread_mutex_t *__attribute__((cleanup(UnlockMutex))) _mtxPtr = ({ \ | ||
| pthread_mutex_lock(&(mtxPtr)); \ | ||
| &(mtxPtr); \ | ||
| }) |
There was a problem hiding this comment.
Use BAIDU_SCOPED_LOCK or std::lock_guard insteal.
| #define SPIN_LOCK_GUARD(spinLockPtr) \ | ||
| pthread_spinlock_t *__attribute__((cleanup(UnlockSpinLock))) _spinLockPtr = ({ \ | ||
| pthread_spin_lock(&(spinLockPtr)); \ | ||
| &(spinLockPtr); \ | ||
| }) |
There was a problem hiding this comment.
Use BAIDU_SCOPED_LOCK or std::lock_guard insteal.
| extern "C" { | ||
| #endif | ||
|
|
||
| static inline void UnlockMutex(pthread_mutex_t **mtx) |
There was a problem hiding this comment.
The functions and macros defined in this file are not used; it is recommended to remove them.
|
|
||
| RETURN_CODE UbsShmInit(void) | ||
| { | ||
| // 加载libubsm_sdk.so函数指针 |
| if (UNLIKELY(CheckTrxSendPreCheck(_trx) != UBRING_OK)) { | ||
| return UBRING_ERR; | ||
| } | ||
| // 1.2 计算空间 |
Okay, no problem. I'll add it in the next couple of days. @chenBright |
Really? I haven't encountered this problem on my machine, but it's an ARM machine. I'll try running it on an x86 machine first. |
* 修复ubring server端关闭连接coredump问题 * 修复PollIn/PollOut解引用已释放Socket指针的问题 PollIn/PollOut通过ep->_socket(裸指针)读取data socket,当data socket 被销毁时该指针悬空,导致Socket::Address读到垃圾id触发SIGSEGV。 改为存储_socket_id(SocketId),用Address获取引用计数的Socket, 并在整个回调期间持有该引用,避免解引用悬空指针。 * 修复client非正常退出导致UBRING shm残留的问题 client被强杀(SIGTERM/崩溃/OOM)时teardown没跑完,localShm(_C)的 shm_unlink未执行,导致/dev/shm残留_C文件。server的remoteShm只munmap 不unlink(正确),无法清理client的名字。 在握手ESTABLISHED时(client/server都确认对方已mmap自己的localShm) 立即unlink localShm名字。此时对端已持有mmap引用,unlink只删名字不 影响通信;进程任意时刻退出都不会残留文件名。 * Address chenBright's review: use English comments and BAIDU_CACHELINE_ALIGNMENT - Convert all Chinese comments in ubshm to English (per chenBright's 'Please use English' on ub_endpoint.cpp:723, ub_ring.cpp:337, shm_ubs.cpp:316, and similar) - Replace __attribute__((aligned(64))) with BAIDU_CACHELINE_ALIGNMENT in ubr_msg.h (per chenBright's comment on ubr_msg.h:41) - Remove unnecessary TODO comment in ub_ring.cpp:551 (per chenBright's 'Unnecessary comments, please delete') * Remove unused lock macros in thread_lock.h Per chenBright's review, the functions and macros defined in thread_lock.h are largely unused. Verified usage across ubshm: - LOCK_GUARD / UnlockMutex: 8 call sites in shm_ubs.cpp and ub_ring_manager.cpp, kept. - SPIN_LOCK_GUARD, R_LOCK_GUARD, W_LOCK_GUARD, SEMAPHORE_WAIT_GUARD, SEMAPHORE_WAIT_GUARD_WITH_CLOSE and their helper functions (UnlockSpinLock, UnlockRWLock, PostSem, PostSemWithClose): 0 call sites, removed. * Apply chenBright's review on timer_mgr globals Per chenBright's review on timer_mgr.cpp:32-37: - Add explicit default values to uninitialized globals (g_total_timer_num=0, g_max_system_fd=0, g_epoll_execute_thread=0, g_timer_module_initialized=0) - Rename globals to snake_case (g_epollFd -> g_epoll_fd, g_totalTimerNum -> g_total_timer_num, g_timerFdCtxMap -> g_timer_fd_ctx_map, maxSystemFd -> g_max_system_fd, g_epollExecuteThread -> g_epoll_execute_thread, g_timerModuleInitialized -> g_timer_module_initialized) - maxSystemFd also gains the g_ prefix to match global naming style Also fix the missing std:: qualifier on atomic_fetch_sub/add/load (per chenBright's earlier comment on timer_mgr.cpp:80). * Change CloseTimerFd fd type from uint32_t to int Per chenBright's review on timer_mgr.cpp:399 (uint32_t -> int). fd is a system file descriptor; POSIX APIs use int and -1 denotes an invalid fd, which uint32_t cannot represent. Changed the CloseTimerFd signature (header + definition) and removed the now-unnecessary (uint32_t) casts at the two call sites. * Use BAIDU_LIKELY/BAIDU_UNLIKELY instead of custom __builtin_expect Per chenBright's review on common.h:27. Rather than redefine the macros with __builtin_expect directly, forward LIKELY/UNLIKELY to brpc's standard BAIDU_LIKELY/BAIDU_UNLIKELY (from butil/compiler_specific.h). The 122 call sites keep using LIKELY()/UNLIKELY() unchanged; only the macro bodies change, preserving semantics. * Add unit tests for UBShmEndpoint Per chenBright's request to add unit tests for UBShmTransport in this PR (rather than a follow-up). Adds test/brpc_ubring_unittest.cpp with tests covering the public interface of UBShmEndpoint under the g_skip_ub_init=true mode (which skips real shared-memory/poller setup): - construct_and_destruct: lifecycle safety - is_writable_false_when_skip_init: skip-mode behavior - reset_is_idempotent: Reset() is safe to call repeatedly The file follows the brpc_*_unittest.cpp naming convention so it is auto-collected by test/CMakeLists.txt's file(GLOB). Verified: compiles, links, and all 3 tests pass (g++ 15.2, C++17, gtest, BRPC_WITH_UBRING=ON). * Rewrite UBShmEndpoint unit tests with real coverage Per chenBright's feedback that the previous tests were too simple and did not cover the main methods. Source changes to enable testing: - Move HelloMessage struct declaration from ub_endpoint.cpp to ub_endpoint.h so tests can access it - Expose private members under #ifdef UNIT_TEST (precedent: butil/containers/stack_container.h) so tests can call AllocateClientResources without -Dprivate=public (which breaks GCC 15 + new libstdc++ <any>/<sstream>) Tests (9, all passing on Ubuntu 26.04 g++ 15.2 C++17 gtest): HelloMessageTest (5): serialize/deserialize roundtrip, network byte order verification, uint64 max boundary, full shm_name, toString UBShmEndpointTest (4): construct, real IPC shm AllocateClientResources (g_skip_ub_init=false), reset cleanup, reset idempotency
What problem does this PR solve?
Issue Number: #3226 #3167 #3217
Problem Summary:
After recent efforts, the UB-Ring framework has been successfully integrated with the BRPC transport framework. Currently, high-performance and low-latency communication based on the load/store (LD/ST) semantics is supported. I feel happy be able to contribute this to the community and look forward to receiving feedback and reviews. @wwbmmm @chenBright
What is changed and the side effects?
Changed:
Side effects:
Performance effects: NAN
Breaking backward compatibility:
Check List: