LLM inference callback support by SteveSandersonMS · Pull Request #1689 · github/copilot-sdk

SteveSandersonMS · 2026-06-16T18:23:39Z

Summary

This PR adds SDK support for intercepting LLM inference requests and handling them in user code across all six SDK languages: Node.js, .NET, Python, Go, Rust, and Java.

Consumers register one client-global LlmRequestHandler (constructed once, no args). The runtime invokes it over JSON-RPC (llmInference.*) whenever it would otherwise issue a model-layer HTTP or WebSocket request — for both BYOK and CAPI — fully replacing the outbound call. A handler that overrides nothing is a transparent pass-through.

It includes the full feature work on this branch:

wire up LLM inference provider registration and generated RPC types
add the raw chunked callback protocol for outbound inference requests and responses
cover plain HTTP, streaming responses, cancellation (runtime- and consumer-initiated), error mapping, session-id threading, and WebSocket transport
port the feature to every SDK language with idiomatic HTTP types
keep the public callback surface to a single LlmRequestHandler model with forwarding helpers

What changed

Shared protocol and plumbing

add generated RPC / session event types needed for LLM inference callbacks across all SDK surfaces
add SDK-side registration for a process-global LLM inference provider (llmInference.setProvider)
route outbound inference requests through the callback bridge instead of requiring provider-specific hooks

Per-language ports

Each language exposes the same LlmRequestHandler model, mapped onto the most canonical HTTP representation available in that ecosystem:

Language	HTTP request/response type	WebSocket type
Node.js	`Request` / `Response` (Fetch)	per-connection handler
.NET	`HttpRequestMessage` / `HttpResponseMessage`	`ClientWebSocket`
Python	`httpx` request/response	per-connection handler
Go	`http.Request` / `http.Response`	per-connection handler
Rust	`http::Request` / `http::Response`	per-connection handler
Java	`java.net.http` `HttpRequest` / `HttpResponse`	`java.net.http.WebSocket`

All ports thread cancellation and session id through the request context, and provide a ForwardingWebSocketHandler (or equivalent) for the common mutate-and-forward case.

API shape

collapse HTTP interception to a single send hook (SendRequestAsync / sendRequest / sendHttp / language equivalents)
expose WebSocket interception through OpenWebSocketAsync / openWebSocket, returning a per-connection handler object
allow consumers to mutate, drop, duplicate, or fully replace request/response messages while keeping the common forwarding case straightforward

Usage examples

C#

using GitHub.Copilot;
using System.Net.Http;

sealed class MyHandler : LlmRequestHandler
{
    protected override async Task<HttpResponseMessage> SendRequestAsync(
        HttpRequestMessage request,
        LlmRequestContext ctx)
    {
        request.Headers.Add("X-Debug-Session", ctx.SessionId ?? "none");
        return await base.SendRequestAsync(request, ctx);
    }

    protected override Task<CopilotWebSocketHandler> OpenWebSocketAsync(LlmRequestContext ctx)
        => Task.FromResult<CopilotWebSocketHandler>(new MyForwardingSocket(ctx));
}

sealed class MyForwardingSocket : ForwardingWebSocketHandler
{
    public MyForwardingSocket(LlmRequestContext ctx)
        : base(ctx)
    {
    }

    public override Task SendRequestMessageAsync(LlmWebSocketMessage message)
    {
        var text = message.GetText().Replace("model-A", "model-B");
        return base.SendRequestMessageAsync(LlmWebSocketMessage.Text(text));
    }
}

Node.js

import {
    ForwardingWebSocketHandler,
    LlmRequestContext,
    LlmRequestHandler,
} from "@github/copilot";

class MyHandler extends LlmRequestHandler {
    protected override async sendRequest(request: Request, ctx: LlmRequestContext): Promise<Response> {
        const headers = new Headers(request.headers);
        headers.set("x-debug-session", ctx.sessionId ?? "none");

        return super.sendRequest(new Request(request, { headers }), ctx);
    }

    protected override async openWebSocket(ctx: LlmRequestContext) {
        return new MyForwardingSocket(ctx);
    }
}

class MyForwardingSocket extends ForwardingWebSocketHandler {
    override sendRequestMessage(data: string | Uint8Array) {
        if (typeof data === "string") {
            return super.sendRequestMessage(data.replace("model-A", "model-B"));
        }
        return super.sendRequestMessage(data);
    }
}

Java

import com.github.copilot.ForwardingWebSocketHandler;
import com.github.copilot.LlmRequestContext;
import com.github.copilot.LlmRequestHandler;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.io.InputStream;

final class MyHandler extends LlmRequestHandler {
    @Override
    protected HttpResponse<InputStream> sendHttp(HttpRequest request, LlmRequestContext ctx) throws Exception {
        HttpRequest mutated = HttpRequest.newBuilder(request, (n, v) -> true)
                .header("X-Debug-Session", ctx.sessionId() == null ? "none" : ctx.sessionId())
                .build();
        return super.sendHttp(mutated, ctx);
    }

    @Override
    protected CopilotWebSocketHandler openWebSocket(LlmRequestContext ctx) {
        return new ForwardingWebSocketHandler(ctx.url(), ctx.headers()) {
            @Override
            protected byte[] onSendRequestMessage(byte[] data, boolean binary) {
                String text = new String(data, java.nio.charset.StandardCharsets.UTF_8).replace("model-A", "model-B");
                return text.getBytes(java.nio.charset.StandardCharsets.UTF_8);
            }
        };
    }
}

Register the handler when constructing the client, e.g. in Java:

CopilotClient client = new CopilotClient(
    new CopilotClientOptions().setLlmInference(new LlmInferenceConfig().setHandler(new MyHandler())));

Tests

Each language adds e2e coverage (mirroring a shared reference suite) for:

callback provider registration
HTTP inference interception
streaming inference interception
error mapping
runtime-initiated and consumer-initiated cancellation
session-id threading
WebSocket callback handling
an idiomatic handler test exercising mutate-and-forward over both HTTP and WebSocket

Plus per-language unit coverage where applicable (e.g. .NET handler/adapter behavior, Node.js unit flows).

Resolves github/copilot-sdk-internal#88

github-advanced-security

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

SteveSandersonMS · 2026-06-22T10:39:41Z

+    /// request/response.
+    /// </summary>
+    public LlmRequestHandler? Handler { get; set; }
+}


Why do we have this wrapper type and not just put LlmRequestHandler directly onto the create/resume config?

SteveSandersonMS · 2026-06-22T10:40:02Z

  </PropertyGroup>

+    <ItemGroup>
+        <InternalsVisibleTo Include="GitHub.Copilot.SDK.Test" />


Is this IVT really needed?

SteveSandersonMS · 2026-06-22T10:40:18Z

+/// model-layer request.
+/// </summary>
+[Experimental(Diagnostics.Experimental)]
+public enum LlmInferenceTransport


Why is this public when everything else in here is internal? Maybe it could move to the file with all the public API.

SteveSandersonMS · 2026-06-22T10:44:03Z

+        // Return from httpRequestStart immediately (after registering state) so
+        // the runtime's RPC reply is not gated on the consumer's I/O. The actual
+        // provider work runs asynchronously.
+        _ = RunProviderAsync(llmRequest, state, sink);


Why are you doing that? What's wrong with holding the RPC reply until the consumer's I/O completes? Isn't tat necessary to achieve natural backpressure and error propagation?

Altogether I'm surprised by how much complexity there is in this file and wonder if you're trying to make it too clever. Why isn't there a significantly simpler way of adapting the RPC protocol to the public base class, for example by eliminating any separate queueing/buffering/etc?

SteveSandersonMS · 2026-06-22T10:45:03Z

+            }
+
+            return result;
+        }


Seems really odd to have an adapter like this when the source data is already in the right format. What stops us from simplifying all this?

Adds an opt-in llmInference config to CopilotClientOptions that lets SDK consumers register a callback the runtime invokes whenever it would otherwise issue an outbound non-streaming LLM HTTP request itself. v1 scope is TS-only/non-streaming, mirroring the runtime support added in github/copilot-agent-runtime. Streaming SSE and WebSocket transports are out of scope for v1 and continue to bypass the callback. - New `LlmInferenceProvider` interface with a single `onLlmRequest` method. - `createLlmInferenceAdapter` converts the provider into the wire-shape `LlmInferenceHandler` consumed by the RPC dispatcher. - Client wiring: `llmInference.setProvider` is sent on connect; per-session adapter is attached alongside the existing sessionFs hook. - New `llm_inference.e2e.test.ts` exercises the full RPC round-trip against the runtime. Resolves github/copilot-sdk-internal#88 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Matches the runtime move of `llmInference.httpRequest` out of the session-scoped client API and onto a new `clientGlobal` schema root. - Codegen emits a new `registerClientGlobalApiHandlers` alongside the existing `registerClientSessionApiHandlers`. Handlers passed to it are dispatched directly (no per-session `getHandlers` callback) and carry no implicit sessionId — sessionId, when present, is just a payload field on the call. - `CopilotClient` now constructs the LLM inference adapter once and registers it process-wide via `registerClientGlobalApiHandlers` during connection setup. The per-session `setupLlmInference` path and the `SessionConfigBase.createLlmInferenceProvider` override are removed — there is no longer any per-session notion of which provider to use. - `LlmInferenceConfig.createLlmInferenceProvider` is now `() => LlmInferenceProvider` (was `(session) => ...`). - `LlmInferenceRequest` exposes the new optional `sessionId` field so consumers can correlate requests with a runtime session when one is in scope. E2E test updated to verify the global registration works and that sessionId is populated on in-session traffic. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

With the Rust runtime intercept chokepoint in place, every model-layer HTTP request - including /models and /models/session - is now dispatched through the SDK callback. Update the e2e test to: - Stub realistic responses for non-streaming model catalog and session endpoints (so the runtime can proceed past model resolution). - Hard-assert the catalog request is intercepted (no more 'either-or' fallback for the pre-rust-intercept state). Streaming inference requests still pass through to the recorded CAPI proxy; a fully-mocked end-to-end inference test will land alongside the streaming-intercept commit. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Extends LlmInferenceProvider with an optional onLlmStreamRequest method that returns a response head synchronously and pushes body chunks via the provided sink. The adapter implements the generated httpStreamStart RPC method and forwards chunks back to the runtime via the typed server-RPC client (llmInference.streamChunk / streamEnd). Adds a fully-mocked e2e test (test/e2e/llm_inference_stream.e2e.test.ts) that drives a complete user->assistant turn through the callback alone: the runtime hits the callback for /models, /models/session, and the chat completion itself, the assistant text returned to the SDK consumer is the synthetic text supplied by the stub. - nodejs/src/llmInferenceProvider.ts: LlmInferenceStreamSink, onLlmStreamRequest, httpStreamStart adapter - nodejs/src/client.ts: pass a lazy server-RPC accessor into the adapter - nodejs/src/index.ts: re-export new types - nodejs/test/e2e/llm_inference_stream.e2e.test.ts: full-mock e2e - nodejs/src/generated/*, python/*, go/*, rust/*: codegen for new RPC methods - dotnet/src/Generated/*: codegen for new RPC methods Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Adds test/e2e/llm_inference_errors.e2e.test.ts that wires a callback whose inference handler throws a synthetic transport error and verifies the failure surfaces to the SDK consumer (the call does not hang and any error caught is non-empty). Confirms the runtime's existing retry / error reporting path handles callback-side failures the same way it handles real transport failures. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Mirrors the runtime-side cleanup: the callback wire no longer carries providerType / endpointKind / wireApi / transport / modelId. Adapter stops forwarding the field, e2e tests filter by URL instead of metadata, and the missing LlmInferenceStreamSink / LlmInferenceStreamStartResponse re-exports in types.ts are added so index.ts type-checks cleanly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

[Phase 3] Realign the Node SDK with the runtime's new four-method chunk protocol. One unified provider callback: interface LlmInferenceProvider { onLlmRequest(req: LlmInferenceRequest): Promise<void>; } LlmInferenceRequest exposes: * url / method / headers / sessionId * requestBody: AsyncIterable<Uint8Array> // body delivered as chunks * responseBody: LlmInferenceResponseSink // start/write/end/error The sink enforces start -> 0..N writes -> exactly one of end/error and maps each call to the corresponding httpResponseStart / httpResponseChunk RPC. createLlmInferenceAdapter maintains a per-requestId state map; the generated httpRequestStart handler registers state synchronously and fires onLlmRequest in the background, so the runtime's RPC reply isn't gated on consumer I/O. The body queue iterator now latches a 'done' flag so a consumer that calls .next() again after end:true gets done back instead of blocking forever waiting for chunks the runtime will never send. Removes the previous onLlmRequest + onLlmStreamRequest split and the LlmInferenceResponse / LlmInferenceStreamSink / LlmInferenceStreamStartResponse public types. All three e2e tests rewritten against the unified callback (one of them URL-dispatches /responses -> SSE and /chat/completions -> buffered JSON; the consumer can also branch on whether the request body has stream:true). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Phase 4.1: expose an AbortSignal on the request envelope, abort it on a cancel chunk from the runtime, and map consumer-side aborts to a 499 + error{code:cancelled} response. Adds the cancellation e2e test. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add an e2e test asserting that when the SDK consumer signals a terminal error via responseBody.error({ code: 'cancelled' }) the runtime surfaces it faithfully as a request failure rather than hanging. Completes the consumer->runtime direction of Phase 4.1. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Surface the new `transport` discriminator on `LlmInferenceRequest` so consumers can tell an `"http"` request (plain HTTP / SSE) from a `"websocket"` one (full-duplex: each request-body chunk is one inbound WS message, each response-body write one outbound message). The adapter threads `params.transport` through, defaulting to `"http"`. Regenerate rpc.ts against the runtime schema for the new field and add an e2e test exercising the full-duplex path: the fake model advertises `ws:/responses`, the runtime's WebSocket flag is enabled via env var, and the consumer pumps `/responses` events back per inbound message. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Friendly product-code starting point for SDK consumers who want to observe or mutate LLM inference requests/responses by overriding virtual methods on a base class. Implements LlmInferenceProvider, so an instance can be returned directly from createLlmInferenceProvider. Default behaviour is a transparent pass-through: each request is forwarded to its original URL via the WHATWG fetch global (HTTP) or WebSocket global (WebSocket), and the upstream response is streamed back unchanged. The same subclass handles both transports - onLlmRequest dispatches on req.transport. Virtual hooks: - HTTP: transformRequest, forward, transformResponse - WebSocket: forwardWebSocket, transformRequestMessage, transformResponseMessage E2e test (llm_inference_handler.e2e.test.ts) demonstrates a single TestHandler subclass servicing both an HTTP turn (single-shot title generation) and a WebSocket turn (main agent turn) against a per-test in-process http+ws upstream that speaks the real CAPI shapes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Review fixes for github/copilot-sdk-internal#88 (Node SDK side). - Honor the runtime's accepted=false ack: the response sink now aborts the provider's signal and stops emitting once the runtime drops the request (I1). - Add a staging backstop in the adapter so a body chunk that arrives before its start frame is buffered and replayed rather than silently dropped (B1). - Run the WebSocket request/response pumps concurrently and race their terminal states, so an upstream-closes-first (or runtime-cancels-first) case tears the other side down instead of hanging on a parked iterator (B2). - Buffer inbound WS frames in wrapGlobalWebSocket until onMessage is registered so the first frames of a fast upstream aren't dropped. - Collapse the dead send branch, hoist TextEncoder/TextDecoder singletons, and correct the LlmWebSocketUpstream.onClose contract doc. - Update CopilotClientOptions.llmInference docs: streaming SSE and WebSocket are intercepted, not bypassed (I6). - Add unit tests: chunk-before-start staging, accepted=false abort, WS upstream-close-first finalisation, and WS upstream-error propagation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Drives a CAPI session and a BYOK (openai/responses) session entirely through the LLM inference callback — the consumer fabricates every model-layer response, so the CAPI record/replay proxy is never the inference endpoint. Asserts each in-session inference request carries req.sessionId === session.sessionId and that the two session ids differ. The mock branches /responses on the request stream flag: BYOK turns whose config-derived model does not advertise streaming issue a buffered (non-streaming) /responses request expecting a single JSON response object, whereas the CAPI turn streams via SSE. This mirrors real upstream behaviour and confirms the callback transport faithfully delivers both shapes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Mirrors the TypeScript LLM inference callback feature in the .NET SDK so consumers can observe/mutate the model-layer HTTP/WebSocket requests the runtime issues (CAPI and BYOK), with the runtime session id threaded into each callback. - scripts/codegen/csharp.ts: emit the clientGlobal handler interface + registration so Rpc.cs gains the llmInference handler surface. - LlmInferenceProvider.cs: low-level ILlmInferenceProvider API + adapter (request staging, response sink state machine) behind an internal ILlmInferenceResponseChannel seam for unit testing. - LlmRequestHandler.cs: idiomatic pass-through base class mapping to HttpRequestMessage/HttpResponseMessage and ClientWebSocket, with virtual transform/forward hooks for both transports. - Types.cs/Client.cs: wire LlmInferenceConfig into the client and register the provider on start. - Tests: factored unit-test infra (recording channel/sink, inline provider, frame builders) with adapter + handler tests, plus CAPI+BYOK e2e tests asserting the session id reaches the callback. e2e provider emits raw JSON (reflection-free STJ) and serves all model-layer traffic off-network. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Hide the redundant low-level provider interface and adapter from the public surface in both SDKs; the sole public extension point is now the LlmRequestHandler base class. Replace the LlmInferenceConfig provider factory with a direct handler instance (the provider is client-global, constructed once with no args). .NET: ILlmInferenceProvider + the LlmInferenceRequest/ResponseInit/ResponseSink DTOs become internal; LlmRequestHandler implements the interface explicitly so OnLlmRequestAsync leaves its public surface. LlmInferenceConfig.Handler replaces the Func<LlmRequestHandler> factory. TS: stop exporting LlmInferenceProvider and createLlmInferenceAdapter from index.ts; LlmInferenceConfig.handler replaces createLlmInferenceProvider. The request/sink DTOs stay exported as onLlmRequest's contract (TS lacks explicit interface implementation). E2E providers become LlmRequestHandler subclasses overriding onLlmRequest. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Collapse the HTTP callback seam to SendRequest/sendRequest, replace websocket hooks with per-connection handlers, and update tests to use the forwarding handler model. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Port the LLM inference callback feature to the Python SDK, mirroring the existing Node.js and .NET implementations. Consumers subclass `LlmRequestHandler` and override `send_request` (idiomatic httpx) for HTTP or `open_web_socket` (websockets) for the WebSocket transport; both default to transparent pass-through. Wired through `LlmInferenceConfig` on the client, registered on the `clientGlobal.llmInference` scope. Adds the low-level provider/adapter, the httpx-based handler base class, client wiring, public exports, and httpx as a core dependency. Extends the Python codegen to emit clientGlobal handler registration and regenerates the generated RPC bindings. Includes 8 e2e test files (10 tests) mirroring the Node.js suite — round trip, session-id threading (CAPI + BYOK), streaming SSE, error mapping, runtime cancel, consumer cancel, WebSocket transport, and the idiomatic handler against a real local HTTP+WebSocket upstream. All pass off-network. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Mirror the existing Node/.NET/Python LLM inference callback support in the Go SDK. Consumers register an LlmInferenceProvider (or the idiomatic LlmRequestHandler over net/http + coder/websocket) via ClientOptions.LlmInference; the runtime routes every model-layer HTTP and WebSocket request through it for both CAPI and BYOK sessions. - Codegen (scripts/codegen/go.ts) now emits the clientGlobal handler registration, regenerating go/rpc/zrpc.go. - New low-level provider types + adapter (llm_inference_provider.go) and the idiomatic forwarding handler (llm_request_handler.go). - Wire LlmInferenceConfig into ClientOptions and the connect/start paths. - 8 off-network e2e scenarios mirroring the other SDKs (basic, session id, stream, errors, cancel, consumer cancel, websocket, handler). Also fixes a pre-existing Go e2e compile break (AttachmentBlob.Data became *string in the Rust contract regen baseline) that blocked the e2e package. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions

Generated by SDK Consistency Review Agent for issue #1689 · sonnet46 3.4M

github-actions · 2026-06-22T15:05:05Z

+/// override <see cref="SendRequestAsync"/> or <see cref="OpenWebSocketAsync"/>.
+/// </summary>
+[Experimental(Diagnostics.Experimental)]
+public class CopilotRequestHandler


Cross-SDK naming note: This class is named CopilotRequestHandler (along with CopilotRequestContext and ForwardingCopilotWebSocketHandler), following the Node.js convention. However, the three other SDKs added in this same PR (Go, Java) and the two pre-existing ones (Python, Rust) all use LlmRequestHandler / LlmRequestContext / ForwardingWebSocketHandler.

After this PR the naming is 4-vs-2: LlmRequestHandler (Python, Rust, Go, Java) vs CopilotRequestHandler (Node.js, .NET). If this split is intentional (grouping .NET with Node.js), that's fine, but it would be worth noting somewhere—otherwise it may be worth aligning .NET with the majority Llm* naming.

github-actions · 2026-06-22T15:05:05Z

+    /// outbound call.
+    /// </summary>
+    [Experimental(Diagnostics.Experimental)]
+    public CopilotRequestHandler? RequestHandler { get; set; }


Cross-SDK config API note: The handler is set here as a flat RequestHandler CopilotRequestHandler? property (same pattern as Node.js's requestHandler). The other four SDKs (Python, Rust, Go, Java) all wrap it in an LlmInferenceConfig config object, e.g.:

// Java options.setLlmInference(new LlmInferenceConfig().setHandler(new MyHandler()));

// Go options.LlmInference = &LlmInferenceConfig{Handler: &MyHandler{}}

# Python options.llm_inference = LlmInferenceConfig(handler=MyHandler())

After this PR the pattern is 4-vs-2: config wrapper (Python, Rust, Go, Java) vs direct property (Node.js, .NET). Developers targeting multiple SDKs will encounter a different configuration shape for the same feature. Worth a deliberate decision here.

github-actions · 2026-06-22T15:05:05Z

+     * @throws Exception
+     *             if the request could not be completed
+     */
+    protected HttpResponse<InputStream> sendHttp(HttpRequest request, LlmRequestContext ctx) throws Exception {


Minor cross-SDK naming note: This method is named sendHttp (like Rust's send_http), but Node.js and Python — both also in the LlmRequestHandler family — name the equivalent hook sendRequest / send_request. All three ultimately receive an "HTTP request" object, so the naming is semantically equivalent, but within the LlmRequestHandler group it differs between languages:

SDK HTTP hook name

Python send_request(httpx.Request, ctx)

Rust send_http(LlmHttpRequest, ctx)

Java sendHttp(HttpRequest, ctx)

Suggest clarifying whether sendHttp or sendRequest should be the canonical hook name for the LlmRequestHandler family (it won't matter for users of a single SDK, but affects multi-language teams reading docs side-by-side).

Mirror the .NET/Node simplification + terminology rename in the Go SDK: consolidate the provider/handler two-layer design into a single copilot_request_handler.go, drop the accepted:false ack plumbing and the staged backstop, emit the WebSocket 101 upgrade head eagerly (a lazy bridge deadlocks the runtime connect), and rename the public Llm* types to Copilot* (types carry the prefix; fields/methods stay succinct). The client option becomes ClientOptions.RequestHandler *CopilotRequestHandler. Generated wire types are untouched. Consolidate the e2e suite to three files (copilot_request_handler covering HTTP + WebSocket + streaming, copilot_request_session_id, and copilot_request_cancel_error with the error and runtime-cancel cases) plus a shared helpers file, replacing the nine llm_inference_* test files. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Mirror the .NET/Node simplification + terminology rename in the Java SDK: fold the low-level provider/adapter indirection and the response sink/init/ body DTOs into CopilotRequestHandler plus an internal exchange, drop the accepted:false ack plumbing and the staged backstop, emit the WebSocket 101 upgrade head eagerly (a lazy bridge deadlocks the runtime connect), and rename the public Llm* types to Copilot* (types carry the prefix; methods stay succinct). The client option becomes CopilotClientOptions.requestHandler of type CopilotRequestHandler; LlmInferenceConfig is removed. Generated wire types are untouched. Consolidate the e2e suite to three tests (CopilotRequestHandlerE2ETest covering HTTP + WebSocket + streaming, CopilotRequestSessionIdE2ETest, and CopilotRequestCancelErrorE2ETest with the error and runtime-cancel cases) plus CopilotRequestTestSupport, replacing the eight LlmInference*E2ETest files. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Mirror the .NET/Node simplification + terminology rename in the Python SDK: consolidate the provider/handler two-layer design into a single copilot/copilot_request_handler.py, drop the accepted:false ack plumbing and the staged backstop, emit the WebSocket 101 upgrade head eagerly (a lazy bridge deadlocks the runtime connect), and rename the public Llm* types to Copilot* (types carry the prefix; attributes/methods stay succinct). The client option becomes request_handler: CopilotRequestHandler. Generated wire types are untouched. Stream in-memory httpx responses (built with content=) by forwarding their buffered body, since their raw stream is already consumed and cannot be re-iterated; real streamed responses still flow through aiter_raw. Consolidate the e2e suite to three files (test_copilot_request_handler covering HTTP + WebSocket + streaming, test_copilot_request_session_id, and test_copilot_request_cancel_error with the error and runtime-cancel cases) plus a shared helpers module, replacing the eight test_llm_inference_* files. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Mirror the .NET/Node simplification + terminology rename in the Rust SDK: consolidate the inference/dispatch/handler modules into a single copilot_request_handler.rs with one CopilotRequestHandler trait (default methods) plus an internal exchange, drop the accepted:false ack plumbing and the staged backstop (cancellation flows via CancellationToken), emit the WebSocket 101 upgrade head eagerly (a lazy bridge deadlocks the runtime connect), and rename the public Llm* types to Copilot* (types carry the prefix; fields/methods stay succinct). The client option becomes ClientOptions.request_handler. Generated wire types are untouched. Consolidate the e2e coverage into copilot_request_handler.rs with four tests mirroring the Node scenarios (HTTP + WebSocket + streaming via one handler, session-id threading, handler errors, and runtime-driven cancel), replacing the llm_inference.rs e2e module. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

+                !inference.is_empty(),
+                "expected at least one intercepted inference request"
+            );
+            for (_, session_id) in &inference {


+                inference.len() > before,
+                "expected at least one intercepted BYOK inference request"
+            );
+            for (_, session_id) in &inference[before..] {


+                handler.inference_attempts.load(Ordering::SeqCst) > 0,
+                "expected the inference callback to be reached and raise"
+            );
+            if let Err(err) = send_result {


+            if self._upstream is not None:
+                try:
+                    await self._upstream.close()
+                except Exception:


+        task.cancel()
+        try:
+            await task
+        except (asyncio.CancelledError, Exception):


+        finally:
+            try:
+                await client.stop()
+            except Exception:


+    is_inference_url,
+    isolated_client_fixture,
+)
+from .testharness import E2ETestContext  # noqa: F401  (ctx fixture dependency)


+            # The callback throws on inference; the turn surfaces an error (or
+            # completes without an assistant message) rather than hanging.
+            await session.send_and_wait("Say OK.")
+        except BaseException:  # noqa: BLE001


+    finally:
+        try:
+            await client.stop()
+        except Exception:


+    is_inference_url,
+    isolated_client_fixture,
+)
+from .testharness import E2ETestContext  # noqa: F401  (ctx fixture dependency)


The branch had dropped the net472 `<Compile Include="..\src\Polyfills\*.cs">` from the test project, on the assumption that the SDK's internal polyfills were visible via InternalsVisibleTo. No InternalsVisibleTo to the test assembly actually exists, so the net472 test build lost its IsExternalInit definition and failed with CS0518 across every file using init-only setters and records (the .NET SDK Tests (windows-latest) job). net8.0 builds were unaffected because IsExternalInit ships in the BCL there. Restore the direct polyfill compile for net472 to match main; net472 now builds clean (0 warnings, 0 errors). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR introduces SDK-wide support for intercepting the runtime’s outbound model-layer LLM inference traffic (HTTP/SSE + WebSocket) by registering a single client-global handler over JSON-RPC (llmInference.*). It extends the shared codegen pipeline to generate client-global reverse-RPC handlers and wires language-specific handler abstractions plus E2E coverage across Node.js, .NET, Python, Go, Rust, and Java.

Changes:

Extend RPC schema/codegen to support a new clientGlobal API surface and generate registration helpers for llmInference.*.
Add per-language “request handler” APIs/adapters to forward/replace runtime model HTTP + WebSocket traffic, including cancellation and session-id threading.
Add cross-language E2E tests validating session-id propagation, cancellation/error behavior, and forwarding behavior for both HTTP and WebSocket paths.

Show a summary per file

File	Description
scripts/codegen/utils.ts	Adds `clientGlobal` support to the shared API schema shape.
scripts/codegen/typescript.ts	Generates TS client-global handler interfaces + registration function.
scripts/codegen/python.ts	Generates Python client-global handler Protocols/dataclass + registration function.
scripts/codegen/go.ts	Generates Go client-global handler interfaces + registration function.
scripts/codegen/csharp.ts	Generates C# client-global handler interfaces + registration plumbing.
rust/tests/e2e/support.rs	Adds no-snapshot E2E context and helper to start an LLM handler-wired client.
rust/tests/e2e.rs	Registers new Rust E2E module for request-handler coverage.
rust/src/types.rs	Re-exports request-handler types for the Rust public surface.
rust/src/router.rs	Routes `llmInference.*` requests via a client-global dispatcher rather than per-session routing.
rust/src/lib.rs	Adds request-handler option wiring, early router start for startup model calls, and internal handle reconstruction.
rust/Cargo.toml	Adds HTTP/WebSocket dependencies needed for the Rust forwarding/transport implementation.
python/pyproject.toml	Promotes `httpx` to a runtime dependency and adds `websockets` to dev deps for tests.
python/e2e/test_copilot_request_session_id_e2e.py	Verifies session-id threading into handler for CAPI and BYOK sessions.
python/e2e/test_copilot_request_handler_e2e.py	Validates mutate-and-forward behavior across HTTP + WebSocket via a real upstream.
python/e2e/test_copilot_request_cancel_error_e2e.py	Covers thrown-error mapping and runtime cancellation behavior via `cancel_event`.
python/e2e/_copilot_request_helpers.py	Shared synthetic model endpoint builders and isolated-client fixtures for request-handler tests.
python/copilot/generated/rpc.py	Adds generated client-global handler types and registration for `llmInference.*`.
python/copilot/client.py	Wires request-handler registration and `llmInference.setProvider` into client startup.
python/copilot/init.py	Re-exports request-handler types/helpers as part of the public Python API surface.
nodejs/test/e2e/copilot_request_session_id.e2e.test.ts	Node E2E for session-id threading through the request handler.
nodejs/test/e2e/copilot_request_cancel_error.e2e.test.ts	Node E2E for handler error propagation and runtime-driven cancellation.
nodejs/src/types.ts	Adds `requestHandler` option + exports request-handler types from the Node public surface.
nodejs/src/index.ts	Re-exports request-handler APIs from the Node entrypoint.
nodejs/src/generated/rpc.ts	Adds generated client-global handler interfaces + `registerClientGlobalApiHandlers`.
nodejs/src/client.ts	Registers client-global handlers on the connection and calls `llmInference.setProvider` when configured.
nodejs/package.json	Adds `ws` and `@types/ws` for Node E2E WebSocket upstream tests.
nodejs/package-lock.json	Locks new `ws` / `@types/ws` additions.
java/src/test/java/com/github/copilot/FakeUpstreamServer.java	Adds an in-process raw HTTP/WebSocket upstream for Java forwarding E2E.
java/src/test/java/com/github/copilot/CopilotRequestSessionIdE2ETest.java	Java E2E verifying session-id threading for CAPI + BYOK.
java/src/test/java/com/github/copilot/CopilotRequestHandlerE2ETest.java	Java E2E validating synthetic HTTP servicing and forwarding over HTTP + WebSocket.
java/src/test/java/com/github/copilot/CopilotRequestCancelErrorE2ETest.java	Java E2E for error propagation and runtime cancellation behavior.
java/src/main/java/module-info.java	Makes `java.net.http` a required module dependency (runtime usage).
java/src/main/java/com/github/copilot/rpc/CopilotClientOptions.java	Adds `requestHandler` option to configure the connection-level interception handler.
java/src/main/java/com/github/copilot/LlmWebSocketResponseBridge.java	Ensures WebSocket upgrade head is emitted before any body/terminal frames.
java/src/main/java/com/github/copilot/LlmInferenceExchange.java	Implements per-request body ingestion + strict response emission state machine.
java/src/main/java/com/github/copilot/LlmInferenceAdapter.java	Bridges `llmInference.*` reverse-RPC frames to `CopilotRequestHandler` execution.
java/src/main/java/com/github/copilot/ForwardingCopilotWebSocketHandler.java	Default Java WS forwarding implementation using `java.net.http.WebSocket`.
java/src/main/java/com/github/copilot/CopilotWebSocketMessage.java	Represents WS text/binary messages exchanged via the handler seam.
java/src/main/java/com/github/copilot/CopilotWebSocketHandler.java	Base per-connection WS handler contract for runtime↔upstream message forwarding.
java/src/main/java/com/github/copilot/CopilotWebSocketCloseStatus.java	Carries terminal close/error state for callback-owned WS connections.
java/src/main/java/com/github/copilot/CopilotRequestTransport.java	Exposes the runtime’s intended transport selection (HTTP vs WebSocket).
java/src/main/java/com/github/copilot/CopilotRequestHandler.java	Adds Java handler base class with default HTTP forwarding + WS forwarding hooks.
java/src/main/java/com/github/copilot/CopilotRequestContext.java	Provides request-id/session-id/headers/url/transport/cancellation context to handlers.
java/src/main/java/com/github/copilot/CopilotClient.java	Registers the adapter + calls `llmInference.setProvider` when handler is configured.
go/types.go	Adds `RequestHandler` to Go client options for connection-level interception.
go/rpc/zrpc.go	Adds generated client-global handler types and registration for `llmInference.*`.
go/internal/e2e/copilot_request_session_id_e2e_test.go	Go E2E for session-id threading into intercepted inference calls.
go/internal/e2e/copilot_request_helpers_test.go	Shared synthetic upstream builders for Go request-handler E2Es.
go/internal/e2e/copilot_request_handler_e2e_test.go	Go E2E validating mutate-and-forward over HTTP + WebSocket.
go/internal/e2e/copilot_request_cancel_error_e2e_test.go	Go E2E covering error propagation and runtime cancellation behavior.
go/go.sum	Adds checksum entries for new Go WebSocket dependency.
go/go.mod	Adds `github.com/coder/websocket` for Go WebSocket upstream tests.
go/client.go	Wires client-global handler registration and `LlmInference.SetProvider` when configured.
dotnet/test/E2E/CopilotRequestSessionIdE2ETests.cs	.NET E2E verifying session-id threading for CAPI + BYOK sessions.
dotnet/test/E2E/CopilotRequestE2EProvider.cs	Shared .NET test handler that fabricates model endpoint responses for off-network turns.
dotnet/test/E2E/CopilotRequestCancelErrorE2ETests.cs	.NET E2E for handler error mapping and runtime cancellation propagation.
dotnet/src/Types.cs	Adds experimental `RequestHandler` option to `CopilotClientOptions`.
dotnet/src/Generated/Rpc.cs	Adds generated types and client-global registration for `llmInference.*`.
dotnet/src/Client.cs	Registers client-global handlers and calls `LlmInference.SetProviderAsync` when configured.

Copilot's findings

Files not reviewed (2)

go/rpc/zrpc.go: Generated file
nodejs/package-lock.json: Generated file

Files reviewed: 62/69 changed files
Comments generated: 1

+
+	httpSrv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		path := strings.ToLower(strings.SplitN(r.URL.Path, "?", 2)[0])
+		_ = r.Body.Close


github-actions

Generated by SDK Consistency Review Agent for issue #1689 · sonnet46 3.1M

github-actions · 2026-06-22T17:03:01Z

+		return nil
+	}
+	return f.conn.Write(ctx, websocket.MessageText, out)
+}


Cross-SDK consistency concern: binary WebSocket frame type is lost

The Go adapter decodes binary frames correctly in routeChunk (line ~603: base64-decodes when params.Binary == true) but drops the binary/text flag before pushing into the Body channel. As a result, SendRequestMessage here always forwards as websocket.MessageText, regardless of the original wire frame type.

All other SDKs carry the binary flag through the request message path:

.NET: CopilotWebSocketMessage.IsBinary

Java: CopilotWebSocketMessage.binary()

Python: str | bytes (type encodes the distinction)

Node.js: string | Uint8Array (same)

Rust: CopilotWebSocketMessage.binary: bool

In practice, LLM inference likely only uses text frames, so this may not cause immediate issues. But if binary frames ever appear this will corrupt the upstream message silently. Consider threading the binary flag through the Body channel (e.g. as a struct { data []byte; binary bool }) so SendRequestMessage can use the correct message type.

github-actions · 2026-06-22T17:03:01Z

+type CopilotWebSocketCloseStatus struct {
+	Description string
+	Code        string
+	Err         error


Minor cross-SDK naming inconsistency: Code vs ErrorCode

All other SDKs name this field ErrorCode/errorCode/error_code:

.NET: public string? ErrorCode { get; init; }

Java: public String errorCode()

Python: error_code: str | None

Node.js: readonly errorCode?: string

Go uses the shorter Code, which is not wrong, but this creates a small inconsistency for developers navigating across SDKs. Consider renaming to ErrorCode to match the cross-SDK convention (the Err field name is fine — Err vs Error is established Go idiom).

github-actions · 2026-06-22T17:03:02Z

+        &self,
+        ctx: &CopilotRequestContext,
+        response: CopilotWebSocketResponse,
+    ) -> Result<Box<dyn CopilotWebSocketHandler>, CopilotRequestError> {


API surface difference from other SDKs: extra response parameter

Every other SDK's open_websocket / openWebSocket / OpenWebSocketAsync takes only the context:

// Node.js protected openWebSocket(ctx: CopilotRequestContext): Promise<CopilotWebSocketHandler> // .NET protected virtual Task<CopilotWebSocketHandler> OpenWebSocketAsync(CopilotRequestContext ctx) // Java protected CopilotWebSocketHandler openWebSocket(CopilotRequestContext ctx) // Python async def open_web_socket(self, ctx: CopilotRequestContext) -> CopilotWebSocketHandler // Go OpenWebSocket func(ctx *CopilotRequestContext) (CopilotWebSocketHandler, error)

Rust adds response: CopilotWebSocketResponse as a second argument, which consumers must store and use to write upstream→runtime messages.

This is architecturally necessary for Rust traits (no base-class state for a sendResponseMessage helper), so it isn't a bug. But it's worth documenting clearly in the CopilotRequestHandler trait doc comment — specifically that response must be stored in the returned handler struct and used to call response.send_message(...) in place of the sendResponseMessage base-class helper the other SDKs provide.

+
+    private static int GetFreePort()
+    {
+        var probe = new TcpListener(IPAddress.Loopback, 0);


+        catch
+        {
+            // Accept loop unwinds on listener shutdown.
+        }


+        catch
+        {
+            // Already stopped.
+        }


+            catch
+            {
+                // Already torn down.
+            }


+        catch
+        {
+            // Best-effort: the runtime tears connections down as turns complete.
+        }


+        catch
+        {
+            // Best-effort; the socket may already be closed.
+        }


+        catch (Exception ex)
+        {
+            await CloseAsync(new CopilotWebSocketCloseStatus
+            {
+                Description = ex.Message,
+                Error = ex,
+            }).ConfigureAwait(false);
+        }


+            catch
+            {
+                // Some headers are managed by the handshake; ignore rejections.
+            }


+    private readonly string _url;
+    private readonly IReadOnlyDictionary<string, IReadOnlyList<string>> _headers;
+    private WebSocket? _upstream;
+    private CancellationTokenSource? _pumpCts;


+{
+    private readonly string _url;
+    private readonly IReadOnlyDictionary<string, IReadOnlyList<string>> _headers;
+    private WebSocket? _upstream;


The codegen-check job runs `cargo +nightly fmt --all` with `rust/.rustfmt.nightly.toml` (imports_granularity = "Module") over the whole crate and fails if the tree changes. The hand-written copilot_request_handler.rs was only stable-formatted, leaving two adjacent `tokio::sync` imports unconsolidated. Merge them into the canonical nightly-fmt form. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

staticcheck (SA1019) flags http.ProtocolError as deprecated. The throwing transport only needs to return some transport-level error to exercise the error path, so use errors.New instead. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

rustdoc's redundant_explicit_links lint (deny under -D warnings) flagged the [`CopilotRequestHandler`](crate::copilot_request_handler::CopilotRequestHandler) link, since the label already resolves to the same destination. Drop the explicit target and rely on intra-doc resolution. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The custom JsonRpc reader dispatches each incoming reverse-RPC method fire-and-forget on the thread pool, so llmInference.httpRequestStart and httpRequestChunk for the same request run concurrently. The start handler registered the exchange in _pending; the chunk handler looked it up and silently dropped the frame on a miss. When the dropped frame was the body End, the request-body drain blocked forever, the model HTTP request never completed, and the turn hung until sendAndWait's 60s idle timeout fired. Make the adapter ordering-independent: both httpRequestStart and httpRequestChunk get-or-create the exchange via _pending.GetOrAdd. A body chunk that races ahead of its start now buffers into the same exchange's channel instead of being dropped; start adopts that exchange, fills in Method/Context, and launches the handler exactly once. Locally reproduced the hang ~1-in-3 before; 20/20 green after. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

jsonrpc2 spawns a goroutine per incoming request, so httpRequestStart and httpRequestChunk for the same request run concurrently. The chunk handler dropped any frame whose requestId was not yet registered by start; when the dropped frame was the body End, the request-body drain blocked forever and the turn hung until the 60s idle timeout fired. Add getOrCreateExchange so both entry points get-or-create the exchange: a body chunk that races ahead of start buffers into the shared frame queue instead of being dropped, and start adopts that exchange's context/queue. 10/10 green after. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Both handleRequestStart and handleRequestChunk now get-or-create the exchange via getOrCreateExchange, so a body chunk that races ahead of its start frame buffers into the same exchange instead of being dropped (which would hang the request-body drain). LlmInferenceExchange is constructed bare (requestId, rpcSupplier) with method set when the start frame arrives. Mirrors the .NET/Go root-cause fix for cross-SDK consistency; 5/5 e2e green. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Make the LLM-inference adapter ordering-independent, matching the .NET/Go root-cause fix. Both httpRequestStart and httpRequestChunk now get-or-create the exchange, so a body chunk (including the terminal end frame) that races ahead of its start frame buffers into the same exchange instead of being dropped — which would hang the request-body drain. CopilotRequestExchange is now constructed bare (requestId, getServerRpc) with its request context filled in via setContext when the start frame arrives. Node's vscode-jsonrpc reader currently dispatches in order so this race is not reachable today, but the get-or-create shape keeps the adapter correct regardless of dispatch order and consistent across all six SDKs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…apter Both http_request_start and http_request_chunk now get-or-create the exchange via _get_or_create, so a body chunk that races ahead of its start frame buffers into the same exchange instead of being dropped (which would hang the request-body drain). _CopilotRequestExchange is constructed bare (request_id, get_server_rpc); the request context is filled in via set_context when the start frame arrives. Mirrors the .NET/Go root-cause fix for cross-SDK consistency. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Both handle_start and handle_chunk now get-or-create the exchange via get_or_create_exchange, so a body chunk that races ahead of its start frame buffers into the same exchange's body channel instead of being dropped (which would hang the request-body drain). The request metadata moves behind a OnceLock filled by set_context when the start frame arrives, letting the exchange be created bare from either entry point. Mirrors the .NET/Go root-cause fix for cross-SDK consistency; fmt (stable+nightly) and clippy clean, 4/4 e2e green. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-06-22T18:12:23Z

Cross-SDK Consistency Review

Overall, this is an impressively consistent implementation across all six SDK languages — class names, option field names, and WebSocket API shapes are well-aligned. One naming inconsistency in the HTTP hook is worth flagging before merge.

⚠️ HTTP override method: `sendHttp`/`send_http` vs `sendRequest`/`send_request`/`SendRequestAsync`

Three SDKs use one semantic name, two use another:

SDK	HTTP override method
Node.js	`sendRequest(request: Request, ctx)`
.NET	`SendRequestAsync(HttpRequestMessage, ctx)`
Python	`send_request(self, request, ctx)`
Rust	`send_http(request: CopilotHttpRequest, ctx)`
Java	`sendHttp(HttpRequest request, ctx)`
Go	n/a — struct field `Transport http.RoundTripper` (idiomatic)

Rust and Java use send_http/sendHttp while the other three SDKs spell it sendRequest/SendRequestAsync/send_request. A developer porting a handler from Node.js to Java would look for sendRequest, not find it, and have to hunt for sendHttp. Aligning all five override-based SDKs on sendRequest/SendRequestAsync/send_request (or alternatively all on sendHttp/send_http) would improve discoverability across the SDK family.

i️ PR description has stale Java examples

The Java usage example in the PR description references types and method names that differ from the actual code:

PR description	Actual code
`extends LlmRequestHandler`	`extends CopilotRequestHandler`
`LlmRequestContext`	`CopilotRequestContext`
`new ForwardingWebSocketHandler(ctx.url(), ctx.headers())`	`new ForwardingCopilotWebSocketHandler(context)`
`protected byte[] onSendRequestMessage(byte[] data, boolean binary)`	`public void sendRequestMessage(CopilotWebSocketMessage message)`

This doesn't affect the implementation itself but will confuse readers. Worth updating before merge so the description matches the actual API.

✅ Everything else looks consistent

Class names: CopilotRequestHandler, CopilotRequestContext, CopilotWebSocketHandler, CopilotWebSocketCloseStatus, ForwardingCopilotWebSocketHandler — consistent across all SDKs ✅
Client option field: requestHandler/RequestHandler/request_handler — consistent across all SDKs ✅
WebSocket override: openWebSocket/OpenWebSocketAsync/open_web_socket/open_websocket — consistent (language-case only) ✅
WebSocket message type CopilotWebSocketMessage and its sendRequestMessage/sendResponseMessage hooks — consistent ✅
E2E test coverage appears symmetric across all six SDKs ✅

Generated by SDK Consistency Review Agent for issue #1689 · sonnet46 2.2M · ◷

github-actions

Generated by SDK Consistency Review Agent for issue #1689 · sonnet46 2.2M

github-actions · 2026-06-22T18:12:26Z

+     * @throws Exception
+     *             if the request could not be completed
+     */
+    protected HttpResponse<InputStream> sendHttp(HttpRequest request, CopilotRequestContext ctx) throws Exception {


Naming inconsistency with Node.js / .NET / Python: The HTTP interception hook is called sendHttp here (and send_http in Rust), but the equivalent in the other three SDKs is sendRequest / SendRequestAsync / send_request. A consumer who has read the Node.js docs will look for sendRequest and not find it.

Suggest aligning on one name across all five override-based SDKs — either rename this to sendRequest (matching the majority) or update the other three to sendHttp/send_http.

github-actions · 2026-06-22T18:12:26Z

+    /// Service one intercepted HTTP request. Default: forward to the real
+    /// upstream via [`forward_http`]. Override to mutate the request before
+    /// forwarding, mutate the response after, or replace the call entirely.
+    async fn send_http(


Naming inconsistency with Node.js / .NET / Python: The HTTP interception hook is called send_http here (and sendHttp in Java), but the equivalent in the other three SDKs is sendRequest / SendRequestAsync / send_request. Aligning on one name across all five override-based SDKs would improve cross-language discoverability.

SteveSandersonMS changed the title ~~Simplify LLM inference callback handlers~~ LLM inference callback support Jun 16, 2026

SteveSandersonMS force-pushed the stevesandersonms/llm-inference-callbacks branch from 815bbd0 to 7bc95c0 Compare June 19, 2026 15:00

This comment has been minimized.

Sign in to view

github-advanced-security AI found potential problems Jun 19, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

SteveSandersonMS commented Jun 22, 2026

View reviewed changes

stevesa and others added 18 commits June 22, 2026 14:38

Refine LLM inference callback handlers

c70adeb

Collapse the HTTP callback seam to SendRequest/sendRequest, replace websocket hooks with per-connection handlers, and update tests to use the forwarding handler model. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions Bot reviewed Jun 22, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

SteveSandersonMS and others added 4 commits June 22, 2026 17:10

github-advanced-security AI found potential problems Jun 22, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

SteveSandersonMS marked this pull request as ready for review June 22, 2026 16:53

SteveSandersonMS requested a review from a team as a code owner June 22, 2026 16:53

Copilot AI review requested due to automatic review settings June 22, 2026 16:53

Copilot started reviewing on behalf of SteveSandersonMS June 22, 2026 16:53 View session

Copilot AI reviewed Jun 22, 2026

View reviewed changes

Comment thread go/internal/e2e/copilot_request_handler_e2e_test.go

httpSrv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {

path := strings.ToLower(strings.SplitN(r.URL.Path, "?", 2)[0])

_ = r.Body.Close

This comment has been minimized.

Sign in to view

github-actions Bot reviewed Jun 22, 2026

View reviewed changes

github-advanced-security AI found potential problems Jun 22, 2026

View reviewed changes

SteveSandersonMS and others added 9 commits June 22, 2026 19:02

SteveSandersonMS force-pushed the stevesandersonms/llm-inference-callbacks branch from b8013ef to 32f4164 Compare June 22, 2026 18:03

github-actions Bot reviewed Jun 22, 2026

View reviewed changes

SDK	HTTP hook name
Python	`send_request(httpx.Request, ctx)`
Rust	`send_http(LlmHttpRequest, ctx)`
Java	`sendHttp(HttpRequest, ctx)`

Conversation

SteveSandersonMS commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Shared protocol and plumbing

Per-language ports

API shape

Usage examples

C#

Node.js

Java

Tests

Uh oh!

This comment has been minimized.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-advanced-security AI left a comment

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

This comment has been minimized.

SteveSandersonMS Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

SteveSandersonMS Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

SteveSandersonMS Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SteveSandersonMS Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

SteveSandersonMS Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

This comment has been minimized.

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

This comment has been minimized.

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 22, 2026

Cross-SDK Consistency Review

⚠️ HTTP override method: sendHttp/send_http vs sendRequest/send_request/SendRequestAsync

SteveSandersonMS commented Jun 16, 2026 •

edited

Loading

SteveSandersonMS Jun 22, 2026 •

edited

Loading

⚠️ HTTP override method: `sendHttp`/`send_http` vs `sendRequest`/`send_request`/`SendRequestAsync`