Skip to content

LLM inference callback support#1689

Open
SteveSandersonMS wants to merge 41 commits into
mainfrom
stevesandersonms/llm-inference-callbacks
Open

LLM inference callback support#1689
SteveSandersonMS wants to merge 41 commits into
mainfrom
stevesandersonms/llm-inference-callbacks

Conversation

@SteveSandersonMS

@SteveSandersonMS SteveSandersonMS commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR adds SDK support for intercepting LLM inference requests and handling them in user code across all six SDK languages: Node.js, .NET, Python, Go, Rust, and Java.

Consumers register one client-global LlmRequestHandler (constructed once, no args). The runtime invokes it over JSON-RPC (llmInference.*) whenever it would otherwise issue a model-layer HTTP or WebSocket request — for both BYOK and CAPI — fully replacing the outbound call. A handler that overrides nothing is a transparent pass-through.

It includes the full feature work on this branch:

  • wire up LLM inference provider registration and generated RPC types
  • add the raw chunked callback protocol for outbound inference requests and responses
  • cover plain HTTP, streaming responses, cancellation (runtime- and consumer-initiated), error mapping, session-id threading, and WebSocket transport
  • port the feature to every SDK language with idiomatic HTTP types
  • keep the public callback surface to a single LlmRequestHandler model with forwarding helpers

What changed

Shared protocol and plumbing

  • add generated RPC / session event types needed for LLM inference callbacks across all SDK surfaces
  • add SDK-side registration for a process-global LLM inference provider (llmInference.setProvider)
  • route outbound inference requests through the callback bridge instead of requiring provider-specific hooks

Per-language ports

Each language exposes the same LlmRequestHandler model, mapped onto the most canonical HTTP representation available in that ecosystem:

Language HTTP request/response type WebSocket type
Node.js Request / Response (Fetch) per-connection handler
.NET HttpRequestMessage / HttpResponseMessage ClientWebSocket
Python httpx request/response per-connection handler
Go *http.Request / *http.Response per-connection handler
Rust http::Request / http::Response per-connection handler
Java java.net.http HttpRequest / HttpResponse java.net.http.WebSocket

All ports thread cancellation and session id through the request context, and provide a ForwardingWebSocketHandler (or equivalent) for the common mutate-and-forward case.

API shape

  • collapse HTTP interception to a single send hook (SendRequestAsync / sendRequest / sendHttp / language equivalents)
  • expose WebSocket interception through OpenWebSocketAsync / openWebSocket, returning a per-connection handler object
  • allow consumers to mutate, drop, duplicate, or fully replace request/response messages while keeping the common forwarding case straightforward

Usage examples

C#

using GitHub.Copilot;
using System.Net.Http;

sealed class MyHandler : LlmRequestHandler
{
    protected override async Task<HttpResponseMessage> SendRequestAsync(
        HttpRequestMessage request,
        LlmRequestContext ctx)
    {
        request.Headers.Add("X-Debug-Session", ctx.SessionId ?? "none");
        return await base.SendRequestAsync(request, ctx);
    }

    protected override Task<CopilotWebSocketHandler> OpenWebSocketAsync(LlmRequestContext ctx)
        => Task.FromResult<CopilotWebSocketHandler>(new MyForwardingSocket(ctx));
}

sealed class MyForwardingSocket : ForwardingWebSocketHandler
{
    public MyForwardingSocket(LlmRequestContext ctx)
        : base(ctx)
    {
    }

    public override Task SendRequestMessageAsync(LlmWebSocketMessage message)
    {
        var text = message.GetText().Replace("model-A", "model-B");
        return base.SendRequestMessageAsync(LlmWebSocketMessage.Text(text));
    }
}

Node.js

import {
    ForwardingWebSocketHandler,
    LlmRequestContext,
    LlmRequestHandler,
} from "@github/copilot";

class MyHandler extends LlmRequestHandler {
    protected override async sendRequest(request: Request, ctx: LlmRequestContext): Promise<Response> {
        const headers = new Headers(request.headers);
        headers.set("x-debug-session", ctx.sessionId ?? "none");

        return super.sendRequest(new Request(request, { headers }), ctx);
    }

    protected override async openWebSocket(ctx: LlmRequestContext) {
        return new MyForwardingSocket(ctx);
    }
}

class MyForwardingSocket extends ForwardingWebSocketHandler {
    override sendRequestMessage(data: string | Uint8Array) {
        if (typeof data === "string") {
            return super.sendRequestMessage(data.replace("model-A", "model-B"));
        }
        return super.sendRequestMessage(data);
    }
}

Java

import com.github.copilot.ForwardingWebSocketHandler;
import com.github.copilot.LlmRequestContext;
import com.github.copilot.LlmRequestHandler;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.io.InputStream;

final class MyHandler extends LlmRequestHandler {
    @Override
    protected HttpResponse<InputStream> sendHttp(HttpRequest request, LlmRequestContext ctx) throws Exception {
        HttpRequest mutated = HttpRequest.newBuilder(request, (n, v) -> true)
                .header("X-Debug-Session", ctx.sessionId() == null ? "none" : ctx.sessionId())
                .build();
        return super.sendHttp(mutated, ctx);
    }

    @Override
    protected CopilotWebSocketHandler openWebSocket(LlmRequestContext ctx) {
        return new ForwardingWebSocketHandler(ctx.url(), ctx.headers()) {
            @Override
            protected byte[] onSendRequestMessage(byte[] data, boolean binary) {
                String text = new String(data, java.nio.charset.StandardCharsets.UTF_8).replace("model-A", "model-B");
                return text.getBytes(java.nio.charset.StandardCharsets.UTF_8);
            }
        };
    }
}

Register the handler when constructing the client, e.g. in Java:

CopilotClient client = new CopilotClient(
    new CopilotClientOptions().setLlmInference(new LlmInferenceConfig().setHandler(new MyHandler())));

Tests

Each language adds e2e coverage (mirroring a shared reference suite) for:

  • callback provider registration
  • HTTP inference interception
  • streaming inference interception
  • error mapping
  • runtime-initiated and consumer-initiated cancellation
  • session-id threading
  • WebSocket callback handling
  • an idiomatic handler test exercising mutate-and-forward over both HTTP and WebSocket

Plus per-language unit coverage where applicable (e.g. .NET handler/adapter behavior, Node.js unit flows).

Resolves github/copilot-sdk-internal#88

@SteveSandersonMS SteveSandersonMS changed the title Simplify LLM inference callback handlers LLM inference callback support Jun 16, 2026
@SteveSandersonMS SteveSandersonMS force-pushed the stevesandersonms/llm-inference-callbacks branch from 815bbd0 to 7bc95c0 Compare June 19, 2026 15:00
@github-actions

This comment has been minimized.

Comment thread dotnet/src/LlmInferenceProvider.cs Fixed
Comment thread dotnet/src/LlmInferenceProvider.cs Fixed
Comment thread dotnet/src/LlmInferenceProvider.cs Fixed
Comment thread dotnet/src/CopilotRequestHandler.cs Fixed
Comment thread dotnet/src/CopilotRequestHandler.cs Fixed
Comment thread dotnet/src/CopilotRequestHandler.cs Fixed
Comment thread dotnet/src/CopilotRequestHandler.cs Fixed
Comment thread dotnet/src/LlmRequestHandler.cs Fixed
Comment thread dotnet/src/CopilotRequestHandler.cs Fixed

@github-advanced-security github-advanced-security AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

Comment thread dotnet/src/Types.cs Outdated
/// request/response.
/// </summary>
public LlmRequestHandler? Handler { get; set; }
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have this wrapper type and not just put LlmRequestHandler directly onto the create/resume config?

Comment thread dotnet/src/GitHub.Copilot.SDK.csproj Outdated
</PropertyGroup>

<ItemGroup>
<InternalsVisibleTo Include="GitHub.Copilot.SDK.Test" />

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this IVT really needed?

Comment thread dotnet/src/LlmInferenceProvider.cs Outdated
/// model-layer request.
/// </summary>
[Experimental(Diagnostics.Experimental)]
public enum LlmInferenceTransport

@SteveSandersonMS SteveSandersonMS Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this public when everything else in here is internal? Maybe it could move to the file with all the public API.

Comment thread dotnet/src/LlmInferenceProvider.cs Outdated
// Return from httpRequestStart immediately (after registering state) so
// the runtime's RPC reply is not gated on the consumer's I/O. The actual
// provider work runs asynchronously.
_ = RunProviderAsync(llmRequest, state, sink);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you doing that? What's wrong with holding the RPC reply until the consumer's I/O completes? Isn't tat necessary to achieve natural backpressure and error propagation?

Altogether I'm surprised by how much complexity there is in this file and wonder if you're trying to make it too clever. Why isn't there a significantly simpler way of adapting the RPC protocol to the public base class, for example by eliminating any separate queueing/buffering/etc?

Comment thread dotnet/src/LlmInferenceProvider.cs Outdated
}

return result;
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems really odd to have an adapter like this when the source data is already in the right format. What stops us from simplifying all this?

stevesa and others added 18 commits June 22, 2026 14:38
Adds an opt-in llmInference config to CopilotClientOptions that lets SDK consumers register a callback the runtime invokes whenever it would otherwise issue an outbound non-streaming LLM HTTP request itself.

v1 scope is TS-only/non-streaming, mirroring the runtime support added in github/copilot-agent-runtime. Streaming SSE and WebSocket transports are out of scope for v1 and continue to bypass the callback.

- New `LlmInferenceProvider` interface with a single `onLlmRequest` method.

- `createLlmInferenceAdapter` converts the provider into the wire-shape `LlmInferenceHandler` consumed by the RPC dispatcher.

- Client wiring: `llmInference.setProvider` is sent on connect; per-session adapter is attached alongside the existing sessionFs hook.

- New `llm_inference.e2e.test.ts` exercises the full RPC round-trip against the runtime.

Resolves github/copilot-sdk-internal#88

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Matches the runtime move of `llmInference.httpRequest` out of the
session-scoped client API and onto a new `clientGlobal` schema root.

- Codegen emits a new `registerClientGlobalApiHandlers` alongside the
  existing `registerClientSessionApiHandlers`. Handlers passed to it
  are dispatched directly (no per-session `getHandlers` callback) and
  carry no implicit sessionId — sessionId, when present, is just a
  payload field on the call.
- `CopilotClient` now constructs the LLM inference adapter once and
  registers it process-wide via `registerClientGlobalApiHandlers`
  during connection setup. The per-session `setupLlmInference` path
  and the `SessionConfigBase.createLlmInferenceProvider` override are
  removed — there is no longer any per-session notion of which provider
  to use.
- `LlmInferenceConfig.createLlmInferenceProvider` is now `() =>
  LlmInferenceProvider` (was `(session) => ...`).
- `LlmInferenceRequest` exposes the new optional `sessionId` field
  so consumers can correlate requests with a runtime session when one
  is in scope.

E2E test updated to verify the global registration works and that
sessionId is populated on in-session traffic.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
With the Rust runtime intercept chokepoint in place, every model-layer
HTTP request - including /models and /models/session - is now dispatched
through the SDK callback. Update the e2e test to:

- Stub realistic responses for non-streaming model catalog and session
  endpoints (so the runtime can proceed past model resolution).
- Hard-assert the catalog request is intercepted (no more 'either-or'
  fallback for the pre-rust-intercept state).

Streaming inference requests still pass through to the recorded CAPI
proxy; a fully-mocked end-to-end inference test will land alongside
the streaming-intercept commit.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Extends LlmInferenceProvider with an optional onLlmStreamRequest method
that returns a response head synchronously and pushes body chunks via the
provided sink. The adapter implements the generated httpStreamStart RPC
method and forwards chunks back to the runtime via the typed server-RPC
client (llmInference.streamChunk / streamEnd).

Adds a fully-mocked e2e test (test/e2e/llm_inference_stream.e2e.test.ts)
that drives a complete user->assistant turn through the callback alone:
the runtime hits the callback for /models, /models/session, and the chat
completion itself, the assistant text returned to the SDK consumer is the
synthetic text supplied by the stub.

- nodejs/src/llmInferenceProvider.ts: LlmInferenceStreamSink, onLlmStreamRequest, httpStreamStart adapter
- nodejs/src/client.ts: pass a lazy server-RPC accessor into the adapter
- nodejs/src/index.ts: re-export new types
- nodejs/test/e2e/llm_inference_stream.e2e.test.ts: full-mock e2e
- nodejs/src/generated/*, python/*, go/*, rust/*: codegen for new RPC methods
- dotnet/src/Generated/*: codegen for new RPC methods

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds test/e2e/llm_inference_errors.e2e.test.ts that wires a callback whose
inference handler throws a synthetic transport error and verifies the
failure surfaces to the SDK consumer (the call does not hang and any error
caught is non-empty). Confirms the runtime's existing retry / error
reporting path handles callback-side failures the same way it handles real
transport failures.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Mirrors the runtime-side cleanup: the callback wire no longer carries providerType / endpointKind / wireApi / transport / modelId. Adapter stops forwarding the field, e2e tests filter by URL instead of metadata, and the missing LlmInferenceStreamSink / LlmInferenceStreamStartResponse re-exports in types.ts are added so index.ts type-checks cleanly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
[Phase 3] Realign the Node SDK with the runtime's new four-method chunk
protocol. One unified provider callback:

  interface LlmInferenceProvider {
    onLlmRequest(req: LlmInferenceRequest): Promise<void>;
  }

LlmInferenceRequest exposes:
  * url / method / headers / sessionId
  * requestBody: AsyncIterable<Uint8Array>     // body delivered as chunks
  * responseBody: LlmInferenceResponseSink     // start/write/end/error

The sink enforces start -> 0..N writes -> exactly one of end/error and
maps each call to the corresponding httpResponseStart / httpResponseChunk
RPC. createLlmInferenceAdapter maintains a per-requestId state map; the
generated httpRequestStart handler registers state synchronously and
fires onLlmRequest in the background, so the runtime's RPC reply isn't
gated on consumer I/O.

The body queue iterator now latches a 'done' flag so a consumer that
calls .next() again after end:true gets done back instead of blocking
forever waiting for chunks the runtime will never send.

Removes the previous onLlmRequest + onLlmStreamRequest split and the
LlmInferenceResponse / LlmInferenceStreamSink /
LlmInferenceStreamStartResponse public types. All three e2e tests
rewritten against the unified callback (one of them URL-dispatches
/responses -> SSE and /chat/completions -> buffered JSON; the consumer
can also branch on whether the request body has stream:true).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Phase 4.1: expose an AbortSignal on the request envelope, abort it on a
cancel chunk from the runtime, and map consumer-side aborts to a 499 +
error{code:cancelled} response. Adds the cancellation e2e test.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add an e2e test asserting that when the SDK consumer signals a terminal
error via responseBody.error({ code: 'cancelled' }) the runtime surfaces
it faithfully as a request failure rather than hanging. Completes the
consumer->runtime direction of Phase 4.1.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Surface the new `transport` discriminator on `LlmInferenceRequest` so
consumers can tell an `"http"` request (plain HTTP / SSE) from a
`"websocket"` one (full-duplex: each request-body chunk is one inbound
WS message, each response-body write one outbound message). The adapter
threads `params.transport` through, defaulting to `"http"`.

Regenerate rpc.ts against the runtime schema for the new field and add
an e2e test exercising the full-duplex path: the fake model advertises
`ws:/responses`, the runtime's WebSocket flag is enabled via env var,
and the consumer pumps `/responses` events back per inbound message.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Friendly product-code starting point for SDK consumers who want to
observe or mutate LLM inference requests/responses by overriding virtual
methods on a base class. Implements LlmInferenceProvider, so an instance
can be returned directly from createLlmInferenceProvider.

Default behaviour is a transparent pass-through: each request is
forwarded to its original URL via the WHATWG fetch global (HTTP) or
WebSocket global (WebSocket), and the upstream response is streamed back
unchanged. The same subclass handles both transports - onLlmRequest
dispatches on req.transport.

Virtual hooks:
- HTTP: transformRequest, forward, transformResponse
- WebSocket: forwardWebSocket, transformRequestMessage,
  transformResponseMessage

E2e test (llm_inference_handler.e2e.test.ts) demonstrates a single
TestHandler subclass servicing both an HTTP turn (single-shot title
generation) and a WebSocket turn (main agent turn) against a per-test
in-process http+ws upstream that speaks the real CAPI shapes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Review fixes for github/copilot-sdk-internal#88 (Node SDK side).

- Honor the runtime's accepted=false ack: the response sink now aborts the
  provider's signal and stops emitting once the runtime drops the request (I1).
- Add a staging backstop in the adapter so a body chunk that arrives before
  its start frame is buffered and replayed rather than silently dropped (B1).
- Run the WebSocket request/response pumps concurrently and race their
  terminal states, so an upstream-closes-first (or runtime-cancels-first)
  case tears the other side down instead of hanging on a parked iterator (B2).
- Buffer inbound WS frames in wrapGlobalWebSocket until onMessage is
  registered so the first frames of a fast upstream aren't dropped.
- Collapse the dead send branch, hoist TextEncoder/TextDecoder singletons,
  and correct the LlmWebSocketUpstream.onClose contract doc.
- Update CopilotClientOptions.llmInference docs: streaming SSE and WebSocket
  are intercepted, not bypassed (I6).
- Add unit tests: chunk-before-start staging, accepted=false abort, WS
  upstream-close-first finalisation, and WS upstream-error propagation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Drives a CAPI session and a BYOK (openai/responses) session entirely
through the LLM inference callback — the consumer fabricates every
model-layer response, so the CAPI record/replay proxy is never the
inference endpoint. Asserts each in-session inference request carries
req.sessionId === session.sessionId and that the two session ids differ.

The mock branches /responses on the request stream flag: BYOK turns whose
config-derived model does not advertise streaming issue a buffered
(non-streaming) /responses request expecting a single JSON response
object, whereas the CAPI turn streams via SSE. This mirrors real upstream
behaviour and confirms the callback transport faithfully delivers both
shapes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Mirrors the TypeScript LLM inference callback feature in the .NET SDK so
consumers can observe/mutate the model-layer HTTP/WebSocket requests the
runtime issues (CAPI and BYOK), with the runtime session id threaded into
each callback.

- scripts/codegen/csharp.ts: emit the clientGlobal handler interface +
  registration so Rpc.cs gains the llmInference handler surface.
- LlmInferenceProvider.cs: low-level ILlmInferenceProvider API + adapter
  (request staging, response sink state machine) behind an internal
  ILlmInferenceResponseChannel seam for unit testing.
- LlmRequestHandler.cs: idiomatic pass-through base class mapping to
  HttpRequestMessage/HttpResponseMessage and ClientWebSocket, with virtual
  transform/forward hooks for both transports.
- Types.cs/Client.cs: wire LlmInferenceConfig into the client and register
  the provider on start.
- Tests: factored unit-test infra (recording channel/sink, inline provider,
  frame builders) with adapter + handler tests, plus CAPI+BYOK e2e tests
  asserting the session id reaches the callback. e2e provider emits raw JSON
  (reflection-free STJ) and serves all model-layer traffic off-network.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Hide the redundant low-level provider interface and adapter from the public
surface in both SDKs; the sole public extension point is now the
LlmRequestHandler base class. Replace the LlmInferenceConfig provider factory
with a direct handler instance (the provider is client-global, constructed
once with no args).

.NET: ILlmInferenceProvider + the LlmInferenceRequest/ResponseInit/ResponseSink
DTOs become internal; LlmRequestHandler implements the interface explicitly so
OnLlmRequestAsync leaves its public surface. LlmInferenceConfig.Handler replaces
the Func<LlmRequestHandler> factory.

TS: stop exporting LlmInferenceProvider and createLlmInferenceAdapter from
index.ts; LlmInferenceConfig.handler replaces createLlmInferenceProvider. The
request/sink DTOs stay exported as onLlmRequest's contract (TS lacks explicit
interface implementation). E2E providers become LlmRequestHandler subclasses
overriding onLlmRequest.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Collapse the HTTP callback seam to SendRequest/sendRequest, replace websocket hooks with per-connection handlers, and update tests to use the forwarding handler model.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Port the LLM inference callback feature to the Python SDK, mirroring the
existing Node.js and .NET implementations. Consumers subclass
`LlmRequestHandler` and override `send_request` (idiomatic httpx) for
HTTP or `open_web_socket` (websockets) for the WebSocket transport; both
default to transparent pass-through. Wired through `LlmInferenceConfig` on
the client, registered on the `clientGlobal.llmInference` scope.

Adds the low-level provider/adapter, the httpx-based handler base class,
client wiring, public exports, and httpx as a core dependency. Extends the
Python codegen to emit clientGlobal handler registration and regenerates the
generated RPC bindings.

Includes 8 e2e test files (10 tests) mirroring the Node.js suite — round
trip, session-id threading (CAPI + BYOK), streaming SSE, error mapping,
runtime cancel, consumer cancel, WebSocket transport, and the idiomatic
handler against a real local HTTP+WebSocket upstream. All pass off-network.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Mirror the existing Node/.NET/Python LLM inference callback support in the
Go SDK. Consumers register an LlmInferenceProvider (or the idiomatic
LlmRequestHandler over net/http + coder/websocket) via
ClientOptions.LlmInference; the runtime routes every model-layer HTTP and
WebSocket request through it for both CAPI and BYOK sessions.

- Codegen (scripts/codegen/go.ts) now emits the clientGlobal handler
  registration, regenerating go/rpc/zrpc.go.
- New low-level provider types + adapter (llm_inference_provider.go) and the
  idiomatic forwarding handler (llm_request_handler.go).
- Wire LlmInferenceConfig into ClientOptions and the connect/start paths.
- 8 off-network e2e scenarios mirroring the other SDKs (basic, session id,
  stream, errors, cancel, consumer cancel, websocket, handler).

Also fixes a pre-existing Go e2e compile break (AttachmentBlob.Data became
*string in the Rust contract regen baseline) that blocked the e2e package.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generated by SDK Consistency Review Agent for issue #1689 · sonnet46 3.4M

/// override <see cref="SendRequestAsync"/> or <see cref="OpenWebSocketAsync"/>.
/// </summary>
[Experimental(Diagnostics.Experimental)]
public class CopilotRequestHandler

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cross-SDK naming note: This class is named CopilotRequestHandler (along with CopilotRequestContext and ForwardingCopilotWebSocketHandler), following the Node.js convention. However, the three other SDKs added in this same PR (Go, Java) and the two pre-existing ones (Python, Rust) all use LlmRequestHandler / LlmRequestContext / ForwardingWebSocketHandler.

After this PR the naming is 4-vs-2: LlmRequestHandler (Python, Rust, Go, Java) vs CopilotRequestHandler (Node.js, .NET). If this split is intentional (grouping .NET with Node.js), that's fine, but it would be worth noting somewhere—otherwise it may be worth aligning .NET with the majority Llm* naming.

Comment thread dotnet/src/Types.cs
/// outbound call.
/// </summary>
[Experimental(Diagnostics.Experimental)]
public CopilotRequestHandler? RequestHandler { get; set; }

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cross-SDK config API note: The handler is set here as a flat RequestHandler CopilotRequestHandler? property (same pattern as Node.js's requestHandler). The other four SDKs (Python, Rust, Go, Java) all wrap it in an LlmInferenceConfig config object, e.g.:

// Java
options.setLlmInference(new LlmInferenceConfig().setHandler(new MyHandler()));
// Go
options.LlmInference = &LlmInferenceConfig{Handler: &MyHandler{}}
# Python
options.llm_inference = LlmInferenceConfig(handler=MyHandler())

After this PR the pattern is 4-vs-2: config wrapper (Python, Rust, Go, Java) vs direct property (Node.js, .NET). Developers targeting multiple SDKs will encounter a different configuration shape for the same feature. Worth a deliberate decision here.

* @throws Exception
* if the request could not be completed
*/
protected HttpResponse<InputStream> sendHttp(HttpRequest request, LlmRequestContext ctx) throws Exception {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor cross-SDK naming note: This method is named sendHttp (like Rust's send_http), but Node.js and Python — both also in the LlmRequestHandler family — name the equivalent hook sendRequest / send_request. All three ultimately receive an "HTTP request" object, so the naming is semantically equivalent, but within the LlmRequestHandler group it differs between languages:

SDK HTTP hook name
Python send_request(httpx.Request, ctx)
Rust send_http(LlmHttpRequest, ctx)
Java sendHttp(HttpRequest, ctx)

Suggest clarifying whether sendHttp or sendRequest should be the canonical hook name for the LlmRequestHandler family (it won't matter for users of a single SDK, but affects multi-language teams reading docs side-by-side).

@github-actions

This comment has been minimized.

SteveSandersonMS and others added 4 commits June 22, 2026 17:10
Mirror the .NET/Node simplification + terminology rename in the Go SDK:
consolidate the provider/handler two-layer design into a single
copilot_request_handler.go, drop the accepted:false ack plumbing and the
staged backstop, emit the WebSocket 101 upgrade head eagerly (a lazy bridge
deadlocks the runtime connect), and rename the public Llm* types to Copilot*
(types carry the prefix; fields/methods stay succinct). The client option
becomes ClientOptions.RequestHandler *CopilotRequestHandler. Generated wire
types are untouched.

Consolidate the e2e suite to three files (copilot_request_handler covering
HTTP + WebSocket + streaming, copilot_request_session_id, and
copilot_request_cancel_error with the error and runtime-cancel cases) plus a
shared helpers file, replacing the nine llm_inference_* test files.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Mirror the .NET/Node simplification + terminology rename in the Java SDK:
fold the low-level provider/adapter indirection and the response sink/init/
body DTOs into CopilotRequestHandler plus an internal exchange, drop the
accepted:false ack plumbing and the staged backstop, emit the WebSocket 101
upgrade head eagerly (a lazy bridge deadlocks the runtime connect), and rename
the public Llm* types to Copilot* (types carry the prefix; methods stay
succinct). The client option becomes CopilotClientOptions.requestHandler of
type CopilotRequestHandler; LlmInferenceConfig is removed. Generated wire
types are untouched.

Consolidate the e2e suite to three tests (CopilotRequestHandlerE2ETest
covering HTTP + WebSocket + streaming, CopilotRequestSessionIdE2ETest, and
CopilotRequestCancelErrorE2ETest with the error and runtime-cancel cases)
plus CopilotRequestTestSupport, replacing the eight LlmInference*E2ETest files.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Mirror the .NET/Node simplification + terminology rename in the Python SDK:
consolidate the provider/handler two-layer design into a single
copilot/copilot_request_handler.py, drop the accepted:false ack plumbing and
the staged backstop, emit the WebSocket 101 upgrade head eagerly (a lazy
bridge deadlocks the runtime connect), and rename the public Llm* types to
Copilot* (types carry the prefix; attributes/methods stay succinct). The
client option becomes request_handler: CopilotRequestHandler. Generated wire
types are untouched.

Stream in-memory httpx responses (built with content=) by forwarding their
buffered body, since their raw stream is already consumed and cannot be
re-iterated; real streamed responses still flow through aiter_raw.

Consolidate the e2e suite to three files (test_copilot_request_handler
covering HTTP + WebSocket + streaming, test_copilot_request_session_id, and
test_copilot_request_cancel_error with the error and runtime-cancel cases)
plus a shared helpers module, replacing the eight test_llm_inference_* files.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Mirror the .NET/Node simplification + terminology rename in the Rust SDK:
consolidate the inference/dispatch/handler modules into a single
copilot_request_handler.rs with one CopilotRequestHandler trait (default
methods) plus an internal exchange, drop the accepted:false ack plumbing and
the staged backstop (cancellation flows via CancellationToken), emit the
WebSocket 101 upgrade head eagerly (a lazy bridge deadlocks the runtime
connect), and rename the public Llm* types to Copilot* (types carry the
prefix; fields/methods stay succinct). The client option becomes
ClientOptions.request_handler. Generated wire types are untouched.

Consolidate the e2e coverage into copilot_request_handler.rs with four tests
mirroring the Node scenarios (HTTP + WebSocket + streaming via one handler,
session-id threading, handler errors, and runtime-driven cancel), replacing
the llm_inference.rs e2e module.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
!inference.is_empty(),
"expected at least one intercepted inference request"
);
for (_, session_id) in &inference {
inference.len() > before,
"expected at least one intercepted BYOK inference request"
);
for (_, session_id) in &inference[before..] {
handler.inference_attempts.load(Ordering::SeqCst) > 0,
"expected the inference callback to be reached and raise"
);
if let Err(err) = send_result {
if self._upstream is not None:
try:
await self._upstream.close()
except Exception:
task.cancel()
try:
await task
except (asyncio.CancelledError, Exception):
finally:
try:
await client.stop()
except Exception:
is_inference_url,
isolated_client_fixture,
)
from .testharness import E2ETestContext # noqa: F401 (ctx fixture dependency)
# The callback throws on inference; the turn surfaces an error (or
# completes without an assistant message) rather than hanging.
await session.send_and_wait("Say OK.")
except BaseException: # noqa: BLE001
finally:
try:
await client.stop()
except Exception:
is_inference_url,
isolated_client_fixture,
)
from .testharness import E2ETestContext # noqa: F401 (ctx fixture dependency)
@github-actions

This comment has been minimized.

The branch had dropped the net472 `<Compile Include="..\src\Polyfills\*.cs">`
from the test project, on the assumption that the SDK's internal polyfills were
visible via InternalsVisibleTo. No InternalsVisibleTo to the test assembly
actually exists, so the net472 test build lost its IsExternalInit definition and
failed with CS0518 across every file using init-only setters and records (the
.NET SDK Tests (windows-latest) job). net8.0 builds were unaffected because
IsExternalInit ships in the BCL there.

Restore the direct polyfill compile for net472 to match main; net472 now builds
clean (0 warnings, 0 errors).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@SteveSandersonMS SteveSandersonMS marked this pull request as ready for review June 22, 2026 16:53
@SteveSandersonMS SteveSandersonMS requested a review from a team as a code owner June 22, 2026 16:53
Copilot AI review requested due to automatic review settings June 22, 2026 16:53

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces SDK-wide support for intercepting the runtime’s outbound model-layer LLM inference traffic (HTTP/SSE + WebSocket) by registering a single client-global handler over JSON-RPC (llmInference.*). It extends the shared codegen pipeline to generate client-global reverse-RPC handlers and wires language-specific handler abstractions plus E2E coverage across Node.js, .NET, Python, Go, Rust, and Java.

Changes:

  • Extend RPC schema/codegen to support a new clientGlobal API surface and generate registration helpers for llmInference.*.
  • Add per-language “request handler” APIs/adapters to forward/replace runtime model HTTP + WebSocket traffic, including cancellation and session-id threading.
  • Add cross-language E2E tests validating session-id propagation, cancellation/error behavior, and forwarding behavior for both HTTP and WebSocket paths.
Show a summary per file
File Description
scripts/codegen/utils.ts Adds clientGlobal support to the shared API schema shape.
scripts/codegen/typescript.ts Generates TS client-global handler interfaces + registration function.
scripts/codegen/python.ts Generates Python client-global handler Protocols/dataclass + registration function.
scripts/codegen/go.ts Generates Go client-global handler interfaces + registration function.
scripts/codegen/csharp.ts Generates C# client-global handler interfaces + registration plumbing.
rust/tests/e2e/support.rs Adds no-snapshot E2E context and helper to start an LLM handler-wired client.
rust/tests/e2e.rs Registers new Rust E2E module for request-handler coverage.
rust/src/types.rs Re-exports request-handler types for the Rust public surface.
rust/src/router.rs Routes llmInference.* requests via a client-global dispatcher rather than per-session routing.
rust/src/lib.rs Adds request-handler option wiring, early router start for startup model calls, and internal handle reconstruction.
rust/Cargo.toml Adds HTTP/WebSocket dependencies needed for the Rust forwarding/transport implementation.
python/pyproject.toml Promotes httpx to a runtime dependency and adds websockets to dev deps for tests.
python/e2e/test_copilot_request_session_id_e2e.py Verifies session-id threading into handler for CAPI and BYOK sessions.
python/e2e/test_copilot_request_handler_e2e.py Validates mutate-and-forward behavior across HTTP + WebSocket via a real upstream.
python/e2e/test_copilot_request_cancel_error_e2e.py Covers thrown-error mapping and runtime cancellation behavior via cancel_event.
python/e2e/_copilot_request_helpers.py Shared synthetic model endpoint builders and isolated-client fixtures for request-handler tests.
python/copilot/generated/rpc.py Adds generated client-global handler types and registration for llmInference.*.
python/copilot/client.py Wires request-handler registration and llmInference.setProvider into client startup.
python/copilot/init.py Re-exports request-handler types/helpers as part of the public Python API surface.
nodejs/test/e2e/copilot_request_session_id.e2e.test.ts Node E2E for session-id threading through the request handler.
nodejs/test/e2e/copilot_request_cancel_error.e2e.test.ts Node E2E for handler error propagation and runtime-driven cancellation.
nodejs/src/types.ts Adds requestHandler option + exports request-handler types from the Node public surface.
nodejs/src/index.ts Re-exports request-handler APIs from the Node entrypoint.
nodejs/src/generated/rpc.ts Adds generated client-global handler interfaces + registerClientGlobalApiHandlers.
nodejs/src/client.ts Registers client-global handlers on the connection and calls llmInference.setProvider when configured.
nodejs/package.json Adds ws and @types/ws for Node E2E WebSocket upstream tests.
nodejs/package-lock.json Locks new ws / @types/ws additions.
java/src/test/java/com/github/copilot/FakeUpstreamServer.java Adds an in-process raw HTTP/WebSocket upstream for Java forwarding E2E.
java/src/test/java/com/github/copilot/CopilotRequestSessionIdE2ETest.java Java E2E verifying session-id threading for CAPI + BYOK.
java/src/test/java/com/github/copilot/CopilotRequestHandlerE2ETest.java Java E2E validating synthetic HTTP servicing and forwarding over HTTP + WebSocket.
java/src/test/java/com/github/copilot/CopilotRequestCancelErrorE2ETest.java Java E2E for error propagation and runtime cancellation behavior.
java/src/main/java/module-info.java Makes java.net.http a required module dependency (runtime usage).
java/src/main/java/com/github/copilot/rpc/CopilotClientOptions.java Adds requestHandler option to configure the connection-level interception handler.
java/src/main/java/com/github/copilot/LlmWebSocketResponseBridge.java Ensures WebSocket upgrade head is emitted before any body/terminal frames.
java/src/main/java/com/github/copilot/LlmInferenceExchange.java Implements per-request body ingestion + strict response emission state machine.
java/src/main/java/com/github/copilot/LlmInferenceAdapter.java Bridges llmInference.* reverse-RPC frames to CopilotRequestHandler execution.
java/src/main/java/com/github/copilot/ForwardingCopilotWebSocketHandler.java Default Java WS forwarding implementation using java.net.http.WebSocket.
java/src/main/java/com/github/copilot/CopilotWebSocketMessage.java Represents WS text/binary messages exchanged via the handler seam.
java/src/main/java/com/github/copilot/CopilotWebSocketHandler.java Base per-connection WS handler contract for runtime↔upstream message forwarding.
java/src/main/java/com/github/copilot/CopilotWebSocketCloseStatus.java Carries terminal close/error state for callback-owned WS connections.
java/src/main/java/com/github/copilot/CopilotRequestTransport.java Exposes the runtime’s intended transport selection (HTTP vs WebSocket).
java/src/main/java/com/github/copilot/CopilotRequestHandler.java Adds Java handler base class with default HTTP forwarding + WS forwarding hooks.
java/src/main/java/com/github/copilot/CopilotRequestContext.java Provides request-id/session-id/headers/url/transport/cancellation context to handlers.
java/src/main/java/com/github/copilot/CopilotClient.java Registers the adapter + calls llmInference.setProvider when handler is configured.
go/types.go Adds RequestHandler to Go client options for connection-level interception.
go/rpc/zrpc.go Adds generated client-global handler types and registration for llmInference.*.
go/internal/e2e/copilot_request_session_id_e2e_test.go Go E2E for session-id threading into intercepted inference calls.
go/internal/e2e/copilot_request_helpers_test.go Shared synthetic upstream builders for Go request-handler E2Es.
go/internal/e2e/copilot_request_handler_e2e_test.go Go E2E validating mutate-and-forward over HTTP + WebSocket.
go/internal/e2e/copilot_request_cancel_error_e2e_test.go Go E2E covering error propagation and runtime cancellation behavior.
go/go.sum Adds checksum entries for new Go WebSocket dependency.
go/go.mod Adds github.com/coder/websocket for Go WebSocket upstream tests.
go/client.go Wires client-global handler registration and LlmInference.SetProvider when configured.
dotnet/test/E2E/CopilotRequestSessionIdE2ETests.cs .NET E2E verifying session-id threading for CAPI + BYOK sessions.
dotnet/test/E2E/CopilotRequestE2EProvider.cs Shared .NET test handler that fabricates model endpoint responses for off-network turns.
dotnet/test/E2E/CopilotRequestCancelErrorE2ETests.cs .NET E2E for handler error mapping and runtime cancellation propagation.
dotnet/src/Types.cs Adds experimental RequestHandler option to CopilotClientOptions.
dotnet/src/Generated/Rpc.cs Adds generated types and client-global registration for llmInference.*.
dotnet/src/Client.cs Registers client-global handlers and calls LlmInference.SetProviderAsync when configured.

Copilot's findings

Files not reviewed (2)
  • go/rpc/zrpc.go: Generated file
  • nodejs/package-lock.json: Generated file
  • Files reviewed: 62/69 changed files
  • Comments generated: 1


httpSrv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
path := strings.ToLower(strings.SplitN(r.URL.Path, "?", 2)[0])
_ = r.Body.Close
@github-actions

This comment has been minimized.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generated by SDK Consistency Review Agent for issue #1689 · sonnet46 3.1M

return nil
}
return f.conn.Write(ctx, websocket.MessageText, out)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cross-SDK consistency concern: binary WebSocket frame type is lost

The Go adapter decodes binary frames correctly in routeChunk (line ~603: base64-decodes when params.Binary == true) but drops the binary/text flag before pushing into the Body channel. As a result, SendRequestMessage here always forwards as websocket.MessageText, regardless of the original wire frame type.

All other SDKs carry the binary flag through the request message path:

  • .NET: CopilotWebSocketMessage.IsBinary
  • Java: CopilotWebSocketMessage.binary()
  • Python: str | bytes (type encodes the distinction)
  • Node.js: string | Uint8Array (same)
  • Rust: CopilotWebSocketMessage.binary: bool

In practice, LLM inference likely only uses text frames, so this may not cause immediate issues. But if binary frames ever appear this will corrupt the upstream message silently. Consider threading the binary flag through the Body channel (e.g. as a struct { data []byte; binary bool }) so SendRequestMessage can use the correct message type.

type CopilotWebSocketCloseStatus struct {
Description string
Code string
Err error

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor cross-SDK naming inconsistency: Code vs ErrorCode

All other SDKs name this field ErrorCode/errorCode/error_code:

  • .NET: public string? ErrorCode { get; init; }
  • Java: public String errorCode()
  • Python: error_code: str | None
  • Node.js: readonly errorCode?: string

Go uses the shorter Code, which is not wrong, but this creates a small inconsistency for developers navigating across SDKs. Consider renaming to ErrorCode to match the cross-SDK convention (the Err field name is fine — Err vs Error is established Go idiom).

&self,
ctx: &CopilotRequestContext,
response: CopilotWebSocketResponse,
) -> Result<Box<dyn CopilotWebSocketHandler>, CopilotRequestError> {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API surface difference from other SDKs: extra response parameter

Every other SDK's open_websocket / openWebSocket / OpenWebSocketAsync takes only the context:

// Node.js
protected openWebSocket(ctx: CopilotRequestContext): Promise<CopilotWebSocketHandler>
// .NET
protected virtual Task<CopilotWebSocketHandler> OpenWebSocketAsync(CopilotRequestContext ctx)
// Java
protected CopilotWebSocketHandler openWebSocket(CopilotRequestContext ctx)
// Python
async def open_web_socket(self, ctx: CopilotRequestContext) -> CopilotWebSocketHandler
// Go
OpenWebSocket func(ctx *CopilotRequestContext) (CopilotWebSocketHandler, error)

Rust adds response: CopilotWebSocketResponse as a second argument, which consumers must store and use to write upstream→runtime messages.

This is architecturally necessary for Rust traits (no base-class state for a sendResponseMessage helper), so it isn't a bug. But it's worth documenting clearly in the CopilotRequestHandler trait doc comment — specifically that response must be stored in the returned handler struct and used to call response.send_message(...) in place of the sendResponseMessage base-class helper the other SDKs provide.


private static int GetFreePort()
{
var probe = new TcpListener(IPAddress.Loopback, 0);
Comment on lines +386 to +389
catch
{
// Accept loop unwinds on listener shutdown.
}
Comment on lines +377 to +380
catch
{
// Already stopped.
}
Comment on lines +253 to +256
catch
{
// Already torn down.
}
Comment on lines +214 to +217
catch
{
// Best-effort: the runtime tears connections down as turns complete.
}
Comment on lines +378 to +381
catch
{
// Best-effort; the socket may already be closed.
}
Comment on lines +327 to +334
catch (Exception ex)
{
await CloseAsync(new CopilotWebSocketCloseStatus
{
Description = ex.Message,
Error = ex,
}).ConfigureAwait(false);
}
Comment on lines +237 to +240
catch
{
// Some headers are managed by the handshake; ignore rejections.
}
private readonly string _url;
private readonly IReadOnlyDictionary<string, IReadOnlyList<string>> _headers;
private WebSocket? _upstream;
private CancellationTokenSource? _pumpCts;
{
private readonly string _url;
private readonly IReadOnlyDictionary<string, IReadOnlyList<string>> _headers;
private WebSocket? _upstream;
SteveSandersonMS and others added 9 commits June 22, 2026 19:02
The codegen-check job runs `cargo +nightly fmt --all` with
`rust/.rustfmt.nightly.toml` (imports_granularity = "Module") over the whole
crate and fails if the tree changes. The hand-written copilot_request_handler.rs
was only stable-formatted, leaving two adjacent `tokio::sync` imports
unconsolidated. Merge them into the canonical nightly-fmt form.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
staticcheck (SA1019) flags http.ProtocolError as deprecated. The throwing
transport only needs to return some transport-level error to exercise the
error path, so use errors.New instead.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
rustdoc's redundant_explicit_links lint (deny under -D warnings) flagged the
[`CopilotRequestHandler`](crate::copilot_request_handler::CopilotRequestHandler)
link, since the label already resolves to the same destination. Drop the
explicit target and rely on intra-doc resolution.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The custom JsonRpc reader dispatches each incoming reverse-RPC method
fire-and-forget on the thread pool, so llmInference.httpRequestStart and
httpRequestChunk for the same request run concurrently. The start handler
registered the exchange in _pending; the chunk handler looked it up and
silently dropped the frame on a miss. When the dropped frame was the body
End, the request-body drain blocked forever, the model HTTP request never
completed, and the turn hung until sendAndWait's 60s idle timeout fired.

Make the adapter ordering-independent: both httpRequestStart and
httpRequestChunk get-or-create the exchange via _pending.GetOrAdd. A body
chunk that races ahead of its start now buffers into the same exchange's
channel instead of being dropped; start adopts that exchange, fills in
Method/Context, and launches the handler exactly once. Locally reproduced
the hang ~1-in-3 before; 20/20 green after.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
jsonrpc2 spawns a goroutine per incoming request, so httpRequestStart and
httpRequestChunk for the same request run concurrently. The chunk handler
dropped any frame whose requestId was not yet registered by start; when the
dropped frame was the body End, the request-body drain blocked forever and
the turn hung until the 60s idle timeout fired.

Add getOrCreateExchange so both entry points get-or-create the exchange: a
body chunk that races ahead of start buffers into the shared frame queue
instead of being dropped, and start adopts that exchange's context/queue.
10/10 green after.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Both handleRequestStart and handleRequestChunk now get-or-create the
exchange via getOrCreateExchange, so a body chunk that races ahead of its
start frame buffers into the same exchange instead of being dropped (which
would hang the request-body drain). LlmInferenceExchange is constructed bare
(requestId, rpcSupplier) with method set when the start frame arrives.
Mirrors the .NET/Go root-cause fix for cross-SDK consistency; 5/5 e2e green.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Make the LLM-inference adapter ordering-independent, matching the .NET/Go
root-cause fix. Both httpRequestStart and httpRequestChunk now get-or-create
the exchange, so a body chunk (including the terminal end frame) that races
ahead of its start frame buffers into the same exchange instead of being
dropped — which would hang the request-body drain. CopilotRequestExchange is
now constructed bare (requestId, getServerRpc) with its request context
filled in via setContext when the start frame arrives.

Node's vscode-jsonrpc reader currently dispatches in order so this race is
not reachable today, but the get-or-create shape keeps the adapter correct
regardless of dispatch order and consistent across all six SDKs.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…apter

Both http_request_start and http_request_chunk now get-or-create the
exchange via _get_or_create, so a body chunk that races ahead of its start
frame buffers into the same exchange instead of being dropped (which would
hang the request-body drain). _CopilotRequestExchange is constructed bare
(request_id, get_server_rpc); the request context is filled in via
set_context when the start frame arrives. Mirrors the .NET/Go root-cause fix
for cross-SDK consistency.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Both handle_start and handle_chunk now get-or-create the exchange via
get_or_create_exchange, so a body chunk that races ahead of its start frame
buffers into the same exchange's body channel instead of being dropped
(which would hang the request-body drain). The request metadata moves behind
a OnceLock filled by set_context when the start frame arrives, letting the
exchange be created bare from either entry point. Mirrors the .NET/Go
root-cause fix for cross-SDK consistency; fmt (stable+nightly) and clippy
clean, 4/4 e2e green.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@SteveSandersonMS SteveSandersonMS force-pushed the stevesandersonms/llm-inference-callbacks branch from b8013ef to 32f4164 Compare June 22, 2026 18:03
@github-actions

Copy link
Copy Markdown
Contributor

Cross-SDK Consistency Review

Overall, this is an impressively consistent implementation across all six SDK languages — class names, option field names, and WebSocket API shapes are well-aligned. One naming inconsistency in the HTTP hook is worth flagging before merge.

⚠️ HTTP override method: sendHttp/send_http vs sendRequest/send_request/SendRequestAsync

Three SDKs use one semantic name, two use another:

SDK HTTP override method
Node.js sendRequest(request: Request, ctx)
.NET SendRequestAsync(HttpRequestMessage, ctx)
Python send_request(self, request, ctx)
Rust send_http(request: CopilotHttpRequest, ctx)
Java sendHttp(HttpRequest request, ctx)
Go n/a — struct field Transport http.RoundTripper (idiomatic)

Rust and Java use send_http/sendHttp while the other three SDKs spell it sendRequest/SendRequestAsync/send_request. A developer porting a handler from Node.js to Java would look for sendRequest, not find it, and have to hunt for sendHttp. Aligning all five override-based SDKs on sendRequest/SendRequestAsync/send_request (or alternatively all on sendHttp/send_http) would improve discoverability across the SDK family.

i️ PR description has stale Java examples

The Java usage example in the PR description references types and method names that differ from the actual code:

PR description Actual code
extends LlmRequestHandler extends CopilotRequestHandler
LlmRequestContext CopilotRequestContext
new ForwardingWebSocketHandler(ctx.url(), ctx.headers()) new ForwardingCopilotWebSocketHandler(context)
protected byte[] onSendRequestMessage(byte[] data, boolean binary) public void sendRequestMessage(CopilotWebSocketMessage message)

This doesn't affect the implementation itself but will confuse readers. Worth updating before merge so the description matches the actual API.

✅ Everything else looks consistent

  • Class names: CopilotRequestHandler, CopilotRequestContext, CopilotWebSocketHandler, CopilotWebSocketCloseStatus, ForwardingCopilotWebSocketHandler — consistent across all SDKs ✅
  • Client option field: requestHandler/RequestHandler/request_handler — consistent across all SDKs ✅
  • WebSocket override: openWebSocket/OpenWebSocketAsync/open_web_socket/open_websocket — consistent (language-case only) ✅
  • WebSocket message type CopilotWebSocketMessage and its sendRequestMessage/sendResponseMessage hooks — consistent ✅
  • E2E test coverage appears symmetric across all six SDKs ✅

Generated by SDK Consistency Review Agent for issue #1689 · sonnet46 2.2M ·

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generated by SDK Consistency Review Agent for issue #1689 · sonnet46 2.2M

* @throws Exception
* if the request could not be completed
*/
protected HttpResponse<InputStream> sendHttp(HttpRequest request, CopilotRequestContext ctx) throws Exception {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming inconsistency with Node.js / .NET / Python: The HTTP interception hook is called sendHttp here (and send_http in Rust), but the equivalent in the other three SDKs is sendRequest / SendRequestAsync / send_request. A consumer who has read the Node.js docs will look for sendRequest and not find it.

Suggest aligning on one name across all five override-based SDKs — either rename this to sendRequest (matching the majority) or update the other three to sendHttp/send_http.

/// Service one intercepted HTTP request. Default: forward to the real
/// upstream via [`forward_http`]. Override to mutate the request before
/// forwarding, mutate the response after, or replace the call entirely.
async fn send_http(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming inconsistency with Node.js / .NET / Python: The HTTP interception hook is called send_http here (and sendHttp in Java), but the equivalent in the other three SDKs is sendRequest / SendRequestAsync / send_request. Aligning on one name across all five override-based SDKs would improve cross-language discoverability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants