Skip to content

feat: CON-1639 HTTPS outcalls pay-as-you-go and dark launch budget trackers#10519

Open
eichhorl wants to merge 33 commits into
masterfrom
eichhorl/dark-launch-tracker
Open

feat: CON-1639 HTTPS outcalls pay-as-you-go and dark launch budget trackers#10519
eichhorl wants to merge 33 commits into
masterfrom
eichhorl/dark-launch-tracker

Conversation

@eichhorl

@eichhorl eichhorl commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Background

Currently, HTTPS outcalls are charged upfront based on a max_response_bytes parameter. This is done by subtracting the full cost from the caller's payment. The remaining cycles are stored in the request context and refunded once a response is delivered.

Instead, we want to introduce pay-as-you-go pricing, which charges cycles whenever resources are consumed. This happens in three stages:

  1. Base cost. This fee is charged for every request upfront
  2. Per-replica cost. The remaining cycles after charging the base cost are split evenly between the participating replicas. Each replica consumes some of their allowance as the HTTP request is processed. In the end, the remaining cycles are gossiped as part of refund shares.
  3. Consensus cost. This fee is charged for including the aggregated HTTP response as part of a block. The cost is covered by the sum of refund shares that were included in the aggregated response. Cycles remaining after charging the consensus cost are refunded to the user asynchronously.

The per-replica cost (2.) is calculated by the "budget tracker": This struct is instantiated with the per-replica cycles allowance, whenever a new request is starting to be processed by the HTTP adapter. As the response is downloaded, transformed and gossiped, the tracker charges the consumed cycles from the initial allowance. If at any point the remaining allowance does not cover an outstanding charge, an error is returned and a reject response is gossiped. If the initial allowance does cover all charges, the remaining cycles are refunded. To this end, the budget tracker creates a payment receipt which is gossiped alongside the response.

Before this PR only one budget tracker exited (LegacyTracker) which doesn't compute any per-replica cost, and also doesn't refund anything.

Proposed Changes

This PR introduces the "pay-as-you-go" budget tracker, whose purpose is to calculate the per-replica cost for outcalls using the "pay-as-you-go" pricing (note that such outcalls do not exist yet). This is done by implementing the per-replica part of the pricing formula defined here (internal). To do this, we additionally pass the subnet's size and cycle cost schedule to the HTTP adapter. This is needed to calculate the correct pricing.

"Pay-as-you-go" pricing charges for the amount of bytes that are gossiped explicitly in the case of flexible and non-replicated outcalls (where the whole response is gossiped). To do this, we move the truncation of oversized reject messages into the adapter, such that the correct payload length is charged.

Additionally, we implement and start to use a DarkLaunchTracker. This tracker calculates both, the real (legacy, i.e. 0) and the new (pay-as-you-go) per-replica cost. In the end, only the "real" refund is gossiped. However, this allows us to compare both trackers, and observe whenever the pay-as-you-go tracker returns an out-of-cycles error, while the legacy tracker succeeds. Such an event indicates that the outcall would not be covered by enough cycles under the new pricing. In this case, the canister ID is logged and a metric increased.

The legacy charging flow should be unchanged by this PR.

@eichhorl eichhorl added the CI_ALL_BAZEL_TARGETS Runs all bazel targets label Jun 26, 2026
@eichhorl eichhorl changed the title Draft: HTTPS outcalls dark launch tracker feat: CON-1639 HTTPS outcalls pay-as-you-go and dark launch budget trackers Jun 26, 2026
@github-actions github-actions Bot added the feat label Jun 26, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces the pay-as-you-go per-replica pricing tracker for HTTPS outcalls and a dark-launch tracker that computes the pay-as-you-go result side-by-side with the legacy tracker, emitting metrics/logs on divergences while keeping the legacy externally observable behavior unchanged. It also threads subnet pricing inputs (subnet size + cycles cost schedule) to the HTTP adapter path so the pricing logic can compute correct costs.

Changes:

  • Add PayAsYouGoTracker and DarkLaunchTracker, plus pricing metrics and a PricingFactory that selects trackers by PricingVersion.
  • Extend CanisterHttpRequest with subnet_size and cost_schedule, populated by consensus/pocket-ic and consumed by the adapter client.
  • Move reject-message truncation to the adapter client and add a gossip-usage accounting step before generating the per-replica receipt.

Reviewed changes

Copilot reviewed 16 out of 17 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
rs/types/types/src/canister_http.rs Adds subnet pricing inputs to CanisterHttpRequest and centralizes the reject-message size limit constant.
rs/types/cycles/src/cycles_cost_schedule.rs Adds Hash derivation to support hashing cost schedule values carried in requests.
rs/pocket_ic_server/src/pocket_ic.rs Populates new CanisterHttpRequest pricing fields in PocketIC request construction.
rs/https_outcalls/pricing/src/payg.rs Implements pay-as-you-go per-replica accounting with tests.
rs/https_outcalls/pricing/src/metrics.rs Introduces dark-launch metrics (total evaluated + incompatible by step/replication).
rs/https_outcalls/pricing/src/lib.rs Adds PricingFactory (with metrics/logger), extends BudgetTracker with gossip accounting, wires dark-launch vs payg selection.
rs/https_outcalls/pricing/src/legacy.rs Updates legacy tracker to satisfy new BudgetTracker API (no-op gossip step).
rs/https_outcalls/pricing/src/dark_launch.rs Implements side-by-side “real vs shadow” tracker with divergence logging + metrics and tests.
rs/https_outcalls/pricing/Cargo.toml Adds dependencies required for logging/metrics and cost schedule types.
rs/https_outcalls/pricing/BUILD.bazel Adds Bazel deps for new pricing crate dependencies.
rs/https_outcalls/consensus/src/pool_manager.rs Fetches subnet pricing inputs from registry and passes them to the HTTP adapter request.
rs/https_outcalls/consensus/Cargo.toml Removes ic-utils dependency no longer needed after moving truncation logic.
rs/https_outcalls/consensus/BUILD.bazel Removes //rs/utils Bazel dep accordingly.
rs/https_outcalls/client/src/client.rs Instantiates pricing factory, truncates oversized rejects, and accounts for gossip usage before receipt creation.
rs/https_outcalls/client/Cargo.toml Adds ic-utils dependency for StrEllipsize.
rs/https_outcalls/client/BUILD.bazel Adds //rs/utils dependency for the client crate and tests.
Cargo.lock Updates lockfile for new/added dependencies.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread rs/https_outcalls/consensus/src/pool_manager.rs
Comment thread rs/https_outcalls/client/src/client.rs
@eichhorl eichhorl marked this pull request as ready for review June 26, 2026 13:53
@eichhorl eichhorl requested a review from a team as a code owner June 26, 2026 13:53
@zeropath-ai

zeropath-ai Bot commented Jun 26, 2026

Copy link
Copy Markdown

No security or compliance issues detected. Reviewed everything up to 81c0f81.

Security Overview
Detected Code Changes
Change Type Relevant files
Configuration changes ► Cargo.lock
    Add ic-utils 0.9.0 dependency
    Remove ic-utils 0.9.0 dependency
    Add dependencies for ic-https-outcalls-pricing
Enhancement ► rs/https_outcalls/client/BUILD.bazel
    Add //rs/utils dependency
► rs/https_outcalls/client/Cargo.toml
    Add ic-utils dependency
► rs/https_outcalls/client/src/client.rs
    Introduce PricingFactory and integrate with budgeting and metrics
    Add support for replication kind in metrics
    Truncate oversized reject messages and add ellipsize for safe string handling
    Introduce CountBytes trait
    Update CanisterHttpResponseContent with MAXIMUM_CANISTER_HTTP_ERROR_MESSAGE_BYTES
    Use StrEllipsize for truncating reject messages
► rs/https_outcalls/client/src/metrics.rs
    Add LABEL_REPLICATION for metrics
► rs/https_outcalls/consensus/BUILD.bazel
    Remove //rs/utils dependency
► rs/https_outcalls/consensus/Cargo.toml
    Remove ic-utils dependency
► rs/https_outcalls/consensus/src/pool_manager.rs
    Introduce pricing_inputs function to read subnet size and cost schedule from registry
    Pass subnet size and cost schedule to http_adapter_shim
    Remove MAXIMUM_ALLOWED_ERROR_MESSAGE_BYTES constant and use MAXIMUM_CANISTER_HTTP_ERROR_MESSAGE_BYTES
► rs/https_outcalls/pricing/BUILD.bazel
    Add dependencies for ic_https_outcalls_pricing
► rs/https_outcalls/pricing/Cargo.toml
    Add ic-logger, ic-metrics, ic-types-cycles dependencies
► rs/https_outcalls/pricing/src/dark_launch.rs
    Implement DarkLaunchTracker for comparing real and shadow pricing results
► rs/https_outcalls/pricing/src/legacy.rs
    Use MAX_RESPONSE_TIME constant
Test ► rs/https_outcalls/client/src/client.rs
    Add test for oversized reject message truncation
► rs/https_outcalls/consensus/src/pool_manager.rs
    Remove tests related to oversized reject message pruning and multi-byte character handling

Comment thread rs/https_outcalls/client/src/client.rs Outdated

// A multi-byte message whose 1200 bytes exceed the 1024-byte limit, with
// emoji straddling the truncation boundary to exercise char safety.
let oversized_message = "😀".repeat(300);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this crosses the boundary of 1024 chars.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? There is an assert just below saying that it is

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is that the emoji is encoded using 4 bytes and the byte limit of 1024 bytes is an integer multiple of that so we don't exercise the case of the emoji crossing the byte limit (in this case truncating to exactly 1024 bytes would cut the emoji encoding and the blob won't be a valid string anymore).

Comment thread rs/https_outcalls/consensus/src/pool_manager.rs Outdated
Comment thread rs/types/types/src/canister_http.rs Outdated
use ic_metrics::MetricsRegistry;
use prometheus::IntCounterVec;

/// Label identifying the accounting step at which the shadow tracker diverged

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"diverged" (or "disagreed" below) sounds like they don't agree on the actual charged amount which is expected; I'd rephrase this so that it is clear that we refer to the step at which the shadow tracker reported an error while the legacy tracker succeeded

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point: b8df347

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still occurrences of "divergence" and "disagreed" that I'd suggest to clean up, too.

/// Charges `amount` against the budget. Returns an error if the total spent
/// now exceeds the available allowance.
fn charge(&mut self, amount: u128) -> Result<(), PricingError> {
// A free cost schedule means the subnet charges nothing for resources.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should still track the cost so that canister metrics could be updated based on the actual work done. Canister metrics are used on "free" subnets for cost accounting in user space.

Comment thread rs/https_outcalls/pricing/src/payg.rs Outdated
Comment thread rs/https_outcalls/pricing/src/payg.rs Outdated
Comment thread rs/https_outcalls/pricing/src/payg.rs
Comment thread rs/https_outcalls/pricing/src/payg.rs Outdated
pub struct PayAsYouGoTracker {
/// Number of nodes (`N`) on the subnet.
subnet_size: NumberOfNodes,
/// Whether this responses to this outcalls are gossiped (only flexible and non-replicated).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Whether this responses to this outcalls are gossiped (only flexible and non-replicated).
/// Whether responses to this outcalls are gossiped (only flexible and non-replicated).

self.error_reported = true;
self.metrics
.shadow_incompatible_total
.with_label_values(&[step, self.replication.as_str()])

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add a label to distinguish which one (real vs shadow) resulted in insufficient cycles? Such that you could easily query how many requests succeeded with the real but not with the shadow, which is what we are mostly interested in IIUC.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concerning reporting once per request, wouldn't we be interested to know if the trackers diverge at different steps?

// + 50 * transformed_response_bytes_i * N + transform_instructions_i / 13
const PER_DOWNLOADED_BYTE_FEE: u128 = 50;
const PER_RESPONSE_MS_FEE: u128 = 300;
const TRANSFORM_INSTRUCTION_DIVISOR: u128 = 13;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just making sure, this doesn't need to be dynamic based on the actual size of the subnet, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants