Skip to content

Defer udf 3#171022

Draft
ZhouXing19 wants to merge 9 commits into
cockroachdb:masterfrom
ZhouXing19:defer-udf-3
Draft

Defer udf 3#171022
ZhouXing19 wants to merge 9 commits into
cockroachdb:masterfrom
ZhouXing19:defer-udf-3

Conversation

@ZhouXing19

Copy link
Copy Markdown
Collaborator

For CI only

ZhouXing19 and others added 5 commits May 27, 2026 15:51
Extract the body-building loop from `buildRoutine` into a standalone
method `buildSQLRoutineBodyStmts`. No behavioral change.

A follow-up commit will add a deferred-build path that skips this call
and instead captures the ASTs for later building at execution time.

Release note: None
Add a `RoutineBodyBuilder` interface in `memo`, modeled after
`PostQueryBuilder`, to defer building of SQL routine body statements
to execution time. Add a `BodyBuilder` field to `UDFDefinition`.

Implement `sqlRoutineBodyBuilder` in `optbuilder/routine.go`, which
captures metadata at plan time (parameter types, privilege context,
statement tree snapshot, ASTs) and builds body RelExprs in a fresh
Builder at execution time, following the `buildTriggerCascadeHelper`
pattern used for FK cascades and AFTER triggers.

Add `GetInitFnForDeferredRoutine` to `statementTree`. Unlike
`GetInitFnForPostQuery` which excludes the current stack level, this
captures ALL levels. The difference is that post-queries are children
of the current-level mutation (e.g. a cascade triggered by a DELETE),
while deferred routines are siblings. For example:

    UPDATE t SET x = my_udf();
    -- my_udf() body: INSERT INTO t VALUES (1)

Both the outer UPDATE and the UDF body mutate `t`. Without capturing
the current level, the deferred body would see an empty statement tree
and miss the conflict.

Add nil-Body guards across execbuilder, memo formatter, and norm
factory so existing code tolerates a deferred-build `UDFDefinition`.

Nothing uses these yet — no behavioral change.

Release note: None
With deferred UDF body optimization, body statements are not built into
RelExprs at plan time. The execution layer needs to know whether a
routine can mutate before execution to choose between LeafTxn and
RootTxn (via PlanFlagContainsMutation).

Fix this by computing the CanMutate property at CREATE FUNCTION time
from the optimizer's transitive Relational().CanMutate logical property
(which covers direct DML, mutations in CTEs/subqueries, and nested
mutating UDF calls) and persisting it on the function descriptor. At
query time, the persisted value is read through the Overload and
UDFDefinition and used to set PlanFlagContainsMutation without needing
to build the body.

The descriptor field uses a three-way enum (UNKNOWN_CAN_MUTATE,
CAN_MUTATE, CANNOT_MUTATE) rather than a bool. The zero value
UNKNOWN_CAN_MUTATE means "not yet determined" and causes consumers to
fall back to inspecting the eagerly-built body RelExprs. This handles
pre-existing function descriptors created before this field was
introduced without requiring a migration: they naturally have the zero
value, which triggers the correct fallback behavior. Functions created
or replaced after the version gate is active get CAN_MUTATE or
CANNOT_MUTATE, allowing consumers to skip the body inspection.

For anonymous routines (DO blocks and trigger functions), CanMutate is
derived directly from the body expression at build time, since these
have no descriptor.

The version gate on writing CanMutate is needed for rollback safety:
if the field were written before finalization and the cluster rolled
back, old binaries would not reset it during CREATE OR REPLACE,
leaving stale values that could cause correctness issues after
re-upgrade.

Release note: None
When a function is replaced via CREATE OR REPLACE and becomes mutating,
propagate CAN_MUTATE transitively to all caller functions via the
DependedOnBy back-references. Only the non-mutating → mutating direction
is propagated; the reverse is left conservative (callers keep CAN_MUTATE)
because determining true non-mutating status would require re-analyzing
each caller's entire body to ensure no child routine mutates at all.

Release note: None

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@trunk-io

trunk-io Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

Merging to master in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here

@blathers-crl

blathers-crl Bot commented May 27, 2026

Copy link
Copy Markdown

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity

Copy link
Copy Markdown
Member

This change is Reviewable

ZhouXing19 and others added 4 commits May 27, 2026 13:27
Enable deferred body building for SQL routines: body RelExprs are now
built at execution time rather than plan time.

Two cases still require eager build:
- AnyTuple return type (RECORD without OUT params), because the actual
  return type must be inferred from the body.
- Inlineable UDFs (single-statement, non-volatile, non-set-returning),
  because expression indexes and partial index predicates depend on the
  inlined body at plan time. Without this, CREATE INDEX on an IMMUTABLE
  UDF expression would fail. This restriction is overly conservative
  for regular DML queries — cockroachdb#169459 tracks loosening it to only force
  eager build in contexts that actually require plan-time inlining.

EXPLAIN respects the deferred execution flow: rather than forcing eager
build, a BuildDeferredBody callback on ExprFmtCtx builds deferred
bodies during formatting, showing the full plan structure inline. For
EXPLAIN (OPT, ENV), table refs from deferred body memos are unioned
into the outer metadata so schemas and stats are collected.

A side effect of deferred build is that privilege checks now match
PostgreSQL: EXECUTE on the function is checked before SELECT on tables
referenced in the body (previously reversed because eager build resolved
table refs first).

Release note (performance improvement): SQL routine (UDF/procedure) body
statements are now built at execution time rather than plan time.
…bodies

When SQL routine body building is deferred to execution time, the
plan-time memo lacks body RelExprs and table references. This causes
EXPLAIN ANALYZE (DEBUG) bundles to miss optimizer detail and table
stats/schema for tables referenced inside deferred routines.

This commit propagates execution-time metadata back to the bundle
collector:

- Add DeferredRoutineOptPlans and DeferredRoutineTableRefs fields to
  eval.Context, initialized when bundle collection is active.
- After deferred body building in buildRoutinePlanGenerator, capture the
  formatted optimizer plan (opt-vv level with redaction markers) and all
  table references from the execution-time memo.
- In the bundle collector, emit opt-vv-deferred-<func>.txt files and
  union deferred table refs with plan-time metadata for stats/schema
  collection.

Note: EXPLAIN ANALYZE already uses deferred build with no special
handling needed. The conn_executor intercepts the ExplainAnalyze AST
before the optbuilder runs, strips the EXPLAIN ANALYZE wrapper, and
passes the inner statement through the normal build path where deferred
build is active. Output is generated after execution by walking the
explain.Plan tree (not the memo), so deferred bodies are transparent.

Release note: None
With deferred routine body building, volatile UDF bodies are not built
at plan time — test output previously showed `body (deferred)` with raw
AST text instead of the full RelExpr plan. This created a test coverage
gap for deferred routine body plans.

Set the `BuildDeferredBody` callback in `OptTester.FormatExpr` so that
deferred bodies are built during formatting and tests show full plan
structure inline. The callback builds the body into the outer memo's
factory so column IDs are globally unique across the outer query and
all UDF bodies. This follows the same pattern used for post-query
(cascade/trigger) test formatting in `OptTester.PostQueries`, which
also passes the outer factory to `Build()` for the same reason. Note
that production code (both EXPLAIN and normal execution) correctly uses
a fresh memo since the outer memo may be cached or shared.

Also move `checkExpectedRules` from `postProcess` into a new
`FormatAndCheck` method that runs after formatting, so that rules fired
during deferred body building (e.g. `NormalizeArrayFlattenToAgg`) are
tracked in `appliedRules` before `expect=`/`expect-not=` are checked.

Test data changes fall into two categories:

1. Column ID renumbering: deferred UDF bodies previously showed body-
   memo column IDs starting from :1 (which could collide with outer
   query columns). Now that bodies are built into the outer memo,
   body column IDs continue from where the outer memo left off,
   producing globally unique IDs.

2. Outer query column renumbering: with eager build, body columns were
   allocated before some outer query columns, affecting the outer
   column numbering. With deferred build, the outer query columns are
   allocated first (body isn't built yet), so outer columns may get
   lower IDs than before.

Release note: None
The udf_mutations subtest in logprops/udf relied on the opt test
catalog providing a correct CanMutate value on function overloads.
With deferred UDF body building, the test catalog can no longer
derive this from the body RelExprs, and the test catalog doesn't
persist CanMutate (it bypasses the optbuilder's buildCreateFunction).

Move the test to a logic test where the production pipeline (DSC
with CanMutate on the descriptor) handles it correctly.

Epic: none
Release note: None

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants