Defer udf 3#171022
Draft
ZhouXing19 wants to merge 9 commits into
Draft
Conversation
Extract the body-building loop from `buildRoutine` into a standalone method `buildSQLRoutineBodyStmts`. No behavioral change. A follow-up commit will add a deferred-build path that skips this call and instead captures the ASTs for later building at execution time. Release note: None
Add a `RoutineBodyBuilder` interface in `memo`, modeled after
`PostQueryBuilder`, to defer building of SQL routine body statements
to execution time. Add a `BodyBuilder` field to `UDFDefinition`.
Implement `sqlRoutineBodyBuilder` in `optbuilder/routine.go`, which
captures metadata at plan time (parameter types, privilege context,
statement tree snapshot, ASTs) and builds body RelExprs in a fresh
Builder at execution time, following the `buildTriggerCascadeHelper`
pattern used for FK cascades and AFTER triggers.
Add `GetInitFnForDeferredRoutine` to `statementTree`. Unlike
`GetInitFnForPostQuery` which excludes the current stack level, this
captures ALL levels. The difference is that post-queries are children
of the current-level mutation (e.g. a cascade triggered by a DELETE),
while deferred routines are siblings. For example:
UPDATE t SET x = my_udf();
-- my_udf() body: INSERT INTO t VALUES (1)
Both the outer UPDATE and the UDF body mutate `t`. Without capturing
the current level, the deferred body would see an empty statement tree
and miss the conflict.
Add nil-Body guards across execbuilder, memo formatter, and norm
factory so existing code tolerates a deferred-build `UDFDefinition`.
Nothing uses these yet — no behavioral change.
Release note: None
With deferred UDF body optimization, body statements are not built into RelExprs at plan time. The execution layer needs to know whether a routine can mutate before execution to choose between LeafTxn and RootTxn (via PlanFlagContainsMutation). Fix this by computing the CanMutate property at CREATE FUNCTION time from the optimizer's transitive Relational().CanMutate logical property (which covers direct DML, mutations in CTEs/subqueries, and nested mutating UDF calls) and persisting it on the function descriptor. At query time, the persisted value is read through the Overload and UDFDefinition and used to set PlanFlagContainsMutation without needing to build the body. The descriptor field uses a three-way enum (UNKNOWN_CAN_MUTATE, CAN_MUTATE, CANNOT_MUTATE) rather than a bool. The zero value UNKNOWN_CAN_MUTATE means "not yet determined" and causes consumers to fall back to inspecting the eagerly-built body RelExprs. This handles pre-existing function descriptors created before this field was introduced without requiring a migration: they naturally have the zero value, which triggers the correct fallback behavior. Functions created or replaced after the version gate is active get CAN_MUTATE or CANNOT_MUTATE, allowing consumers to skip the body inspection. For anonymous routines (DO blocks and trigger functions), CanMutate is derived directly from the body expression at build time, since these have no descriptor. The version gate on writing CanMutate is needed for rollback safety: if the field were written before finalization and the cluster rolled back, old binaries would not reset it during CREATE OR REPLACE, leaving stale values that could cause correctness issues after re-upgrade. Release note: None
When a function is replaced via CREATE OR REPLACE and becomes mutating, propagate CAN_MUTATE transitively to all caller functions via the DependedOnBy back-references. Only the non-mutating → mutating direction is propagated; the reverse is left conservative (callers keep CAN_MUTATE) because determining true non-mutating status would require re-analyzing each caller's entire body to ensure no child routine mutates at all. Release note: None Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor
|
Merging to
After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here |
|
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Member
Enable deferred body building for SQL routines: body RelExprs are now built at execution time rather than plan time. Two cases still require eager build: - AnyTuple return type (RECORD without OUT params), because the actual return type must be inferred from the body. - Inlineable UDFs (single-statement, non-volatile, non-set-returning), because expression indexes and partial index predicates depend on the inlined body at plan time. Without this, CREATE INDEX on an IMMUTABLE UDF expression would fail. This restriction is overly conservative for regular DML queries — cockroachdb#169459 tracks loosening it to only force eager build in contexts that actually require plan-time inlining. EXPLAIN respects the deferred execution flow: rather than forcing eager build, a BuildDeferredBody callback on ExprFmtCtx builds deferred bodies during formatting, showing the full plan structure inline. For EXPLAIN (OPT, ENV), table refs from deferred body memos are unioned into the outer metadata so schemas and stats are collected. A side effect of deferred build is that privilege checks now match PostgreSQL: EXECUTE on the function is checked before SELECT on tables referenced in the body (previously reversed because eager build resolved table refs first). Release note (performance improvement): SQL routine (UDF/procedure) body statements are now built at execution time rather than plan time.
…bodies When SQL routine body building is deferred to execution time, the plan-time memo lacks body RelExprs and table references. This causes EXPLAIN ANALYZE (DEBUG) bundles to miss optimizer detail and table stats/schema for tables referenced inside deferred routines. This commit propagates execution-time metadata back to the bundle collector: - Add DeferredRoutineOptPlans and DeferredRoutineTableRefs fields to eval.Context, initialized when bundle collection is active. - After deferred body building in buildRoutinePlanGenerator, capture the formatted optimizer plan (opt-vv level with redaction markers) and all table references from the execution-time memo. - In the bundle collector, emit opt-vv-deferred-<func>.txt files and union deferred table refs with plan-time metadata for stats/schema collection. Note: EXPLAIN ANALYZE already uses deferred build with no special handling needed. The conn_executor intercepts the ExplainAnalyze AST before the optbuilder runs, strips the EXPLAIN ANALYZE wrapper, and passes the inner statement through the normal build path where deferred build is active. Output is generated after execution by walking the explain.Plan tree (not the memo), so deferred bodies are transparent. Release note: None
With deferred routine body building, volatile UDF bodies are not built at plan time — test output previously showed `body (deferred)` with raw AST text instead of the full RelExpr plan. This created a test coverage gap for deferred routine body plans. Set the `BuildDeferredBody` callback in `OptTester.FormatExpr` so that deferred bodies are built during formatting and tests show full plan structure inline. The callback builds the body into the outer memo's factory so column IDs are globally unique across the outer query and all UDF bodies. This follows the same pattern used for post-query (cascade/trigger) test formatting in `OptTester.PostQueries`, which also passes the outer factory to `Build()` for the same reason. Note that production code (both EXPLAIN and normal execution) correctly uses a fresh memo since the outer memo may be cached or shared. Also move `checkExpectedRules` from `postProcess` into a new `FormatAndCheck` method that runs after formatting, so that rules fired during deferred body building (e.g. `NormalizeArrayFlattenToAgg`) are tracked in `appliedRules` before `expect=`/`expect-not=` are checked. Test data changes fall into two categories: 1. Column ID renumbering: deferred UDF bodies previously showed body- memo column IDs starting from :1 (which could collide with outer query columns). Now that bodies are built into the outer memo, body column IDs continue from where the outer memo left off, producing globally unique IDs. 2. Outer query column renumbering: with eager build, body columns were allocated before some outer query columns, affecting the outer column numbering. With deferred build, the outer query columns are allocated first (body isn't built yet), so outer columns may get lower IDs than before. Release note: None
The udf_mutations subtest in logprops/udf relied on the opt test catalog providing a correct CanMutate value on function overloads. With deferred UDF body building, the test catalog can no longer derive this from the body RelExprs, and the test catalog doesn't persist CanMutate (it bypasses the optbuilder's buildCreateFunction). Move the test to a logic test where the production pipeline (DSC with CanMutate on the descriptor) handles it correctly. Epic: none Release note: None Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
For CI only