Skip to content

Fix: detect mutating keywords inside CTEs in read-only mode#292

Closed
tianzhou wants to merge 12 commits into
mainfrom
fix/issue-271-cte-readonly-bypass
Closed

Fix: detect mutating keywords inside CTEs in read-only mode#292
tianzhou wants to merge 12 commits into
mainfrom
fix/issue-271-cte-readonly-bypass

Conversation

@tianzhou

Copy link
Copy Markdown
Member

Summary

  • Read-only SQL validation only checked the first keyword, allowing WITH cte AS (UPDATE/INSERT/DELETE ...) SELECT ... to bypass restrictions
  • Now scans the full statement (after stripping comments and string literals) for DML/DDL keywords in WITH/SELECT/EXPLAIN contexts
  • Dialect-aware REPLACE detection: matches REPLACE INTO statement form for MySQL/MariaDB/SQLite without false-positiving on REPLACE() function calls or identifiers

Changes

  • src/utils/allowed-keywords.ts:
    • Added mutating keyword scan for WITH statements (INSERT, UPDATE, DELETE, DROP, ALTER, CREATE, TRUNCATE, MERGE, GRANT, REVOKE, RENAME)
    • Added REPLACE INTO detection for MySQL/MariaDB/SQLite via per-dialect pattern map
    • Added SELECT ... INTO detection for both SELECT and WITH statements
    • Added EXPLAIN ANALYZE validation — recursively checks the inner statement
    • Handles EXPLAIN (ANALYZE false/off) and EXPLAIN ANALYZE VERBOSE correctly
    • Removed standalone ANALYZE from allowed keywords (it updates statistics)
    • Extracted checkReadOnly() to avoid re-running stripCommentsAndStrings on recursive calls
  • src/utils/__tests__/allowed-keywords.test.ts: Added 40+ test cases covering CTE bypass, REPLACE handling, SELECT INTO, EXPLAIN ANALYZE, and edge cases

Closes #271

Test plan

  • All 56 allowed-keywords unit tests pass
  • Pure SELECT CTEs still allowed
  • REPLACE() function calls not falsely rejected
  • Mutating keywords in string literals/comments not falsely rejected
  • No regressions in other unit tests

🤖 Generated with Claude Code

tianzhou and others added 11 commits March 28, 2026 06:59
The read-only SQL check only examined the first keyword, so queries like
`WITH updated AS (UPDATE ...) SELECT ...` bypassed the restriction.
Now scans the full statement (after stripping comments and strings) for
DML/DDL keywords.

Closes #271

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
REPLACE followed by ( is a string function call (e.g. SELECT REPLACE(...)),
not a mutating REPLACE INTO statement. Use negative lookahead to avoid
false positives.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Only scan for mutating keywords when first keyword is WITH (fixes
  false rejection of SHOW CREATE TABLE in MySQL/MariaDB)
- Add SELECT ... INTO detection for Postgres/MySQL/SQL Server
- EXPLAIN with DML is allowed since it doesn't execute the statement
- Add 6 more tests for SHOW CREATE, EXPLAIN, and SELECT INTO

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…n CTEs

- WITH statements now also check for SELECT ... INTO
- EXPLAIN ANALYZE with DML is rejected (Postgres executes the statement)
- EXPLAIN without ANALYZE remains allowed with DML (plan only)
- Fixed regex word boundary for parenthesized EXPLAIN (ANALYZE) form
- Added tests for REPLACE() inside CTEs, WITH...SELECT INTO,
  EXPLAIN ANALYZE DELETE, EXPLAIN (ANALYZE) DELETE

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reuse isReadOnlySQL recursively on the statement after EXPLAIN ANALYZE,
so SELECT INTO and other write-capable forms are also caught.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Strip optional VERBOSE keyword after ANALYZE so the recursive
isReadOnlySQL check receives the actual inner statement.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
REPLACE INTO is a MySQL/MariaDB-specific statement. In Postgres/SQLite/
SQL Server, REPLACE is only a function, so it should not trigger the
mutating keyword check. This avoids false rejections when 'replace'
appears as a CTE name or alias in non-MySQL dialects.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove ANALYZE from allowed first keywords since it updates statistics
  (writes to system catalogs) in Postgres/MySQL/SQLite
- EXPLAIN (ANALYZE false/off/0) does not execute, so allow it with DML
- Add MySQL/MariaDB REPLACE() function tests to ensure no false positives

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous regex `replace(?!\s*\()` rejected any standalone word
`replace` that wasn't a function call, causing false positives for
identifiers named `replace` (e.g. CTE aliases). Now only matches
`REPLACE [LOW_PRIORITY|DELAYED] INTO` which is the actual MySQL
REPLACE statement syntax.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SQLite also supports REPLACE INTO, so the mutating keyword scan
needs to detect it for SQLite connectors, not just MySQL/MariaDB.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Merge mutatingPatternSQLite into shared mutatingPatternWithReplace
- Replace getMutatingPattern() with Record<ConnectorType, RegExp> lookup
- Use match(/\S+/) instead of split(/\s+/)[0] to avoid full array alloc
- Use exec() + slice() instead of test() + replace() to avoid double regex
- Extract checkReadOnly() to skip re-running stripCommentsAndStrings on
  recursive EXPLAIN ANALYZE calls
- Unify selectIntoPattern branches for select/with

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 28, 2026 14:26

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR tightens DBHub’s read-only SQL enforcement by preventing data-modifying statements from being hidden inside otherwise “allowed” top-level statements (notably WITH ... CTEs and EXPLAIN ANALYZE ...).

Changes:

  • Extend isReadOnlySQL() to scan WITH statements for mutating keywords (and block SELECT ... INTO cases).
  • Add dialect-specific detection for REPLACE INTO in MySQL/MariaDB/SQLite.
  • Add recursive validation for EXPLAIN ANALYZE and expand unit tests for the new read-only checks.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/utils/allowed-keywords.ts Adds deeper read-only validation for CTEs / SELECT INTO / EXPLAIN ANALYZE, plus mutating-keyword scanning and MySQL-family REPLACE handling.
src/utils/__tests__/allowed-keywords.test.ts Adds extensive coverage for CTE bypasses, REPLACE handling, SELECT INTO, and EXPLAIN ANALYZE edge cases.

Comment thread src/utils/allowed-keywords.ts Outdated
Comment on lines +42 to +47
* Extended pattern for dialects that support REPLACE INTO (MySQL/MariaDB/SQLite).
* Only matches `REPLACE INTO` (with optional LOW_PRIORITY/DELAYED), not
* REPLACE() function calls or identifiers named `replace`.
*/
const mutatingPatternWithReplace = new RegExp(
`\\b(?:${mutatingKeywords.join("|")}|replace\\s+(?:(?:low_priority|delayed)\\s+)?into)\\b`,

Copilot AI Mar 28, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mutatingPatternWithReplace only matches REPLACE ... INTO, but in MySQL/MariaDB the INTO keyword is optional for REPLACE statements (e.g. REPLACE tbl_name SET ...). As a result, a mutating statement can still bypass readonly mode when wrapped in a WITH statement.

Adjust the dialect pattern to also detect REPLACE statements without INTO (while still avoiding false positives on REPLACE() function calls, e.g. by rejecting replace when it’s not immediately followed by ().

Suggested change
* Extended pattern for dialects that support REPLACE INTO (MySQL/MariaDB/SQLite).
* Only matches `REPLACE INTO` (with optional LOW_PRIORITY/DELAYED), not
* REPLACE() function calls or identifiers named `replace`.
*/
const mutatingPatternWithReplace = new RegExp(
`\\b(?:${mutatingKeywords.join("|")}|replace\\s+(?:(?:low_priority|delayed)\\s+)?into)\\b`,
* Extended pattern for dialects that support REPLACE (MySQL/MariaDB/SQLite).
* Matches REPLACE statements (including those without INTO) but not REPLACE()
* function calls, by rejecting `replace` when it is immediately followed by `(`.
*/
const mutatingPatternWithReplace = new RegExp(
`\\b(?:${mutatingKeywords.join("|")}|replace(?!\\s*\\()))\\b`,

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 87f75fe. Changed pattern to replace(?!\s*\() to catch all REPLACE statement forms (including REPLACE tbl SET ... without INTO). Accepts the minor false positive on identifiers named replace in MySQL/MariaDB/SQLite as a security tradeoff.

Comment on lines +131 to +149
it("should reject REPLACE INTO as a mutating statement", () => {
const sql = "REPLACE INTO users (id, name) VALUES (1, 'test')";
expect(isReadOnlySQL(sql, "mysql")).toBe(false);
});

it("should allow a CTE named 'replace' in Postgres", () => {
const sql = "WITH replace AS (SELECT 1) SELECT * FROM replace";
expect(isReadOnlySQL(sql, "postgres")).toBe(true);
});

it("should allow a CTE named 'replace' in MySQL", () => {
const sql = "WITH replace AS (SELECT 1) SELECT * FROM replace";
expect(isReadOnlySQL(sql, "mysql")).toBe(true);
});

it("should reject REPLACE (non-function) inside WITH in MySQL", () => {
const sql = "WITH cte AS (SELECT 1) REPLACE INTO users VALUES (1, 'test')";
expect(isReadOnlySQL(sql, "mysql")).toBe(false);
});

Copilot AI Mar 28, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s test coverage for REPLACE INTO ..., but no test that asserts readonly rejection for the valid MySQL/MariaDB form REPLACE tbl_name SET ... (where INTO is optional). Adding such a test would prevent regressions and would have caught the current bypass in the pattern matcher.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 87f75fe. Added tests for REPLACE tbl SET ... (both standalone and inside WITH).

MySQL/MariaDB allow REPLACE without INTO (e.g. REPLACE tbl SET ...).
Changed pattern from matching only REPLACE INTO to matching any
REPLACE not followed by ( to catch all REPLACE statement forms.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@tianzhou tianzhou closed this Mar 28, 2026
@tianzhou tianzhou deleted the fix/issue-271-cte-readonly-bypass branch June 9, 2026 10:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Execute update in CTE in postgres read-only

2 participants