Bound HTTP transaction lifetime#3484
Merged
Merged
Conversation
Contributor
|
VOTE +1 |
1 similar comment
Contributor
|
VOTE +1 |
The idle timeout was armed on request arrival rather than when the transaction went idle, so a single operation running longer than the timeout tripped it mid-execution, contradicting the documented promise that active transactions are unaffected. A long operation should be bounded by evaluationTimeout; the idle timer should only reclaim abandoned transactions. The per-transaction executor is now a ThreadPoolExecutor(1,1) whose before/afterExecute hooks suspend the idle timer while work runs and re-arm it only once the worker parks with an empty queue. This gives a reliable running-vs-idle signal without wrapping submitted tasks, which would break the evaluation-timeout interrupt that relies on cancelling the real FutureTask. transactionTimeout is renamed to idleTransactionTimeout to reflect its actual meaning (renamed outright as the feature is unreleased), and now honors 0 as "disabled" to match its documentation. Assisted-by: Claude Code:claude-opus-4-8
The idle timeout only reclaims transactions that go quiet; a client could still hold a transaction (and its dedicated worker thread and concurrency slot) open indefinitely with a single long operation or a keep-alive drip. maxTransactionLifetime bounds total transaction age regardless of activity: when it fires it interrupts the running operation and rolls the transaction back, so the in-flight client gets a transaction-timeout (504) rather than a misleading evaluation-timeout error. Rather than validate timeout configuration and fail begins (or silently override a client's timeoutMs) when bounds are disabled, the server ships sane defaults instead: idle reclamation at 1 minute and a lifetime cap at 10 minutes. A transaction is bounded out of the box, disabling the bounds is a deliberate operator choice, and a per-request timeoutMs is always honored as sent rather than second-guessed on the client's behalf. Assisted-by: Claude Code:claude-opus-4-8
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bound HTTP transaction lifetime: suspend idle timer while busy + add
maxTransactionLifetimeWhat changed
Two related changes to how Gremlin Server bounds the lifetime of an HTTP transaction (each open transaction owns a dedicated worker thread and a
maxConcurrentTransactionsslot, so an unbounded transaction is a real resourceleak).
Idle timer suspends while busy. The
transactionTimeoutidle timer was armed on request arrival, so a single operation running longer than the timeout would trip it mid-execution and roll back a perfectly healthytransaction. It now arms only when the transaction goes genuinely idle (no operation running or queued) and is suspended while work is in flight — a long operation is bounded by
evaluationTimeout, not the idle timer. Thesetting is renamed
transactionTimeout→idleTransactionTimeoutto reflect what it actually means, and0now correctly disables it (the docs always claimed this; the code never honored it).New
maxTransactionLifetimeabsolute cap. A new setting that bounds total transaction age regardless of activity, closing the gap where a client holds a transaction open indefinitely via one long operation orkeep-alive drips. When it fires it interrupts the running operation and rolls the transaction back, and the in-flight client receives a transaction-timeout (504) rather than a misleading "increase evaluationTimeout" error.
Why
The committed idle-timer behavior acted more like a per request timeout instead of the described idle timeout and there was no ceiling on transaction lifetime at all. Together these give three composable bounds —
per-operation (
evaluationTimeout), between-operations (idleTransactionTimeout), and whole-transaction (maxTransactionLifetime) — mirroring PostgreSQL'sstatement_timeout/idle_in_transaction_session_timeout/transaction_timeout.Notably, the server does not validate these settings or reject begins when bounds are disabled. Instead it ships sane defaults (idle 1 min, lifetime 10 min): a transaction is bounded out of the box, disabling the bounds
is a deliberate operator choice, and a client's per-request
timeoutMsis always honored as sent rather than second-guessed.Review guide
UnmanagedTransaction— the executor swap. The single-thread executor is now aThreadPoolExecutor(1,1)subclass purely to exposebeforeExecute/afterExecute+ the queue, which drive the suspend-while-busy logic.The key invariant: submitted tasks must not be wrapped —
submit()returns the sameFutureTaskso the eval-timeout / capcancel(true)interrupts the real work.maybeScheduleIdleTimerre-checksacceptingafter arming (so a concurrentclose()can't be raced into re-arming a dying transaction); the in-flight op is tracked as a singleimmutable
Running(future, context)pair so the cap never flags one operation'sContextwhile interrupting another's future.UnmanagedTransaction(it must see the executor hooks); the lifetime cap is scheduled/cancelled byTransactionManager(a fixed schedule tied toregistry membership). The cap is armed after
putIfAbsentso it can never fire into an unregistered transaction and leak a thread;destroy()cancels it on every close path.close()ordering is still load-bearing —manager.destroy()beforeexecutor.shutdown(), gracefulshutdown()(notshutdownNow()). The cap path reuses this exact path; verify it's unchanged.TransactionExceptionvia aContext.closedByLifetimeCapflag set before the interrupt; ordinary eval timeout still → 500. Both the eval-timeout writer andformErrorResponseMessagearecap-aware so the code is correct regardless of which thread writes the response first.
ManualScheduledExecutorService(noThread.sleepflakiness); integration tests assert the guarantee actually made (transaction reclaimed /subsequent 404), since timing can't reliably catch the mid-op interrupt.
VOTE +1