Skip to content

perf(task-detail): cache cloud diff stats by content identity#2804

Merged
tatoalo merged 2 commits into
mainfrom
perf/cache-cloud-diff-stats
Jun 23, 2026
Merged

perf(task-detail): cache cloud diff stats by content identity#2804
tatoalo merged 2 commits into
mainfrom
perf/cache-cloud-diff-stats

Conversation

@pauldambra

Copy link
Copy Markdown
Member

Problem

Opening a large cloud task froze the renderer (100% CPU, no loaders — synchronous work). extractCloudToolChangedFiles line-diffs every changed file via getDiffStats, and it re-runs whenever the session events array changes identity. A big session log loads/streams in chunks, so every changed file was re-diffed on every chunk — O(files × updates).

Found by loading a production database (766 workspaces, 5.9 GB of logs, biggest single conversation 449 MB / 203k events) into a dev instance and CPU-profiling task loads over CDP. getDiffStats was the single biggest frame — 28% / ~2,088ms on the worst load.

Change

The diff content objects are stable across these recomputes (they're the session events' own objects), so cache getDiffStats results in a WeakMap keyed on the diff object. Repeated recomputes become O(new diffs) instead of O(all files). Output is unchanged — the cache only avoids recompute.

Measurement

Same CDP profile, before/after:

  • getDiffStats: ~2,088ms (28% of the worst load) → ~0 (dropped out of the hot path entirely).
  • A representative large cloud task: 4,101ms → 1,282ms blocked main-thread time on load.

A parameterised test pins the diff-stat values (added/removed across new/modified/removal) and that reusing the same diff object returns identical results.

Scope

This is the cloud-conversation half of the freeze. The sidebar half (re-rendering all task rows on navigation) is a separate PR (memoizing TaskRow). Residual cost on the most extreme 449 MB task is now diffuse (log parsing, file-search indexing) with no single dominant frame — a possible future follow-up is incrementalizing buildCloudEventSummary, but the dominant freeze is resolved.

🤖 Generated with Claude Code

Opening a large cloud task froze the renderer. `extractCloudToolChangedFiles`
line-diffs every changed file via `getDiffStats`, and it re-runs whenever the
session `events` array changes - which during load/stream of a big log happens
on every chunk. So every changed file was re-diffed on every update: O(files x
updates), the dominant cost (28% / ~2s) when opening the worst tasks.

The diff content objects are stable across these recomputes (they are the
session events' own objects), so cache `getDiffStats` by diff-object identity
in a WeakMap. Repeated recomputes become O(new diffs) instead of O(all files).

Measured on a production-scale dataset (CPU profile over CDP, task-load blocked
main-thread time): `getDiffStats` dropped from ~2088ms / 28% of the worst load
to ~0 (out of the hot path); a representative large cloud task fell 4101ms ->
1282ms blocked. Results are unchanged - the cache only avoids recompute - and a
parameterised test pins the diff-stat values plus same-object reuse.

This is the cloud-conversation half of the sidebar/cloud freeze pair; the
sidebar half is addressed separately by memoizing the task-list rows.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 20, 2026

Copy link
Copy Markdown

React Doctor found no issues in the changed files. 🎉

Reviewed by React Doctor for commit d46ed41.

@greptile-apps

greptile-apps Bot commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Comments Outside Diff (1)

  1. packages/core/src/task-detail/cloudToolChanges.test.ts, line 112-130 (link)

    P2 Cache test only proves idempotency, not caching

    The test named "returns identical stats when the same diff object is reused (cache)" verifies that two calls with the same content reference return equal linesAdded/linesRemoved values. That assertion would pass even if the WeakMap were removed and cachedDiffStats delegated directly to getDiffStats every time — both calls produce the same result regardless. To actually guard the caching contract, the test would need to spy on getDiffStats and assert it is called exactly once across the two extractCloudToolChangedFiles invocations.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: packages/core/src/task-detail/cloudToolChanges.test.ts
    Line: 112-130
    
    Comment:
    **Cache test only proves idempotency, not caching**
    
    The test named `"returns identical stats when the same diff object is reused (cache)"` verifies that two calls with the same `content` reference return equal `linesAdded`/`linesRemoved` values. That assertion would pass even if the `WeakMap` were removed and `cachedDiffStats` delegated directly to `getDiffStats` every time — both calls produce the same result regardless. To actually guard the caching contract, the test would need to spy on `getDiffStats` and assert it is called exactly once across the two `extractCloudToolChangedFiles` invocations.
    
    How can I resolve this? If you propose a fix, please make it concise.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
packages/core/src/task-detail/cloudToolChanges.test.ts:112-130
**Cache test only proves idempotency, not caching**

The test named `"returns identical stats when the same diff object is reused (cache)"` verifies that two calls with the same `content` reference return equal `linesAdded`/`linesRemoved` values. That assertion would pass even if the `WeakMap` were removed and `cachedDiffStats` delegated directly to `getDiffStats` every time — both calls produce the same result regardless. To actually guard the caching contract, the test would need to spy on `getDiffStats` and assert it is called exactly once across the two `extractCloudToolChangedFiles` invocations.

Reviews (1): Last reviewed commit: "perf(task-detail): cache cloud diff stat..." | Re-trigger Greptile

Review (paul + xp) noted the previous "cache" test asserted value-equality,
which passes even with the WeakMap removed (getDiffStats is pure) - it tested
purity, not memoization. Export cachedDiffStats and assert a cache hit returns
the same instance (and a distinct-but-equal diff object recomputes), so the
test now fails if the cache regresses.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@pauldambra

Copy link
Copy Markdown
Member Author

Note

🤖 Automated comment by QA Swarm — not written by a human

Clean-room review: qa-team, paul-reviewer, xp-reviewer, security-audit — all on Opus.

Verdict: ✅ APPROVE (unanimous)

All four approved; this is a clean, correct, well-targeted memoization.

Verified (the load-bearing correctness question)

The cache key — the diff content object — is sound. All four reviewers independently traced it: getDiffStats reads only oldText/newText, which live on the key object itself; the session store appends events (sessionStore push, never mutates in place), so already-seen diff objects keep identity across the buildCloudEventSummary recomputes — the WeakMap hits, and a cached value can never go stale relative to its key. WeakMap is the right structure (self-evicting, no leak). Behaviour is value-identical; only recompute is avoided. security-audit: no cross-session contamination, no findings.

Actioned (convergent nit, paul + xp) — fixed in d46ed41

The original cache test asserted value-equality, which passes even with the WeakMap removed (since getDiffStats is pure) — it tested purity, not the cache. Exported cachedDiffStats and rewrote the test to assert a cache hit returns the same instance (and a distinct-but-equal diff object recomputes), so it now fails if the cache regresses. This also pins qa-team's shared-return-reference observation.

Non-blocking notes (left as-is)

  • Module-level WeakMap is a process-global, but it's a pure memoization table (value derived from key), not mutable shared state — safe; reviewers agreed leaving it.
  • xp/qa-team noted the deeper fix is incrementalizing buildCloudEventSummary (so the whole summary isn't rebuilt each update) — a possible follow-up; the cache is a correct, contained win for now.
Reviewer Verdict
🔍 qa-team APPROVE, risk NONE. Cache key sound, no leak, behaviour preserved.
👤 paul High trust, ship it. Verified the key holds; assert instance identity (done).
📐 xp Correctly diagnosed + built; tighten the cache test (done).
🛡 security-audit No findings.

Automated by QA Swarm — not a human review

@pauldambra pauldambra added the Stamphog This will request an autostamp by stamphog on small changes label Jun 20, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WeakMap-based memoization of diff stats by object identity — leak-safe, correct, well-tested. No architectural violations or risk to production.

@pauldambra pauldambra requested a review from a team June 20, 2026 19:59
@tatoalo tatoalo merged commit 0d9885a into main Jun 23, 2026
26 checks passed
@tatoalo tatoalo deleted the perf/cache-cloud-diff-stats branch June 23, 2026 12:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Stamphog This will request an autostamp by stamphog on small changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants