perf(session-replay): Reduce capture stutters#7851
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## fix/replay-video-assembly #7851 +/- ##
===============================================================
+ Coverage 87.633% 87.678% +0.044%
===============================================================
Files 559 562 +3
Lines 32289 32626 +337
Branches 13223 13394 +171
===============================================================
+ Hits 28296 28606 +310
- Misses 3944 3971 +27
Partials 49 49
... and 3 files with indirect coverage changes Continue to review full report in Codecov by Harness.
|
|
I quickly glanced over the changes and while I believe it to be viable (with the adoption of the existing My main concern is that we are not taking screenshots at a one second interval anymore, but instead the interval can be longer than that (less than a second is debounced). In that case we need to look into two topics:
|
|
Follow-up pushed in This implements option 1 from Phil’s note: the video writer now holds the last captured bitmap across missing constant-FPS frame slots, including the tail of the requested replay window, so deferred/backed-off captures no longer compress replay time. I also fixed video duration calculation for frame rates above 1 FPS. For Noah’s run-loop comment, the capture path skips tracking/interactive run-loop modes via the current run-loop mode check; this is covered by Local verification: |
|
Additional production-readiness evidence for Phil’s timing concern:
I would not use the Simulator timing values as proof of the performance improvement; they are not comparable to the physical-device before/after benchmark in the PR description. This only verifies integration/build/run behavior for the current branch. Physical-device benchmarking remains the right validation for final performance confidence. |
|
Pushed 6c8b177 to tighten skipped-frame extrapolation: sparse capture timestamps now use ceil when mapped into constant-FPS slots, so a fractional gap such as 2.4s is held through the next frame slot instead of rounding down and shortening the replay. Covered by testProcessFrames_WhenVideoEndHasFractionalGap_ShouldNotCompressDuration; replay-focused Catalyst suite passed locally (68 tests). |
📲 Install BuildsiOS
|
Performance metrics 🚀
|
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| 4832942 | 1230.43 ms | 1265.88 ms | 35.45 ms |
| f98aa1c | 1229.22 ms | 1259.48 ms | 30.26 ms |
| f17e4d9 | 1224.52 ms | 1255.33 ms | 30.81 ms |
App size
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| 4832942 | 24.14 KiB | 1.18 MiB | 1.16 MiB |
| f98aa1c | 24.14 KiB | 1.18 MiB | 1.16 MiB |
| f17e4d9 | 24.14 KiB | 1.18 MiB | 1.16 MiB |
Previous results on branch: perf-session-replay-capture-backoff
Startup times
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| 14db94d | 1229.34 ms | 1261.52 ms | 32.18 ms |
| c28cabb | 1228.83 ms | 1255.35 ms | 26.52 ms |
| 31d64de | 1224.31 ms | 1257.52 ms | 33.21 ms |
| 9933251 | 1233.05 ms | 1263.65 ms | 30.60 ms |
App size
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| 14db94d | 24.14 KiB | 1.19 MiB | 1.17 MiB |
| c28cabb | 24.14 KiB | 1.19 MiB | 1.17 MiB |
| 31d64de | 24.14 KiB | 1.19 MiB | 1.16 MiB |
| 9933251 | 24.14 KiB | 1.19 MiB | 1.16 MiB |
Move activity scanning onto SentryViewSubtreeTraversal so capture guard checks share the same excluded-subtree handling as view capture. Keep excluded subtrees from triggering interaction backoff during replay capture.
Use an enum with static helpers for SentrySessionReplayCaptureGuard and remove the unused stored instance from SentrySessionReplay.
Replace the local nanoseconds-per-second constant with NSEC_PER_SEC when calculating screenshot capture duration.
Promote buffer replays to full session replay on the main thread so lastScreenshotAt remains main-thread confined. Add coverage for captureReplay from a background queue.
Rotate the capture scheduler token on every observer start so a delayed stop from an older observer generation cannot remove the active observer. Guard full-session state with the existing replay lock so background replay capture and main run-loop capture read a consistent mode.
Guard resume starts with a scheduler generation so a later pause can invalidate work queued to the main thread. Run scheduler start and stop on the main thread synchronously to keep observer installation and removal ordered.
Compute excluded view class patterns once per subtree traversal instead of rebuilding the set for every visited view.
Avoid synchronously hopping to the main thread from SentrySessionReplay deinit, while keeping normal pause and resume scheduler stops ordered.
This reverts commit eb9eb9e. Keep deinit cleanup consistent with the explicit session replay lifecycle paths.
Do not move the next screenshot deadline while full-session replay is paused. This lets replay capture immediately after resume when the configured interval already elapsed during the pause.
Return whether the run-loop observer was installed and clear replay scheduler state when observer creation fails. This avoids reporting replay as running when no capture callback can fire.
Use the capture scheduler token itself to represent running state and remove the duplicate boolean. Keep observer creation failure as a local scheduler no-op, matching the existing HangTracker approach.
Keep the original explanation for using type descriptions in the shared view subtree traversal helper.
Extract the default CameraUI subtree exclusion pattern into a named constant.
942a18b to
5f74b10
Compare
| let resumeDate = dateProvider.date() | ||
| let schedulerGeneration = lock.synchronized { () -> Int? in | ||
| if _isFullSession && isSessionPaused { | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Bug: The lifecycle resume() method incorrectly checks the network-related isSessionPaused flag, preventing session replay from restarting when the app foregrounds while offline.
Severity: HIGH
Suggested Fix
Decouple the lifecycle resume logic from the network pause state. Remove the if _isFullSession && isSessionPaused check from the resume() method. This will allow the capture scheduler to restart when the app is foregrounded, regardless of the current network connectivity status, ensuring the system is ready to capture once online.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: Sources/Swift/Integrations/SessionReplay/SentrySessionReplay.swift#L177-L181
Potential issue: A race condition between app lifecycle and network connectivity events
can cause session replay to stop recording. If the app goes offline, is backgrounded,
and then foregrounded while still offline, the capture scheduler fails to restart. This
occurs because the lifecycle `resume()` method incorrectly checks the `isSessionPaused`
flag (which is set due to network status) and returns early. As a result, no session
replay data is captured until the device regains connectivity, which may happen much
later or not at all, leading to data loss.
Reduce Session Replay capture stutters by moving screenshot scheduling off display refresh callbacks and onto run loop activity.
The screenshot path is unchanged, so visual fidelity stays on the existing rendering, masking, scale, and video pipeline. The change only adjusts when we ask for screenshots. Session Replay now uses a run loop observer instead of a display link, captures after the run loop has processed UI work, treats tracking-mode run loop activity as interaction work, and keeps adaptive backoff for non-interaction captures that are themselves slow.
Stacked on #8041, which carries the video-assembly fixes (frame-slot rendering, retained previous-frame anchor, half-open segment windows, empty-segment dropping) that this scheduler relies on once capture intervals stretch past the configured frame interval. Land #8041 first; this PR then retargets
mainautomatically.Because the run loop observer — unlike a display link — keeps firing while the app is backgrounded, resume handling is reworked alongside the scheduler: session-mode (network) pause/resume is split from lifecycle pause/resume, and the integration tracks the application pause state so a connectivity reconnect while backgrounded clears the network-pause flag without restarting capture.
Suggested review order, by concern:
SentrySessionReplay.captureFrameIfNeeded(stage order is documented in its headerdoc) plus theCapturePacingconstants and adaptive-interval logic.lockproperty.resume()vsresumeSessionMode(restartCaptureScheduler:)and the reachability guard inSentrySessionReplayIntegration.SentrySessionReplayCaptureGuard(new file) is a stateless view-hierarchy activity detector extracted for cohesion. The display-link eranewFrame(_:)test entry point is gone; tests drive capture through the typedcaptureFrameForTesting()hook.Latest physical-device comparison from the benchmark app on iPhone 15 running iOS 26.5, using Release builds:
This PR:
Sample App Replay: https://sentry-sdks.sentry.io/explore/replays/8593836f355243c8863222a12493cb21
Benchmark App Replay: https://sentry-sdks.sentry.io/explore/replays/acc9cfbc692e4d2788b262da0e1fbd80
Main:
Sample App Replay: https://sentry-sdks.sentry.io/explore/replays/2c737a44a1404cf0928a07c4e7d43ff8
Benchmark App Replay: https://sentry-sdks.sentry.io/explore/replays/e4891955c9bf4c539d111ebf268a11df
Closes #6885