Skip to content

perf(session-replay): Reduce capture stutters#7851

Open
romtsn wants to merge 33 commits into
fix/replay-video-assemblyfrom
perf-session-replay-capture-backoff
Open

perf(session-replay): Reduce capture stutters#7851
romtsn wants to merge 33 commits into
fix/replay-video-assemblyfrom
perf-session-replay-capture-backoff

Conversation

@romtsn

@romtsn romtsn commented Apr 29, 2026

Copy link
Copy Markdown
Member

Reduce Session Replay capture stutters by moving screenshot scheduling off display refresh callbacks and onto run loop activity.

The screenshot path is unchanged, so visual fidelity stays on the existing rendering, masking, scale, and video pipeline. The change only adjusts when we ask for screenshots. Session Replay now uses a run loop observer instead of a display link, captures after the run loop has processed UI work, treats tracking-mode run loop activity as interaction work, and keeps adaptive backoff for non-interaction captures that are themselves slow.

Stacked on #8041, which carries the video-assembly fixes (frame-slot rendering, retained previous-frame anchor, half-open segment windows, empty-segment dropping) that this scheduler relies on once capture intervals stretch past the configured frame interval. Land #8041 first; this PR then retargets main automatically.

Because the run loop observer — unlike a display link — keeps firing while the app is backgrounded, resume handling is reworked alongside the scheduler: session-mode (network) pause/resume is split from lifecycle pause/resume, and the integration tracks the application pause state so a connectivity reconnect while backgrounded clears the network-pause flag without restarting capture.

Suggested review order, by concern:

  1. Pacing policySentrySessionReplay.captureFrameIfNeeded (stage order is documented in its headerdoc) plus the CapturePacing constants and adaptive-interval logic.
  2. Scheduler mechanics and segment bookkeeping — run loop observer install/teardown, pending/pause segment handling. The two-domain threading model (lock-guarded vs main-thread-confined pacing state) is documented on the lock property.
  3. Resume pathsresume() vs resumeSessionMode(restartCaptureScheduler:) and the reachability guard in SentrySessionReplayIntegration.
  4. SentrySessionReplayCaptureGuard (new file) is a stateless view-hierarchy activity detector extracted for cohesion. The display-link era newFrame(_:) test entry point is gone; tests drive capture through the typed captureFrameForTesting() hook.

Latest physical-device comparison from the benchmark app on iPhone 15 running iOS 26.5, using Release builds:

Metric Main This PR Delta
Average FPS 55.1 58.4 +3.3
Sampled frames 1,654 1,752 +98
Slow frames 4.1% (67) 1.4% (25) -2.7pp (-42)
Frozen frames 0 0 0
p50 frame 16.6ms 16.6ms 0ms
p90 frame 16.6ms 16.6ms 0ms
p99 frame 74.9ms 25.7ms -49.2ms
Max frame 107.1ms 82.8ms -24.3ms

This PR:

Session Replay physical-device benchmark results for this PR: 58.4 FPS, 1.4% slow frames, p99 25.7ms

Sample App Replay: https://sentry-sdks.sentry.io/explore/replays/8593836f355243c8863222a12493cb21
Benchmark App Replay: https://sentry-sdks.sentry.io/explore/replays/acc9cfbc692e4d2788b262da0e1fbd80

Main:

Session Replay physical-device benchmark results on main: 55.1 FPS, 4.1% slow frames, p99 74.9ms

Sample App Replay: https://sentry-sdks.sentry.io/explore/replays/2c737a44a1404cf0928a07c4e7d43ff8
Benchmark App Replay: https://sentry-sdks.sentry.io/explore/replays/e4891955c9bf4c539d111ebf268a11df

Closes #6885

@codecov

codecov Bot commented Apr 29, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 92.62473% with 34 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.678%. Comparing base (4ddd96e) to head (e55e247).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...tegrations/SessionReplay/SentrySessionReplay.swift 91.262% 27 Missing ⚠️
...Tools/ViewCapture/SentryViewSubtreeTraversal.swift 85.365% 6 Missing ⚠️
...essionReplay/SentrySessionReplayCaptureGuard.swift 94.736% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@                       Coverage Diff                       @@
##           fix/replay-video-assembly     #7851       +/-   ##
===============================================================
+ Coverage                     87.633%   87.678%   +0.044%     
===============================================================
  Files                            559       562        +3     
  Lines                          32289     32626      +337     
  Branches                       13223     13394      +171     
===============================================================
+ Hits                           28296     28606      +310     
- Misses                          3944      3971       +27     
  Partials                          49        49               
Files with missing lines Coverage Δ
...ntryTestUtils/Sources/TestDisplayLinkWrapper.swift 88.235% <ø> (-0.806%) ⬇️
...rces/Swift/Core/Protocol/SentryRedactOptions.swift 100.000% <ø> (ø)
...Core/Tools/ViewCapture/SentryUIRedactBuilder.swift 96.640% <100.000%> (-0.204%) ⬇️
...rations/Performance/SentryDisplayLinkWrapper.swift 100.000% <ø> (ø)
...egrations/Screenshot/SentryScreenshotOptions.swift 92.156% <ø> (ø)
...tegrations/SessionReplay/SentryReplayOptions.swift 97.175% <ø> (ø)
...SessionReplay/SentrySessionReplayIntegration.swift 90.064% <100.000%> (+0.328%) ⬆️
...y/SentrySessionReplayRunLoopCaptureScheduler.swift 100.000% <100.000%> (ø)
Sources/Swift/SentryDependencyContainer.swift 96.888% <100.000%> (+0.013%) ⬆️
...t/SentrySwiftUI/Preview/PreviewRedactOptions.swift 100.000% <ø> (ø)
... and 3 more

... and 3 files with indirect coverage changes


Continue to review full report in Codecov by Harness.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4ddd96e...e55e247. Read the comment docs.

Comment thread Sources/Swift/Integrations/SessionReplay/SentrySessionReplay.swift Outdated
@philprime

Copy link
Copy Markdown
Member

I quickly glanced over the changes and while I believe it to be viable (with the adoption of the existing HangTracker.swift) we need to do a full end-to-end verification with sample apps having a lot of tracked interactions, e.g. lots of scrolling, and compare it to sample apps without any activity in the UI.

My main concern is that we are not taking screenshots at a one second interval anymore, but instead the interval can be longer than that (less than a second is debounced).

In that case we need to look into two topics:

  1. Extrapolating screenshots to reach the one second interval (for 1 FPS), i.e. if the time between screenshot A and screenshot B is more than one second we need to duplicate screenshot A.
  2. Dynamic frame rates in MPEG containers, i.e. if the screenshots are taken in variable intervals larger than one second, we don't duplicate/extrapolate but instead the containers include information that the next frame will be in e.g. 2.4 seconds

@romtsn

romtsn commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

Follow-up pushed in 1a2e9444b.

This implements option 1 from Phil’s note: the video writer now holds the last captured bitmap across missing constant-FPS frame slots, including the tail of the requested replay window, so deferred/backed-off captures no longer compress replay time. I also fixed video duration calculation for frame rates above 1 FPS.

For Noah’s run-loop comment, the capture path skips tracking/interactive run-loop modes via the current run-loop mode check; this is covered by testCaptureRunLoopObserver_whenRunLoopIsTracking_shouldNotCapture.

Local verification: make test-catalyst ONLY_TESTING=SentrySessionReplayTests,SentryOnDemandReplayTests,SentryVideoFrameProcessorTests passes 67 tests, 0 failures.

@romtsn

romtsn commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

Additional production-readiness evidence for Phil’s timing concern:

  • Cross-checked Android: sentry-java already implements the same extrapolation one layer above SimpleVideoEncoder, in ReplayCache.createVideoOf. It walks the segment timeline at 1000 / frameRate intervals and reuses the last known frame when no new screenshot exists for that slot. SimpleMp4FrameMuxer then writes constant-FPS presentation timestamps by output frame index.
  • Cocoa now mirrors that model in SentryVideoFrameProcessor: we still write a constant-FPS MP4, but we hold the last bitmap across skipped output slots instead of letting sparse captures compress video time.
  • Temp E2E smoke build: copied the benchmark app to /tmp, pointed its SPM dependency at /Users/romtsn/Workspace/sentry-cocoa, set the PR DSN, and built Release for iOS 26.4.1 Simulator. Xcode resolved Sentry from the local checkout and xcodebuild succeeded.
  • Temp E2E smoke run: launched the benchmark on iPhone 17 / iOS 26.4.1 Simulator with replay on and replay off automation. Both 15s scrolling runs completed, persisted results, and reported 0 frozen frames.

I would not use the Simulator timing values as proof of the performance improvement; they are not comparable to the physical-device before/after benchmark in the PR description. This only verifies integration/build/run behavior for the current branch. Physical-device benchmarking remains the right validation for final performance confidence.

@romtsn

romtsn commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

Pushed 6c8b177 to tighten skipped-frame extrapolation: sparse capture timestamps now use ceil when mapped into constant-FPS slots, so a fractional gap such as 2.4s is held through the next frame slot instead of rounding down and shortening the replay. Covered by testProcessFrames_WhenVideoEndHasFractionalGap_ShouldNotCompressDuration; replay-focused Catalyst suite passed locally (68 tests).

@romtsn romtsn marked this pull request as ready for review June 10, 2026 14:22
@romtsn romtsn added the ready-to-merge Use this label to trigger all PR workflows label Jun 10, 2026
Comment thread Sources/Swift/Integrations/SessionReplay/SentrySessionReplay.swift Outdated
Comment thread Sources/Swift/Integrations/SessionReplay/SentrySessionReplay.swift
Comment thread Sources/Swift/Integrations/SessionReplay/SentrySessionReplay.swift Outdated
Comment thread Sources/Swift/Integrations/SessionReplay/SentryOnDemandReplay.swift Outdated
Comment thread Sources/Swift/Integrations/SessionReplay/SentrySessionReplay.swift Outdated
@sentry

sentry Bot commented Jun 10, 2026

Copy link
Copy Markdown

📲 Install Builds

iOS

🔗 App Name App ID Version Configuration
SDK-Size io.sentry.sample.SDK-Size 9.18.0 (1) Release

⚙️ sentry-cocoa Build Distribution Settings

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Performance metrics 🚀

  Plain With Sentry Diff
Startup time 1221.98 ms 1264.43 ms 42.45 ms
Size 24.14 KiB 1.19 MiB 1.17 MiB

Baseline results on branch: fix/replay-video-assembly

Startup times

Revision Plain With Sentry Diff
4832942 1230.43 ms 1265.88 ms 35.45 ms
f98aa1c 1229.22 ms 1259.48 ms 30.26 ms
f17e4d9 1224.52 ms 1255.33 ms 30.81 ms

App size

Revision Plain With Sentry Diff
4832942 24.14 KiB 1.18 MiB 1.16 MiB
f98aa1c 24.14 KiB 1.18 MiB 1.16 MiB
f17e4d9 24.14 KiB 1.18 MiB 1.16 MiB

Previous results on branch: perf-session-replay-capture-backoff

Startup times

Revision Plain With Sentry Diff
14db94d 1229.34 ms 1261.52 ms 32.18 ms
c28cabb 1228.83 ms 1255.35 ms 26.52 ms
31d64de 1224.31 ms 1257.52 ms 33.21 ms
9933251 1233.05 ms 1263.65 ms 30.60 ms

App size

Revision Plain With Sentry Diff
14db94d 24.14 KiB 1.19 MiB 1.17 MiB
c28cabb 24.14 KiB 1.19 MiB 1.17 MiB
31d64de 24.14 KiB 1.19 MiB 1.16 MiB
9933251 24.14 KiB 1.19 MiB 1.16 MiB

Comment thread Sources/Swift/Integrations/SessionReplay/SentrySessionReplay.swift Outdated
Comment thread Sources/Swift/Integrations/SessionReplay/SentryOnDemandReplay.swift Outdated
Comment thread Sources/Swift/Integrations/SessionReplay/SentrySessionReplay.swift Outdated
Comment thread Sources/Swift/Integrations/SessionReplay/SentrySessionReplay.swift Outdated
Comment thread Sources/Swift/Integrations/SessionReplay/SentrySessionReplay.swift
Comment thread Sources/Swift/Integrations/SessionReplay/SentrySessionReplay.swift
Comment thread Sources/Swift/Integrations/SessionReplay/SentrySessionReplay.swift Outdated
Comment thread Sources/Swift/Integrations/SessionReplay/SentrySessionReplay.swift
Comment thread Sources/Swift/Integrations/SessionReplay/SentrySessionReplay.swift Outdated
Comment thread Sources/Swift/Integrations/SessionReplay/SentryOnDemandReplay.swift Outdated
Comment thread Sources/Swift/Integrations/SessionReplay/SentrySessionReplay.swift Outdated
Comment thread Sources/Swift/Integrations/SessionReplay/SentrySessionReplayIntegration.swift Outdated
Comment thread Sources/Swift/Integrations/SessionReplay/SentryVideoFrameProcessor.swift Outdated
Comment thread Sources/Swift/Integrations/SessionReplay/SentrySessionReplay.swift
Comment thread Sources/Swift/Integrations/SessionReplay/SentrySessionReplay.swift Outdated
Comment thread Sources/Swift/Integrations/SessionReplay/SentrySessionReplay.swift Outdated
Comment thread Sources/Swift/Integrations/SessionReplay/SentryVideoFrameProcessor.swift Outdated
romtsn added 27 commits June 19, 2026 15:48
Move activity scanning onto SentryViewSubtreeTraversal so capture guard checks share the same excluded-subtree handling as view capture. Keep excluded subtrees from triggering interaction backoff during replay capture.
Use an enum with static helpers for SentrySessionReplayCaptureGuard and remove the unused stored instance from SentrySessionReplay.
Replace the local nanoseconds-per-second constant with NSEC_PER_SEC when calculating screenshot capture duration.
Promote buffer replays to full session replay on the main thread so lastScreenshotAt remains main-thread confined. Add coverage for captureReplay from a background queue.
Rotate the capture scheduler token on every observer start so a delayed stop from an older observer generation cannot remove the active observer.

Guard full-session state with the existing replay lock so background replay capture and main run-loop capture read a consistent mode.
Guard resume starts with a scheduler generation so a later pause can invalidate work queued to the main thread.

Run scheduler start and stop on the main thread synchronously to keep observer installation and removal ordered.
Compute excluded view class patterns once per subtree traversal instead of rebuilding the set for every visited view.
Avoid synchronously hopping to the main thread from SentrySessionReplay deinit, while keeping normal pause and resume scheduler stops ordered.
This reverts commit eb9eb9e.

Keep deinit cleanup consistent with the explicit session replay lifecycle paths.
Do not move the next screenshot deadline while full-session replay is paused. This lets replay capture immediately after resume when the configured interval already elapsed during the pause.
Return whether the run-loop observer was installed and clear replay scheduler state when observer creation fails. This avoids reporting replay as running when no capture callback can fire.
Use the capture scheduler token itself to represent running state and remove the duplicate boolean. Keep observer creation failure as a local scheduler no-op, matching the existing HangTracker approach.
Keep the original explanation for using type descriptions in the shared view subtree traversal helper.
Extract the default CameraUI subtree exclusion pattern into a named constant.
@romtsn romtsn force-pushed the perf-session-replay-capture-backoff branch from 942a18b to 5f74b10 Compare June 19, 2026 13:49
Comment on lines +177 to +181
let resumeDate = dateProvider.date()
let schedulerGeneration = lock.synchronized { () -> Int? in
if _isFullSession && isSessionPaused {
return nil
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The lifecycle resume() method incorrectly checks the network-related isSessionPaused flag, preventing session replay from restarting when the app foregrounds while offline.
Severity: HIGH

Suggested Fix

Decouple the lifecycle resume logic from the network pause state. Remove the if _isFullSession && isSessionPaused check from the resume() method. This will allow the capture scheduler to restart when the app is foregrounded, regardless of the current network connectivity status, ensuring the system is ready to capture once online.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: Sources/Swift/Integrations/SessionReplay/SentrySessionReplay.swift#L177-L181

Potential issue: A race condition between app lifecycle and network connectivity events
can cause session replay to stop recording. If the app goes offline, is backgrounded,
and then foregrounded while still offline, the capture scheduler fails to restart. This
occurs because the lifecycle `resume()` method incorrectly checks the `isSessionPaused`
flag (which is set due to network status) and returns early. As a result, no session
replay data is captured until the device regains connectivity, which may happen much
later or not at all, leading to data loss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-to-merge Use this label to trigger all PR workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Low overhead session replay with runloop observers

3 participants