Episode Recorder#810
Conversation
Record every value drawn by an enabled variation's sampler so downstream sensitivity-analysis tooling has the input factors that produced each episode. This adds a sample-observer layer the recorder builds on: SamplerBase gains add_listener/remove_listener and a sample() template method that notifies listeners around the concrete _sample(); VariationBase gains add_sample_listener/remove_sample_listener, re-binding subscriptions onto the sampler rebuilt by apply_cfg so they survive cfg swaps. ArenaEnvBuilder constructs a VariationRecorder, attaches it after Hydra overrides but before any sampling, and exposes it on env.unwrapped. The recorder is stashed on the builder rather than the env cfg because the configclass __post_init__ deep-copies its attributes, which would orphan the listener closures. Signed-off-by: alex <amillane@nvidia.com>
…riations_recording
…riations_recording
…riations_recording
…ecording_per_episode
Move the inline RecordVideo / CameraObsVideoRecorder wrapping out of policy_runner and eval_runner into a single video_recording module (VideoRecordingCfg + wrap_env_for_video), so the gym-wrapper plumbing lives in one place instead of being duplicated across the two runners. Signed-off-by: alex <amillane@nvidia.com>
- Move video_recording.py to isaaclab_arena/utils/ and drop its module docstring. - Rename VideoRecordingCfg fields and the corresponding CLI flags: --video -> --record_viewport_video, --camera_video -> --record_camera_video, --video_dir -> --video_base_dir. - Write videos into a reverse-dated run subdirectory (timestamped_run_dir), shared across all jobs in an eval run, mirroring Isaac Lab's log layout. - Guard _resolve_video_length when both num_steps and num_episodes are None. - Tidy wrap_env_for_video docstring args. Signed-off-by: alex <amillane@nvidia.com>
Use underscore-only flag names (--record_viewport_video, --record_camera_video, --video_base_dir) to match the convention used by every other CLI argument. Signed-off-by: alex <amillane@nvidia.com>
…_results_2 Signed-off-by: alex <amillane@nvidia.com>
There was a problem hiding this comment.
🤖 Isaac Lab-Arena Review Bot
Summary
Introduces an EpisodeRecorderManager (Isaac Lab ManagerBase + pluggable EpisodeRecorderTermCfg terms), centralizes per-env episode-index tracking on the env, and reworks VariationRecord to attribute draws per (env_id, episode). The index/attribution logic is consistent end-to-end — draws are advanced before reset events fire (tagged with the episode they begin), records are captured before super()._reset_idx (so the finished episode is still intact), and the camera recorder reads get_episode_index()-1 so video episode numbers stay in lockstep with the JSONL records. Old VariationRecord rendering helpers were removed cleanly with no remaining callers. Nicely covered by tests. Two minor improvements below; nothing blocking.
Findings
🔵 Improvement: common_terms.py:29 — the per-episode timestamp uses a naive local-time value; UTC is unambiguous for provenance across machines/timezones.
🔵 Improvement: eval_runner.py:328 — per-episode JSONL is now written unconditionally on every run (no flag gate). Worth confirming this always-on default is intended.
Test Coverage
Good. test_episode_recorder.py exercises core terms (with deterministic success/failure per env), variation recording, and a custom term, using the inner/outer run_simulation_app_function pattern with deferred imports — consistent with the unmarked (Phase 1) sim tests in test_variation_recorder.py. The variation-recorder unit tests cover per-(env, episode) attribution, build-time-applies-to-all, and the duplicate-draw assert.
Verdict
Ship it
Greptile SummaryThis PR introduces an
Confidence Score: 3/5The episode recording path writes incorrect data before any simulation steps run, which would corrupt evaluation JSONL output on every run. The isaaclab_arena/environments/isaaclab_arena_manager_based_env.py (_reset_idx override and _first_reset guard) and isaaclab_arena/variations/variation_recorder.py (on_sample closure episode-index attribution during init reset). Important Files Changed
Sequence Diagram%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant Caller as policy_runner / eval_runner
participant Env as IsaacLabArenaManagerBasedRLEnv
participant ER as EpisodeRecorderManager
participant VR as VariationRecorder
participant Disk as JSONL File
Caller->>Env: make_registered() [__init__]
Note over Env: _first_reset = True
Env->>VR: bind_env(self)
Env->>Env: super().__init__() → _reset_idx([all])
Note over Env: _first_reset guard → skip record, set False
Caller->>ER: set_output_path(path) [creates/clears file]
Caller->>Env: env.reset() ← ManagerBasedRLEnv.reset() calls _reset_idx
Note over Env: _first_reset=False ⚠️ phantom record fires
Env->>ER: record_pre_reset([all envs])
ER->>Disk: append phantom zero-length records
Env->>Env: _advance_episode_indices (counters → 1)
loop Each step() with termination
Env->>Env: _reset_idx(terminated_envs)
Env->>ER: record_pre_reset(env_ids)
ER->>ER: call term funcs (core + variations + custom)
ER->>Disk: append real episode record
Env->>VR: on_sample callback (variation draw)
VR->>VR: record_runtime_sample(env_id, episode_idx)
Env->>Env: _advance_episode_indices
end
Caller->>Caller: build_report(output_dir) → index.html
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant Caller as policy_runner / eval_runner
participant Env as IsaacLabArenaManagerBasedRLEnv
participant ER as EpisodeRecorderManager
participant VR as VariationRecorder
participant Disk as JSONL File
Caller->>Env: make_registered() [__init__]
Note over Env: _first_reset = True
Env->>VR: bind_env(self)
Env->>Env: super().__init__() → _reset_idx([all])
Note over Env: _first_reset guard → skip record, set False
Caller->>ER: set_output_path(path) [creates/clears file]
Caller->>Env: env.reset() ← ManagerBasedRLEnv.reset() calls _reset_idx
Note over Env: _first_reset=False ⚠️ phantom record fires
Env->>ER: record_pre_reset([all envs])
ER->>Disk: append phantom zero-length records
Env->>Env: _advance_episode_indices (counters → 1)
loop Each step() with termination
Env->>Env: _reset_idx(terminated_envs)
Env->>ER: record_pre_reset(env_ids)
ER->>ER: call term funcs (core + variations + custom)
ER->>Disk: append real episode record
Env->>VR: on_sample callback (variation draw)
VR->>VR: record_runtime_sample(env_id, episode_idx)
Env->>Env: _advance_episode_indices
end
Caller->>Caller: build_report(output_dir) → index.html
Reviews (5): Last reviewed commit: "Merge branch 'main' into alex/feature/re..." | Re-trigger Greptile |
There was a problem hiding this comment.
🤖 Isaac Lab-Arena Review Bot
Summary
The EpisodeRecorderManager design is clean and extensible: terms compose nicely, the per-env episode-index ownership lives on the env, and the variation-recording attribution (advance-before-super so reset-mode draws are tagged with the episode they begin) is carefully reasoned and well-tested. The inner/outer sim-test pattern is used correctly across the three new tests. There is, however, one blocking bug in the eval CLI, plus a couple of scope/design questions worth resolving before merge.
Design, Boundaries & Scope
Scope of the diff. This PR is titled Episode Recorder, but the diff against the base branch also adds isaaclab_arena/progress_tracking/ (~700 lines + 762-line test), isaaclab_arena_examples/.../review_gui/ (~900 lines + 336-line test), and the intent_compiler/spec_io rework — none of which are on alex/feature/evaluation_visualization. Is that intended to land through this PR, or are these meant to be reviewed/merged as separate stack entries? As one 71-file diff it is hard to review the recorder on its own merits. If these are genuinely separate features, splitting them out would make each reviewable.
Always-on recording with no opt-out. ArenaEnvBuilder unconditionally composes the core + variations terms, and _reset_idx fires record_pre_reset on every reset for every env — so every Arena env (teleop, datagen, training, not just eval) now pays the per-reset cost and accumulates a dict per finished episode in _records, with no flag to disable it and no bound within an env's lifetime. For eval this is bounded and the point; for other env uses it is pure overhead. Worth a thought: should episode recording be opt-in (or at least the always-appended buffer be bounded/disable-able) so it doesn't silently change the default path for non-eval workflows?
Findings
See the inline comment for the blocking issue.
Test Coverage
Good. The three new tests (test_core_terms, test_variations_recorded, test_custom_term) cover the built-in terms, variation attribution, and custom terms, with substantive assertions (per-env episode_in_env increment, deterministic success/failure via fixed poses, exact variation sample values, custom-field derivation). They correctly use the inner/outer run_simulation_app_function pattern; since that helper runs in-process, Phase 1 (no marker) is the right placement, consistent with the existing test_variation_recorder sim tests. The new test_eval_runner_graph_spec_with_variation is correctly marked with_subprocess. No gap to flag — though note that none of these exercise the duplicate-argument CLI path below, which is why it slipped through.
Verdict
Minor fixes needed (the CLI duplicate is a hard blocker, but it's a one-line fix).
cvolkcvolk
left a comment
There was a problem hiding this comment.
Thanks for adding this! Lgtm. Only minor comments
There was a problem hiding this comment.
🤖 Isaac Lab-Arena Review Bot
Summary
Adds an EpisodeRecorderManager (term-based, mirroring the metrics manager) that accumulates per-episode records and writes them as JSONL, wires it through the env/builder/runners, threads env_ids through the sampler → variation-recorder so each draw is attributed to its (env, episode), and surfaces the records in the HTML report. The design is clean and well-tested. One critical issue: the eval-runner CLI now registers two arguments twice, which crashes argument parsing; plus one design question on always-on recording.
Design, Boundaries & Scope
_compose_episode_recorders_cfg wires the core + variations terms in unconditionally, so every env builds an EpisodeRecorderManager and record_pre_reset appends a record per env on every reset — yet only the eval/policy runners ever call write(), and _records is never cleared. For long-running non-eval paths (RL training, teleop, interactive) this accumulates one dict per env per episode for the lifetime of the env, never persisted and never freed. Worth considering whether recording should be opt-in (a cfg flag, or attached only when an output path is configured), or at least clear _records after write(). Raised inline.
Findings
See inline comments.
Test Coverage
New test_episode_recorder.py covers the core terms, variation recording, and a custom term via the inner/outer run_simulation_app_function pattern; since it runs in-process, headless, without cameras, it correctly lands in Phase 1 (matching the sibling HDR test in test_variation_recorder.py). The variation-recorder tests are extended for the new per-(env, episode) attribution, and the camera-recorder tests are updated for the centralized episode index. Coverage is solid.
Verdict
Minor fixes needed — the duplicate CLI argument block is a hard startup crash and must be removed; the always-on recording question is worth a look.
There was a problem hiding this comment.
🤖 Isaac Lab-Arena Review Bot
Summary
This PR adds an EpisodeRecorderManager (manager + term pattern, mirroring metrics) that accumulates per-episode records and writes them as JSONL, surfaces them in the evaluation report, and renames the --video_base_dir CLI flag to --output_base_dir. The term-based design is clean and consistent with existing Arena conventions, and the test coverage is solid (core/variation/custom terms, plus the per-(env, episode) VariationRecord keying). Two things need attention before merge: a duplicate CLI argument that crashes the eval runner at startup, and the recorder being always-on for every env (including training) with an unbounded, never-flushed buffer.
Design, Boundaries & Scope
The recorder is wired in unconditionally: ArenaEnvBuilder always composes the core + variations terms, load_managers always constructs the manager, and _reset_idx calls record_pre_reset on every reset. Because IsaacLabArenaManagerBasedRLEnv is the shared eval and training env, every workflow now appends one dict per env per episode to _records for the life of the env — and write() never clears the buffer, with no cap. In eval this is bounded (short runs, written per rebuild), but a long RL training run resets millions of times and never calls write(), so the buffer grows without bound for data nobody consumes. This trips the conservative-defaults / lean-by-default invariants. Could recording be opt-in (or the buffer be flushed/capped) so the cost is paid only where the records are actually written out? Raised inline on record_pre_reset.
Findings
See inline comments.
Test Coverage
Good. test_episode_recorder.py follows the inner/outer run_simulation_app_function pattern with deferred imports and returns True; it exercises core terms (with deterministic per-env success/failure), variation recording, and a custom term. No camera/subprocess marker is needed — consistent with other run_simulation_app_function tests (e.g. test_run_time_variations.py), so it lands in Phase 1 correctly. The VariationRecord per-(env, episode) keying and the double-draw assertion are also covered.
Verdict
Minor fixes needed — the duplicate CLI arg is a hard blocker (eval runner won't start); the always-on unbounded buffer is worth resolving or consciously deferring.
There was a problem hiding this comment.
🤖 Isaac Lab-Arena Review Bot
Summary
Adds an EpisodeRecorderManager (the manager-with-terms pattern) that accumulates per-episode metadata + variation samples, persists them as JSONL on request, and surfaces them in the HTML evaluation report; reworks VariationRecord to track draws per (env, episode); and renames the --video_base_dir CLI flag to --output_base_dir. The design is clean and the new sim tests (core / variations / custom-term, via the inner/outer pattern) are solid. Two things stand out: the recorder does its full per-episode work on every reset even when nothing is being written, and a few hunks look out of scope.
Findings
🟡 Warning: recording/episode_recorder_manager.py:88 — record_pre_reset runs every term + json.dumps for every env on every reset, but records are only ever written to disk (no in-memory consumer). When no output path is set — the default for training, teleop, and any non-recording rollout, where this manager is still constructed and _reset_idx still calls it — that's wasted work on everyone's hot path. Suggest short-circuiting when no path is set.
🟡 Warning: isaaclab_arena_environments/robolab/*_linked.yaml — the added episode_length_s: 20.0 (4 robolab configs) changes the runtime episode length of production environments and looks unrelated to the episode recorder. Intended as part of this PR, or a stray change for its own MR?
🔵 Improvement: recording/episode_recorder_manager.py:134 — stale write() reference (no such method); same dangling mention in report.py.
Heads-up (not a code finding): the --video_base_dir → --output_base_dir rename, with default /eval/videos → /eval/output, is a breaking change to a public CLI flag. Docs were updated in this PR, so this is just a note for downstream users/scripts.
Test Coverage
test_episode_recorder.py exercises the core term, variation recording, and a custom term, each through run_simulation_app_function with the inner/outer pattern (inner _test_* returns True, outer asserts). Since run_simulation_app_function runs an in-process persistent sim app and these use no cameras, leaving them unmarked (Phase 1) matches the convention used by test_close_door / test_object_pose_randomization. The camera- and variation-recorder tests were updated to the centralized episode index. Coverage looks good.
Verdict
Minor fixes needed
There was a problem hiding this comment.
🤖 Isaac Lab-Arena Review Bot
Summary
This PR adds a manager-based EpisodeRecorderManager that streams per-episode metadata (core fields + variation samples + user-registered terms) to JSONL, wires it into the env via a _reset_idx override, and surfaces the records in the HTML evaluation report. The per-episode-index bookkeeping and its careful pre/post-super()._reset_idx ordering (record the just-finished episode, then advance, then let reset events tag the new episode) is well thought through and the variation attribution lines up cleanly with the video filenames. The design is solid; my notes below are a couple of small things plus one question about default-path cost.
Design, Boundaries & Scope
- Recording fires on every reset for every env, even when nothing is being written.
IsaacLabArenaManagerBasedRLEnv._reset_idxalways callsrecord_pre_reset, and the core/variations terms are always composed onto every RL env. With nooutput_pathset, every term still runs and every record isjson.dumps-validated per env per reset before being discarded by the_append_recordno-op. Since this is the same env class used for RL training (frequent resets, largenum_envs), this adds work to the default path for everyone for no benefit. See the inline suggestion to short-circuit when no output path is configured. (Conservative-defaults / lean-by-default invariant.) - Out-of-scope config churn? Four
isaaclab_arena_environments/robolab/*_linked.yamlfiles change here: three addepisode_length_s: 20.0andbutter_raisin_box_grey_bin_linked.yamlmerely reorders an existingepisode_length_sline (pure churn). These don't seem related to the episode recorder — were they intended for this PR, or should they be split out?
Findings
🟡 Warning: isaaclab_arena/recording/episode_recorder_manager.py — episode recorder terms run unconditionally per reset even when no output path is set (see inline; adds avoidable per-reset overhead to the default/training path).
🔵 Improvement: isaaclab_arena/visualization/report.py:23 — stale reference to a EpisodeRecorderManager.write method that doesn't exist.
Test Coverage
Good coverage: test_episode_recorder.py adds three sim tests (core terms, variation recording, custom term) using the inner/outer run_simulation_app_function pattern with deferred imports, and asserts substantive things (per-env episode_in_env increments, deterministic success/failure per env, exact key schemas, variation values). test_variation_recorder.py is updated for the new per-(env, episode) VariationRecord API including the duplicate-draw assertion and build-time-applies-to-all semantics. No camera/subprocess markers are needed (matches the existing test_hdr_... test in the same file). Nothing missing.
Verdict
Minor fixes needed
Summary
Introduce an
EpisodeRecorderManagerthat records per-episode data.Detailed description
envthat accumulates per-episode data and optionally writes it to disk.EpisodeRecorderTermCfgsto record their custom data.Note
IsaacLabManagerBasedRLEnvNot done
run_in_chunks. Will be handled in later MRs.