Skip to content

SensitivityDataset: read episode_results.jsonl directly#811

Merged
cvolkcvolk merged 14 commits into
mainfrom
cvolk/refactor/sensitivity_episode_results_reader
Jun 24, 2026
Merged

SensitivityDataset: read episode_results.jsonl directly#811
cvolkcvolk merged 14 commits into
mainfrom
cvolk/refactor/sensitivity_episode_results_reader

Conversation

@cvolkcvolk

@cvolkcvolk cvolkcvolk commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Summary

Sensitivity analysis reads recorder output directly

Detailed description

  • Why: the per-episode recorder now emits an episode_results.jsonl carrying the realized variation draws, so the old hand-written factors.yaml schema plus bespoke nested episode_summary.jsonl pair is redundant.
  • Reader: added SensitivityDataset.from_episode_results, which discovers the factors from each row's variations block — a number becomes continuous, a numeric vector splits into one factor per component, a string becomes categorical (choices learned from the data). generate_report takes a single --episode_results input, with --factors to analyze a subset.
  • Simpler data model: removed the factors.yaml path (from_files), the FactorSchema wrapper, and FactorSpec.dim. It is now just FactorSpec (one factor = one theta column) and SensitivityDataset (factors + theta + x). Continuous-only sweeps fit NPE, mixed sweeps fit MNPE automatically.
  • Robustness: factors that never varied are dropped (a constant categorical otherwise crashed the MNPE fit); an all-constant input errors clearly.

… FactorSchema

- Add SensitivityDataset.from_episode_results: discover the factor schema from the
  recorder's per-episode variation draws (number -> continuous, numeric vector ->
  one scalar factor per component, string -> categorical), building theta/x directly.
- Make generate_report single-mode (--episode_results); remove the factors.yaml /
  episode_summary path (from_files, FactorSchema.from_yaml, arena_env_args helpers).
- Remove the FactorSchema dataclass: SensitivityDataset now holds factors:
  list[FactorSpec] and owns factor_columns; consumers use dataset.factors.
- Replace the two from_files tests with episode_results equivalents covering the
  continuous-first layout, integer-coded categoricals, and inferred ranges.

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
- Remove the vestigial FactorSpec.dim field: vector variations are split into
  scalar factors in from_episode_results, so a factor always spans one theta column.
- Simplify factor_columns to one-column-per-factor (continuous-first) and the
  constructor's width check to len(factors); drop the total_factor_dim arithmetic.

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
- Vector variation components are named with an opaque positional suffix (key[0], ...);
  note the follow-up to emit semantic names (e.g. camera x_right/y_down/z_forward) at the recorder.

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
arena-review-bot[bot]

This comment was marked as outdated.

…ests

- Drop factors that took a single value across all episodes (constant continuous or
  one-choice categorical) instead of warn-and-keep; a constant categorical otherwise
  crashed MNPE's mixed-density transform during fit. Assert if nothing varied.
- Update concept_sensitivity_analysis.rst to the single --episode_results flow with
  discovered schema; remove the stale factors.yaml / episode_summary docs and CLI.
- Add tests: constant-factor dropping, all-constant raises, bool->categorical,
  inconsistent factor set across rows, non-numeric vector component.

Signed-off-by: Clemens Volk <cvolk@nvidia.com>

@arena-review-bot arena-review-bot Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab-Arena Review Bot

Summary

Clean, well-scoped refactor: the sensitivity analyzer now discovers its factor schema directly from the recorder's episode_results.jsonl variations block, removing the hand-authored factors.yaml indirection, the FactorSchema wrapper, and the vestigial FactorSpec.dim. The schema-from-data inference (number → continuous, numeric vector → one factor per component, string/bool → categorical) is sensibly chosen and the constant-factor drop is a real correctness guard (a constant continuous factor would give a zero-width range and a constant categorical crashes MNPE). Tests cover the new paths well, including the negative cases. This is the kind of removal-and-simplify change the codebase benefits from.

The CLI/generate_report/from_files signature change is a breaking change to this entry point, but it's the stated purpose of the PR, the docs are updated in lockstep, and the recorder that emits this format isn't shipped yet — so downstream blast radius is minimal.

Findings

🔵 Improvement: dataset.py:315 — outcome values get no type validation or row context, unlike the careful asserts on the variations side. A malformed recorder field (e.g. a string outcome) would raise a bare ValueError/TypeError with no pointer to the offending row. Worth a small guarded conversion mirroring _flatten_variation_value so a bad outcome fails as legibly as a bad variation does.

Test Coverage

Good. The two from_files tests are replaced with from_episode_results equivalents covering vector-splitting, mixed continuous/categorical with inferred ranges/choices, constant-factor dropping, all-constant raising, bool→categorical, inconsistent factor sets, and non-numeric vector components. These are pure-CPU analysis tests (no sim), so the inner/outer run_simulation_app_function pattern correctly doesn't apply.

Verdict

Ship it

Comment thread isaaclab_arena/analysis/sensitivity/dataset.py Outdated
@cvolkcvolk cvolkcvolk marked this pull request as ready for review June 23, 2026 09:50
@greptile-apps

greptile-apps Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR replaces the two-file input schema (factors.yaml + episode_summary.jsonl) with a single episode_results.jsonl whose factor schema is auto-discovered from each row's variations block. The old hand-written FactorSchema wrapper and FactorSpec.dim field are removed; every factor is now exactly one theta column.

  • New episode_results_reader.py auto-discovers continuous/categorical factors, drops constant factors (preventing estimator crashes), warns on unevenly-sampled categoricals, and validates the recorder contract across all rows.
  • dataset.py simplified: FactorSchema and from_files are removed; SensitivityDataset is now a pure in-memory container wrapping a list[FactorSpec] plus the two tensors; FactorSpec.range is now a flat (low, high) tuple instead of a list of pairs.
  • Tests rewritten to cover the new reader's schema-discovery paths, constant-factor dropping, imbalance warnings, and error handling for malformed rows.

Confidence Score: 5/5

Safe to merge — the refactoring is self-contained, all call sites are updated consistently, and the test suite comprehensively covers the new reader paths.

The schema-discovery logic is correct end-to-end: vector expansion, constant-factor dropping, type-consistency checks across rows, and the null/non-dict variations guard all behave as designed. The range representation change (list-of-pairs → flat tuple) is applied uniformly across dataset.py, analyzer.py, plotting.py, and the synthetic test fixture. Issues flagged in previous review threads — null variations crashing the pre-check, double min/max traversal, and the interleaved syntax error — are all resolved in the current code. No regressions or new defects identified.

No files require special attention.

Important Files Changed

Filename Overview
isaaclab_arena/analysis/sensitivity/episode_results_reader.py New file: reads episode_results.jsonl, discovers factors, drops constants, warns on imbalanced categoricals; null-variations and syntax issues from previous review threads are addressed.
isaaclab_arena/analysis/sensitivity/dataset.py FactorSchema removed; SensitivityDataset now holds a plain list[FactorSpec]; FactorSpec.range simplified from list[tuple] to a single (low, high) tuple; factor_columns recomputed inline.
isaaclab_arena/analysis/sensitivity/analyzer.py Adapts to the new flat range tuple: range[0]/range[1] replaces range[0][0]/range[0][1]; dataset.schema.factors replaced by dataset.factors throughout.
isaaclab_arena/tests/test_sensitivity_analysis.py Tests fully rewritten around dataset_from_episode_results; covers schema discovery, vector expansion, constant dropping, bool categoricals, imbalance warnings, and error paths.
isaaclab_arena/tests/sensitivity_synthetic.py FactorSchema removed; FactorSpec.range now takes the raw tuple directly; SensitivityDataset constructed with a plain factor list.
isaaclab_arena/analysis/sensitivity/generate_report.py CLI updated: --factors_yaml and --episode_summary replaced by --episode_results and optional --factors; delegates to dataset_from_episode_results.
isaaclab_arena/analysis/sensitivity/plotting.py Minimal adaptation: factor.range[0] unpacking replaced by factor.range; dataset.schema.factors replaced by dataset.factors.
docs/pages/concepts/policy/concept_sensitivity_analysis.rst Docs updated to reflect single-file input, schema discovery, and new CLI flags; accurately matches the new implementation.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant User
    participant generate_report
    participant episode_results_reader
    participant SensitivityDataset
    participant SensitivityAnalyzer
    participant plot_marginals

    User->>generate_report: episode_results.jsonl + outcome_names + factor_names
    generate_report->>episode_results_reader: dataset_from_episode_results()
    episode_results_reader->>episode_results_reader: _read_rows() — parse JSONL
    episode_results_reader->>episode_results_reader: _discover_factor_values() — flatten vectors, validate rows
    episode_results_reader->>episode_results_reader: _build_factor_columns() — drop constants, infer ranges
    episode_results_reader->>episode_results_reader: _build_outcome_columns()
    episode_results_reader->>SensitivityDataset: SensitivityDataset(factors, theta, x)
    SensitivityDataset-->>generate_report: dataset
    generate_report->>SensitivityAnalyzer: SensitivityAnalyzer(dataset)
    generate_report->>SensitivityAnalyzer: fit()
    generate_report->>SensitivityAnalyzer: sample_posterior(observation)
    SensitivityAnalyzer-->>generate_report: posterior samples
    generate_report->>plot_marginals: plot_marginals(samples, dataset, observation)
    plot_marginals-->>User: figure saved to output_path
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant User
    participant generate_report
    participant episode_results_reader
    participant SensitivityDataset
    participant SensitivityAnalyzer
    participant plot_marginals

    User->>generate_report: episode_results.jsonl + outcome_names + factor_names
    generate_report->>episode_results_reader: dataset_from_episode_results()
    episode_results_reader->>episode_results_reader: _read_rows() — parse JSONL
    episode_results_reader->>episode_results_reader: _discover_factor_values() — flatten vectors, validate rows
    episode_results_reader->>episode_results_reader: _build_factor_columns() — drop constants, infer ranges
    episode_results_reader->>episode_results_reader: _build_outcome_columns()
    episode_results_reader->>SensitivityDataset: SensitivityDataset(factors, theta, x)
    SensitivityDataset-->>generate_report: dataset
    generate_report->>SensitivityAnalyzer: SensitivityAnalyzer(dataset)
    generate_report->>SensitivityAnalyzer: fit()
    generate_report->>SensitivityAnalyzer: sample_posterior(observation)
    SensitivityAnalyzer-->>generate_report: posterior samples
    generate_report->>plot_marginals: plot_marginals(samples, dataset, observation)
    plot_marginals-->>User: figure saved to output_path
Loading

Reviews (8): Last reviewed commit: "Sensitivity: fix the --factors pre-check..." | Re-trigger Greptile

Comment thread isaaclab_arena/analysis/sensitivity/dataset.py Outdated
Comment thread isaaclab_arena/analysis/sensitivity/dataset.py Outdated
cvolkcvolk and others added 3 commits June 23, 2026 12:01
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
- from_episode_results / generate_report take an optional factor_names allowlist
  (--factors); a vector variation is selected by its base name and keeps all components.
  Unset analyzes every recorded variation (unchanged default). Unknown names raise.
- Add tests for subset selection and unknown-name rejection; document --factors.

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
- Drop reStructuredText double-ticks and semicolons from the docstrings and comments we
  added; remove restated default values and estimator-internal asides.
- Keep each docstring to what the function does rather than the surrounding flow.

Signed-off-by: Clemens Volk <cvolk@nvidia.com>

@arena-review-bot arena-review-bot Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab-Arena Review Bot

Summary

This drops the hand-authored factors.yaml / episode_summary.jsonl indirection in favor of discovering the factor schema directly from the recorder's per-episode variations block, and collapses FactorSchema + FactorSpec.dim into a flat factors: list[FactorSpec] where every factor is exactly one theta column. The simplification is well-motivated and the new discovery/validation path is thorough — inconsistent factor sets, non-numeric vector components, empty/constant inputs, and unknown --factors names all fail loudly with clear messages, and the tests cover each of those branches plus the vector-split and bool→categorical cases. Nice, well-scoped cleanup.

Findings

🔵 Improvement: dataset.py:192 — type inference from JSON value type means an integer-coded categorical would be silently treated as continuous. See inline note — likely fine given the recorder emits strings for categoricals, but worth confirming/documenting the assumption.

Test Coverage

These are CPU-only analysis tests (no Isaac Sim), so the inner/outer run_simulation_app_function pattern correctly does not apply. Coverage is good: the new from_episode_results path is exercised across vector-split, mixed continuous/categorical, constant-drop, all-constant-raises, bool→categorical, inconsistent-factor-set, non-numeric-component, subset selection, and unknown-name rejection. The two removed from_files tests are replaced 1:1 with episode-results equivalents.

Verdict

Ship it

Comment thread isaaclab_arena/analysis/sensitivity/dataset.py Outdated
@cvolkcvolk cvolkcvolk changed the title Sensitivity: read episode_results.jsonl directly SensitivityDataset: read episode_results.jsonl directly Jun 23, 2026
…r assumption

- The posterior is taken relative to how the sweep drew the factors, so an unevenly sampled
  categorical leaks its sampling frequency into the result (a no-effect factor stops reading
  flat). Warn when the most-sampled choice exceeds the least by 1.5x or more.
- Document in the concept page that factors should be drawn from the prior (uniform per
  continuous range, equal episodes per categorical choice), and that balancing the draws is
  the clean fix.

Signed-off-by: Clemens Volk <cvolk@nvidia.com>

@arena-review-bot arena-review-bot Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab-Arena Review Bot

Summary

This refactor has SensitivityDataset read the recorder's episode_results.jsonl directly and discover the factor schema from each row's variations block, retiring the hand-authored factors.yaml, the FactorSchema wrapper, from_files, and FactorSpec.dim. It's a genuine net simplification — fewer types, one input path, one column per factor — and the discovery rules (number→continuous, numeric vector→one factor per component, string/bool→categorical), constant-factor dropping, and the uneven-categorical warning are all backed by focused tests. Discovery, validation, type-consistency, and the public generate_report signature all check out, and no other module references the removed symbols, so nothing downstream breaks.

Findings

🔵 Improvement: dataset.py:212_build_dataset_from_episode_rows carries a lot at once (subset validation → per-row discovery/validation → constant-drop → theta/x assembly). Optional polish: see inline.

Test Coverage

Strong. The new cases cover vector-split, mixed continuous/categorical, constant-factor dropping, the all-constant raise, bool→categorical, inconsistent factor sets across rows, non-numeric vector components, subset selection, and unknown-name rejection. All are pure-Python (no sim), so they correctly land in Phase 1 and don't need the inner/outer run_simulation_app_function pattern. The two retired from_files tests are replaced with from_episode_results equivalents.

Verdict

Ship it

Comment thread isaaclab_arena/analysis/sensitivity/dataset.py Outdated

@alexmillane alexmillane left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great.

Only real comment is the request for a e2e smoke test.

Comment thread isaaclab_arena/analysis/sensitivity/dataset.py Outdated
Comment thread isaaclab_arena/analysis/sensitivity/dataset.py Outdated
Comment thread isaaclab_arena/analysis/sensitivity/dataset.py Outdated
Comment thread isaaclab_arena/analysis/sensitivity/dataset.py Outdated
Comment thread isaaclab_arena/analysis/sensitivity/dataset.py Outdated
Comment thread isaaclab_arena/analysis/sensitivity/dataset.py Outdated
Comment thread isaaclab_arena/tests/test_sensitivity_analysis.py
…view)

- Define theta (the per-episode factor matrix, one column per factor) in the FactorSpec and
  SensitivityDataset docstrings, so "column of theta" reads on its own.
- FactorSpec.range is now a single (low, high) tuple, not a list of pairs — the list was
  vestigial from the removed per-dim handling. Updates analyzer, plotting, reader, and tests.

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
- Move from_episode_results and its helpers to a new episode_results_reader.py module, exposed
  as dataset_from_episode_results(); dataset.py is now a pure in-memory container that no longer
  knows the recorder's on-disk format.
- Break the reader into small composable helpers (_read_rows, _discover_factor_values,
  _build_factor_columns, _warn_if_unevenly_sampled, _build_outcome_columns); the entry point is
  a short composition that returns a SensitivityDataset directly.
- Update generate_report, the analyzer error message, and the tests to the new entry point.

Signed-off-by: Clemens Volk <cvolk@nvidia.com>

@arena-review-bot arena-review-bot Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab-Arena Review Bot

Summary

This replaces the hand-authored factors.yaml + episode_summary.jsonl pair with direct discovery from a single episode_results.jsonl, and moves the on-disk reader into its own episode_results_reader.py so dataset.py stays a pure in-memory container. The data model is genuinely simpler (one factor = one theta column, range inferred from data), the layering is clean (analysis stays sim-agnostic, the reader is the only thing that knows the recorder format), and the test suite covers the new paths well — vector splitting, mixed continuous/categorical, bool-as-categorical, constant dropping, inconsistent factor sets, subset selection, and unknown-name rejection. Only a couple of minor nits below.

Findings

🔵 Improvement: episode_results_reader.py:233 — the 1.5 imbalance threshold is a behavior-controlling magic number; pulling it into a named module constant would make it discoverable and tunable.
🔵 Improvement: episode_results_reader.py:223_warn_if_unevenly_sampled is only ever called with categorical string values, so the parameter type could be the more precise list[str].

Test Coverage

Thorough. The new dataset_from_episode_results paths are well exercised (including the negative cases: inconsistent factor set, non-numeric vector component, all-constant, unknown factor name). These are CPU-only analysis tests with no sim dependency, so the inner/outer run_simulation_app_function pattern correctly does not apply. The synthetic-data and from_filesfrom_episode_results migrations keep the existing recovery tests intact.

Verdict

Ship it

Comment thread isaaclab_arena/analysis/sensitivity/episode_results_reader.py Outdated
Comment thread isaaclab_arena/analysis/sensitivity/episode_results_reader.py
- Guard that a row's variations block is a dict, so a null/non-object value fails with a clear
  message instead of an AttributeError.
- Assert each outcome value is numeric or boolean with row-and-path context, matching the
  variation-side error legibility instead of a bare cast error.
- Note in the docstring that categoricals must be recorded as strings or bools, not integer
  codes (an integer is read as a continuous factor).
- Lift the imbalance-warning threshold to a named _IMBALANCE_WARN_RATIO constant.
- Add guard tests for the non-dict variations and non-numeric outcome cases.

Signed-off-by: Clemens Volk <cvolk@nvidia.com>

@arena-review-bot arena-review-bot Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab-Arena Review Bot

Summary

This is a welcome simplification: the sensitivity toolbox now discovers its factor schema directly from the recorder's episode_results.jsonl (dataset_from_episode_results), dropping the hand-authored factors.yaml, the FactorSchema wrapper, and the vestigial FactorSpec.dim. dataset.py becomes a pure in-memory container and the on-disk format lives in one well-decomposed reader module. The net is ~200 fewer lines, the layering reads cleanly (pure-Python, sim-agnostic analysis), and the new behavior is well covered by tests. Two minor robustness/consistency nits below; neither blocks.

Findings

See inline comments.

Test Coverage

Good. The new reader is exercised across vector-splitting, mixed continuous/categorical discovery, constant-factor dropping, all-constant rejection, bool→categorical, inconsistent factor sets, non-numeric vector components, subset selection, unknown-name rejection, non-dict variations, and non-numeric outcomes. These are pure-CPU analysis tests (no SimulationApp), so they correctly stay in Phase 1 and don't need the inner/outer simulation pattern. The replaced from_files tests were ported to episode_results equivalents.

Verdict

Ship it

Comment thread isaaclab_arena/analysis/sensitivity/episode_results_reader.py Outdated
Comment thread isaaclab_arena/analysis/sensitivity/episode_results_reader.py
Comment thread isaaclab_arena/analysis/sensitivity/episode_results_reader.py
cvolkcvolk and others added 2 commits June 24, 2026 16:56
Co-authored-by: arena-review-bot[bot] <290456231+arena-review-bot[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

@arena-review-bot arena-review-bot Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab-Arena Review Bot

Summary

This is a clean simplification: the sensitivity toolbox now discovers its factor schema straight from the recorder's episode_results.jsonl, dropping the hand-authored factors.yaml / FactorSchema / FactorSpec.dim machinery. The reader is well-placed in its own module so dataset.py stays a pure in-memory container, and the new behavior (constant-factor dropping, imbalance warning, subset selection) is well covered by tests.

There is one hard blocker: a duplicated line introduced by the last (suggestion) commit is a SyntaxError, so the module currently can't import. Once that's removed the PR is in good shape.

Findings

🔴 Critical: episode_results_reader.py:124-125 — see inline comment. A duplicated if selected is not None: is a SyntaxError; nothing in the module imports, so every new test and generate_report is broken. One-line fix.

Test Coverage

Strong. The two old from_files tests are replaced with episode-results equivalents and the new paths are well exercised — vector splitting, mixed continuous/categorical, constant-factor dropping, all-constant raise, bool→categorical, inconsistent factor set, non-numeric vector component / outcome, non-dict variations, subset selection, and unknown-name rejection. These all live in Phase 1 (no sim), which is correct for a CPU-only analysis layer. Note that the syntax error means none of them can currently run — they'll pass once the duplicate line is dropped.

Verdict

Minor fixes needed — sound refactor blocked only by the one-line syntax error.

Comment thread isaaclab_arena/analysis/sensitivity/episode_results_reader.py Outdated

@arena-review-bot arena-review-bot Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab-Arena Review Bot

Summary

This PR drops the hand-authored factors.yaml + episode_summary.jsonl pair in favour of discovering factors directly from a single episode_results.jsonl, and removes FactorSchema / FactorSpec.dim to leave a leaner FactorSpec + SensitivityDataset. The direction is good and the data-model simplification is welcome — but the central new module does not parse, so it can't merge as-is.

Findings

🔴 Critical: episode_results_reader.py:123-132 — the factor_names validation block is malformed (duplicated if selected is not None: headers, an unclosed assert, the missing check never raised). python -m py_compile fails with IndentationError: expected an indented block after 'if' statement on line 124. The module can't be imported, which takes down generate_report.py and every test in test_sensitivity_analysis.py (all import dataset_from_episode_results). This looks like a botched merge/edit. Reconstructed fix in the inline suggestion — it also restores the actual unknown-factor assertion that test_from_episode_results_rejects_unknown_factor_name depends on.

🔵 Improvement: episode_results_reader.py:134factor_kinds carries the bare strings "continuous"/"categorical" while the FactorType enum (same module) already models exactly this. Could it use dict[str, FactorType] and FactorType.CATEGORICAL/FactorType.CONTINUOUS, in line with the typed-discriminator convention?

Test Coverage

Good coverage of the new reader: vector splitting, mixed continuous/categorical, constant-factor drop, all-constant error, bool-as-categorical, inconsistent factor set, non-numeric vector/outcome, and the subset path. These are pure-Python (no sim), so they belong in Phase 1 — no inner/outer harness needed. Note that none of these currently run: the IndentationError fails collection for the whole file. Once the syntax is fixed they should pass.

Verdict

Needs rework — the new module doesn't parse; fix the syntax error (and confirm the suite runs green) before this can land.

Comment thread isaaclab_arena/analysis/sensitivity/episode_results_reader.py
Comment thread isaaclab_arena/analysis/sensitivity/episode_results_reader.py
Comment thread isaaclab_arena/analysis/sensitivity/episode_results_reader.py Outdated
…uggestions

- Two applied GitHub suggestions interleaved three `if selected is not None:` blocks with an
  unclosed assert, so the module no longer parsed (IndentationError) — breaking every import.
- Restore a single pre-check that guards row 0's variations is a dict (clear message instead of
  a TypeError on null), and re-adds the dropped `assert not missing` for unknown factor names.

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
@cvolkcvolk cvolkcvolk enabled auto-merge (squash) June 24, 2026 15:12

@arena-review-bot arena-review-bot Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab-Arena Review Bot

Summary

This collapses the sensitivity data model from a hand-authored factors.yaml + nested episode_summary.jsonl pair down to a single episode_results.jsonl whose factor schema is discovered from each row's variations block. The new episode_results_reader.py cleanly isolates all on-disk-format knowledge so dataset.py stays a pure in-memory container, and the net effect is a large simplification (≈238 deletions). The discovery, vector-splitting, constant-dropping, and error paths are all covered by focused tests. Reads well and the layering is sound — nothing blocking.

One thing worth noting (not a code finding): this is a deliberate breaking change to the sensitivity CLI/API — --factors_yaml/--episode_summary become --episode_results, and SensitivityDataset.from_files/FactorSchema/FactorSpec.dim are gone. Since the producing recorder hasn't shipped yet (per the .. todo::) and every in-repo caller is updated in this PR, there are no external consumers to surprise — flagging only for the record.

Findings

No issues found. A few things I checked and was happy with: the continuous-first column layout stays consistent across _build_factor_columns, dataset.factor_columns, and the analyzer's _normalize/_normalized_prior slicing; constant factors are dropped before the MNPE fit (the documented crash) and an all-constant input asserts loudly; bool is correctly intercepted before the int/float branch and kept categorical; and sorted(set(...)) choices make the integer coding deterministic/reproducible.

Test Coverage

Thorough. The new tests exercise vector-splitting, mixed continuous/categorical discovery, constant-drop, all-constant raise, bool-as-categorical, inconsistent factor sets, non-numeric vector components, the --factors subset path, unknown-factor and non-dict-variations errors, and non-numeric outcomes — including asserting the MNPE vs NPE selection. These are pure-CPU analysis tests with no sim dependency, so the inner/outer run_simulation_app_function pattern correctly does not apply, and with no camera/subprocess use they land in Phase 1 as intended.

Verdict

Ship it

@cvolkcvolk cvolkcvolk merged commit e495af9 into main Jun 24, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants