isaac-sim · cvolkcvolk · Jun 24, 2026 · Jun 23, 2026 · Jun 23, 2026 · Jun 23, 2026
@@ -10,7 +10,7 @@ rate) and renders one figure summarising which factor values are associated with
 Two distinct ideas are at work. *Joint* means all factors are modelled together rather than
 one at a time, which is what captures interactions and confounds (see the next section).
 *Posterior* means the result is conditioned on the outcome: starting from the prior — the
-factor values the sweep actually drew, uniform over the declared ranges — it reweights them
+factor values the sweep actually drew, uniform over their observed ranges — it reweights them
 by how often each led to the chosen outcome. So the figure answers *given success, which
 factor values were in play?*, not merely *how were the factors distributed in the sweep?*
 
@@ -39,54 +39,49 @@ How it works
 The toolbox is a thin analysis layer over `sbi <https://sbi.readthedocs.io>`_'s
 neural posterior estimators. The flow is:
 
-1. **Per-episode input.** The analysis reads an ``episode_summary.jsonl`` — one row per
-   episode, holding that episode's factor values and outcomes.
-2. **Schema.** A ``factors.yaml`` declares the *factors* — which ``arena_env_args`` columns
-   were varied and whether each is continuous or categorical, plus the continuous ranges
-   that were swept (so the analyzer's prior matches the simulation). It does **not** list
-   outcomes — *which* outcome to condition on is chosen at analysis time, not saved here.
-3. **Inference.** ``SensitivityAnalyzer`` loads the pair, trains an estimator on the full
-   ``(theta, x)`` jointly — sbi's terms for the factor values (``theta``) and the per-episode
-   outcomes (``x``) — and samples the joint posterior conditioned on a chosen observation
-   (by default, success).
+1. **Per-episode input.** The analysis reads a single ``episode_results.jsonl`` — one row per
+   episode, holding that episode's recorded variation draws and outcomes.
+2. **Schema discovery.** The factors are discovered from the data: each entry in a row's
+   ``variations`` block becomes a factor — a number is continuous, a numeric vector splits into
+   one continuous factor per component, and a string is categorical (its choices are the labels
+   observed across the sweep). Continuous ranges are taken from the data's min/max. There is no
+   schema file to author; *which* outcome to condition on is chosen at analysis time.
+3. **Inference.** ``SensitivityAnalyzer`` trains an estimator on the full ``(theta, x)`` jointly
+   — sbi's terms for the factor values (``theta``) and the per-episode outcomes (``x``) — and
+   samples the joint posterior conditioned on a chosen observation (by default, success).
 4. **Report.** A probability density curve for each continuous factor and a probability bar
    chart for each categorical factor.
 
 .. todo::
 
-   The eval-runner writer (``episode_writer``) that emits ``episode_summary.jsonl`` during
-   evaluation is not part of this version — it lands in a follow-up. For now, run the analysis
-   on synthetic data (see below) or on a JSONL produced externally.
+   The per-episode recorder that emits ``episode_results.jsonl`` during evaluation lands in a
+   follow-up. For now, run the analysis on synthetic data (see below) or on a JSONL produced
+   externally.
 
-Inputs
-------
+Input
+-----
 
-**factors.yaml** declares only the factors that were varied (and the continuous ranges that
-were swept). Outcomes are not declared here — they're selected at analysis time (see below):
-
-.. code-block:: yaml
-
-   factors:
-     light_intensity:
-       type: continuous
-       range: [[0.0, 5000.0]]   # the swept range; inferred from the data's min/max if omitted
-     table_material:
-       type: categorical
-       choices: [oak, walnut, bamboo]
-
-**episode_summary.jsonl** holds one JSON object per episode. It carries every measured
-outcome; the analysis picks which one(s) to condition on:
+The analysis reads a single ``episode_results.jsonl`` written by the per-episode recorder —
+one JSON object per episode. Each row's ``variations`` block holds the sampled factor draws,
+and the top-level fields named by ``--outcome`` hold the outcomes (any other top-level fields
+are ignored):
 
 .. code-block:: json
 
-   {"job_name": "pi0_sweep", "episode_idx": 0,
-    "arena_env_args": {"light_intensity": 3200.0, "table_material": "oak"},
-    "outcomes": {"success": 1}}
+   {"job_name": "pi0_sweep", "episode_in_env": 0, "success": true,
+    "variations": {"light_intensity": 3200.0, "table_material": "oak",
+                   "wrist_camera": [0.01, -0.02, 0.0]}}
+
+The factor schema is discovered from these values, so there is no separate schema file: a
+number becomes a continuous factor, a numeric vector splits into one continuous factor per
+component (named ``key[0]``, ``key[1]``, …), and a string becomes a categorical factor whose
+choices are the labels observed across the sweep. A factor that took a single value across
+all episodes carries no information and is dropped.
 
 Choice of estimator
 -------------------
 
-``SensitivityAnalyzer`` picks the estimator from the schema automatically:
+``SensitivityAnalyzer`` picks the estimator from the discovered factors automatically:
 
 .. list-table::
    :header-rows: 1
@@ -111,22 +106,22 @@ conditions on success (1).
 Running a report
 ----------------
 
-Point the report generator at a ``(factors.yaml, episode_summary.jsonl)`` pair. The output
-format follows the file extension (``.png``, ``.pdf``, …); reports are written under
-``eval/`` by default.
+Point the report generator at an ``episode_results.jsonl``. The output format follows the
+file extension (``.png``, ``.pdf``, …); reports are written under ``eval/`` by default.
 
 .. code-block:: bash
 
    python -m isaaclab_arena.analysis.sensitivity.generate_report \
-     --factors_yaml factors.yaml \
-     --episode_summary episode_summary.jsonl \
+     --episode_results episode_results.jsonl \
      --outcome success \
      --output eval/sensitivity_report.png
 
-``--outcome`` selects which per-episode outcome(s) to condition on (keys in the rows'
-``outcomes`` block); it defaults to ``success``. Pass ``--observation`` to set the value
-per outcome — since outcomes are binary, use ``1`` for success or ``0`` for failure; it
-defaults to ``1`` (success).
+``--outcome`` selects which per-episode outcome(s) to condition on (top-level field(s) in
+each row); it defaults to ``success``. Pass ``--observation`` to set the value per outcome —
+since outcomes are binary, use ``1`` for success or ``0`` for failure; it defaults to ``1``
+(success). ``--factors`` restricts the analysis to a subset of the recorded variations (by
+their ``variations``-block names; a vector variation keeps all its components); by default
+every recorded variation is analyzed.
 
 Trying it on synthetic data
 ---------------------------
@@ -162,12 +157,18 @@ Current scope
 
 - Outcomes are treated as **binary** (0/1). Conditioning defaults to success; a continuous
   outcome is rejected with a clear error rather than silently averaged.
-- Continuous **vector** factors (``dim > 1``) are reserved for a future extension. The likely
-  approach is to record scalar reductions (e.g. a norm or distance-to-reference) alongside the
-  raw vector, so a pose or RGB factor becomes one or more analysable scalar columns.
+- A **vector** variation draw (e.g. a camera pose offset) is split into one scalar factor per
+  component (``key[0]``, ``key[1]``, …), each analysed independently. Components are named by
+  position; semantic names (e.g. a camera's lateral vs. depth axis) are a future extension.
+- **Factors should be drawn from the prior** the analyzer assumes — uniform over each
+  continuous range, and an equal number of episodes per categorical choice. The posterior is
+  taken relative to how the sweep drew the factors, so uneven sampling leaks in: a factor with
+  no real effect comes out flat only if it was sampled flat, otherwise its posterior tracks the
+  sampling frequency. The analyzer warns when a categorical is sampled unevenly, but the clean
+  fix is to balance the draws in the sweep.
 - The estimators run on CPU and do not require Isaac Sim, so a report can be generated
   anywhere the evaluation JSONL is available.
-- The analysis assumes the ``episode_summary.jsonl`` is a single coherent slice — one
+- The analysis assumes the ``episode_results.jsonl`` is a single coherent slice — one
   policy, task, and embodiment. **TODO:** add a filter (in the spirit of robolab's
   ``--filter-policy`` / ``--filter-task``) to select that slice from a larger JSONL,
   rather than relying on the caller to pre-filter it.
@@ -35,19 +35,19 @@ class SensitivityAnalyzer:
     def __init__(self, dataset: SensitivityDataset):
         self.dataset = dataset
         self.posterior = None
-        continuous_factors = [factor for factor in dataset.schema.factors if factor.type == "continuous"]
+        continuous_factors = [factor for factor in dataset.factors if factor.type == "continuous"]
         # theta is laid out continuous-first then categorical — built that way by
-        # SensitivityDataset and defined by FactorSchema.factor_columns — so the leading
+        # SensitivityDataset and defined by its factor_columns — so the leading
         # self._num_continuous columns are the continuous factors that _normalize/_denormalize slice.
         self._num_continuous = len(continuous_factors)
         for factor in continuous_factors:
             assert factor.range is not None, (
-                f"Continuous factor {factor.name!r} has no range to normalize against. Declare a"
-                " range in factors.yaml, or build the dataset via from_files()/from_file() so the"
-                " range is inferred from the data before constructing the analyzer."
+                f"Continuous factor {factor.name!r} has no range to normalize against. Set a range on"
+                " the FactorSpec, or build the dataset via dataset_from_episode_results() so the range is"
+                " inferred from the data before constructing the analyzer."
             )
-        self._continuous_low = torch.tensor([factor.range[0][0] for factor in continuous_factors])
-        self._continuous_high = torch.tensor([factor.range[0][1] for factor in continuous_factors])
+        self._continuous_low = torch.tensor([factor.range[0] for factor in continuous_factors])
+        self._continuous_high = torch.tensor([factor.range[1] for factor in continuous_factors])
 
     def _select_inference_class(self):
         """Choose the sbi inference class for this schema.
@@ -61,7 +61,7 @@ def _normalized_prior(self):
         """Uniform prior matching the normalized theta: continuous dims [0, 1], categoricals [0, k-1]."""
         low_bounds = [0.0] * self._num_continuous
         high_bounds = [1.0] * self._num_continuous
-        for factor in self.dataset.schema.factors:
+        for factor in self.dataset.factors:
             if factor.type == "categorical":
                 low_bounds.append(0.0)
                 high_bounds.append(float(len(factor.choices) - 1))
@@ -98,7 +98,7 @@ def sample_posterior(self, observation: torch.Tensor | None = None, num_samples:
         """Sample the joint posterior over all factors at observation.
 
         Defaults to the dataset's default observation (condition on success). Returns a
-        (num_samples, total_factor_dim) tensor laid out like theta — continuous columns first
+        (num_samples, num_factors) tensor laid out like theta — continuous columns first
         (in original, denormalized units), then integer-coded categorical columns.
         """
         assert self.posterior is not None, "Call fit() before sampling the posterior"