diff --git a/docs/pages/concepts/policy/concept_sensitivity_analysis.rst b/docs/pages/concepts/policy/concept_sensitivity_analysis.rst
index 82793670f8..9cb30b882c 100644
--- a/docs/pages/concepts/policy/concept_sensitivity_analysis.rst
+++ b/docs/pages/concepts/policy/concept_sensitivity_analysis.rst
@@ -10,7 +10,7 @@ rate) and renders one figure summarising which factor values are associated with
 Two distinct ideas are at work. *Joint* means all factors are modelled together rather than
 one at a time, which is what captures interactions and confounds (see the next section).
 *Posterior* means the result is conditioned on the outcome: starting from the prior — the
-factor values the sweep actually drew, uniform over the declared ranges — it reweights them
+factor values the sweep actually drew, uniform over their observed ranges — it reweights them
 by how often each led to the chosen outcome. So the figure answers *given success, which
 factor values were in play?*, not merely *how were the factors distributed in the sweep?*
 
@@ -39,54 +39,49 @@ How it works
 The toolbox is a thin analysis layer over `sbi <https://sbi.readthedocs.io>`_'s
 neural posterior estimators. The flow is:
 
-1. **Per-episode input.** The analysis reads an ``episode_summary.jsonl`` — one row per
-   episode, holding that episode's factor values and outcomes.
-2. **Schema.** A ``factors.yaml`` declares the *factors* — which ``arena_env_args`` columns
-   were varied and whether each is continuous or categorical, plus the continuous ranges
-   that were swept (so the analyzer's prior matches the simulation). It does **not** list
-   outcomes — *which* outcome to condition on is chosen at analysis time, not saved here.
-3. **Inference.** ``SensitivityAnalyzer`` loads the pair, trains an estimator on the full
-   ``(theta, x)`` jointly — sbi's terms for the factor values (``theta``) and the per-episode
-   outcomes (``x``) — and samples the joint posterior conditioned on a chosen observation
-   (by default, success).
+1. **Per-episode input.** The analysis reads a single ``episode_results.jsonl`` — one row per
+   episode, holding that episode's recorded variation draws and outcomes.
+2. **Schema discovery.** The factors are discovered from the data: each entry in a row's
+   ``variations`` block becomes a factor — a number is continuous, a numeric vector splits into
+   one continuous factor per component, and a string is categorical (its choices are the labels
+   observed across the sweep). Continuous ranges are taken from the data's min/max. There is no
+   schema file to author; *which* outcome to condition on is chosen at analysis time.
+3. **Inference.** ``SensitivityAnalyzer`` trains an estimator on the full ``(theta, x)`` jointly
+   — sbi's terms for the factor values (``theta``) and the per-episode outcomes (``x``) — and
+   samples the joint posterior conditioned on a chosen observation (by default, success).
 4. **Report.** A probability density curve for each continuous factor and a probability bar
    chart for each categorical factor.
 
 .. todo::
 
-   The eval-runner writer (``episode_writer``) that emits ``episode_summary.jsonl`` during
-   evaluation is not part of this version — it lands in a follow-up. For now, run the analysis
-   on synthetic data (see below) or on a JSONL produced externally.
+   The per-episode recorder that emits ``episode_results.jsonl`` during evaluation lands in a
+   follow-up. For now, run the analysis on synthetic data (see below) or on a JSONL produced
+   externally.
 
-Inputs
-------
+Input
+-----
 
-**factors.yaml** declares only the factors that were varied (and the continuous ranges that
-were swept). Outcomes are not declared here — they're selected at analysis time (see below):
-
-.. code-block:: yaml
-
-   factors:
-     light_intensity:
-       type: continuous
-       range: [[0.0, 5000.0]]   # the swept range; inferred from the data's min/max if omitted
-     table_material:
-       type: categorical
-       choices: [oak, walnut, bamboo]
-
-**episode_summary.jsonl** holds one JSON object per episode. It carries every measured
-outcome; the analysis picks which one(s) to condition on:
+The analysis reads a single ``episode_results.jsonl`` written by the per-episode recorder —
+one JSON object per episode. Each row's ``variations`` block holds the sampled factor draws,
+and the top-level fields named by ``--outcome`` hold the outcomes (any other top-level fields
+are ignored):
 
 .. code-block:: json
 
-   {"job_name": "pi0_sweep", "episode_idx": 0,
-    "arena_env_args": {"light_intensity": 3200.0, "table_material": "oak"},
-    "outcomes": {"success": 1}}
+   {"job_name": "pi0_sweep", "episode_in_env": 0, "success": true,
+    "variations": {"light_intensity": 3200.0, "table_material": "oak",
+                   "wrist_camera": [0.01, -0.02, 0.0]}}
+
+The factor schema is discovered from these values, so there is no separate schema file: a
+number becomes a continuous factor, a numeric vector splits into one continuous factor per
+component (named ``key[0]``, ``key[1]``, …), and a string becomes a categorical factor whose
+choices are the labels observed across the sweep. A factor that took a single value across
+all episodes carries no information and is dropped.
 
 Choice of estimator
 -------------------
 
-``SensitivityAnalyzer`` picks the estimator from the schema automatically:
+``SensitivityAnalyzer`` picks the estimator from the discovered factors automatically:
 
 .. list-table::
    :header-rows: 1
@@ -111,22 +106,22 @@ conditions on success (1).
 Running a report
 ----------------
 
-Point the report generator at a ``(factors.yaml, episode_summary.jsonl)`` pair. The output
-format follows the file extension (``.png``, ``.pdf``, …); reports are written under
-``eval/`` by default.
+Point the report generator at an ``episode_results.jsonl``. The output format follows the
+file extension (``.png``, ``.pdf``, …); reports are written under ``eval/`` by default.
 
 .. code-block:: bash
 
    python -m isaaclab_arena.analysis.sensitivity.generate_report \
-     --factors_yaml factors.yaml \
-     --episode_summary episode_summary.jsonl \
+     --episode_results episode_results.jsonl \
      --outcome success \
      --output eval/sensitivity_report.png
 
-``--outcome`` selects which per-episode outcome(s) to condition on (keys in the rows'
-``outcomes`` block); it defaults to ``success``. Pass ``--observation`` to set the value
-per outcome — since outcomes are binary, use ``1`` for success or ``0`` for failure; it
-defaults to ``1`` (success).
+``--outcome`` selects which per-episode outcome(s) to condition on (top-level field(s) in
+each row); it defaults to ``success``. Pass ``--observation`` to set the value per outcome —
+since outcomes are binary, use ``1`` for success or ``0`` for failure; it defaults to ``1``
+(success). ``--factors`` restricts the analysis to a subset of the recorded variations (by
+their ``variations``-block names; a vector variation keeps all its components); by default
+every recorded variation is analyzed.
 
 Trying it on synthetic data
 ---------------------------
@@ -162,12 +157,18 @@ Current scope
 
 - Outcomes are treated as **binary** (0/1). Conditioning defaults to success; a continuous
   outcome is rejected with a clear error rather than silently averaged.
-- Continuous **vector** factors (``dim > 1``) are reserved for a future extension. The likely
-  approach is to record scalar reductions (e.g. a norm or distance-to-reference) alongside the
-  raw vector, so a pose or RGB factor becomes one or more analysable scalar columns.
+- A **vector** variation draw (e.g. a camera pose offset) is split into one scalar factor per
+  component (``key[0]``, ``key[1]``, …), each analysed independently. Components are named by
+  position; semantic names (e.g. a camera's lateral vs. depth axis) are a future extension.
+- **Factors should be drawn from the prior** the analyzer assumes — uniform over each
+  continuous range, and an equal number of episodes per categorical choice. The posterior is
+  taken relative to how the sweep drew the factors, so uneven sampling leaks in: a factor with
+  no real effect comes out flat only if it was sampled flat, otherwise its posterior tracks the
+  sampling frequency. The analyzer warns when a categorical is sampled unevenly, but the clean
+  fix is to balance the draws in the sweep.
 - The estimators run on CPU and do not require Isaac Sim, so a report can be generated
   anywhere the evaluation JSONL is available.
-- The analysis assumes the ``episode_summary.jsonl`` is a single coherent slice — one
+- The analysis assumes the ``episode_results.jsonl`` is a single coherent slice — one
   policy, task, and embodiment. **TODO:** add a filter (in the spirit of robolab's
   ``--filter-policy`` / ``--filter-task``) to select that slice from a larger JSONL,
   rather than relying on the caller to pre-filter it.
diff --git a/isaaclab_arena/analysis/sensitivity/analyzer.py b/isaaclab_arena/analysis/sensitivity/analyzer.py
index cca176797a..6ec0946512 100644
--- a/isaaclab_arena/analysis/sensitivity/analyzer.py
+++ b/isaaclab_arena/analysis/sensitivity/analyzer.py
@@ -35,19 +35,19 @@ class SensitivityAnalyzer:
     def __init__(self, dataset: SensitivityDataset):
         self.dataset = dataset
         self.posterior = None
-        continuous_factors = [factor for factor in dataset.schema.factors if factor.type == "continuous"]
+        continuous_factors = [factor for factor in dataset.factors if factor.type == "continuous"]
         # theta is laid out continuous-first then categorical — built that way by
-        # SensitivityDataset and defined by FactorSchema.factor_columns — so the leading
+        # SensitivityDataset and defined by its factor_columns — so the leading
         # self._num_continuous columns are the continuous factors that _normalize/_denormalize slice.
         self._num_continuous = len(continuous_factors)
         for factor in continuous_factors:
             assert factor.range is not None, (
-                f"Continuous factor {factor.name!r} has no range to normalize against. Declare a"
-                " range in factors.yaml, or build the dataset via from_files()/from_file() so the"
-                " range is inferred from the data before constructing the analyzer."
+                f"Continuous factor {factor.name!r} has no range to normalize against. Set a range on"
+                " the FactorSpec, or build the dataset via dataset_from_episode_results() so the range is"
+                " inferred from the data before constructing the analyzer."
             )
-        self._continuous_low = torch.tensor([factor.range[0][0] for factor in continuous_factors])
-        self._continuous_high = torch.tensor([factor.range[0][1] for factor in continuous_factors])
+        self._continuous_low = torch.tensor([factor.range[0] for factor in continuous_factors])
+        self._continuous_high = torch.tensor([factor.range[1] for factor in continuous_factors])
 
     def _select_inference_class(self):
         """Choose the sbi inference class for this schema.
@@ -61,7 +61,7 @@ def _normalized_prior(self):
         """Uniform prior matching the normalized theta: continuous dims [0, 1], categoricals [0, k-1]."""
         low_bounds = [0.0] * self._num_continuous
         high_bounds = [1.0] * self._num_continuous
-        for factor in self.dataset.schema.factors:
+        for factor in self.dataset.factors:
             if factor.type == "categorical":
                 low_bounds.append(0.0)
                 high_bounds.append(float(len(factor.choices) - 1))
@@ -98,7 +98,7 @@ def sample_posterior(self, observation: torch.Tensor | None = None, num_samples:
         """Sample the joint posterior over all factors at observation.
 
         Defaults to the dataset's default observation (condition on success). Returns a
-        (num_samples, total_factor_dim) tensor laid out like theta — continuous columns first
+        (num_samples, num_factors) tensor laid out like theta — continuous columns first
         (in original, denormalized units), then integer-coded categorical columns.
         """
         assert self.posterior is not None, "Call fit() before sampling the posterior"
diff --git a/isaaclab_arena/analysis/sensitivity/dataset.py b/isaaclab_arena/analysis/sensitivity/dataset.py
index c4bac0a610..5ac893c953 100644
--- a/isaaclab_arena/analysis/sensitivity/dataset.py
+++ b/isaaclab_arena/analysis/sensitivity/dataset.py
@@ -5,12 +5,9 @@
 
 from __future__ import annotations
 
-import json
 import torch
-import yaml
 from dataclasses import dataclass
 from enum import Enum
-from pathlib import Path
 
 
 class FactorType(str, Enum):
@@ -22,174 +19,72 @@ class FactorType(str, Enum):
 
 @dataclass
 class FactorSpec:
-    """One factor's schema as declared in factors.yaml.
+    """One varied input — a lighting level, a camera-offset axis, a background choice, and so on.
 
-    Continuous factors carry a range (one [low, high] pair per dim); categorical
-    factors carry choices (a list of string labels, integer-encoded by index in theta).
+    Each factor occupies one column of the dataset's factor matrix theta (see SensitivityDataset).
+    A continuous factor carries a range, the (low, high) it was swept over. A categorical factor
+    carries choices, the string labels it took, integer-encoded by their index in that column.
     """
 
     name: str
     type: FactorType
-    dim: int = 1
-    range: list[tuple[float, float]] | None = None  # one (low, high) pair per dim, continuous only
+    range: tuple[float, float] | None = None  # (low, high), continuous only
     choices: list[str] | None = None  # categorical only
 
     def __post_init__(self) -> None:
         # Accept the raw string form (from YAML / callers) and normalize to the enum.
         self.type = FactorType(self.type)
-        # Normalize each (low, high) pair to a tuple (YAML/JSON deliver them as lists).
+        # JSON/YAML deliver the range as a list; normalize it to a tuple.
         if self.range is not None:
-            self.range = [tuple(pair) for pair in self.range]
-
-
-@dataclass
-class FactorSchema:
-    """Parsed factors.yaml — the list of factors that were varied.
-
-    The schema describes what *can* vary (continuous vs categorical, range/choices), not the
-    values taken in any given episode. Outcomes are not part of the schema; which outcome to
-    condition on is chosen at analysis time.
-    """
-
-    factors: list[FactorSpec]
-
-    @classmethod
-    def from_yaml(cls, path: str | Path) -> FactorSchema:
-        """Load a factors.yaml from disk into a typed FactorSchema.
-
-        The YAML has one top-level block, factors (one entry per varied input). Each factor's
-        type must be continuous or categorical.
-        """
-        # TODO: add a robolab-style filter (e.g. select rows by policy/task/embodiment) so a
-        # single episode_summary.jsonl can be sliced to one coherent (policy, task, embodiment)
-        # before analysis, instead of assuming the caller pre-filtered it.
-        with open(path, encoding="utf-8") as yaml_file:
-            yaml_data = yaml.safe_load(yaml_file)
-        assert isinstance(yaml_data, dict), f"factors.yaml at {path} must be a mapping at top level"
-        assert "factors" in yaml_data, f"factors.yaml at {path} is missing top-level `factors:` block"
-
-        factors: list[FactorSpec] = []
-        for factor_name, factor_block in yaml_data["factors"].items():
-            assert "type" in factor_block, (
-                f"factors.yaml at {path} factor {factor_name!r} is missing required `type:` field"
-                " (expected 'continuous' or 'categorical')"
-            )
-            factor_type = factor_block["type"]
-            assert factor_type in ("continuous", "categorical"), (
-                f"factors.yaml at {path} factor {factor_name!r} has unknown type {factor_type!r};"
-                " expected 'continuous' or 'categorical'"
-            )
-            factors.append(
-                FactorSpec(
-                    name=factor_name,
-                    type=factor_type,
-                    dim=factor_block.get("dim", 1),
-                    range=factor_block.get("range"),
-                    choices=factor_block.get("choices"),
-                )
-            )
-
-        return cls(factors=factors)
-
-    @property
-    def total_factor_dim(self) -> int:
-        """Total width of theta — sum of dim over continuous factors plus 1 per categorical."""
-        return sum(factor.dim if factor.type == "continuous" else 1 for factor in self.factors)
-
-    @property
-    def factor_columns(self) -> dict[str, slice]:
-        """Map factor name → its column slice in theta.
-
-        Continuous factors occupy the leading columns (dim each), then each categorical
-        factor occupies one trailing column. This continuous-first layout is what sbi's
-        mixed density estimator expects.
-        """
-        continuous_factors = [factor for factor in self.factors if factor.type == "continuous"]
-        categorical_factors = [factor for factor in self.factors if factor.type == "categorical"]
-        column_slices: dict[str, slice] = {}
-        column_index = 0
-        for factor in continuous_factors + categorical_factors:
-            column_width = factor.dim if factor.type == "continuous" else 1
-            column_slices[factor.name] = slice(column_index, column_index + column_width)
-            column_index += column_width
-        return column_slices
+            self.range = tuple(self.range)
 
 
 class SensitivityDataset:
-    """A FactorSchema paired with its per-episode theta (factors) and x (outcomes).
+    """The varied factors paired with their per-episode values (theta) and outcomes (x).
 
-    The object is a pure container: it holds the schema and the two tensors, and exposes
-    the prior and column layouts an analyzer consumes. It can be built two ways:
-
-      - from_files — parse a factors.yaml / episode_summary.jsonl pair
-        (the path eval runs take).
-      - the constructor — wrap in-memory tensors directly (what a synthetic simulator or
-        a unit test takes). The tensors must already be in the layout factor_columns
-        describes: continuous columns first, then one integer-coded column per categorical.
+    theta is the factor matrix: one row per episode, one column per factor — continuous factors
+    in the leading columns, then one integer-coded column per categorical factor. x is the
+    matching outcome matrix, one row per episode and one column per outcome. The object is a pure
+    in-memory container (the factor list plus the two tensors) and exposes the column layout an
+    analyzer reads.
     """
 
     def __init__(
         self,
-        schema: FactorSchema,
+        factors: list[FactorSpec],
         theta: torch.Tensor,
         x: torch.Tensor,
         outcome_names: list[str] | tuple[str, ...] = ("success",),
     ):
-        """Wrap an in-memory schema plus its theta / x tensors, validating shapes.
+        """Wrap an in-memory factor list plus its theta / x tensors, validating shapes.
 
         Args:
-            schema: The parsed factor schema. Continuous factors must carry a range;
-                categorical factors must carry choices.
-            theta: (num_episodes, total_factor_dim) factor matrix, continuous-first.
+            factors: The varied factors, one per theta column. A continuous factor must carry a
+                range, a categorical factor must carry choices.
+            theta: (num_episodes, num_factors) factor matrix, continuous-first.
             x: (num_episodes, num_outcomes) outcome matrix.
             outcome_names: Name of each outcome column in x, in order (used for plot labels).
         """
         assert theta.ndim == 2 and x.ndim == 2, f"theta and x must be 2D; got {theta.shape} and {x.shape}"
         assert theta.shape[0] == x.shape[0], f"theta/x row counts disagree: {theta.shape[0]} vs {x.shape[0]}"
         assert theta.shape[0] > 0, "Dataset is empty (no episodes)"
-        assert (
-            theta.shape[1] == schema.total_factor_dim
-        ), f"theta has {theta.shape[1]} columns but schema declares {schema.total_factor_dim} factor dims"
+        assert theta.shape[1] == len(
+            factors
+        ), f"theta has {theta.shape[1]} columns but there are {len(factors)} factor(s) (one column each)"
         assert x.shape[1] == len(
             outcome_names
         ), f"x has {x.shape[1]} columns but {len(outcome_names)} outcome name(s) were given"
-        self.schema = schema
+        self.factors = factors
         self.outcome_names = list(outcome_names)
         self._theta = theta
         self._x = x
 
-    @classmethod
-    def from_files(
-        cls,
-        factors_yaml: str | Path,
-        jsonl_path: str | Path,
-        outcome_names: list[str] | tuple[str, ...] = ("success",),
-    ) -> SensitivityDataset:
-        """Build a dataset from a factors.yaml schema and an episode_summary.jsonl.
-
-        Parses and validates both, infers any missing continuous range from the data, and
-        assembles the theta / x tensors in the layout the analyzers expect. ``outcome_names``
-        selects which per-episode outcome columns to condition on (the analysis-time query).
-        """
-        schema = FactorSchema.from_yaml(factors_yaml)
-
-        jsonl_text = Path(jsonl_path).read_text(encoding="utf-8")
-        rows = [json.loads(line) for line in jsonl_text.splitlines() if line.strip()]
-        assert len(rows) > 0, f"Empty episode_summary.jsonl at {jsonl_path}"
-
-        _validate_rows(schema, rows, outcome_names, jsonl_path)
-        _infer_missing_factor_ranges(schema, rows)
-
-        theta = _build_factor_tensor(schema, rows)
-        x = _build_outcome_tensor(rows, outcome_names)
-        return cls(schema, theta, x, outcome_names)
-
     @property
     def theta(self) -> torch.Tensor:
-        """(num_episodes, total_factor_dim) matrix of factor values, one row per episode.
+        """(num_episodes, num_factors) matrix of factor values, one row per episode.
 
-        This is the "input" sbi infers a posterior over. Column layout is given by
-        factor_columns — continuous factors first, then categoricals (integer-coded).
+        The column layout is given by factor_columns, continuous factors first then categoricals
+        (integer-coded).
         """
         return self._theta
 
@@ -197,8 +92,7 @@ def theta(self) -> torch.Tensor:
     def x(self) -> torch.Tensor:
         """(num_episodes, num_outcomes) matrix of outcome values, one row per episode.
 
-        This is what the analyzer conditions queries on — "what factor values were consistent
-        with observing these outcomes?". Columns are named by ``outcome_names``.
+        Columns are named by outcome_names. These are the values a query conditions on.
         """
         return self._x
 
@@ -209,15 +103,19 @@ def num_episodes(self) -> int:
 
     @property
     def factor_columns(self) -> dict[str, slice]:
-        """Map factor name → its column slice in theta. Same as schema.factor_columns."""
-        return self.schema.factor_columns
+        """Map each factor name to its single-column slice in theta.
+
+        Continuous factors take the leading columns, then categoricals. Each factor is one column.
+        """
+        continuous = [factor for factor in self.factors if factor.type == "continuous"]
+        categorical = [factor for factor in self.factors if factor.type == "categorical"]
+        return {factor.name: slice(index, index + 1) for index, factor in enumerate(continuous + categorical)}
 
     def default_observation(self) -> torch.Tensor:
-        """The default outcome vector to condition a query on: success (1) for every outcome.
+        """The outcome vector a query conditions on by default: success (1) for every outcome.
 
-        Outcomes are binary (0/1) in the current scope, so the natural default query is
-        "what produced success?". Asserts the outcomes are binary, so adding a continuous
-        outcome later fails loudly here instead of silently conditioning on a meaningless value.
+        Outcomes are binary (0/1), so the natural query is what produced success. The assertion
+        keeps a continuous outcome from being used here silently.
         """
         is_binary = set(self._x.flatten().tolist()).issubset({0.0, 1.0})
         assert is_binary, "default_observation assumes binary (0/1) outcomes; pass an explicit observation otherwise."
@@ -225,103 +123,5 @@ def default_observation(self) -> torch.Tensor:
 
     @property
     def has_categorical_factors(self) -> bool:
-        """True iff the schema declares at least one categorical factor."""
-        return any(factor.type == "categorical" for factor in self.schema.factors)
-
-
-def _validate_rows(
-    schema: FactorSchema, rows: list[dict], outcome_names: list[str] | tuple[str, ...], jsonl_path: str | Path
-) -> None:
-    """Assert every JSONL row carries the declared factor keys and the requested outcome keys.
-
-    The declared names need only be a subset of each row's arena_env_args / outcomes;
-    extra keys are ignored. Raises pointing at the first offending row.
-    """
-    expected_factor_names = {factor.name for factor in schema.factors}
-    expected_outcome_names = set(outcome_names)
-    for row_index, row in enumerate(rows):
-        assert (
-            "arena_env_args" in row and "outcomes" in row
-        ), f"Row {row_index} of {jsonl_path} missing arena_env_args/outcomes block"
-        missing_factor_names = expected_factor_names - set(row["arena_env_args"].keys())
-        assert not missing_factor_names, (
-            f"Row {row_index} of {jsonl_path} is missing factor(s) "
-            f"{sorted(missing_factor_names)} from its arena_env_args block; "
-            f"factors.yaml declares: {sorted(expected_factor_names)}"
-        )
-        missing_outcome_names = expected_outcome_names - set(row["outcomes"].keys())
-        assert (
-            not missing_outcome_names
-        ), f"Row {row_index} of {jsonl_path} missing outcomes {sorted(missing_outcome_names)}"
-
-
-def _infer_missing_factor_ranges(schema: FactorSchema, rows: list[dict]) -> None:
-    """Fill any continuous factor's missing range from the observed min/max.
-
-    A range declared in factors.yaml takes precedence and is left untouched.
-    """
-    for factor in schema.factors:
-        if factor.type != "continuous" or factor.range is not None:
-            continue
-        if factor.dim != 1:
-            raise NotImplementedError(
-                "Range inference for vector factors (dim > 1) is not implemented;"
-                f" factor {factor.name!r} has dim={factor.dim}"
-            )
-        observed_values = [float(row["arena_env_args"][factor.name]) for row in rows]
-        factor.range = [(min(observed_values), max(observed_values))]
-
-
-def _build_factor_tensor(schema: FactorSchema, rows: list[dict]) -> torch.Tensor:
-    """Assemble the per-episode factor matrix theta.
-
-    Continuous columns first (one per dim), then one column per categorical factor with its
-    value integer-coded as a float32 index into FactorSpec.choices.
-    """
-    continuous_factors = [factor for factor in schema.factors if factor.type == "continuous"]
-    categorical_factors = [factor for factor in schema.factors if factor.type == "categorical"]
-
-    factor_columns: list[torch.Tensor] = []
-
-    # Continuous columns come first (sbi MNPE convention).
-    for factor in continuous_factors:
-        if factor.dim != 1:
-            raise NotImplementedError(
-                "Vector continuous factors (dim > 1) are not yet supported;"
-                f" factor {factor.name!r} has dim={factor.dim}"
-            )
-        raw_values = [float(row["arena_env_args"][factor.name]) for row in rows]
-        factor_column = torch.tensor(raw_values, dtype=torch.float32).unsqueeze(1)
-        factor_columns.append(factor_column)
-
-    # Categorical columns: integer-code each string value as its index in FactorSpec.choices.
-    for factor in categorical_factors:
-        assert (
-            factor.choices is not None and len(factor.choices) > 0
-        ), f"Categorical factor {factor.name!r} has no `choices:` block in factors.yaml"
-        choice_to_code = {choice: code for code, choice in enumerate(factor.choices)}
-        category_codes: list[int] = []
-        for row_index, row in enumerate(rows):
-            value = row["arena_env_args"][factor.name]
-            assert (
-                value in choice_to_code
-            ), f"Row {row_index} factor {factor.name!r} has value {value!r} not in declared choices {factor.choices}"
-            category_codes.append(choice_to_code[value])
-        factor_column = torch.tensor(category_codes, dtype=torch.float32).unsqueeze(1)
-        factor_columns.append(factor_column)
-
-    if factor_columns:
-        return torch.cat(factor_columns, dim=1)
-    return torch.zeros((len(rows), 0), dtype=torch.float32)
-
-
-def _build_outcome_tensor(rows: list[dict], outcome_names: list[str] | tuple[str, ...]) -> torch.Tensor:
-    """Assemble the per-episode outcome matrix x (one column per requested outcome).
-
-    Each outcome value is cast to float; bool outcomes become 0.0/1.0.
-    """
-    outcome_column_tensors = [
-        torch.tensor([float(row["outcomes"][name]) for row in rows], dtype=torch.float32).unsqueeze(1)
-        for name in outcome_names
-    ]
-    return torch.cat(outcome_column_tensors, dim=1)
+        """True iff at least one factor is categorical."""
+        return any(factor.type == "categorical" for factor in self.factors)
diff --git a/isaaclab_arena/analysis/sensitivity/episode_results_reader.py b/isaaclab_arena/analysis/sensitivity/episode_results_reader.py
new file mode 100644
index 0000000000..e4ba7b1a77
--- /dev/null
+++ b/isaaclab_arena/analysis/sensitivity/episode_results_reader.py
@@ -0,0 +1,268 @@
+# Copyright (c) 2026, The Isaac Lab Arena Project Developers (https://github.com/isaac-sim/IsaacLab-Arena/blob/main/CONTRIBUTORS.md).
+# All rights reserved.
+#
+# SPDX-License-Identifier: Apache-2.0
+
+"""Read an episode_results.jsonl (the per-episode recorder's output) into a SensitivityDataset.
+
+This module is the only place that knows the recorder's on-disk format, so dataset.py stays a
+pure in-memory container.
+"""
+
+from __future__ import annotations
+
+import json
+import torch
+from pathlib import Path
+from typing import Any
+
+from isaaclab_arena.analysis.sensitivity.dataset import FactorSpec, FactorType, SensitivityDataset
+
+_IMBALANCE_WARN_RATIO = 1.5
+"""Warn when a categorical's most-sampled choice exceeds its least-sampled one by at least this factor."""
+
+
+def dataset_from_episode_results(
+    jsonl_path: str | Path,
+    outcome_names: list[str] | tuple[str, ...] = ("success",),
+    factor_names: list[str] | tuple[str, ...] | None = None,
+) -> SensitivityDataset:
+    """Build a SensitivityDataset from an episode_results.jsonl, discovering the factors from the data.
+
+    Each line is one episode. The variations block holds the sampled factor draws, and the
+    top-level fields named by outcome_names hold the outcomes. Other top-level fields are ignored.
+    A number becomes a continuous factor, a numeric vector becomes one continuous factor per
+    component (named key[i]), and a string becomes a categorical factor over its observed labels.
+
+    Example line, one vector and one string factor:
+
+        {"success": true,
+         "variations": {"wrist_camera": [0.01, -0.02, 0.0], "hdr_image": "sunset"}}
+
+    Args:
+        jsonl_path: Path to the episode_results.jsonl, one JSON object per line.
+        outcome_names: Top-level field(s) per line to use as outcomes.
+        factor_names: Which recorded variations to analyze, by their variations-block name. A
+            vector is selected by its base name and keeps every component. None analyzes all.
+
+    Returns:
+        A SensitivityDataset whose theta / x use the continuous-first layout the analyzers read.
+    """
+    rows = _read_rows(jsonl_path)
+    factor_kinds, factor_values, factor_order = _discover_factor_values(rows, outcome_names, jsonl_path, factor_names)
+    factors, theta = _build_factor_columns(factor_kinds, factor_values, factor_order, jsonl_path)
+    x = _build_outcome_columns(rows, outcome_names, jsonl_path)
+    return SensitivityDataset(factors, theta, x, outcome_names)
+
+
+def _read_rows(jsonl_path: str | Path) -> list[dict]:
+    """Parse the JSONL file into a non-empty list of episode records."""
+    jsonl_text = Path(jsonl_path).read_text(encoding="utf-8")
+    rows = [json.loads(line) for line in jsonl_text.splitlines() if line.strip()]
+    assert len(rows) > 0, f"Empty episode_results.jsonl at {jsonl_path}"
+    return rows
+
+
+def _flatten_variation_value(
+    key: str, value: Any, row_index: int, jsonl_path: str | Path
+) -> list[tuple[str, float | str]]:
+    """Turn one recorded variation draw into (factor_name, scalar) pairs.
+
+    A numeric vector becomes one pair per component, each named key[i]. A bare number or string
+    becomes a single pair under key. A bool is treated as a categorical label rather than a 0/1
+    number.
+
+    Args:
+        key: The variation name, asset.variation.
+        value: The recorded draw for one episode.
+        row_index: Source row index, used in error messages.
+        jsonl_path: Source path, used in error messages.
+
+    Returns:
+        The (factor_name, scalar) pairs this draw contributes.
+    """
+    assert isinstance(value, (bool, int, float, str, list, tuple)), (
+        f"Variation {key!r} in row {row_index} of {jsonl_path} has unsupported value type "
+        f"{type(value).__name__}: {value!r}. Expected a number, string, or numeric vector."
+    )
+    # bool is an int subclass, so check it before int/float and keep it categorical.
+    if isinstance(value, bool):
+        return [(key, str(value))]
+    if isinstance(value, (int, float)):
+        return [(key, float(value))]
+    if isinstance(value, str):
+        return [(key, value)]
+    # list / tuple → one continuous scalar factor per component.
+    # TODO(cvolk): components are named with an opaque positional suffix (key[0], key[1], ...),
+    # so plots can't tell e.g. a camera's lateral axis from its depth axis. Follow-up PR: have
+    # the recorder emit semantic component names (e.g. camera ROS frame x_right/y_down/z_forward)
+    # rather than a bare vector, so the labels flow through this generic reader unchanged.
+    assert len(value) > 0, f"Variation {key!r} in row {row_index} of {jsonl_path} is an empty list."
+    pairs: list[tuple[str, float | str]] = []
+    for component_index, component in enumerate(value):
+        assert isinstance(component, (int, float)) and not isinstance(component, bool), (
+            f"Variation {key!r} in row {row_index} of {jsonl_path} is a vector with a non-numeric "
+            f"component at index {component_index}: {component!r}. Vector variations must be all-numeric."
+        )
+        pairs.append((f"{key}[{component_index}]", float(component)))
+    return pairs
+
+
+def _discover_factor_values(
+    rows: list[dict],
+    outcome_names: list[str] | tuple[str, ...],
+    jsonl_path: str | Path,
+    factor_names: list[str] | tuple[str, ...] | None,
+) -> tuple[dict[str, str], dict[str, list[float | str]], list[str]]:
+    """Scan the rows into per-factor value lists, checking the recorder contract.
+
+    Flattens each row's variation draws (see _flatten_variation_value), keeps only the requested
+    factor_names if given, and asserts every episode records the same factors and the requested
+    outcomes. Returns the factor kinds, the per-row values, and the first-seen order.
+    """
+    selected = set(factor_names) if factor_names is not None else None
+    if selected is not None:
+        first_variations = rows[0].get("variations")
+        assert isinstance(
+            first_variations, dict
+        ), f"Row 0 of {jsonl_path} has no 'variations' block (or it is not a JSON object)."
+        available = set(first_variations)
+        missing = selected - available
+        assert not missing, (
+            f"Requested factor(s) {sorted(missing)} not found in {jsonl_path}; "
+            f"available variations: {sorted(available)}."
+        )
+
+    factor_kinds: dict[str, str] = {}  # factor name → "continuous" | "categorical"
+    factor_values: dict[str, list[float | str]] = {}  # factor name → per-row value, in row order
+    factor_order: list[str] = []  # factor names in first-seen order, for a stable schema
+
+    for row_index, row in enumerate(rows):
+        assert "variations" in row and isinstance(row["variations"], dict), (
+            f"Row {row_index} of {jsonl_path} has no 'variations' block (or it is not a JSON object); "
+            "episode_results rows must carry recorded variation draws."
+        )
+        seen_in_row: set[str] = set()
+        for key, value in row["variations"].items():
+            if selected is not None and key not in selected:
+                continue
+            for factor_name, scalar in _flatten_variation_value(key, value, row_index, jsonl_path):
+                kind = "categorical" if isinstance(scalar, str) else "continuous"
+                if factor_name not in factor_kinds:
+                    assert row_index == 0, (
+                        f"Factor {factor_name!r} first appears in row {row_index} of {jsonl_path}; "
+                        "every episode must record the same variations."
+                    )
+                    factor_kinds[factor_name] = kind
+                    factor_values[factor_name] = []
+                    factor_order.append(factor_name)
+                assert factor_kinds[factor_name] == kind, (
+                    f"Factor {factor_name!r} is {factor_kinds[factor_name]} in earlier rows but {kind} "
+                    f"in row {row_index} of {jsonl_path}; a variation must keep a single type."
+                )
+                factor_values[factor_name].append(scalar)
+                seen_in_row.add(factor_name)
+
+        missing_in_row = [name for name in factor_order if name not in seen_in_row]
+        assert not missing_in_row, (
+            f"Row {row_index} of {jsonl_path} is missing factor(s) {sorted(missing_in_row)}; "
+            "every episode must record the same variations."
+        )
+        for name in outcome_names:
+            assert name in row, (
+                f"Row {row_index} of {jsonl_path} is missing outcome field {name!r} "
+                f"(requested outcomes: {list(outcome_names)})."
+            )
+
+    assert factor_order, f"No factors discovered in {jsonl_path}: every row's 'variations' block was empty."
+    return factor_kinds, factor_values, factor_order
+
+
+def _build_factor_columns(
+    factor_kinds: dict[str, str],
+    factor_values: dict[str, list[float | str]],
+    factor_order: list[str],
+    jsonl_path: str | Path,
+) -> tuple[list[FactorSpec], torch.Tensor]:
+    """Turn the discovered per-factor values into the factor specs and the theta matrix.
+
+    Continuous factors lead theta, then categoricals (integer-coded). A factor that took a single
+    value is dropped (it carries no information, and a constant categorical breaks the estimator
+    fit), and an all-constant input raises.
+    """
+    continuous_names = [name for name in factor_order if factor_kinds[name] == "continuous"]
+    categorical_names = [name for name in factor_order if factor_kinds[name] == "categorical"]
+
+    factors: list[FactorSpec] = []
+    columns: list[torch.Tensor] = []
+    dropped: list[str] = []
+    for name in continuous_names:
+        values = factor_values[name]
+        lo, hi = min(values), max(values)
+        if lo == hi:
+            dropped.append(name)
+            continue
+        factors.append(FactorSpec(name=name, type=FactorType.CONTINUOUS, range=(lo, hi)))
+        columns.append(torch.tensor(values, dtype=torch.float32).unsqueeze(1))
+    for name in categorical_names:
+        choices = sorted(set(factor_values[name]))
+        if len(choices) == 1:
+            dropped.append(name)
+            continue
+        _warn_if_unevenly_sampled(name, factor_values[name], choices)
+        code_of = {choice: code for code, choice in enumerate(choices)}
+        factors.append(FactorSpec(name=name, type=FactorType.CATEGORICAL, choices=choices))
+        columns.append(
+            torch.tensor([code_of[value] for value in factor_values[name]], dtype=torch.float32).unsqueeze(1)
+        )
+
+    if dropped:
+        print(
+            f"[INFO] Dropped {len(dropped)} constant factor(s) (single value across all episodes): {sorted(dropped)}."
+        )
+    assert factors, (
+        f"All discovered factors in {jsonl_path} are constant (each took a single value across all "
+        "episodes). Nothing to analyze. Vary at least one factor."
+    )
+    return factors, torch.cat(columns, dim=1)
+
+
+def _warn_if_unevenly_sampled(name: str, values: list[float | str], choices: list[str]) -> None:
+    """Warn when a categorical's choices were sampled unevenly, since that biases its posterior.
+
+    The analysis assumes factors were drawn from the uniform prior. Uneven draw counts per choice
+    leak into the posterior (a no-effect factor then tracks its sampling frequency), so warn once
+    the imbalance reaches _IMBALANCE_WARN_RATIO.
+    """
+    counts: dict[str, int] = {}
+    for value in values:
+        counts[value] = counts.get(value, 0) + 1
+    if max(counts.values()) >= _IMBALANCE_WARN_RATIO * min(counts.values()):
+        ordered_counts = {choice: counts[choice] for choice in choices}
+        print(
+            f"[WARNING] Categorical factor {name!r} was sampled unevenly across its choices "
+            f"({ordered_counts}). Its posterior reflects this sampling frequency, not only its effect "
+            "on the outcome. Balance the draws per choice for an unbiased result."
+        )
+
+
+def _build_outcome_columns(
+    rows: list[dict], outcome_names: list[str] | tuple[str, ...], jsonl_path: str | Path
+) -> torch.Tensor:
+    """Stack the requested top-level outcome fields into the x matrix, one column per outcome.
+
+    Asserts each outcome value is numeric or boolean, so a stray non-numeric outcome fails with
+    the same row-and-path context as a bad variation rather than a bare cast error.
+    """
+    columns: list[torch.Tensor] = []
+    for name in outcome_names:
+        values: list[float] = []
+        for row_index, row in enumerate(rows):
+            value = row[name]
+            assert isinstance(value, (bool, int, float)), (
+                f"Outcome {name!r} in row {row_index} of {jsonl_path} is {type(value).__name__} {value!r}; "
+                "outcomes must be numeric or boolean."
+            )
+            values.append(float(value))
+        columns.append(torch.tensor(values, dtype=torch.float32).unsqueeze(1))
+    return torch.cat(columns, dim=1)
diff --git a/isaaclab_arena/analysis/sensitivity/generate_report.py b/isaaclab_arena/analysis/sensitivity/generate_report.py
index a746ceb3a2..33d4094105 100644
--- a/isaaclab_arena/analysis/sensitivity/generate_report.py
+++ b/isaaclab_arena/analysis/sensitivity/generate_report.py
@@ -11,32 +11,32 @@
 from pathlib import Path
 
 from isaaclab_arena.analysis.sensitivity.analyzer import SensitivityAnalyzer
-from isaaclab_arena.analysis.sensitivity.dataset import SensitivityDataset
+from isaaclab_arena.analysis.sensitivity.episode_results_reader import dataset_from_episode_results
 from isaaclab_arena.analysis.sensitivity.plotting import plot_marginals
 
 
 def generate_report(
-    factors_yaml_path: str | Path,
-    jsonl_path: str | Path,
+    episode_results_path: str | Path,
     output_path: str | Path,
     outcome_names: list[str] | tuple[str, ...] = ("success",),
+    factor_names: list[str] | tuple[str, ...] | None = None,
     observation: list[float] | None = None,
     seed: int | None = 0,
 ) -> Path:
-    """Build a sensitivity report from a factors.yaml / episode_summary.jsonl pair.
+    """Build a sensitivity report from an episode_results.jsonl, fit, and save a figure.
 
-    Loads the data, fits a SensitivityAnalyzer, and saves a single posterior-marginals
-    figure. The output format follows the output_path extension (.png, .pdf, …).
+    The factor schema is discovered from the recorder's per-episode variation draws. The output
+    format follows the output_path extension (.png, .pdf, …).
 
     Args:
-        factors_yaml_path: Schema file declaring the factors.
-        jsonl_path: episode_summary.jsonl produced by eval_runner.
+        episode_results_path: episode_results.jsonl produced by the per-episode recorder.
         output_path: Destination figure file (parent dirs created if absent).
         outcome_names: Which per-episode outcome(s) to condition on.
-        observation: Outcome values to condition on, one per outcome name. Defaults to
-            conditioning on success (1) for every (binary) outcome.
-        seed: Seed for torch's global RNG, set once before fitting so the estimator training
-            and posterior sampling are reproducible. Pass ``None`` to leave the RNG untouched.
+        factor_names: Which recorded variations to analyze. None analyzes all of them.
+        observation: Outcome values to condition on, one per outcome name. None conditions on
+            success (1) for every binary outcome.
+        seed: Seed for torch's global RNG so a report is reproducible. Pass None to leave the
+            RNG untouched.
 
     Returns:
         The resolved output path.
@@ -46,7 +46,7 @@ def generate_report(
     if seed is not None:
         torch.manual_seed(seed)
 
-    dataset = SensitivityDataset.from_files(Path(factors_yaml_path), Path(jsonl_path), outcome_names)
+    dataset = dataset_from_episode_results(episode_results_path, outcome_names, factor_names)
     analyzer = SensitivityAnalyzer(dataset)
     analyzer.fit()
 
@@ -64,26 +64,38 @@ def generate_report(
 def main():
     parser = argparse.ArgumentParser(
         description=(
-            "Build a sensitivity report (one posterior-marginal panel per factor) from a "
-            "(factors.yaml, episode_summary.jsonl) pair. Output format follows the --output extension."
+            "Build a sensitivity report (one posterior-marginal panel per factor) from an "
+            "episode_results.jsonl. Output format follows the --output extension."
         )
     )
-    parser.add_argument("--factors_yaml", type=str, required=True, help="Path to factors.yaml.")
     parser.add_argument(
-        "--episode_summary", type=str, required=True, help="Path to episode_summary.jsonl produced by eval_runner."
+        "--episode_results",
+        type=str,
+        required=True,
+        help="Path to episode_results.jsonl produced by the per-episode recorder.",
     )
     parser.add_argument(
         "--output",
         type=str,
         default="eval/sensitivity_report.png",
-        help="Output figure file; format follows the extension (.png, .pdf, …). Default: eval/sensitivity_report.png.",
+        help="Output figure file. Format follows the extension (.png, .pdf, …). Default: eval/sensitivity_report.png.",
     )
     parser.add_argument(
         "--outcome",
         type=str,
         nargs="+",
         default=["success"],
-        help="Which per-episode outcome(s) to condition on (keys in the rows' outcomes block). Default: success.",
+        help="Which per-episode outcome(s) to condition on (top-level field(s) in each row). Default: success.",
+    )
+    parser.add_argument(
+        "--factors",
+        type=str,
+        nargs="+",
+        default=None,
+        help=(
+            "Which recorded variations to analyze (keys in each row's variations block, a vector "
+            "variation keeps all its components). Default: all recorded variations."
+        ),
     )
     parser.add_argument(
         "--observation",
@@ -99,15 +111,15 @@ def main():
         "--seed",
         type=int,
         default=0,
-        help="Seed for torch's global RNG, so estimator training + sampling are reproducible. Default: 0.",
+        help="Seed for torch's global RNG so a report is reproducible. Default: 0.",
     )
     args = parser.parse_args()
 
     generate_report(
-        args.factors_yaml,
-        args.episode_summary,
+        args.episode_results,
         args.output,
         outcome_names=args.outcome,
+        factor_names=args.factors,
         observation=args.observation,
         seed=args.seed,
     )
diff --git a/isaaclab_arena/analysis/sensitivity/plotting.py b/isaaclab_arena/analysis/sensitivity/plotting.py
index 73a4961e7b..5dd0ef2cbf 100644
--- a/isaaclab_arena/analysis/sensitivity/plotting.py
+++ b/isaaclab_arena/analysis/sensitivity/plotting.py
@@ -35,7 +35,7 @@ def plot_marginals(
     for categorical ones, wrapped into a grid.
 
     Args:
-        samples: ``(num_samples, total_factor_dim)`` posterior draws in the dataset's factor
+        samples: ``(num_samples, num_factors)`` posterior draws in the dataset's factor
             layout (continuous-first, original units), e.g. from ``SensitivityAnalyzer.sample_posterior``.
         dataset: The dataset, for the factor schema and column layout.
         observation: The outcome vector the samples were conditioned on (shown in the title).
@@ -46,7 +46,7 @@ def plot_marginals(
         The matplotlib Figure.
     """
     samples = samples.cpu().numpy()
-    factors = dataset.schema.factors
+    factors = dataset.factors
     # Wrap panels into a grid (at most 3 columns) so many factors stay readable.
     num_columns = min(3, len(factors))
     num_rows = math.ceil(len(factors) / num_columns)
@@ -86,7 +86,7 @@ def _draw_continuous_marginal(ax, factor: FactorSpec, factor_samples: np.ndarray
     than a binned histogram. Falls back to a single line at the mean when the samples have
     no spread (KDE bandwidth is then undefined).
     """
-    range_low, range_high = factor.range[0]
+    range_low, range_high = factor.range
     sample_mean = float(np.mean(factor_samples))
     if float(np.std(factor_samples)) >= 1e-9:
         grid = np.linspace(range_low, range_high, 200)
diff --git a/isaaclab_arena/tests/sensitivity_synthetic.py b/isaaclab_arena/tests/sensitivity_synthetic.py
index 056b6ef50f..b2b4392be1 100644
--- a/isaaclab_arena/tests/sensitivity_synthetic.py
+++ b/isaaclab_arena/tests/sensitivity_synthetic.py
@@ -28,7 +28,7 @@
 from dataclasses import dataclass
 
 from isaaclab_arena.analysis.sensitivity.analyzer import SensitivityAnalyzer
-from isaaclab_arena.analysis.sensitivity.dataset import FactorSchema, FactorSpec, SensitivityDataset
+from isaaclab_arena.analysis.sensitivity.dataset import FactorSpec, SensitivityDataset
 from isaaclab_arena.analysis.sensitivity.plotting import plot_marginals
 
 
@@ -50,7 +50,7 @@ def logit(self, values: torch.Tensor) -> torch.Tensor:
         return self.weight * normalized
 
     def spec(self) -> FactorSpec:
-        return FactorSpec(name=self.name, type="continuous", range=[list(self.value_range)])
+        return FactorSpec(name=self.name, type="continuous", range=self.value_range)
 
     def column(self, values: torch.Tensor) -> torch.Tensor:
         return values
@@ -105,10 +105,10 @@ def _build_dataset(
     SensitivityDataset.factor_columns expects.
     """
     ordered = sorted(factors_and_columns, key=lambda pair: isinstance(pair[0], _CategoricalFactor))
-    schema = FactorSchema(factors=[factor.spec() for factor, _ in ordered])
+    factors = [factor.spec() for factor, _ in ordered]
     theta = torch.stack([factor.column(values) for factor, values in ordered], dim=1)
     # outcome_names defaults to ("success",), matching the single binary outcome built here.
-    return SensitivityDataset(schema, theta, success.unsqueeze(1))
+    return SensitivityDataset(factors, theta, success.unsqueeze(1))
 
 
 def make_continuous_dataset(seed: int, num_episodes: int = 2000) -> SensitivityDataset:
diff --git a/isaaclab_arena/tests/test_sensitivity_analysis.py b/isaaclab_arena/tests/test_sensitivity_analysis.py
index cf6d50a799..b18692a862 100644
--- a/isaaclab_arena/tests/test_sensitivity_analysis.py
+++ b/isaaclab_arena/tests/test_sensitivity_analysis.py
@@ -18,8 +18,10 @@
 import numpy as np
 import torch
 
+import pytest
+
 from isaaclab_arena.analysis.sensitivity.analyzer import SensitivityAnalyzer
-from isaaclab_arena.analysis.sensitivity.dataset import SensitivityDataset
+from isaaclab_arena.analysis.sensitivity.episode_results_reader import dataset_from_episode_results
 from isaaclab_arena.tests.sensitivity_synthetic import (
     CAMERA_DISTANCE,
     GRASP_OFFSET,
@@ -89,64 +91,213 @@ def test_npe_recovers_two_continuous_effects():
 
 
 def _write_jsonl(path, rows: list[dict]) -> None:
-    """Write one JSON object per line to ``path``."""
+    """Write one JSON object per line to path."""
     path.write_text("\n".join(json.dumps(row) for row in rows) + "\n", encoding="utf-8")
 
 
-def test_from_files_parses_mixed_schema_and_builds_tensors(tmp_path):
-    """from_files parses a factors.yaml + episode_summary.jsonl into the expected theta / x layout."""
-    factors_yaml = tmp_path / "factors.yaml"
-    factors_yaml.write_text(
-        "factors:\n"
-        "  light_intensity:\n"
-        "    type: continuous\n"
-        "    range: [[0.0, 1000.0]]\n"
-        "  pick_up_object:\n"
-        "    type: categorical\n"
-        "    choices: [cube, can]\n",
-        encoding="utf-8",
+def test_from_episode_results_splits_vector_variation_into_scalar_factors(tmp_path):
+    """from_episode_results discovers a continuous factor per component of a vector variation draw."""
+    jsonl = tmp_path / "episode_results.jsonl"
+    _write_jsonl(
+        jsonl,
+        [
+            {"success": True, "variations": {"droid.camera_extrinsics_wrist_camera": [0.001, -0.004, 0.002]}},
+            {"success": False, "variations": {"droid.camera_extrinsics_wrist_camera": [0.003, 0.001, -0.005]}},
+        ],
+    )
+
+    dataset = dataset_from_episode_results(jsonl, outcome_names=["success"])
+
+    # A 3-vector draw becomes three continuous factors, named with a per-component suffix.
+    factors_by_name = {factor.name: factor for factor in dataset.factors}
+    expected_names = [f"droid.camera_extrinsics_wrist_camera[{i}]" for i in range(3)]
+    assert [factor.name for factor in dataset.factors] == expected_names
+    assert all(factors_by_name[name].type == "continuous" for name in expected_names)
+
+    assert dataset.theta.shape == (2, 3)
+    assert dataset.x.shape == (2, 1)
+    assert dataset.theta[:, 0].tolist() == pytest.approx([0.001, 0.003])  # first component, both episodes (float32)
+    assert dataset.x[:, 0].tolist() == [1.0, 0.0]  # success bool → 1.0 / 0.0
+
+
+def test_from_episode_results_discovers_mixed_continuous_and_categorical(tmp_path):
+    """A numeric and a string variation become a continuous and a categorical factor (choices observed)."""
+    jsonl = tmp_path / "episode_results.jsonl"
+    _write_jsonl(
+        jsonl,
+        [
+            {"success": True, "variations": {"dome.light_intensity": 250.0, "dome.hdr_image": "studio"}},
+            {"success": False, "variations": {"dome.light_intensity": 750.0, "dome.hdr_image": "sunset"}},
+            {"success": True, "variations": {"dome.light_intensity": 500.0, "dome.hdr_image": "studio"}},
+        ],
+    )
+
+    dataset = dataset_from_episode_results(jsonl, outcome_names=["success"])
+
+    factors_by_name = {factor.name: factor for factor in dataset.factors}
+    assert factors_by_name["dome.light_intensity"].type == "continuous"
+    assert factors_by_name["dome.hdr_image"].type == "categorical"
+    assert factors_by_name["dome.hdr_image"].choices == ["studio", "sunset"]  # sorted observed labels
+    # A continuous factor's range is inferred as [min, max] of the observed values.
+    assert factors_by_name["dome.light_intensity"].range == (250.0, 750.0)
+
+    # Continuous-first layout; categorical integer-coded by its index into the discovered choices.
+    assert dataset.factor_columns == {"dome.light_intensity": slice(0, 1), "dome.hdr_image": slice(1, 2)}
+    assert dataset.theta[:, 0].tolist() == [250.0, 750.0, 500.0]  # continuous column, in row order
+    assert dataset.theta[:, 1].tolist() == [0.0, 1.0, 0.0]  # studio -> 0, sunset -> 1
+    assert dataset.x[:, 0].tolist() == [1.0, 0.0, 1.0]  # success bool → 1.0 / 0.0
+    # A categorical factor selects MNPE; a continuous-only schema would select NPE.
+    assert SensitivityAnalyzer(dataset)._select_inference_class().__name__ == "MNPE"
+
+
+def test_from_episode_results_drops_constant_factors(tmp_path):
+    """A factor that took a single value across all episodes is dropped, varying factors survive."""
+    jsonl = tmp_path / "episode_results.jsonl"
+    _write_jsonl(
+        jsonl,
+        [
+            {"success": True, "variations": {"light_intensity": 250.0, "always_5": 5.0, "hdr": "only_one"}},
+            {"success": False, "variations": {"light_intensity": 750.0, "always_5": 5.0, "hdr": "only_one"}},
+        ],
+    )
+
+    dataset = dataset_from_episode_results(jsonl, outcome_names=["success"])
+
+    # The constant continuous (always_5) and constant categorical (hdr) are dropped; only the varying one remains.
+    assert [factor.name for factor in dataset.factors] == ["light_intensity"]
+    assert dataset.theta.shape == (2, 1)
+
+
+def test_from_episode_results_warns_on_imbalanced_categorical(tmp_path, capsys):
+    """An unevenly sampled categorical warns, since its posterior would track the sampling frequency."""
+    jsonl = tmp_path / "episode_results.jsonl"
+    _write_jsonl(
+        jsonl,
+        [
+            {"success": True, "variations": {"hdr": "a"}},
+            {"success": False, "variations": {"hdr": "a"}},
+            {"success": True, "variations": {"hdr": "a"}},
+            {"success": False, "variations": {"hdr": "b"}},  # a:b sampled 3:1
+        ],
+    )
+
+    dataset_from_episode_results(jsonl, outcome_names=["success"])
+
+    assert "sampled unevenly" in capsys.readouterr().out
+
+
+def test_from_episode_results_raises_when_all_factors_constant(tmp_path):
+    """If every factor took a single value there is nothing to analyze, so building raises."""
+    jsonl = tmp_path / "episode_results.jsonl"
+    _write_jsonl(
+        jsonl,
+        [
+            {"success": True, "variations": {"hdr": "only_one"}},
+            {"success": False, "variations": {"hdr": "only_one"}},
+        ],
+    )
+
+    with pytest.raises(AssertionError, match="constant"):
+        dataset_from_episode_results(jsonl, outcome_names=["success"])
+
+
+def test_from_episode_results_treats_bool_variation_as_categorical(tmp_path):
+    """A boolean variation draw becomes a categorical factor labelled by str(value)."""
+    jsonl = tmp_path / "episode_results.jsonl"
+    _write_jsonl(
+        jsonl,
+        [
+            {"success": True, "variations": {"distractor_present": True}},
+            {"success": False, "variations": {"distractor_present": False}},
+        ],
     )
-    jsonl = tmp_path / "episode_summary.jsonl"
+
+    dataset = dataset_from_episode_results(jsonl, outcome_names=["success"])
+
+    factor = dataset.factors[0]
+    assert factor.type == "categorical"
+    assert factor.choices == ["False", "True"]  # sorted str labels
+    assert dataset.theta[:, 0].tolist() == [1.0, 0.0]  # "True" -> 1, "False" -> 0 (index in sorted choices)
+
+
+def test_from_episode_results_rejects_inconsistent_factor_set(tmp_path):
+    """Every episode must record the same variations, so a row with a different factor set raises."""
+    jsonl = tmp_path / "episode_results.jsonl"
     _write_jsonl(
         jsonl,
         [
-            {"arena_env_args": {"light_intensity": 250.0, "pick_up_object": "cube"}, "outcomes": {"success": 1}},
-            {"arena_env_args": {"light_intensity": 750.0, "pick_up_object": "can"}, "outcomes": {"success": 0}},
-            {"arena_env_args": {"light_intensity": 500.0, "pick_up_object": "cube"}, "outcomes": {"success": 1}},
+            {"success": True, "variations": {"light_intensity": 250.0}},
+            {"success": False, "variations": {"light_intensity": 750.0, "extra": 1.0}},  # new factor mid-stream
         ],
     )
 
-    dataset = SensitivityDataset.from_files(factors_yaml, jsonl, outcome_names=["success"])
-
-    # Schema parsed with the declared structure.
-    factors_by_name = {factor.name: factor for factor in dataset.schema.factors}
-    assert factors_by_name["light_intensity"].type == "continuous"
-    assert factors_by_name["light_intensity"].range == [(0.0, 1000.0)]
-    assert factors_by_name["pick_up_object"].type == "categorical"
-    assert factors_by_name["pick_up_object"].choices == ["cube", "can"]
-
-    # Continuous-first theta layout; categorical integer-coded by its index into choices.
-    assert dataset.theta.shape == (3, 2)
-    assert dataset.x.shape == (3, 1)
-    assert dataset.factor_columns == {"light_intensity": slice(0, 1), "pick_up_object": slice(1, 2)}
-    assert dataset.theta[:, 0].tolist() == [250.0, 750.0, 500.0]
-    assert dataset.theta[:, 1].tolist() == [0.0, 1.0, 0.0]  # cube -> 0, can -> 1
-    assert dataset.x[:, 0].tolist() == [1.0, 0.0, 1.0]
-
-
-def test_from_files_infers_missing_continuous_range(tmp_path):
-    """A continuous factor with no declared range gets [min, max] inferred from the observed values."""
-    factors_yaml = tmp_path / "factors.yaml"
-    factors_yaml.write_text("factors:\n  light_intensity:\n    type: continuous\n", encoding="utf-8")
-    jsonl = tmp_path / "episode_summary.jsonl"
+    with pytest.raises(AssertionError, match="same variations"):
+        dataset_from_episode_results(jsonl, outcome_names=["success"])
+
+
+def test_from_episode_results_rejects_non_numeric_vector_component(tmp_path):
+    """A vector variation with a non-numeric component is rejected."""
+    jsonl = tmp_path / "episode_results.jsonl"
+    _write_jsonl(
+        jsonl,
+        [{"success": True, "variations": {"pose": [0.1, "oops", 0.2]}}],
+    )
+
+    with pytest.raises(AssertionError, match="non-numeric"):
+        dataset_from_episode_results(jsonl, outcome_names=["success"])
+
+
+def test_from_episode_results_selects_factor_subset(tmp_path):
+    """factor_names restricts the analysis to the named variations, a vector keeps all components."""
+    jsonl = tmp_path / "episode_results.jsonl"
+    _write_jsonl(
+        jsonl,
+        [
+            {"success": True, "variations": {"light_intensity": 250.0, "hdr": "studio", "wrist": [0.1, 0.2]}},
+            {"success": False, "variations": {"light_intensity": 750.0, "hdr": "sunset", "wrist": [0.3, 0.4]}},
+        ],
+    )
+
+    dataset = dataset_from_episode_results(jsonl, outcome_names=["success"], factor_names=["light_intensity", "wrist"])
+
+    # hdr is excluded; the selected vector is still split into one factor per component.
+    assert [factor.name for factor in dataset.factors] == ["light_intensity", "wrist[0]", "wrist[1]"]
+
+
+def test_from_episode_results_rejects_unknown_factor_name(tmp_path):
+    """Requesting a factor that wasn't recorded raises with the available names listed."""
+    jsonl = tmp_path / "episode_results.jsonl"
     _write_jsonl(
         jsonl,
         [
-            {"arena_env_args": {"light_intensity": 30.0}, "outcomes": {"success": 0}},
-            {"arena_env_args": {"light_intensity": 90.0}, "outcomes": {"success": 1}},
+            {"success": True, "variations": {"light_intensity": 250.0}},
+            {"success": False, "variations": {"light_intensity": 750.0}},
         ],
     )
 
-    dataset = SensitivityDataset.from_files(factors_yaml, jsonl, outcome_names=["success"])
+    with pytest.raises(AssertionError, match="not found"):
+        dataset_from_episode_results(jsonl, outcome_names=["success"], factor_names=["nonexistent"])
+
+
+def test_from_episode_results_rejects_non_dict_variations(tmp_path):
+    """A null / non-object variations block fails clearly rather than as an AttributeError."""
+    jsonl = tmp_path / "episode_results.jsonl"
+    _write_jsonl(jsonl, [{"success": True, "variations": None}])
+
+    with pytest.raises(AssertionError, match="not a JSON object"):
+        dataset_from_episode_results(jsonl, outcome_names=["success"])
+
+
+def test_from_episode_results_rejects_non_numeric_outcome(tmp_path):
+    """A non-numeric outcome value fails with row context, not a bare cast error."""
+    jsonl = tmp_path / "episode_results.jsonl"
+    _write_jsonl(
+        jsonl,
+        [
+            {"success": "yes", "variations": {"light_intensity": 250.0}},
+            {"success": "no", "variations": {"light_intensity": 750.0}},
+        ],
+    )
 
-    assert dataset.schema.factors[0].range == [(30.0, 90.0)]
+    with pytest.raises(AssertionError, match="must be numeric or boolean"):
+        dataset_from_episode_results(jsonl, outcome_names=["success"])