diff --git a/docs/pages/concepts/policy/concept_sensitivity_analysis.rst b/docs/pages/concepts/policy/concept_sensitivity_analysis.rst index 82793670f8..9cb30b882c 100644 --- a/docs/pages/concepts/policy/concept_sensitivity_analysis.rst +++ b/docs/pages/concepts/policy/concept_sensitivity_analysis.rst @@ -10,7 +10,7 @@ rate) and renders one figure summarising which factor values are associated with Two distinct ideas are at work. *Joint* means all factors are modelled together rather than one at a time, which is what captures interactions and confounds (see the next section). *Posterior* means the result is conditioned on the outcome: starting from the prior — the -factor values the sweep actually drew, uniform over the declared ranges — it reweights them +factor values the sweep actually drew, uniform over their observed ranges — it reweights them by how often each led to the chosen outcome. So the figure answers *given success, which factor values were in play?*, not merely *how were the factors distributed in the sweep?* @@ -39,54 +39,49 @@ How it works The toolbox is a thin analysis layer over `sbi `_'s neural posterior estimators. The flow is: -1. **Per-episode input.** The analysis reads an ``episode_summary.jsonl`` — one row per - episode, holding that episode's factor values and outcomes. -2. **Schema.** A ``factors.yaml`` declares the *factors* — which ``arena_env_args`` columns - were varied and whether each is continuous or categorical, plus the continuous ranges - that were swept (so the analyzer's prior matches the simulation). It does **not** list - outcomes — *which* outcome to condition on is chosen at analysis time, not saved here. -3. **Inference.** ``SensitivityAnalyzer`` loads the pair, trains an estimator on the full - ``(theta, x)`` jointly — sbi's terms for the factor values (``theta``) and the per-episode - outcomes (``x``) — and samples the joint posterior conditioned on a chosen observation - (by default, success). +1. **Per-episode input.** The analysis reads a single ``episode_results.jsonl`` — one row per + episode, holding that episode's recorded variation draws and outcomes. +2. **Schema discovery.** The factors are discovered from the data: each entry in a row's + ``variations`` block becomes a factor — a number is continuous, a numeric vector splits into + one continuous factor per component, and a string is categorical (its choices are the labels + observed across the sweep). Continuous ranges are taken from the data's min/max. There is no + schema file to author; *which* outcome to condition on is chosen at analysis time. +3. **Inference.** ``SensitivityAnalyzer`` trains an estimator on the full ``(theta, x)`` jointly + — sbi's terms for the factor values (``theta``) and the per-episode outcomes (``x``) — and + samples the joint posterior conditioned on a chosen observation (by default, success). 4. **Report.** A probability density curve for each continuous factor and a probability bar chart for each categorical factor. .. todo:: - The eval-runner writer (``episode_writer``) that emits ``episode_summary.jsonl`` during - evaluation is not part of this version — it lands in a follow-up. For now, run the analysis - on synthetic data (see below) or on a JSONL produced externally. + The per-episode recorder that emits ``episode_results.jsonl`` during evaluation lands in a + follow-up. For now, run the analysis on synthetic data (see below) or on a JSONL produced + externally. -Inputs ------- +Input +----- -**factors.yaml** declares only the factors that were varied (and the continuous ranges that -were swept). Outcomes are not declared here — they're selected at analysis time (see below): - -.. code-block:: yaml - - factors: - light_intensity: - type: continuous - range: [[0.0, 5000.0]] # the swept range; inferred from the data's min/max if omitted - table_material: - type: categorical - choices: [oak, walnut, bamboo] - -**episode_summary.jsonl** holds one JSON object per episode. It carries every measured -outcome; the analysis picks which one(s) to condition on: +The analysis reads a single ``episode_results.jsonl`` written by the per-episode recorder — +one JSON object per episode. Each row's ``variations`` block holds the sampled factor draws, +and the top-level fields named by ``--outcome`` hold the outcomes (any other top-level fields +are ignored): .. code-block:: json - {"job_name": "pi0_sweep", "episode_idx": 0, - "arena_env_args": {"light_intensity": 3200.0, "table_material": "oak"}, - "outcomes": {"success": 1}} + {"job_name": "pi0_sweep", "episode_in_env": 0, "success": true, + "variations": {"light_intensity": 3200.0, "table_material": "oak", + "wrist_camera": [0.01, -0.02, 0.0]}} + +The factor schema is discovered from these values, so there is no separate schema file: a +number becomes a continuous factor, a numeric vector splits into one continuous factor per +component (named ``key[0]``, ``key[1]``, …), and a string becomes a categorical factor whose +choices are the labels observed across the sweep. A factor that took a single value across +all episodes carries no information and is dropped. Choice of estimator ------------------- -``SensitivityAnalyzer`` picks the estimator from the schema automatically: +``SensitivityAnalyzer`` picks the estimator from the discovered factors automatically: .. list-table:: :header-rows: 1 @@ -111,22 +106,22 @@ conditions on success (1). Running a report ---------------- -Point the report generator at a ``(factors.yaml, episode_summary.jsonl)`` pair. The output -format follows the file extension (``.png``, ``.pdf``, …); reports are written under -``eval/`` by default. +Point the report generator at an ``episode_results.jsonl``. The output format follows the +file extension (``.png``, ``.pdf``, …); reports are written under ``eval/`` by default. .. code-block:: bash python -m isaaclab_arena.analysis.sensitivity.generate_report \ - --factors_yaml factors.yaml \ - --episode_summary episode_summary.jsonl \ + --episode_results episode_results.jsonl \ --outcome success \ --output eval/sensitivity_report.png -``--outcome`` selects which per-episode outcome(s) to condition on (keys in the rows' -``outcomes`` block); it defaults to ``success``. Pass ``--observation`` to set the value -per outcome — since outcomes are binary, use ``1`` for success or ``0`` for failure; it -defaults to ``1`` (success). +``--outcome`` selects which per-episode outcome(s) to condition on (top-level field(s) in +each row); it defaults to ``success``. Pass ``--observation`` to set the value per outcome — +since outcomes are binary, use ``1`` for success or ``0`` for failure; it defaults to ``1`` +(success). ``--factors`` restricts the analysis to a subset of the recorded variations (by +their ``variations``-block names; a vector variation keeps all its components); by default +every recorded variation is analyzed. Trying it on synthetic data --------------------------- @@ -162,12 +157,18 @@ Current scope - Outcomes are treated as **binary** (0/1). Conditioning defaults to success; a continuous outcome is rejected with a clear error rather than silently averaged. -- Continuous **vector** factors (``dim > 1``) are reserved for a future extension. The likely - approach is to record scalar reductions (e.g. a norm or distance-to-reference) alongside the - raw vector, so a pose or RGB factor becomes one or more analysable scalar columns. +- A **vector** variation draw (e.g. a camera pose offset) is split into one scalar factor per + component (``key[0]``, ``key[1]``, …), each analysed independently. Components are named by + position; semantic names (e.g. a camera's lateral vs. depth axis) are a future extension. +- **Factors should be drawn from the prior** the analyzer assumes — uniform over each + continuous range, and an equal number of episodes per categorical choice. The posterior is + taken relative to how the sweep drew the factors, so uneven sampling leaks in: a factor with + no real effect comes out flat only if it was sampled flat, otherwise its posterior tracks the + sampling frequency. The analyzer warns when a categorical is sampled unevenly, but the clean + fix is to balance the draws in the sweep. - The estimators run on CPU and do not require Isaac Sim, so a report can be generated anywhere the evaluation JSONL is available. -- The analysis assumes the ``episode_summary.jsonl`` is a single coherent slice — one +- The analysis assumes the ``episode_results.jsonl`` is a single coherent slice — one policy, task, and embodiment. **TODO:** add a filter (in the spirit of robolab's ``--filter-policy`` / ``--filter-task``) to select that slice from a larger JSONL, rather than relying on the caller to pre-filter it. diff --git a/isaaclab_arena/analysis/sensitivity/analyzer.py b/isaaclab_arena/analysis/sensitivity/analyzer.py index cca176797a..6ec0946512 100644 --- a/isaaclab_arena/analysis/sensitivity/analyzer.py +++ b/isaaclab_arena/analysis/sensitivity/analyzer.py @@ -35,19 +35,19 @@ class SensitivityAnalyzer: def __init__(self, dataset: SensitivityDataset): self.dataset = dataset self.posterior = None - continuous_factors = [factor for factor in dataset.schema.factors if factor.type == "continuous"] + continuous_factors = [factor for factor in dataset.factors if factor.type == "continuous"] # theta is laid out continuous-first then categorical — built that way by - # SensitivityDataset and defined by FactorSchema.factor_columns — so the leading + # SensitivityDataset and defined by its factor_columns — so the leading # self._num_continuous columns are the continuous factors that _normalize/_denormalize slice. self._num_continuous = len(continuous_factors) for factor in continuous_factors: assert factor.range is not None, ( - f"Continuous factor {factor.name!r} has no range to normalize against. Declare a" - " range in factors.yaml, or build the dataset via from_files()/from_file() so the" - " range is inferred from the data before constructing the analyzer." + f"Continuous factor {factor.name!r} has no range to normalize against. Set a range on" + " the FactorSpec, or build the dataset via dataset_from_episode_results() so the range is" + " inferred from the data before constructing the analyzer." ) - self._continuous_low = torch.tensor([factor.range[0][0] for factor in continuous_factors]) - self._continuous_high = torch.tensor([factor.range[0][1] for factor in continuous_factors]) + self._continuous_low = torch.tensor([factor.range[0] for factor in continuous_factors]) + self._continuous_high = torch.tensor([factor.range[1] for factor in continuous_factors]) def _select_inference_class(self): """Choose the sbi inference class for this schema. @@ -61,7 +61,7 @@ def _normalized_prior(self): """Uniform prior matching the normalized theta: continuous dims [0, 1], categoricals [0, k-1].""" low_bounds = [0.0] * self._num_continuous high_bounds = [1.0] * self._num_continuous - for factor in self.dataset.schema.factors: + for factor in self.dataset.factors: if factor.type == "categorical": low_bounds.append(0.0) high_bounds.append(float(len(factor.choices) - 1)) @@ -98,7 +98,7 @@ def sample_posterior(self, observation: torch.Tensor | None = None, num_samples: """Sample the joint posterior over all factors at observation. Defaults to the dataset's default observation (condition on success). Returns a - (num_samples, total_factor_dim) tensor laid out like theta — continuous columns first + (num_samples, num_factors) tensor laid out like theta — continuous columns first (in original, denormalized units), then integer-coded categorical columns. """ assert self.posterior is not None, "Call fit() before sampling the posterior" diff --git a/isaaclab_arena/analysis/sensitivity/dataset.py b/isaaclab_arena/analysis/sensitivity/dataset.py index c4bac0a610..5ac893c953 100644 --- a/isaaclab_arena/analysis/sensitivity/dataset.py +++ b/isaaclab_arena/analysis/sensitivity/dataset.py @@ -5,12 +5,9 @@ from __future__ import annotations -import json import torch -import yaml from dataclasses import dataclass from enum import Enum -from pathlib import Path class FactorType(str, Enum): @@ -22,174 +19,72 @@ class FactorType(str, Enum): @dataclass class FactorSpec: - """One factor's schema as declared in factors.yaml. + """One varied input — a lighting level, a camera-offset axis, a background choice, and so on. - Continuous factors carry a range (one [low, high] pair per dim); categorical - factors carry choices (a list of string labels, integer-encoded by index in theta). + Each factor occupies one column of the dataset's factor matrix theta (see SensitivityDataset). + A continuous factor carries a range, the (low, high) it was swept over. A categorical factor + carries choices, the string labels it took, integer-encoded by their index in that column. """ name: str type: FactorType - dim: int = 1 - range: list[tuple[float, float]] | None = None # one (low, high) pair per dim, continuous only + range: tuple[float, float] | None = None # (low, high), continuous only choices: list[str] | None = None # categorical only def __post_init__(self) -> None: # Accept the raw string form (from YAML / callers) and normalize to the enum. self.type = FactorType(self.type) - # Normalize each (low, high) pair to a tuple (YAML/JSON deliver them as lists). + # JSON/YAML deliver the range as a list; normalize it to a tuple. if self.range is not None: - self.range = [tuple(pair) for pair in self.range] - - -@dataclass -class FactorSchema: - """Parsed factors.yaml — the list of factors that were varied. - - The schema describes what *can* vary (continuous vs categorical, range/choices), not the - values taken in any given episode. Outcomes are not part of the schema; which outcome to - condition on is chosen at analysis time. - """ - - factors: list[FactorSpec] - - @classmethod - def from_yaml(cls, path: str | Path) -> FactorSchema: - """Load a factors.yaml from disk into a typed FactorSchema. - - The YAML has one top-level block, factors (one entry per varied input). Each factor's - type must be continuous or categorical. - """ - # TODO: add a robolab-style filter (e.g. select rows by policy/task/embodiment) so a - # single episode_summary.jsonl can be sliced to one coherent (policy, task, embodiment) - # before analysis, instead of assuming the caller pre-filtered it. - with open(path, encoding="utf-8") as yaml_file: - yaml_data = yaml.safe_load(yaml_file) - assert isinstance(yaml_data, dict), f"factors.yaml at {path} must be a mapping at top level" - assert "factors" in yaml_data, f"factors.yaml at {path} is missing top-level `factors:` block" - - factors: list[FactorSpec] = [] - for factor_name, factor_block in yaml_data["factors"].items(): - assert "type" in factor_block, ( - f"factors.yaml at {path} factor {factor_name!r} is missing required `type:` field" - " (expected 'continuous' or 'categorical')" - ) - factor_type = factor_block["type"] - assert factor_type in ("continuous", "categorical"), ( - f"factors.yaml at {path} factor {factor_name!r} has unknown type {factor_type!r};" - " expected 'continuous' or 'categorical'" - ) - factors.append( - FactorSpec( - name=factor_name, - type=factor_type, - dim=factor_block.get("dim", 1), - range=factor_block.get("range"), - choices=factor_block.get("choices"), - ) - ) - - return cls(factors=factors) - - @property - def total_factor_dim(self) -> int: - """Total width of theta — sum of dim over continuous factors plus 1 per categorical.""" - return sum(factor.dim if factor.type == "continuous" else 1 for factor in self.factors) - - @property - def factor_columns(self) -> dict[str, slice]: - """Map factor name → its column slice in theta. - - Continuous factors occupy the leading columns (dim each), then each categorical - factor occupies one trailing column. This continuous-first layout is what sbi's - mixed density estimator expects. - """ - continuous_factors = [factor for factor in self.factors if factor.type == "continuous"] - categorical_factors = [factor for factor in self.factors if factor.type == "categorical"] - column_slices: dict[str, slice] = {} - column_index = 0 - for factor in continuous_factors + categorical_factors: - column_width = factor.dim if factor.type == "continuous" else 1 - column_slices[factor.name] = slice(column_index, column_index + column_width) - column_index += column_width - return column_slices + self.range = tuple(self.range) class SensitivityDataset: - """A FactorSchema paired with its per-episode theta (factors) and x (outcomes). + """The varied factors paired with their per-episode values (theta) and outcomes (x). - The object is a pure container: it holds the schema and the two tensors, and exposes - the prior and column layouts an analyzer consumes. It can be built two ways: - - - from_files — parse a factors.yaml / episode_summary.jsonl pair - (the path eval runs take). - - the constructor — wrap in-memory tensors directly (what a synthetic simulator or - a unit test takes). The tensors must already be in the layout factor_columns - describes: continuous columns first, then one integer-coded column per categorical. + theta is the factor matrix: one row per episode, one column per factor — continuous factors + in the leading columns, then one integer-coded column per categorical factor. x is the + matching outcome matrix, one row per episode and one column per outcome. The object is a pure + in-memory container (the factor list plus the two tensors) and exposes the column layout an + analyzer reads. """ def __init__( self, - schema: FactorSchema, + factors: list[FactorSpec], theta: torch.Tensor, x: torch.Tensor, outcome_names: list[str] | tuple[str, ...] = ("success",), ): - """Wrap an in-memory schema plus its theta / x tensors, validating shapes. + """Wrap an in-memory factor list plus its theta / x tensors, validating shapes. Args: - schema: The parsed factor schema. Continuous factors must carry a range; - categorical factors must carry choices. - theta: (num_episodes, total_factor_dim) factor matrix, continuous-first. + factors: The varied factors, one per theta column. A continuous factor must carry a + range, a categorical factor must carry choices. + theta: (num_episodes, num_factors) factor matrix, continuous-first. x: (num_episodes, num_outcomes) outcome matrix. outcome_names: Name of each outcome column in x, in order (used for plot labels). """ assert theta.ndim == 2 and x.ndim == 2, f"theta and x must be 2D; got {theta.shape} and {x.shape}" assert theta.shape[0] == x.shape[0], f"theta/x row counts disagree: {theta.shape[0]} vs {x.shape[0]}" assert theta.shape[0] > 0, "Dataset is empty (no episodes)" - assert ( - theta.shape[1] == schema.total_factor_dim - ), f"theta has {theta.shape[1]} columns but schema declares {schema.total_factor_dim} factor dims" + assert theta.shape[1] == len( + factors + ), f"theta has {theta.shape[1]} columns but there are {len(factors)} factor(s) (one column each)" assert x.shape[1] == len( outcome_names ), f"x has {x.shape[1]} columns but {len(outcome_names)} outcome name(s) were given" - self.schema = schema + self.factors = factors self.outcome_names = list(outcome_names) self._theta = theta self._x = x - @classmethod - def from_files( - cls, - factors_yaml: str | Path, - jsonl_path: str | Path, - outcome_names: list[str] | tuple[str, ...] = ("success",), - ) -> SensitivityDataset: - """Build a dataset from a factors.yaml schema and an episode_summary.jsonl. - - Parses and validates both, infers any missing continuous range from the data, and - assembles the theta / x tensors in the layout the analyzers expect. ``outcome_names`` - selects which per-episode outcome columns to condition on (the analysis-time query). - """ - schema = FactorSchema.from_yaml(factors_yaml) - - jsonl_text = Path(jsonl_path).read_text(encoding="utf-8") - rows = [json.loads(line) for line in jsonl_text.splitlines() if line.strip()] - assert len(rows) > 0, f"Empty episode_summary.jsonl at {jsonl_path}" - - _validate_rows(schema, rows, outcome_names, jsonl_path) - _infer_missing_factor_ranges(schema, rows) - - theta = _build_factor_tensor(schema, rows) - x = _build_outcome_tensor(rows, outcome_names) - return cls(schema, theta, x, outcome_names) - @property def theta(self) -> torch.Tensor: - """(num_episodes, total_factor_dim) matrix of factor values, one row per episode. + """(num_episodes, num_factors) matrix of factor values, one row per episode. - This is the "input" sbi infers a posterior over. Column layout is given by - factor_columns — continuous factors first, then categoricals (integer-coded). + The column layout is given by factor_columns, continuous factors first then categoricals + (integer-coded). """ return self._theta @@ -197,8 +92,7 @@ def theta(self) -> torch.Tensor: def x(self) -> torch.Tensor: """(num_episodes, num_outcomes) matrix of outcome values, one row per episode. - This is what the analyzer conditions queries on — "what factor values were consistent - with observing these outcomes?". Columns are named by ``outcome_names``. + Columns are named by outcome_names. These are the values a query conditions on. """ return self._x @@ -209,15 +103,19 @@ def num_episodes(self) -> int: @property def factor_columns(self) -> dict[str, slice]: - """Map factor name → its column slice in theta. Same as schema.factor_columns.""" - return self.schema.factor_columns + """Map each factor name to its single-column slice in theta. + + Continuous factors take the leading columns, then categoricals. Each factor is one column. + """ + continuous = [factor for factor in self.factors if factor.type == "continuous"] + categorical = [factor for factor in self.factors if factor.type == "categorical"] + return {factor.name: slice(index, index + 1) for index, factor in enumerate(continuous + categorical)} def default_observation(self) -> torch.Tensor: - """The default outcome vector to condition a query on: success (1) for every outcome. + """The outcome vector a query conditions on by default: success (1) for every outcome. - Outcomes are binary (0/1) in the current scope, so the natural default query is - "what produced success?". Asserts the outcomes are binary, so adding a continuous - outcome later fails loudly here instead of silently conditioning on a meaningless value. + Outcomes are binary (0/1), so the natural query is what produced success. The assertion + keeps a continuous outcome from being used here silently. """ is_binary = set(self._x.flatten().tolist()).issubset({0.0, 1.0}) assert is_binary, "default_observation assumes binary (0/1) outcomes; pass an explicit observation otherwise." @@ -225,103 +123,5 @@ def default_observation(self) -> torch.Tensor: @property def has_categorical_factors(self) -> bool: - """True iff the schema declares at least one categorical factor.""" - return any(factor.type == "categorical" for factor in self.schema.factors) - - -def _validate_rows( - schema: FactorSchema, rows: list[dict], outcome_names: list[str] | tuple[str, ...], jsonl_path: str | Path -) -> None: - """Assert every JSONL row carries the declared factor keys and the requested outcome keys. - - The declared names need only be a subset of each row's arena_env_args / outcomes; - extra keys are ignored. Raises pointing at the first offending row. - """ - expected_factor_names = {factor.name for factor in schema.factors} - expected_outcome_names = set(outcome_names) - for row_index, row in enumerate(rows): - assert ( - "arena_env_args" in row and "outcomes" in row - ), f"Row {row_index} of {jsonl_path} missing arena_env_args/outcomes block" - missing_factor_names = expected_factor_names - set(row["arena_env_args"].keys()) - assert not missing_factor_names, ( - f"Row {row_index} of {jsonl_path} is missing factor(s) " - f"{sorted(missing_factor_names)} from its arena_env_args block; " - f"factors.yaml declares: {sorted(expected_factor_names)}" - ) - missing_outcome_names = expected_outcome_names - set(row["outcomes"].keys()) - assert ( - not missing_outcome_names - ), f"Row {row_index} of {jsonl_path} missing outcomes {sorted(missing_outcome_names)}" - - -def _infer_missing_factor_ranges(schema: FactorSchema, rows: list[dict]) -> None: - """Fill any continuous factor's missing range from the observed min/max. - - A range declared in factors.yaml takes precedence and is left untouched. - """ - for factor in schema.factors: - if factor.type != "continuous" or factor.range is not None: - continue - if factor.dim != 1: - raise NotImplementedError( - "Range inference for vector factors (dim > 1) is not implemented;" - f" factor {factor.name!r} has dim={factor.dim}" - ) - observed_values = [float(row["arena_env_args"][factor.name]) for row in rows] - factor.range = [(min(observed_values), max(observed_values))] - - -def _build_factor_tensor(schema: FactorSchema, rows: list[dict]) -> torch.Tensor: - """Assemble the per-episode factor matrix theta. - - Continuous columns first (one per dim), then one column per categorical factor with its - value integer-coded as a float32 index into FactorSpec.choices. - """ - continuous_factors = [factor for factor in schema.factors if factor.type == "continuous"] - categorical_factors = [factor for factor in schema.factors if factor.type == "categorical"] - - factor_columns: list[torch.Tensor] = [] - - # Continuous columns come first (sbi MNPE convention). - for factor in continuous_factors: - if factor.dim != 1: - raise NotImplementedError( - "Vector continuous factors (dim > 1) are not yet supported;" - f" factor {factor.name!r} has dim={factor.dim}" - ) - raw_values = [float(row["arena_env_args"][factor.name]) for row in rows] - factor_column = torch.tensor(raw_values, dtype=torch.float32).unsqueeze(1) - factor_columns.append(factor_column) - - # Categorical columns: integer-code each string value as its index in FactorSpec.choices. - for factor in categorical_factors: - assert ( - factor.choices is not None and len(factor.choices) > 0 - ), f"Categorical factor {factor.name!r} has no `choices:` block in factors.yaml" - choice_to_code = {choice: code for code, choice in enumerate(factor.choices)} - category_codes: list[int] = [] - for row_index, row in enumerate(rows): - value = row["arena_env_args"][factor.name] - assert ( - value in choice_to_code - ), f"Row {row_index} factor {factor.name!r} has value {value!r} not in declared choices {factor.choices}" - category_codes.append(choice_to_code[value]) - factor_column = torch.tensor(category_codes, dtype=torch.float32).unsqueeze(1) - factor_columns.append(factor_column) - - if factor_columns: - return torch.cat(factor_columns, dim=1) - return torch.zeros((len(rows), 0), dtype=torch.float32) - - -def _build_outcome_tensor(rows: list[dict], outcome_names: list[str] | tuple[str, ...]) -> torch.Tensor: - """Assemble the per-episode outcome matrix x (one column per requested outcome). - - Each outcome value is cast to float; bool outcomes become 0.0/1.0. - """ - outcome_column_tensors = [ - torch.tensor([float(row["outcomes"][name]) for row in rows], dtype=torch.float32).unsqueeze(1) - for name in outcome_names - ] - return torch.cat(outcome_column_tensors, dim=1) + """True iff at least one factor is categorical.""" + return any(factor.type == "categorical" for factor in self.factors) diff --git a/isaaclab_arena/analysis/sensitivity/episode_results_reader.py b/isaaclab_arena/analysis/sensitivity/episode_results_reader.py new file mode 100644 index 0000000000..e4ba7b1a77 --- /dev/null +++ b/isaaclab_arena/analysis/sensitivity/episode_results_reader.py @@ -0,0 +1,268 @@ +# Copyright (c) 2026, The Isaac Lab Arena Project Developers (https://github.com/isaac-sim/IsaacLab-Arena/blob/main/CONTRIBUTORS.md). +# All rights reserved. +# +# SPDX-License-Identifier: Apache-2.0 + +"""Read an episode_results.jsonl (the per-episode recorder's output) into a SensitivityDataset. + +This module is the only place that knows the recorder's on-disk format, so dataset.py stays a +pure in-memory container. +""" + +from __future__ import annotations + +import json +import torch +from pathlib import Path +from typing import Any + +from isaaclab_arena.analysis.sensitivity.dataset import FactorSpec, FactorType, SensitivityDataset + +_IMBALANCE_WARN_RATIO = 1.5 +"""Warn when a categorical's most-sampled choice exceeds its least-sampled one by at least this factor.""" + + +def dataset_from_episode_results( + jsonl_path: str | Path, + outcome_names: list[str] | tuple[str, ...] = ("success",), + factor_names: list[str] | tuple[str, ...] | None = None, +) -> SensitivityDataset: + """Build a SensitivityDataset from an episode_results.jsonl, discovering the factors from the data. + + Each line is one episode. The variations block holds the sampled factor draws, and the + top-level fields named by outcome_names hold the outcomes. Other top-level fields are ignored. + A number becomes a continuous factor, a numeric vector becomes one continuous factor per + component (named key[i]), and a string becomes a categorical factor over its observed labels. + + Example line, one vector and one string factor: + + {"success": true, + "variations": {"wrist_camera": [0.01, -0.02, 0.0], "hdr_image": "sunset"}} + + Args: + jsonl_path: Path to the episode_results.jsonl, one JSON object per line. + outcome_names: Top-level field(s) per line to use as outcomes. + factor_names: Which recorded variations to analyze, by their variations-block name. A + vector is selected by its base name and keeps every component. None analyzes all. + + Returns: + A SensitivityDataset whose theta / x use the continuous-first layout the analyzers read. + """ + rows = _read_rows(jsonl_path) + factor_kinds, factor_values, factor_order = _discover_factor_values(rows, outcome_names, jsonl_path, factor_names) + factors, theta = _build_factor_columns(factor_kinds, factor_values, factor_order, jsonl_path) + x = _build_outcome_columns(rows, outcome_names, jsonl_path) + return SensitivityDataset(factors, theta, x, outcome_names) + + +def _read_rows(jsonl_path: str | Path) -> list[dict]: + """Parse the JSONL file into a non-empty list of episode records.""" + jsonl_text = Path(jsonl_path).read_text(encoding="utf-8") + rows = [json.loads(line) for line in jsonl_text.splitlines() if line.strip()] + assert len(rows) > 0, f"Empty episode_results.jsonl at {jsonl_path}" + return rows + + +def _flatten_variation_value( + key: str, value: Any, row_index: int, jsonl_path: str | Path +) -> list[tuple[str, float | str]]: + """Turn one recorded variation draw into (factor_name, scalar) pairs. + + A numeric vector becomes one pair per component, each named key[i]. A bare number or string + becomes a single pair under key. A bool is treated as a categorical label rather than a 0/1 + number. + + Args: + key: The variation name, asset.variation. + value: The recorded draw for one episode. + row_index: Source row index, used in error messages. + jsonl_path: Source path, used in error messages. + + Returns: + The (factor_name, scalar) pairs this draw contributes. + """ + assert isinstance(value, (bool, int, float, str, list, tuple)), ( + f"Variation {key!r} in row {row_index} of {jsonl_path} has unsupported value type " + f"{type(value).__name__}: {value!r}. Expected a number, string, or numeric vector." + ) + # bool is an int subclass, so check it before int/float and keep it categorical. + if isinstance(value, bool): + return [(key, str(value))] + if isinstance(value, (int, float)): + return [(key, float(value))] + if isinstance(value, str): + return [(key, value)] + # list / tuple → one continuous scalar factor per component. + # TODO(cvolk): components are named with an opaque positional suffix (key[0], key[1], ...), + # so plots can't tell e.g. a camera's lateral axis from its depth axis. Follow-up PR: have + # the recorder emit semantic component names (e.g. camera ROS frame x_right/y_down/z_forward) + # rather than a bare vector, so the labels flow through this generic reader unchanged. + assert len(value) > 0, f"Variation {key!r} in row {row_index} of {jsonl_path} is an empty list." + pairs: list[tuple[str, float | str]] = [] + for component_index, component in enumerate(value): + assert isinstance(component, (int, float)) and not isinstance(component, bool), ( + f"Variation {key!r} in row {row_index} of {jsonl_path} is a vector with a non-numeric " + f"component at index {component_index}: {component!r}. Vector variations must be all-numeric." + ) + pairs.append((f"{key}[{component_index}]", float(component))) + return pairs + + +def _discover_factor_values( + rows: list[dict], + outcome_names: list[str] | tuple[str, ...], + jsonl_path: str | Path, + factor_names: list[str] | tuple[str, ...] | None, +) -> tuple[dict[str, str], dict[str, list[float | str]], list[str]]: + """Scan the rows into per-factor value lists, checking the recorder contract. + + Flattens each row's variation draws (see _flatten_variation_value), keeps only the requested + factor_names if given, and asserts every episode records the same factors and the requested + outcomes. Returns the factor kinds, the per-row values, and the first-seen order. + """ + selected = set(factor_names) if factor_names is not None else None + if selected is not None: + first_variations = rows[0].get("variations") + assert isinstance( + first_variations, dict + ), f"Row 0 of {jsonl_path} has no 'variations' block (or it is not a JSON object)." + available = set(first_variations) + missing = selected - available + assert not missing, ( + f"Requested factor(s) {sorted(missing)} not found in {jsonl_path}; " + f"available variations: {sorted(available)}." + ) + + factor_kinds: dict[str, str] = {} # factor name → "continuous" | "categorical" + factor_values: dict[str, list[float | str]] = {} # factor name → per-row value, in row order + factor_order: list[str] = [] # factor names in first-seen order, for a stable schema + + for row_index, row in enumerate(rows): + assert "variations" in row and isinstance(row["variations"], dict), ( + f"Row {row_index} of {jsonl_path} has no 'variations' block (or it is not a JSON object); " + "episode_results rows must carry recorded variation draws." + ) + seen_in_row: set[str] = set() + for key, value in row["variations"].items(): + if selected is not None and key not in selected: + continue + for factor_name, scalar in _flatten_variation_value(key, value, row_index, jsonl_path): + kind = "categorical" if isinstance(scalar, str) else "continuous" + if factor_name not in factor_kinds: + assert row_index == 0, ( + f"Factor {factor_name!r} first appears in row {row_index} of {jsonl_path}; " + "every episode must record the same variations." + ) + factor_kinds[factor_name] = kind + factor_values[factor_name] = [] + factor_order.append(factor_name) + assert factor_kinds[factor_name] == kind, ( + f"Factor {factor_name!r} is {factor_kinds[factor_name]} in earlier rows but {kind} " + f"in row {row_index} of {jsonl_path}; a variation must keep a single type." + ) + factor_values[factor_name].append(scalar) + seen_in_row.add(factor_name) + + missing_in_row = [name for name in factor_order if name not in seen_in_row] + assert not missing_in_row, ( + f"Row {row_index} of {jsonl_path} is missing factor(s) {sorted(missing_in_row)}; " + "every episode must record the same variations." + ) + for name in outcome_names: + assert name in row, ( + f"Row {row_index} of {jsonl_path} is missing outcome field {name!r} " + f"(requested outcomes: {list(outcome_names)})." + ) + + assert factor_order, f"No factors discovered in {jsonl_path}: every row's 'variations' block was empty." + return factor_kinds, factor_values, factor_order + + +def _build_factor_columns( + factor_kinds: dict[str, str], + factor_values: dict[str, list[float | str]], + factor_order: list[str], + jsonl_path: str | Path, +) -> tuple[list[FactorSpec], torch.Tensor]: + """Turn the discovered per-factor values into the factor specs and the theta matrix. + + Continuous factors lead theta, then categoricals (integer-coded). A factor that took a single + value is dropped (it carries no information, and a constant categorical breaks the estimator + fit), and an all-constant input raises. + """ + continuous_names = [name for name in factor_order if factor_kinds[name] == "continuous"] + categorical_names = [name for name in factor_order if factor_kinds[name] == "categorical"] + + factors: list[FactorSpec] = [] + columns: list[torch.Tensor] = [] + dropped: list[str] = [] + for name in continuous_names: + values = factor_values[name] + lo, hi = min(values), max(values) + if lo == hi: + dropped.append(name) + continue + factors.append(FactorSpec(name=name, type=FactorType.CONTINUOUS, range=(lo, hi))) + columns.append(torch.tensor(values, dtype=torch.float32).unsqueeze(1)) + for name in categorical_names: + choices = sorted(set(factor_values[name])) + if len(choices) == 1: + dropped.append(name) + continue + _warn_if_unevenly_sampled(name, factor_values[name], choices) + code_of = {choice: code for code, choice in enumerate(choices)} + factors.append(FactorSpec(name=name, type=FactorType.CATEGORICAL, choices=choices)) + columns.append( + torch.tensor([code_of[value] for value in factor_values[name]], dtype=torch.float32).unsqueeze(1) + ) + + if dropped: + print( + f"[INFO] Dropped {len(dropped)} constant factor(s) (single value across all episodes): {sorted(dropped)}." + ) + assert factors, ( + f"All discovered factors in {jsonl_path} are constant (each took a single value across all " + "episodes). Nothing to analyze. Vary at least one factor." + ) + return factors, torch.cat(columns, dim=1) + + +def _warn_if_unevenly_sampled(name: str, values: list[float | str], choices: list[str]) -> None: + """Warn when a categorical's choices were sampled unevenly, since that biases its posterior. + + The analysis assumes factors were drawn from the uniform prior. Uneven draw counts per choice + leak into the posterior (a no-effect factor then tracks its sampling frequency), so warn once + the imbalance reaches _IMBALANCE_WARN_RATIO. + """ + counts: dict[str, int] = {} + for value in values: + counts[value] = counts.get(value, 0) + 1 + if max(counts.values()) >= _IMBALANCE_WARN_RATIO * min(counts.values()): + ordered_counts = {choice: counts[choice] for choice in choices} + print( + f"[WARNING] Categorical factor {name!r} was sampled unevenly across its choices " + f"({ordered_counts}). Its posterior reflects this sampling frequency, not only its effect " + "on the outcome. Balance the draws per choice for an unbiased result." + ) + + +def _build_outcome_columns( + rows: list[dict], outcome_names: list[str] | tuple[str, ...], jsonl_path: str | Path +) -> torch.Tensor: + """Stack the requested top-level outcome fields into the x matrix, one column per outcome. + + Asserts each outcome value is numeric or boolean, so a stray non-numeric outcome fails with + the same row-and-path context as a bad variation rather than a bare cast error. + """ + columns: list[torch.Tensor] = [] + for name in outcome_names: + values: list[float] = [] + for row_index, row in enumerate(rows): + value = row[name] + assert isinstance(value, (bool, int, float)), ( + f"Outcome {name!r} in row {row_index} of {jsonl_path} is {type(value).__name__} {value!r}; " + "outcomes must be numeric or boolean." + ) + values.append(float(value)) + columns.append(torch.tensor(values, dtype=torch.float32).unsqueeze(1)) + return torch.cat(columns, dim=1) diff --git a/isaaclab_arena/analysis/sensitivity/generate_report.py b/isaaclab_arena/analysis/sensitivity/generate_report.py index a746ceb3a2..33d4094105 100644 --- a/isaaclab_arena/analysis/sensitivity/generate_report.py +++ b/isaaclab_arena/analysis/sensitivity/generate_report.py @@ -11,32 +11,32 @@ from pathlib import Path from isaaclab_arena.analysis.sensitivity.analyzer import SensitivityAnalyzer -from isaaclab_arena.analysis.sensitivity.dataset import SensitivityDataset +from isaaclab_arena.analysis.sensitivity.episode_results_reader import dataset_from_episode_results from isaaclab_arena.analysis.sensitivity.plotting import plot_marginals def generate_report( - factors_yaml_path: str | Path, - jsonl_path: str | Path, + episode_results_path: str | Path, output_path: str | Path, outcome_names: list[str] | tuple[str, ...] = ("success",), + factor_names: list[str] | tuple[str, ...] | None = None, observation: list[float] | None = None, seed: int | None = 0, ) -> Path: - """Build a sensitivity report from a factors.yaml / episode_summary.jsonl pair. + """Build a sensitivity report from an episode_results.jsonl, fit, and save a figure. - Loads the data, fits a SensitivityAnalyzer, and saves a single posterior-marginals - figure. The output format follows the output_path extension (.png, .pdf, …). + The factor schema is discovered from the recorder's per-episode variation draws. The output + format follows the output_path extension (.png, .pdf, …). Args: - factors_yaml_path: Schema file declaring the factors. - jsonl_path: episode_summary.jsonl produced by eval_runner. + episode_results_path: episode_results.jsonl produced by the per-episode recorder. output_path: Destination figure file (parent dirs created if absent). outcome_names: Which per-episode outcome(s) to condition on. - observation: Outcome values to condition on, one per outcome name. Defaults to - conditioning on success (1) for every (binary) outcome. - seed: Seed for torch's global RNG, set once before fitting so the estimator training - and posterior sampling are reproducible. Pass ``None`` to leave the RNG untouched. + factor_names: Which recorded variations to analyze. None analyzes all of them. + observation: Outcome values to condition on, one per outcome name. None conditions on + success (1) for every binary outcome. + seed: Seed for torch's global RNG so a report is reproducible. Pass None to leave the + RNG untouched. Returns: The resolved output path. @@ -46,7 +46,7 @@ def generate_report( if seed is not None: torch.manual_seed(seed) - dataset = SensitivityDataset.from_files(Path(factors_yaml_path), Path(jsonl_path), outcome_names) + dataset = dataset_from_episode_results(episode_results_path, outcome_names, factor_names) analyzer = SensitivityAnalyzer(dataset) analyzer.fit() @@ -64,26 +64,38 @@ def generate_report( def main(): parser = argparse.ArgumentParser( description=( - "Build a sensitivity report (one posterior-marginal panel per factor) from a " - "(factors.yaml, episode_summary.jsonl) pair. Output format follows the --output extension." + "Build a sensitivity report (one posterior-marginal panel per factor) from an " + "episode_results.jsonl. Output format follows the --output extension." ) ) - parser.add_argument("--factors_yaml", type=str, required=True, help="Path to factors.yaml.") parser.add_argument( - "--episode_summary", type=str, required=True, help="Path to episode_summary.jsonl produced by eval_runner." + "--episode_results", + type=str, + required=True, + help="Path to episode_results.jsonl produced by the per-episode recorder.", ) parser.add_argument( "--output", type=str, default="eval/sensitivity_report.png", - help="Output figure file; format follows the extension (.png, .pdf, …). Default: eval/sensitivity_report.png.", + help="Output figure file. Format follows the extension (.png, .pdf, …). Default: eval/sensitivity_report.png.", ) parser.add_argument( "--outcome", type=str, nargs="+", default=["success"], - help="Which per-episode outcome(s) to condition on (keys in the rows' outcomes block). Default: success.", + help="Which per-episode outcome(s) to condition on (top-level field(s) in each row). Default: success.", + ) + parser.add_argument( + "--factors", + type=str, + nargs="+", + default=None, + help=( + "Which recorded variations to analyze (keys in each row's variations block, a vector " + "variation keeps all its components). Default: all recorded variations." + ), ) parser.add_argument( "--observation", @@ -99,15 +111,15 @@ def main(): "--seed", type=int, default=0, - help="Seed for torch's global RNG, so estimator training + sampling are reproducible. Default: 0.", + help="Seed for torch's global RNG so a report is reproducible. Default: 0.", ) args = parser.parse_args() generate_report( - args.factors_yaml, - args.episode_summary, + args.episode_results, args.output, outcome_names=args.outcome, + factor_names=args.factors, observation=args.observation, seed=args.seed, ) diff --git a/isaaclab_arena/analysis/sensitivity/plotting.py b/isaaclab_arena/analysis/sensitivity/plotting.py index 73a4961e7b..5dd0ef2cbf 100644 --- a/isaaclab_arena/analysis/sensitivity/plotting.py +++ b/isaaclab_arena/analysis/sensitivity/plotting.py @@ -35,7 +35,7 @@ def plot_marginals( for categorical ones, wrapped into a grid. Args: - samples: ``(num_samples, total_factor_dim)`` posterior draws in the dataset's factor + samples: ``(num_samples, num_factors)`` posterior draws in the dataset's factor layout (continuous-first, original units), e.g. from ``SensitivityAnalyzer.sample_posterior``. dataset: The dataset, for the factor schema and column layout. observation: The outcome vector the samples were conditioned on (shown in the title). @@ -46,7 +46,7 @@ def plot_marginals( The matplotlib Figure. """ samples = samples.cpu().numpy() - factors = dataset.schema.factors + factors = dataset.factors # Wrap panels into a grid (at most 3 columns) so many factors stay readable. num_columns = min(3, len(factors)) num_rows = math.ceil(len(factors) / num_columns) @@ -86,7 +86,7 @@ def _draw_continuous_marginal(ax, factor: FactorSpec, factor_samples: np.ndarray than a binned histogram. Falls back to a single line at the mean when the samples have no spread (KDE bandwidth is then undefined). """ - range_low, range_high = factor.range[0] + range_low, range_high = factor.range sample_mean = float(np.mean(factor_samples)) if float(np.std(factor_samples)) >= 1e-9: grid = np.linspace(range_low, range_high, 200) diff --git a/isaaclab_arena/tests/sensitivity_synthetic.py b/isaaclab_arena/tests/sensitivity_synthetic.py index 056b6ef50f..b2b4392be1 100644 --- a/isaaclab_arena/tests/sensitivity_synthetic.py +++ b/isaaclab_arena/tests/sensitivity_synthetic.py @@ -28,7 +28,7 @@ from dataclasses import dataclass from isaaclab_arena.analysis.sensitivity.analyzer import SensitivityAnalyzer -from isaaclab_arena.analysis.sensitivity.dataset import FactorSchema, FactorSpec, SensitivityDataset +from isaaclab_arena.analysis.sensitivity.dataset import FactorSpec, SensitivityDataset from isaaclab_arena.analysis.sensitivity.plotting import plot_marginals @@ -50,7 +50,7 @@ def logit(self, values: torch.Tensor) -> torch.Tensor: return self.weight * normalized def spec(self) -> FactorSpec: - return FactorSpec(name=self.name, type="continuous", range=[list(self.value_range)]) + return FactorSpec(name=self.name, type="continuous", range=self.value_range) def column(self, values: torch.Tensor) -> torch.Tensor: return values @@ -105,10 +105,10 @@ def _build_dataset( SensitivityDataset.factor_columns expects. """ ordered = sorted(factors_and_columns, key=lambda pair: isinstance(pair[0], _CategoricalFactor)) - schema = FactorSchema(factors=[factor.spec() for factor, _ in ordered]) + factors = [factor.spec() for factor, _ in ordered] theta = torch.stack([factor.column(values) for factor, values in ordered], dim=1) # outcome_names defaults to ("success",), matching the single binary outcome built here. - return SensitivityDataset(schema, theta, success.unsqueeze(1)) + return SensitivityDataset(factors, theta, success.unsqueeze(1)) def make_continuous_dataset(seed: int, num_episodes: int = 2000) -> SensitivityDataset: diff --git a/isaaclab_arena/tests/test_sensitivity_analysis.py b/isaaclab_arena/tests/test_sensitivity_analysis.py index cf6d50a799..b18692a862 100644 --- a/isaaclab_arena/tests/test_sensitivity_analysis.py +++ b/isaaclab_arena/tests/test_sensitivity_analysis.py @@ -18,8 +18,10 @@ import numpy as np import torch +import pytest + from isaaclab_arena.analysis.sensitivity.analyzer import SensitivityAnalyzer -from isaaclab_arena.analysis.sensitivity.dataset import SensitivityDataset +from isaaclab_arena.analysis.sensitivity.episode_results_reader import dataset_from_episode_results from isaaclab_arena.tests.sensitivity_synthetic import ( CAMERA_DISTANCE, GRASP_OFFSET, @@ -89,64 +91,213 @@ def test_npe_recovers_two_continuous_effects(): def _write_jsonl(path, rows: list[dict]) -> None: - """Write one JSON object per line to ``path``.""" + """Write one JSON object per line to path.""" path.write_text("\n".join(json.dumps(row) for row in rows) + "\n", encoding="utf-8") -def test_from_files_parses_mixed_schema_and_builds_tensors(tmp_path): - """from_files parses a factors.yaml + episode_summary.jsonl into the expected theta / x layout.""" - factors_yaml = tmp_path / "factors.yaml" - factors_yaml.write_text( - "factors:\n" - " light_intensity:\n" - " type: continuous\n" - " range: [[0.0, 1000.0]]\n" - " pick_up_object:\n" - " type: categorical\n" - " choices: [cube, can]\n", - encoding="utf-8", +def test_from_episode_results_splits_vector_variation_into_scalar_factors(tmp_path): + """from_episode_results discovers a continuous factor per component of a vector variation draw.""" + jsonl = tmp_path / "episode_results.jsonl" + _write_jsonl( + jsonl, + [ + {"success": True, "variations": {"droid.camera_extrinsics_wrist_camera": [0.001, -0.004, 0.002]}}, + {"success": False, "variations": {"droid.camera_extrinsics_wrist_camera": [0.003, 0.001, -0.005]}}, + ], + ) + + dataset = dataset_from_episode_results(jsonl, outcome_names=["success"]) + + # A 3-vector draw becomes three continuous factors, named with a per-component suffix. + factors_by_name = {factor.name: factor for factor in dataset.factors} + expected_names = [f"droid.camera_extrinsics_wrist_camera[{i}]" for i in range(3)] + assert [factor.name for factor in dataset.factors] == expected_names + assert all(factors_by_name[name].type == "continuous" for name in expected_names) + + assert dataset.theta.shape == (2, 3) + assert dataset.x.shape == (2, 1) + assert dataset.theta[:, 0].tolist() == pytest.approx([0.001, 0.003]) # first component, both episodes (float32) + assert dataset.x[:, 0].tolist() == [1.0, 0.0] # success bool → 1.0 / 0.0 + + +def test_from_episode_results_discovers_mixed_continuous_and_categorical(tmp_path): + """A numeric and a string variation become a continuous and a categorical factor (choices observed).""" + jsonl = tmp_path / "episode_results.jsonl" + _write_jsonl( + jsonl, + [ + {"success": True, "variations": {"dome.light_intensity": 250.0, "dome.hdr_image": "studio"}}, + {"success": False, "variations": {"dome.light_intensity": 750.0, "dome.hdr_image": "sunset"}}, + {"success": True, "variations": {"dome.light_intensity": 500.0, "dome.hdr_image": "studio"}}, + ], + ) + + dataset = dataset_from_episode_results(jsonl, outcome_names=["success"]) + + factors_by_name = {factor.name: factor for factor in dataset.factors} + assert factors_by_name["dome.light_intensity"].type == "continuous" + assert factors_by_name["dome.hdr_image"].type == "categorical" + assert factors_by_name["dome.hdr_image"].choices == ["studio", "sunset"] # sorted observed labels + # A continuous factor's range is inferred as [min, max] of the observed values. + assert factors_by_name["dome.light_intensity"].range == (250.0, 750.0) + + # Continuous-first layout; categorical integer-coded by its index into the discovered choices. + assert dataset.factor_columns == {"dome.light_intensity": slice(0, 1), "dome.hdr_image": slice(1, 2)} + assert dataset.theta[:, 0].tolist() == [250.0, 750.0, 500.0] # continuous column, in row order + assert dataset.theta[:, 1].tolist() == [0.0, 1.0, 0.0] # studio -> 0, sunset -> 1 + assert dataset.x[:, 0].tolist() == [1.0, 0.0, 1.0] # success bool → 1.0 / 0.0 + # A categorical factor selects MNPE; a continuous-only schema would select NPE. + assert SensitivityAnalyzer(dataset)._select_inference_class().__name__ == "MNPE" + + +def test_from_episode_results_drops_constant_factors(tmp_path): + """A factor that took a single value across all episodes is dropped, varying factors survive.""" + jsonl = tmp_path / "episode_results.jsonl" + _write_jsonl( + jsonl, + [ + {"success": True, "variations": {"light_intensity": 250.0, "always_5": 5.0, "hdr": "only_one"}}, + {"success": False, "variations": {"light_intensity": 750.0, "always_5": 5.0, "hdr": "only_one"}}, + ], + ) + + dataset = dataset_from_episode_results(jsonl, outcome_names=["success"]) + + # The constant continuous (always_5) and constant categorical (hdr) are dropped; only the varying one remains. + assert [factor.name for factor in dataset.factors] == ["light_intensity"] + assert dataset.theta.shape == (2, 1) + + +def test_from_episode_results_warns_on_imbalanced_categorical(tmp_path, capsys): + """An unevenly sampled categorical warns, since its posterior would track the sampling frequency.""" + jsonl = tmp_path / "episode_results.jsonl" + _write_jsonl( + jsonl, + [ + {"success": True, "variations": {"hdr": "a"}}, + {"success": False, "variations": {"hdr": "a"}}, + {"success": True, "variations": {"hdr": "a"}}, + {"success": False, "variations": {"hdr": "b"}}, # a:b sampled 3:1 + ], + ) + + dataset_from_episode_results(jsonl, outcome_names=["success"]) + + assert "sampled unevenly" in capsys.readouterr().out + + +def test_from_episode_results_raises_when_all_factors_constant(tmp_path): + """If every factor took a single value there is nothing to analyze, so building raises.""" + jsonl = tmp_path / "episode_results.jsonl" + _write_jsonl( + jsonl, + [ + {"success": True, "variations": {"hdr": "only_one"}}, + {"success": False, "variations": {"hdr": "only_one"}}, + ], + ) + + with pytest.raises(AssertionError, match="constant"): + dataset_from_episode_results(jsonl, outcome_names=["success"]) + + +def test_from_episode_results_treats_bool_variation_as_categorical(tmp_path): + """A boolean variation draw becomes a categorical factor labelled by str(value).""" + jsonl = tmp_path / "episode_results.jsonl" + _write_jsonl( + jsonl, + [ + {"success": True, "variations": {"distractor_present": True}}, + {"success": False, "variations": {"distractor_present": False}}, + ], ) - jsonl = tmp_path / "episode_summary.jsonl" + + dataset = dataset_from_episode_results(jsonl, outcome_names=["success"]) + + factor = dataset.factors[0] + assert factor.type == "categorical" + assert factor.choices == ["False", "True"] # sorted str labels + assert dataset.theta[:, 0].tolist() == [1.0, 0.0] # "True" -> 1, "False" -> 0 (index in sorted choices) + + +def test_from_episode_results_rejects_inconsistent_factor_set(tmp_path): + """Every episode must record the same variations, so a row with a different factor set raises.""" + jsonl = tmp_path / "episode_results.jsonl" _write_jsonl( jsonl, [ - {"arena_env_args": {"light_intensity": 250.0, "pick_up_object": "cube"}, "outcomes": {"success": 1}}, - {"arena_env_args": {"light_intensity": 750.0, "pick_up_object": "can"}, "outcomes": {"success": 0}}, - {"arena_env_args": {"light_intensity": 500.0, "pick_up_object": "cube"}, "outcomes": {"success": 1}}, + {"success": True, "variations": {"light_intensity": 250.0}}, + {"success": False, "variations": {"light_intensity": 750.0, "extra": 1.0}}, # new factor mid-stream ], ) - dataset = SensitivityDataset.from_files(factors_yaml, jsonl, outcome_names=["success"]) - - # Schema parsed with the declared structure. - factors_by_name = {factor.name: factor for factor in dataset.schema.factors} - assert factors_by_name["light_intensity"].type == "continuous" - assert factors_by_name["light_intensity"].range == [(0.0, 1000.0)] - assert factors_by_name["pick_up_object"].type == "categorical" - assert factors_by_name["pick_up_object"].choices == ["cube", "can"] - - # Continuous-first theta layout; categorical integer-coded by its index into choices. - assert dataset.theta.shape == (3, 2) - assert dataset.x.shape == (3, 1) - assert dataset.factor_columns == {"light_intensity": slice(0, 1), "pick_up_object": slice(1, 2)} - assert dataset.theta[:, 0].tolist() == [250.0, 750.0, 500.0] - assert dataset.theta[:, 1].tolist() == [0.0, 1.0, 0.0] # cube -> 0, can -> 1 - assert dataset.x[:, 0].tolist() == [1.0, 0.0, 1.0] - - -def test_from_files_infers_missing_continuous_range(tmp_path): - """A continuous factor with no declared range gets [min, max] inferred from the observed values.""" - factors_yaml = tmp_path / "factors.yaml" - factors_yaml.write_text("factors:\n light_intensity:\n type: continuous\n", encoding="utf-8") - jsonl = tmp_path / "episode_summary.jsonl" + with pytest.raises(AssertionError, match="same variations"): + dataset_from_episode_results(jsonl, outcome_names=["success"]) + + +def test_from_episode_results_rejects_non_numeric_vector_component(tmp_path): + """A vector variation with a non-numeric component is rejected.""" + jsonl = tmp_path / "episode_results.jsonl" + _write_jsonl( + jsonl, + [{"success": True, "variations": {"pose": [0.1, "oops", 0.2]}}], + ) + + with pytest.raises(AssertionError, match="non-numeric"): + dataset_from_episode_results(jsonl, outcome_names=["success"]) + + +def test_from_episode_results_selects_factor_subset(tmp_path): + """factor_names restricts the analysis to the named variations, a vector keeps all components.""" + jsonl = tmp_path / "episode_results.jsonl" + _write_jsonl( + jsonl, + [ + {"success": True, "variations": {"light_intensity": 250.0, "hdr": "studio", "wrist": [0.1, 0.2]}}, + {"success": False, "variations": {"light_intensity": 750.0, "hdr": "sunset", "wrist": [0.3, 0.4]}}, + ], + ) + + dataset = dataset_from_episode_results(jsonl, outcome_names=["success"], factor_names=["light_intensity", "wrist"]) + + # hdr is excluded; the selected vector is still split into one factor per component. + assert [factor.name for factor in dataset.factors] == ["light_intensity", "wrist[0]", "wrist[1]"] + + +def test_from_episode_results_rejects_unknown_factor_name(tmp_path): + """Requesting a factor that wasn't recorded raises with the available names listed.""" + jsonl = tmp_path / "episode_results.jsonl" _write_jsonl( jsonl, [ - {"arena_env_args": {"light_intensity": 30.0}, "outcomes": {"success": 0}}, - {"arena_env_args": {"light_intensity": 90.0}, "outcomes": {"success": 1}}, + {"success": True, "variations": {"light_intensity": 250.0}}, + {"success": False, "variations": {"light_intensity": 750.0}}, ], ) - dataset = SensitivityDataset.from_files(factors_yaml, jsonl, outcome_names=["success"]) + with pytest.raises(AssertionError, match="not found"): + dataset_from_episode_results(jsonl, outcome_names=["success"], factor_names=["nonexistent"]) + + +def test_from_episode_results_rejects_non_dict_variations(tmp_path): + """A null / non-object variations block fails clearly rather than as an AttributeError.""" + jsonl = tmp_path / "episode_results.jsonl" + _write_jsonl(jsonl, [{"success": True, "variations": None}]) + + with pytest.raises(AssertionError, match="not a JSON object"): + dataset_from_episode_results(jsonl, outcome_names=["success"]) + + +def test_from_episode_results_rejects_non_numeric_outcome(tmp_path): + """A non-numeric outcome value fails with row context, not a bare cast error.""" + jsonl = tmp_path / "episode_results.jsonl" + _write_jsonl( + jsonl, + [ + {"success": "yes", "variations": {"light_intensity": 250.0}}, + {"success": "no", "variations": {"light_intensity": 750.0}}, + ], + ) - assert dataset.schema.factors[0].range == [(30.0, 90.0)] + with pytest.raises(AssertionError, match="must be numeric or boolean"): + dataset_from_episode_results(jsonl, outcome_names=["success"])