Sensitivity: factor-importance and corner/joint report plots#826
Sensitivity: factor-importance and corner/joint report plots#826cvolkcvolk wants to merge 1 commit into
Conversation
…e static report - marginals.py: shared marginal-density/probability estimators plus factor importance (total-variation distance from the uniform prior, normalized to [0,1]) - plotting.py: plot_importance (tornado ranking) and plot_corner with type-aware pairwise joints (continuous heatmaps, continuous-categorical violins, categorical heatmaps) - generate_report: render marginals + importance + corner, selectable via --plots - synthetic demo emits all three plots Signed-off-by: Clemens Volk <cvolk@nvidia.com>
There was a problem hiding this comment.
🤖 Isaac Lab-Arena Review Bot
Summary
This factors the marginal-density / categorical-probability estimators out of plotting.py into a pure-numpy marginals.py, adds a factor-importance tornado and a type-aware corner plot, and lets generate_report / the synthetic demo render a selectable set of plots. The split is a clean dedup, the importance math (TV-from-prior, normalized to [0,1]) checks out, and the layering stays analysis-only — no sim or env coupling. A couple of small cleanups below.
Heads-up (not a blocker): generate_report changes its return type from Path to dict[str, Path]. That's fine for alpha analysis tooling, but worth noting for any caller that used the returned path.
Findings
🟡 Warning: marginals.py:107 — the new importance scoring is pure, deterministic, CPU-only logic with subtle cross-type normalization, but has no unit test. See the inline comment.
🔵 Improvement: tests/sensitivity_synthetic.py:195 — the plot-dispatch + output-naming convention is duplicated from generate_report. See inline.
🔵 Improvement: marginals.py:9 — docstring references an "interactive app" that isn't in the repo. See inline.
Test Coverage
test_sensitivity_analysis.py is a solid end-to-end suite, but the new factor_importance / factor_importances path is untested — and it's the easiest thing here to unit-test (pure numpy, no sim, no fit needed if you hand it synthetic samples). A small test asserting a collapsed posterior scores ≈1 and a uniform-looking one scores ≈0, for both a continuous and a categorical factor, would lock in the [0,1] contract the ranking relies on. The plotting functions themselves are reasonably left untested (they only render).
Verdict
Minor fixes needed
| return total_variation / ceiling if ceiling > 0 else 0.0 | ||
|
|
||
|
|
||
| def factor_importances(samples: torch.Tensor, dataset: SensitivityDataset) -> list[tuple[str, float]]: |
There was a problem hiding this comment.
Could we add a unit test for the importance scoring? It's the one new piece here that's pure, deterministic, and CPU-only, yet the normalization is non-obvious: continuous uses raw TV with a point-mass shortcut to 1.0, while categorical rescales by the 1 - 1/k ceiling so both types share the [0,1] scale. test_sensitivity_analysis.py is the natural home — feed factor_importance synthetic samples directly (no fit needed) and assert a collapsed posterior scores ≈1 and a near-uniform one ≈0, for one continuous and one categorical factor. That pins the cross-type comparability the tornado ranking depends on.
|
|
||
| # Render all three report plots from the one posterior sample, each to a name-suffixed file. | ||
| output = Path(args.output) | ||
| for name, render in (("marginals", plot_marginals), ("importance", plot_importance), ("corner", plot_corner)): |
There was a problem hiding this comment.
This repeats the renderer mapping and the "marginals keeps the stem, others get a _<name> sibling" naming rule from generate_report (_PLOT_RENDERERS + the suffix loop). If the suffix convention changes in one place the other diverges silently. Could the render-named-plots-to-suffixed-siblings step live as one shared helper in plotting.py (taking samples, dataset, observation, output_path, plots) that both generate_report and this demo call?
| """Pure (numpy-only) estimators over posterior samples: marginal densities and factor importance. | ||
|
|
||
| These functions take already-drawn posterior samples and return summary arrays. They run no | ||
| inference and do no plotting, so the static report and the interactive app share one source of |
There was a problem hiding this comment.
The docstring justifies the split by sharing "one source of truth" with "the interactive app", but I don't see such an app in the repo. The dedup with plotting.py already justifies the module on its own — could we drop the interactive-app clause (or scope it to the static report) so the rationale matches what's actually here?
Summary
Importance + corner plots in the report
Detailed description