Skip to content
Open
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
a35e16e
feat(cli): add `stats` subcommand — per-model React Doctor leaderboar…
aidenybai Jun 22, 2026
7b0e7b1
fix(cli): keep stats spinner responsive during session discovery
aidenybai Jun 22, 2026
fe9f110
feat(cli): show only the top 5 models in the stats leaderboard
aidenybai Jun 22, 2026
8769924
refactor(stats): deduplicate transcript coercion helpers
aidenybai Jun 22, 2026
755b8aa
test(stats): guard cursor adapter test behind node:sqlite availability
aidenybai Jun 22, 2026
721470d
fix(stats): make cursor DB + reconstruct tests pass on Windows
aidenybai Jun 22, 2026
f15f640
fix(stats): correct reconstruction fidelity and skip bucketing (Bugbot)
aidenybai Jun 22, 2026
0d13c86
refactor(stats): stream JSONL transcripts via node:readline
aidenybai Jun 22, 2026
dad2a5c
fix(stats): address review feedback (score correctness, JSON errors, …
aidenybai Jun 22, 2026
9a20e3d
fix(stats): drop unfaithful StrReplace edits instead of linting stale…
aidenybai Jun 22, 2026
f6b2f03
fix(stats): weight scores by productive sessions, not dead ones
aidenybai Jun 22, 2026
3f50df7
chore(stats): bump changeset to patch
aidenybai Jun 22, 2026
509f229
fix(ci): publish deslop-js to pkg.pr.new so previews install
aidenybai Jun 22, 2026
f26f960
feat(stats): scan every Cursor store (Nightly GUI + CLI agent) and de…
rayhanadev Jun 23, 2026
ac04d51
feat(stats): trace stats runs in Sentry (cli.stats + per-model leader…
rayhanadev Jun 23, 2026
db52fc6
feat(stats): report leaderboard rows to /api/stats + render the commu…
rayhanadev Jun 23, 2026
83a9210
refactor(stats): send /api/stats payload as plain JSON, not gzip
rayhanadev Jun 23, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions .changeset/stats-agent-leaderboard.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
---
"react-doctor": minor
---

Add a `react-doctor stats` subcommand — a per-model code-quality leaderboard built from local AI agent chat history.

`stats` reads local agent history — Claude Code (`~/.claude`) and Codex (`~/.codex`) transcripts, plus the Cursor composer database — reconstructs the file content each model actually wrote (Claude post-edit snapshots, Cursor full post-edit file snapshots, Codex `apply_patch` envelopes), lints that content with the existing engine, and ranks models and providers by their React Doctor score and diagnostics-per-file. The job: answer "which agent/model writes the cleanest React code in my repo".

- Only the React code each model wrote is scored. Reconstructed files are filtered to actual React (JSX/TSX, `use client`/`use server` directives, or a React-ecosystem import) before linting, so a model's plain backend/util/config files don't pad its file count or dilute its diagnostics-per-file. A scan that errors, is skipped, or whose lint phase fails is dropped rather than counted as zero-diagnostic "clean" code, so un-lintable output can't inflate a model's score.
- Ranking is by a confidence-weighted score, not the raw score: each group's score is regressed toward the global mean by its evidence, so a model with a handful of clean files can't top the board on a tiny sample. Files are the dominant signal; sessions only lightly discount the file weight (many files from one session are one correlated sample) and never below a floor.
- Cursor attribution reads the canonical composer database (`state.vscdb`) directly, so each session carries its real model (e.g. `claude-opus-4-8`, `gpt-5.5`, `composer-2`) and an exact post-edit snapshot of every edited file — the model-less agent-transcript JSONL files are no longer used. Attribution falls back to `unknown` only for chats left on the "Auto" model.
- Default scope is the current repository (sessions whose cwd or edits touch the repo root); `--global` ranks across every repo on the machine. `--since`, `--limit`, and `--provider` bound the work.
- `--json` emits a structured leaderboard (`{ schemaVersion, scope, models, providers, best, worst, … }`); the terminal output shows the top models and per-tool tables with a single score bar (the confidence-weighted score) and a best/worst callout.
- Coverage is honest about its limits: Codex shell-based edits are not faithfully reconstructable (surfaced as skipped), the Cursor composer database requires `node:sqlite` (Node 22.13+) and covers GUI agent sessions (not cursor-agent CLI runs), and the score requires network access.
10 changes: 10 additions & 0 deletions packages/core/src/highlighter.ts
Original file line number Diff line number Diff line change
@@ -1,12 +1,21 @@
import pc from "picocolors";

// picocolors only ships the 16-color palette, so orange (Claude's brand) is a
// 256-color escape built by hand. Honors color-disabled by returning the input.
const ORANGE_ANSI_CODE = 208;
const makeOrange =
(enabled: boolean): ((input: string | number) => string) =>
(input) =>
enabled ? `\u001b[38;5;${ORANGE_ANSI_CODE}m${input}\u001b[39m` : String(input);

export const highlighter = {
error: pc.red,
warn: pc.yellow,
info: pc.cyan,
success: pc.green,
dim: pc.dim,
gray: pc.gray,
orange: makeOrange(pc.isColorSupported),
bold: pc.bold,
};

Expand All @@ -27,5 +36,6 @@ export const setColorEnabled = (enabled: boolean): void => {
highlighter.success = colors.green;
highlighter.dim = colors.dim;
highlighter.gray = colors.gray;
highlighter.orange = makeOrange(enabled);
highlighter.bold = colors.bold;
};
135 changes: 135 additions & 0 deletions packages/react-doctor/src/cli/commands/stats.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
import * as path from "node:path";
import { resolveScanTarget, type ReactDoctorConfig } from "@react-doctor/core";
import { aggregateStats } from "../../stats/aggregate-stats.js";
import { STATS_DEFAULT_SESSION_LIMIT } from "../../stats/constants.js";
import { discoverSessions } from "../../stats/discover-sessions.js";
import { renderStatsReport } from "../../stats/render-stats.js";
import { runStatsScan } from "../../stats/run-stats-scan.js";
import type { StatsProvider, StatsReport, StatsScopeOptions } from "../../stats/types.js";
import { METRIC } from "../utils/constants.js";
import { enableJsonMode } from "../utils/json-mode.js";
import { recordCount } from "../utils/record-metric.js";
import { spinner } from "../utils/spinner.js";

export interface StatsFlags {
global?: boolean;
since?: string;
limit?: string;
provider?: string;
json?: boolean;
cwd?: string;
}

const VALID_PROVIDERS = new Set<string>(["claude", "codex", "cursor"]);

const isStatsProvider = (value: string): value is StatsProvider => VALID_PROVIDERS.has(value);

const parseProvider = (value: string | undefined): StatsProvider | undefined => {
if (value === undefined) return undefined;
if (!isStatsProvider(value)) {
throw new Error(`Unknown provider "${value}". Expected one of: claude, codex, cursor.`);
}
return value;
};

const parseSince = (value: string | undefined): Date | undefined => {
if (value === undefined) return undefined;
const parsed = new Date(value);
if (Number.isNaN(parsed.getTime())) {
throw new Error(`Invalid --since date "${value}". Use e.g. 2026-06-01.`);
}
return parsed;
};

const parseLimit = (value: string | undefined): number => {
if (value === undefined) return STATS_DEFAULT_SESSION_LIMIT;
const parsed = Number.parseInt(value, 10);
if (!Number.isFinite(parsed) || parsed <= 0) {
throw new Error(`Invalid --limit "${value}". Use a positive integer, e.g. 200.`);
}
return parsed;
};

const resolveTarget = async (
directory: string,
): Promise<{ root: string; userConfig: ReactDoctorConfig | null }> => {
try {
const target = await resolveScanTarget(directory);
return { root: target.resolvedDirectory, userConfig: target.userConfig };
} catch {
return { root: path.resolve(directory), userConfig: null };
}
};

export const statsAction = async (flags: StatsFlags): Promise<void> => {
const directory = flags.cwd ?? process.cwd();
// Register JSON mode up front so any throw (flag parsing, scan, or score API
// failure) is emitted as a structured JSON error by the top-level handler
// instead of plain text — and so incidental logs (e.g. a score-API warning)
// never corrupt the report on stdout.
if (flags.json) enableJsonMode({ compact: false, directory });
const scope: StatsScopeOptions = {
global: flags.global ?? false,
since: parseSince(flags.since),
limit: parseLimit(flags.limit),
provider: parseProvider(flags.provider),
};

const { root, userConfig } = await resolveTarget(directory);

// ora renders to stderr; suppress it in JSON mode so the run stays quiet.
const progress = flags.json ? null : spinner("Looking through your agent history…").start();
let report: StatsReport;
let providerCount: number;
try {
const sessions = await discoverSessions(root, scope, (foundCount) =>
progress?.update(`Looking through your agent history… (${foundCount} found)`),
);
progress?.update("Checking the code each agent wrote…");
const results = await runStatsScan(sessions, scope.global ? null : root, {
onProgress: (completedCount, totalCount) =>
progress?.update(`Checking the code each agent wrote… (${completedCount}/${totalCount})`),
});
progress?.update("Scoring…");
const aggregated = await aggregateStats(results, userConfig);
providerCount = aggregated.providers.length;

report = {
scope: scope.global ? "global" : "repo",
directory: root,
models: aggregated.models,
providers: aggregated.providers,
best: aggregated.best,
worst: aggregated.worst,
sessionsAnalyzed: results.length,
sessionsRanked: results.filter((result) => result.filesScanned > 0).length,
sessionsNonReact: results.filter(
(result) => result.filesScanned === 0 && result.reconstructedFiles > 0,
).length,
Comment thread
aidenybai marked this conversation as resolved.
Outdated
sessionsUnreconstructable: results.filter(
(result) =>
result.filesScanned === 0 &&
result.reconstructedFiles === 0 &&
result.unreconstructable > 0,
).length,
Comment thread
aidenybai marked this conversation as resolved.
Outdated
generatedAt: new Date().toISOString(),
};
progress?.succeed("Done.");
} finally {
progress?.stop();
}

recordCount(METRIC.statsRun, 1, {
scope: report.scope,
sessions: report.sessionsAnalyzed,
providers: providerCount,
provider: scope.provider ?? "all",
});

if (flags.json) {
process.stdout.write(`${JSON.stringify({ schemaVersion: 1, ...report }, null, 2)}\n`);
return;
Comment thread
aidenybai marked this conversation as resolved.
}

process.stdout.write(`${renderStatsReport(report)}\n`);
};
52 changes: 50 additions & 2 deletions packages/react-doctor/src/cli/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ import {
rulesSetAction,
rulesUnignoreTagAction,
} from "./commands/rules.js";
import { statsAction } from "./commands/stats.js";
import { versionAction } from "./commands/version.js";
import { whyAction } from "./commands/why.js";
import { applyColorPreference } from "./utils/apply-color-preference.js";
Expand Down Expand Up @@ -80,8 +81,12 @@ ${formatExampleLines([
])}

${highlighter.dim("Configuration:")}
Add a ${highlighter.info("doctor.config.ts")} (or .js/.mjs/.json — or a ${highlighter.info('"reactDoctor"')} key in your package.json) in the project root.
Use ${highlighter.info("react-doctor rules")} to list, explain, and configure rules. CLI flags always override config values.
Add a ${highlighter.info("doctor.config.ts")} (or .js/.mjs/.json — or a ${highlighter.info(
'"reactDoctor"',
)} key in your package.json) in the project root.
Use ${highlighter.info(
"react-doctor rules",
)} to list, explain, and configure rules. CLI flags always override config values.

${highlighter.dim("Feedback & bug reports:")}
${highlighter.info(`${CANONICAL_GITHUB_URL}/issues`)}
Expand All @@ -103,6 +108,31 @@ ${highlighter.dim("Learn more:")}
${highlighter.info(CANONICAL_GITHUB_URL)}
`;

const renderStatsHelpEpilog = (): string => `
${highlighter.dim("Examples:")}
${formatExampleLines([
["react-doctor stats", "rank agents on sessions that touched this repo"],
["react-doctor stats --global", "rank across every repository on this machine"],
["react-doctor stats --provider claude", "only Claude Code sessions"],
["react-doctor stats --since 2026-06-01", "only recent sessions"],
["react-doctor stats --json", "machine-readable leaderboard"],
])}

${highlighter.dim("How it works:")}
Reads local agent history (Claude Code + Codex transcripts, the Cursor
composer database), reconstructs the code each model wrote, lints it, and
ranks models + providers by score.

${highlighter.dim("Caveats:")}
Codex shell-based edits aren't reconstructable (partial coverage). Cursor uses
the GUI composer database (cursor-agent CLI transcripts are not included), and
attribution falls back to "unknown" only for chats left on "Auto". The score
requires network access.

${highlighter.dim("Learn more:")}
${highlighter.info(CANONICAL_GITHUB_URL)}
`;

const collectCategoryOption = (value: string, previousValues: string[] | undefined): string[] => [
...(previousValues ?? []),
value,
Expand Down Expand Up @@ -227,6 +257,24 @@ program
.option("--no-color", "disable colored output (also honors NO_COLOR)")
.action(versionAction);

program
.command("stats")
.description("Rank agents/models by the React Doctor health of the code they wrote")
.option("--global", "include sessions from every repository (default: this repo only)")
.option("--since <date>", "only sessions modified on or after this date (e.g. 2026-06-01)")
.option("--limit <n>", "max sessions to analyze, newest first (default: 200)")
.option("--provider <name>", "only one source: claude, codex, or cursor")
.option("--json", "output a structured JSON leaderboard")
.option("-c, --cwd <cwd>", "working directory", process.cwd())
.option("--color", "force colored output")
.option("--no-color", "disable colored output (also honors NO_COLOR)")
.addHelpText("after", renderStatsHelpEpilog)
// stats redeclares --json/--cwd/--color, but the root program also exposes
// them as globals (e.g. --json for the default inspect command). Merge via
// optsWithGlobals() so a flag works whether it lands before or after the
// subcommand.
.action((_options, command) => statsAction(command.optsWithGlobals()));

const rules = program
.command("rules")
.description("List, explain, and configure which React Doctor rules run");
Expand Down
3 changes: 3 additions & 0 deletions packages/react-doctor/src/cli/utils/constants.ts
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,9 @@ export const METRIC = {
installDependency: "install.dependency",
rulesChanged: "rules.changed",
rulesQueried: "rules.queried",
// `react-doctor stats`: one counter per run (adoption), with the providers
// discovered and the number of agent sessions scored as attributes.
statsRun: "stats.run",
// Editor language server (`react-doctor experimental-lsp`). Each workspace
// scan burst is one wide-event span (op `lsp.scan`) plus these metrics.
lspSessionStarted: "lsp.session.started",
Expand Down
10 changes: 10 additions & 0 deletions packages/react-doctor/src/cli/utils/strip-unknown-cli-flags.ts
Original file line number Diff line number Diff line change
Expand Up @@ -99,12 +99,22 @@ const WHY_FLAG_SPEC: CliFlagSpec = {
shortOptionsWithRequiredValues: new Set(["-c"]),
};

// `stats` takes no positionals — just the scope/output options.
const STATS_FLAG_SPEC: CliFlagSpec = {
longOptionsWithoutValues: new Set(["--color", "--global", "--help", "--json", "--no-color"]),
longOptionsWithRequiredValues: new Set(["--cwd", "--limit", "--provider", "--since"]),
longOptionsWithOptionalValues: new Set(),
shortOptionsWithoutValues: new Set(["-h"]),
shortOptionsWithRequiredValues: new Set(["-c"]),
};

const COMMAND_FLAG_SPECS = new Map<string, CliFlagSpec>([
["install", INSTALL_FLAG_SPEC],
["setup", INSTALL_FLAG_SPEC],
["version", VERSION_FLAG_SPEC],
["rules", RULES_FLAG_SPEC],
["why", WHY_FLAG_SPEC],
["stats", STATS_FLAG_SPEC],

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stats strips trailing no-score

Medium Severity

For react-doctor stats --no-score (or --no-telemetry), the pre-Commander flag stripper drops those globals when they appear after stats, so statsAction still treats telemetry as on and calls the score and /api/stats endpoints even though Sentry already opted out via raw process.argv.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit db52fc6. Configure here.

]);

const isFlagLike = (argument: string): boolean => argument.startsWith("-") && argument !== "-";
Expand Down
Loading
Loading