Promote bucket benchmarking from a plan to a runnable baseline
Constraint: The cap48/cap64 reversal means strategy guidance can no longer rely on a single overall subset result Rejected: Keep bucket benchmarking as a doc-only next step | The repo now needs an executable baseline so later sessions can measure scale/style divergence directly Confidence: high Scope-risk: moderate Directive: Treat ab_smoke_bucketed.py as the canonical seed for style-aware evaluation, and expand bucket definitions before revisiting global default-strategy claims Tested: Verified acr-engine/scripts/ab_smoke_bucketed.py passes py_compile; verified first bucket prefix_000_a produced bucket_report.json with hybrid 4/1.0/1.0 and high_energy 3/1.0/1.0; verified second bucket execution is in progress Not-tested: Full multi-bucket report.json completion, richer bucket definitions, and bucket-level aggregate conclusions
Showing
6 changed files
with
259 additions
and
2 deletions
acr-engine/scripts/ab_smoke_bucketed.py
0 → 100755
-
Please register or sign in to post a comment