Lock the cap32 result and harden the hybrid default recommendation
Persist the larger 32-track benchmark showing hybrid strongly outperforming high_energy, so the default strategy decision rests on multiple larger real-data checkpoints instead of a single subset. Constraint: Only documentation changes are allowed because benchmark artifacts stay outside version control Rejected: Keep the default recommendation tentative after cap32 | The 24-track and 32-track capped benchmarks now agree on hybrid superiority Confidence: high Scope-risk: narrow Directive: Use cap24 and cap32 together as the current strongest strategy evidence until a broader multi-style benchmark supersedes them Tested: Verified /tmp/ab_smoke_seg_cap32_top2/report.json; verified high_energy eval.json; verified docs now record hybrid=20/0.95/1.0 and high_energy=20/0.5/1.0 Not-tested: Wider style-balanced benchmark beyond the FMA top-two subsets
Showing
3 changed files
with
43 additions
and
7 deletions
-
Please register or sign in to post a comment