Commit 55974514 559745141e5fd163cc92c574b509c0af32f38f9d by cnb.bofCdSsphPA

Record the proven offline smoke so the handoff reflects executable evidence

Constraint: Limit this checkpoint to documentation updates backed by already-collected local evidence
Rejected: Leave the smoke result only in transient chat output | The next session needs the proof captured in repo-native handoff files
Confidence: high
Scope-risk: narrow
Directive: Keep treating the offline smoke as an integration proof, not as a substitute for real business-data validation
Tested: Rechecked 183 relative links and documented the successful offline smoke summary already verified locally
Not-tested: No new code path executed in this documentation-only checkpoint
1 parent 7eff944b
## 2026-06-02 业务导出离线 smoke 实跑通过 checkpoint
完成项:
- 已实际运行 `acr-engine/scripts/business_export_offline_smoke.py`
- 已确认链路从业务导出样例 -> manifest-ready JSONL -> 项目 manifest -> `train.py --dry-run` 全部跑通。
验证结果:
- `input_rows=5`
- `output_rows=5`
- roles=`reference/query/excluded`
- buckets=`lossless_reference_core/short_video_hook/demo_variation_pool`
- `catalog_refs=2`
- `train_queries=1`
- `test_queries=1`
- `val_queries=0`
- `dry_run_passed=true`
结论:
- 业务导出离线适配链已经具备真实可运行证据,而不只是模板与脚本集合。
- 下个 session 可以直接替换成真实业务导出数据,沿同一链路继续推进。
## 2026-06-02 项目 manifest 适配脚本交付 checkpoint
完成项:
......
......@@ -99,3 +99,7 @@ cd /workspace/acr-engine
2. 继续补 cap64 multi-seed,而不是只保留单 seed。
3. 在 bucket 基线下继续优化 `hybrid` 波动,而不是过早锁定全局默认策略。
4. 保持“文档更新 -> changelog -> commit -> push”的阶段节奏。
- 已新增 `acr-engine/scripts/business_export_offline_smoke.py`,并拿到端到端离线 smoke fresh evidence。
- 已确认链路:业务导出样例 -> 规范化 -> 项目 manifest -> `train.py --dry-run`
......
......@@ -74,3 +74,16 @@ test -f /tmp/ab_smoke_seg_cap48_top2_seed999/report.json && cat /tmp/ab_smoke_se
- `hybrid``mean_top1=1.0, mean_num_queries=4.0`
- `high_energy``mean_top1=1.0, mean_num_queries=3.5`
- 这意味着 bucket baseline 已经可以作为后续“解释不同子集 winner 分化”的最小工程基础。
## 最新新增的实跑证据
- 新增脚本:`acr-engine/scripts/business_export_offline_smoke.py`
- 已在本地真实可读音频上跑通:
- 业务导出样例 -> 规范化 -> 项目 manifest -> `train.py --dry-run`
- 关键结果:
- `catalog_refs=2`
- `train_queries=1`
- `test_queries=1`
- `val_queries=0`
- `dry_run_passed=true`
......