Commit 55974514 559745141e5fd163cc92c574b509c0af32f38f9d by cnb.bofCdSsphPA

Record the proven offline smoke so the handoff reflects executable evidence

Constraint: Limit this checkpoint to documentation updates backed by already-collected local evidence
Rejected: Leave the smoke result only in transient chat output | The next session needs the proof captured in repo-native handoff files
Confidence: high
Scope-risk: narrow
Directive: Keep treating the offline smoke as an integration proof, not as a substitute for real business-data validation
Tested: Rechecked 183 relative links and documented the successful offline smoke summary already verified locally
Not-tested: No new code path executed in this documentation-only checkpoint
1 parent 7eff944b
1 ## 2026-06-02 业务导出离线 smoke 实跑通过 checkpoint
2
3 完成项:
4 - 已实际运行 `acr-engine/scripts/business_export_offline_smoke.py`
5 - 已确认链路从业务导出样例 -> manifest-ready JSONL -> 项目 manifest -> `train.py --dry-run` 全部跑通。
6
7 验证结果:
8 - `input_rows=5`
9 - `output_rows=5`
10 - roles=`reference/query/excluded`
11 - buckets=`lossless_reference_core/short_video_hook/demo_variation_pool`
12 - `catalog_refs=2`
13 - `train_queries=1`
14 - `test_queries=1`
15 - `val_queries=0`
16 - `dry_run_passed=true`
17
18 结论:
19 - 业务导出离线适配链已经具备真实可运行证据,而不只是模板与脚本集合。
20 - 下个 session 可以直接替换成真实业务导出数据,沿同一链路继续推进。
21
1 ## 2026-06-02 项目 manifest 适配脚本交付 checkpoint 22 ## 2026-06-02 项目 manifest 适配脚本交付 checkpoint
2 23
3 完成项: 24 完成项:
......
...@@ -99,3 +99,7 @@ cd /workspace/acr-engine ...@@ -99,3 +99,7 @@ cd /workspace/acr-engine
99 2. 继续补 cap64 multi-seed,而不是只保留单 seed。 99 2. 继续补 cap64 multi-seed,而不是只保留单 seed。
100 3. 在 bucket 基线下继续优化 `hybrid` 波动,而不是过早锁定全局默认策略。 100 3. 在 bucket 基线下继续优化 `hybrid` 波动,而不是过早锁定全局默认策略。
101 4. 保持“文档更新 -> changelog -> commit -> push”的阶段节奏。 101 4. 保持“文档更新 -> changelog -> commit -> push”的阶段节奏。
102
103
104 - 已新增 `acr-engine/scripts/business_export_offline_smoke.py`,并拿到端到端离线 smoke fresh evidence。
105 - 已确认链路:业务导出样例 -> 规范化 -> 项目 manifest -> `train.py --dry-run`
......
...@@ -74,3 +74,16 @@ test -f /tmp/ab_smoke_seg_cap48_top2_seed999/report.json && cat /tmp/ab_smoke_se ...@@ -74,3 +74,16 @@ test -f /tmp/ab_smoke_seg_cap48_top2_seed999/report.json && cat /tmp/ab_smoke_se
74 - `hybrid``mean_top1=1.0, mean_num_queries=4.0` 74 - `hybrid``mean_top1=1.0, mean_num_queries=4.0`
75 - `high_energy``mean_top1=1.0, mean_num_queries=3.5` 75 - `high_energy``mean_top1=1.0, mean_num_queries=3.5`
76 - 这意味着 bucket baseline 已经可以作为后续“解释不同子集 winner 分化”的最小工程基础。 76 - 这意味着 bucket baseline 已经可以作为后续“解释不同子集 winner 分化”的最小工程基础。
77
78
79 ## 最新新增的实跑证据
80
81 - 新增脚本:`acr-engine/scripts/business_export_offline_smoke.py`
82 - 已在本地真实可读音频上跑通:
83 - 业务导出样例 -> 规范化 -> 项目 manifest -> `train.py --dry-run`
84 - 关键结果:
85 - `catalog_refs=2`
86 - `train_queries=1`
87 - `test_queries=1`
88 - `val_queries=0`
89 - `dry_run_passed=true`
......