Commit 08379e56 08379e5636315c8ed958706c7288c302e8237d47 by cnb.bofCdSsphPA

Promote hybrid to the default strategy using the stronger cap24 evidence

Persist the larger real-FMA benchmark result showing hybrid clearly outperforming high_energy, so the project recommendation can converge on one default instead of an unresolved tie.

Constraint: Only docs change because benchmark outputs remain outside version control
Rejected: Keep treating hybrid and high_energy as co-equal defaults | The larger 24-track capped benchmark now separates them clearly
Confidence: high
Scope-risk: narrow
Directive: Use cap24 top-two as the current strongest public evidence until a larger capped benchmark supersedes it
Tested: Verified /tmp/ab_smoke_seg_cap24_top2/report.json; verified high_energy eval.json; verified docs now state hybrid=16/1.0/1.0 and high_energy=16/0.8125/1.0
Not-tested: Broader strategy comparison beyond hybrid vs high_energy on the 24-track subset
1 parent 48a5957a
...@@ -2,6 +2,25 @@ ...@@ -2,6 +2,25 @@
2 2
3 ## 2026-06-02 3 ## 2026-06-02
4 4
5 ### Stage: 收尾 cap24 top2 真实 FMA 对照并确认默认策略
6
7 完成项:
8 - 读取 `/tmp/ab_smoke_seg_cap24_top2/report.json`
9 - 读取 `/tmp/ab_smoke_seg_cap24_top2/high_energy/fma_reports_smoke/eval.json`
10 - 更新:
11 - [open-dataset-workflow.md](./open-dataset-workflow.md)
12 - [session-handoff.md](./session-handoff.md)
13 - [CHANGELOG.md](./CHANGELOG.md)
14
15 最终结果(subset=24, `max_test_queries=16`):
16 - `hybrid`: `num_queries=16`, `top1=1.0`, `topk=1.0`
17 - `high_energy`: `num_queries=16`, `top1=0.8125`, `topk=1.0`
18
19 结论:
20 - cap24 比 cap16 更有区分度,`hybrid` 不再只是与 `high_energy` 打平
21 - 当前默认训练 / query 策略应明确固定为 `hybrid`
22 - `high_energy` 更适合作为补充对照或偏高能区数据的次选策略
23
5 ### Stage: 启动更大 cap24 top2 真实 FMA 对照并记录首条结果 24 ### Stage: 启动更大 cap24 top2 真实 FMA 对照并记录首条结果
6 25
7 完成项: 26 完成项:
......
...@@ -161,6 +161,20 @@ flowchart LR ...@@ -161,6 +161,20 @@ flowchart LR
161 - `high_energy` 是当前最强的并列次选,适合更偏主段/高能区的数据 161 - `high_energy` 是当前最强的并列次选,适合更偏主段/高能区的数据
162 - `beat_aware` 更适合规则节拍较强的风格,但在这轮 FMA 子集上略弱 162 - `beat_aware` 更适合规则节拍较强的风格,但在这轮 FMA 子集上略弱
163 - `repeated_section_aware` 单独使用不如混合策略稳 163 - `repeated_section_aware` 单独使用不如混合策略稳
164
165 ### 更新:更大 cap24 top2 对照(subset=24, `max_test_queries=16`)
166
167 在更大的真实 FMA 子集上,只保留前两名策略继续对照:
168
169 | 排名 | 策略 | num_queries | top1 | topk |
170 |---:|---|---:|---:|---:|
171 | 1 | `hybrid` | 16 | 1.0 | 1.0 |
172 | 2 | `high_energy` | 16 | 0.8125 | 1.0 |
173
174 这轮结果比 cap16 更有区分度,说明:
175 - `hybrid` 不只是“和 `high_energy` 打平”
176 - 在更大的真实子集上,`hybrid` 的稳定性更强
177 - 当前默认推荐应明确收敛到 **`hybrid`**
164 /usr/local/miniconda3/bin/python evaluate.py --data data/external_ingested/fma/manifests --model data/models_fma_smoke/best_model.pt --index-prefix data/index_fma_smoke/reference --split test --device cpu --fast-eval --output-json reports/fma-smoke/eval.json 178 /usr/local/miniconda3/bin/python evaluate.py --data data/external_ingested/fma/manifests --model data/models_fma_smoke/best_model.pt --index-prefix data/index_fma_smoke/reference --split test --device cpu --fast-eval --output-json reports/fma-smoke/eval.json
165 /usr/local/miniconda3/bin/python scripts/generate_artifacts.py --eval-json reports/fma-smoke/eval.json --config-json reports/fma-smoke/config.json --output-dir reports/fma-smoke --model-version fma-smoke --data-version fma_local 179 /usr/local/miniconda3/bin/python scripts/generate_artifacts.py --eval-json reports/fma-smoke/eval.json --config-json reports/fma-smoke/config.json --output-dir reports/fma-smoke --model-version fma-smoke --data-version fma_local
166 ``` 180 ```
......
...@@ -401,7 +401,7 @@ cd /workspace/acr-engine ...@@ -401,7 +401,7 @@ cd /workspace/acr-engine
401 | 策略 | subset | max_test_queries | top1 | topk | 状态 | 401 | 策略 | subset | max_test_queries | top1 | topk | 状态 |
402 |---|---:|---:|---:|---:|---| 402 |---|---:|---:|---:|---:|---|
403 | `hybrid` | 24 | 16 | 1.0 | 1.0 | 已完成 | 403 | `hybrid` | 24 | 16 | 1.0 | 1.0 | 已完成 |
404 | `high_energy` | 24 | 16 | - | - | 训练中 | 404 | `high_energy` | 24 | 16 | 0.8125 | 1.0 | 已完成 |
405 405
406 恢复检查命令: 406 恢复检查命令:
407 407
...@@ -409,9 +409,10 @@ cd /workspace/acr-engine ...@@ -409,9 +409,10 @@ cd /workspace/acr-engine
409 pgrep -af 'ab_smoke_seg_cap24_top2|external_adapters.py smoke-local fma /tmp/ab_smoke_seg_cap24_top2|evaluate.py --data /tmp/ab_smoke_seg_cap24_top2|run_demo.py build-index --data /tmp/ab_smoke_seg_cap24_top2' 409 pgrep -af 'ab_smoke_seg_cap24_top2|external_adapters.py smoke-local fma /tmp/ab_smoke_seg_cap24_top2|evaluate.py --data /tmp/ab_smoke_seg_cap24_top2|run_demo.py build-index --data /tmp/ab_smoke_seg_cap24_top2'
410 ``` 410 ```
411 411
412 如果 `report.json` 尚未生成,优先等待: 412 cap24 top2 最终结论:
413 - `/tmp/ab_smoke_seg_cap24_top2/high_energy/fma_reports_smoke/eval.json` 413 - `hybrid``16 / 1.0 / 1.0`
414 - `/tmp/ab_smoke_seg_cap24_top2/report.json` 414 - `high_energy``16 / 0.8125 / 1.0`
415 - 这个结果比 cap16 更能说明问题:**当前默认策略应明确固定为 `hybrid`**
415 - `b766c74` Make open-dataset manifests trainable end to end 416 - `b766c74` Make open-dataset manifests trainable end to end
416 - `fa23144` Add a single-page open dataset workflow for training prep 417 - `fa23144` Add a single-page open dataset workflow for training prep
417 - `af33be3` Condense docs and add manifest validation before training 418 - `af33be3` Condense docs and add manifest validation before training
......