Promote hybrid to the default strategy using the stronger cap24 evidence
Persist the larger real-FMA benchmark result showing hybrid clearly outperforming high_energy, so the project recommendation can converge on one default instead of an unresolved tie. Constraint: Only docs change because benchmark outputs remain outside version control Rejected: Keep treating hybrid and high_energy as co-equal defaults | The larger 24-track capped benchmark now separates them clearly Confidence: high Scope-risk: narrow Directive: Use cap24 top-two as the current strongest public evidence until a larger capped benchmark supersedes it Tested: Verified /tmp/ab_smoke_seg_cap24_top2/report.json; verified high_energy eval.json; verified docs now state hybrid=16/1.0/1.0 and high_energy=16/0.8125/1.0 Not-tested: Broader strategy comparison beyond hybrid vs high_energy on the 24-track subset
Showing
3 changed files
with
38 additions
and
4 deletions
| ... | @@ -2,6 +2,25 @@ | ... | @@ -2,6 +2,25 @@ |
| 2 | 2 | ||
| 3 | ## 2026-06-02 | 3 | ## 2026-06-02 |
| 4 | 4 | ||
| 5 | ### Stage: 收尾 cap24 top2 真实 FMA 对照并确认默认策略 | ||
| 6 | |||
| 7 | 完成项: | ||
| 8 | - 读取 `/tmp/ab_smoke_seg_cap24_top2/report.json` | ||
| 9 | - 读取 `/tmp/ab_smoke_seg_cap24_top2/high_energy/fma_reports_smoke/eval.json` | ||
| 10 | - 更新: | ||
| 11 | - [open-dataset-workflow.md](./open-dataset-workflow.md) | ||
| 12 | - [session-handoff.md](./session-handoff.md) | ||
| 13 | - [CHANGELOG.md](./CHANGELOG.md) | ||
| 14 | |||
| 15 | 最终结果(subset=24, `max_test_queries=16`): | ||
| 16 | - `hybrid`: `num_queries=16`, `top1=1.0`, `topk=1.0` | ||
| 17 | - `high_energy`: `num_queries=16`, `top1=0.8125`, `topk=1.0` | ||
| 18 | |||
| 19 | 结论: | ||
| 20 | - cap24 比 cap16 更有区分度,`hybrid` 不再只是与 `high_energy` 打平 | ||
| 21 | - 当前默认训练 / query 策略应明确固定为 `hybrid` | ||
| 22 | - `high_energy` 更适合作为补充对照或偏高能区数据的次选策略 | ||
| 23 | |||
| 5 | ### Stage: 启动更大 cap24 top2 真实 FMA 对照并记录首条结果 | 24 | ### Stage: 启动更大 cap24 top2 真实 FMA 对照并记录首条结果 |
| 6 | 25 | ||
| 7 | 完成项: | 26 | 完成项: | ... | ... |
| ... | @@ -161,6 +161,20 @@ flowchart LR | ... | @@ -161,6 +161,20 @@ flowchart LR |
| 161 | - `high_energy` 是当前最强的并列次选,适合更偏主段/高能区的数据 | 161 | - `high_energy` 是当前最强的并列次选,适合更偏主段/高能区的数据 |
| 162 | - `beat_aware` 更适合规则节拍较强的风格,但在这轮 FMA 子集上略弱 | 162 | - `beat_aware` 更适合规则节拍较强的风格,但在这轮 FMA 子集上略弱 |
| 163 | - `repeated_section_aware` 单独使用不如混合策略稳 | 163 | - `repeated_section_aware` 单独使用不如混合策略稳 |
| 164 | |||
| 165 | ### 更新:更大 cap24 top2 对照(subset=24, `max_test_queries=16`) | ||
| 166 | |||
| 167 | 在更大的真实 FMA 子集上,只保留前两名策略继续对照: | ||
| 168 | |||
| 169 | | 排名 | 策略 | num_queries | top1 | topk | | ||
| 170 | |---:|---|---:|---:|---:| | ||
| 171 | | 1 | `hybrid` | 16 | 1.0 | 1.0 | | ||
| 172 | | 2 | `high_energy` | 16 | 0.8125 | 1.0 | | ||
| 173 | |||
| 174 | 这轮结果比 cap16 更有区分度,说明: | ||
| 175 | - `hybrid` 不只是“和 `high_energy` 打平” | ||
| 176 | - 在更大的真实子集上,`hybrid` 的稳定性更强 | ||
| 177 | - 当前默认推荐应明确收敛到 **`hybrid`** | ||
| 164 | /usr/local/miniconda3/bin/python evaluate.py --data data/external_ingested/fma/manifests --model data/models_fma_smoke/best_model.pt --index-prefix data/index_fma_smoke/reference --split test --device cpu --fast-eval --output-json reports/fma-smoke/eval.json | 178 | /usr/local/miniconda3/bin/python evaluate.py --data data/external_ingested/fma/manifests --model data/models_fma_smoke/best_model.pt --index-prefix data/index_fma_smoke/reference --split test --device cpu --fast-eval --output-json reports/fma-smoke/eval.json |
| 165 | /usr/local/miniconda3/bin/python scripts/generate_artifacts.py --eval-json reports/fma-smoke/eval.json --config-json reports/fma-smoke/config.json --output-dir reports/fma-smoke --model-version fma-smoke --data-version fma_local | 179 | /usr/local/miniconda3/bin/python scripts/generate_artifacts.py --eval-json reports/fma-smoke/eval.json --config-json reports/fma-smoke/config.json --output-dir reports/fma-smoke --model-version fma-smoke --data-version fma_local |
| 166 | ``` | 180 | ``` | ... | ... |
| ... | @@ -401,7 +401,7 @@ cd /workspace/acr-engine | ... | @@ -401,7 +401,7 @@ cd /workspace/acr-engine |
| 401 | | 策略 | subset | max_test_queries | top1 | topk | 状态 | | 401 | | 策略 | subset | max_test_queries | top1 | topk | 状态 | |
| 402 | |---|---:|---:|---:|---:|---| | 402 | |---|---:|---:|---:|---:|---| |
| 403 | | `hybrid` | 24 | 16 | 1.0 | 1.0 | 已完成 | | 403 | | `hybrid` | 24 | 16 | 1.0 | 1.0 | 已完成 | |
| 404 | | `high_energy` | 24 | 16 | - | - | 训练中 | | 404 | | `high_energy` | 24 | 16 | 0.8125 | 1.0 | 已完成 | |
| 405 | 405 | ||
| 406 | 恢复检查命令: | 406 | 恢复检查命令: |
| 407 | 407 | ||
| ... | @@ -409,9 +409,10 @@ cd /workspace/acr-engine | ... | @@ -409,9 +409,10 @@ cd /workspace/acr-engine |
| 409 | pgrep -af 'ab_smoke_seg_cap24_top2|external_adapters.py smoke-local fma /tmp/ab_smoke_seg_cap24_top2|evaluate.py --data /tmp/ab_smoke_seg_cap24_top2|run_demo.py build-index --data /tmp/ab_smoke_seg_cap24_top2' | 409 | pgrep -af 'ab_smoke_seg_cap24_top2|external_adapters.py smoke-local fma /tmp/ab_smoke_seg_cap24_top2|evaluate.py --data /tmp/ab_smoke_seg_cap24_top2|run_demo.py build-index --data /tmp/ab_smoke_seg_cap24_top2' |
| 410 | ``` | 410 | ``` |
| 411 | 411 | ||
| 412 | 如果 `report.json` 尚未生成,优先等待: | 412 | cap24 top2 最终结论: |
| 413 | - `/tmp/ab_smoke_seg_cap24_top2/high_energy/fma_reports_smoke/eval.json` | 413 | - `hybrid`:`16 / 1.0 / 1.0` |
| 414 | - `/tmp/ab_smoke_seg_cap24_top2/report.json` | 414 | - `high_energy`:`16 / 0.8125 / 1.0` |
| 415 | - 这个结果比 cap16 更能说明问题:**当前默认策略应明确固定为 `hybrid`** | ||
| 415 | - `b766c74` Make open-dataset manifests trainable end to end | 416 | - `b766c74` Make open-dataset manifests trainable end to end |
| 416 | - `fa23144` Add a single-page open dataset workflow for training prep | 417 | - `fa23144` Add a single-page open dataset workflow for training prep |
| 417 | - `af33be3` Condense docs and add manifest validation before training | 418 | - `af33be3` Condense docs and add manifest validation before training | ... | ... |
-
Please register or sign in to post a comment