pin down the hard-case gap after the first real-path closure\n\nConstraint: Hand…
…off must distinguish clean real-path evidence from hard-case evidence without staging temporary evaluation artifacts\nRejected: Keep scaling clean-only FMA smoke first | Fresh evidence shows the next highest-yield work is hard-case top1 improvement\nConfidence: high\nScope-risk: narrow\nDirective: Treat humming_like and confused as the primary optimization targets before investing more cycles in larger clean-only smoke runs\nTested: Audited manifest type coverage, verified synthetic_v2 hard-case evaluate results, and updated handoff/changelog docs with the gap analysis\nNot-tested: Post-optimization hard-case improvements on real open-data derived hard cases
Showing
5 changed files
with
117 additions
and
33 deletions
| ... | @@ -74,24 +74,20 @@ | ... | @@ -74,24 +74,20 @@ |
| 74 | 74 | ||
| 75 | ## 5.5 最新真实 FMA / chromaprint 运行态(2026-06-02) | 75 | ## 5.5 最新真实 FMA / chromaprint 运行态(2026-06-02) |
| 76 | 76 | ||
| 77 | ### 当前最新快照(15:40 UTC) | 77 | ### 当前最新快照(15:43 UTC) |
| 78 | 78 | ||
| 79 | - 远程同步基线:`9371e94`(更新前) | 79 | - 远程同步基线:`81704ac`(更新前) |
| 80 | - 当前最重要的新证据:**fixed real-path 200-ref rerun 已拿到首份显式 evaluate 指标**。 | 80 | - 当前最重要的新证据:**hard-case 短板已经被明确量化**。 |
| 81 | - index 路径:`/tmp/fma_realpath_small_rerun_index2` | 81 | - real-path clean 闭环:`num_queries=35`, `top1=0.8571`, `topk=1.0` |
| 82 | - eval 路径:`/tmp/fma_realpath_small_rerun_eval/eval_top50.json` | 82 | - synthetic hard-case smoke:`num_queries=16`, `top1=0.6875`, `topk=1.0` |
| 83 | - 当前结果: | 83 | - `humming_like: n=4, top1=0.25, topk=1.0` |
| 84 | - `num_queries=35` | 84 | - `confused: n=1, top1=0.0, topk=1.0` |
| 85 | - `top1=0.8571` | 85 | - 关键解释:real-path FMA external smoke manifest 目前只有 `clean` query;`humming_like` / `confused` 需要通过 `data/synthetic_v2` 这类 hard-case 集补验证。 |
| 86 | - `topk=1.0` | 86 | - 这说明:当前工程链已经足够稳定,下一步的最大收益不在 clean smoke,而在 hard-case top1 提升。 |
| 87 | - `by_type.clean: n=35, top1=0.8571, topk=1.0` | ||
| 88 | - query 数说明:overlap test items `235` 中,非 `reference` query 只有 `35`,所以 `--max-queries 50` 最终评到 `35`。 | ||
| 89 | - 这说明:当前已具备一条完整可复用的真实路径 smoke 证据链: | ||
| 90 | `chromaprint complete -> reference complete -> evaluate complete` | ||
| 91 | - 下一次值得提交的事件: | 87 | - 下一次值得提交的事件: |
| 92 | 1. 更大 query 数 / 更大 reference 集评测 | 88 | 1. hard-case 优化后指标改善 |
| 93 | 2. `confused` / `humming_like` / hard negative 指标 | 89 | 2. 真实开放数据上的 hard-case 生成/标注方案落地 |
| 94 | 3. 更接近商用场景的数据组合结果 | 90 | 3. 更大规模 query/reference 的复测结果 |
| 95 | 91 | ||
| 96 | 92 | ||
| 97 | ## 6. 高风险注意事项 | 93 | ## 6. 高风险注意事项 | ... | ... |
| 1 | ## 2026-06-02 15:43 UTC / hard-case gap verified after the first real-path closure | ||
| 2 | |||
| 3 | - 在首个 real-path `build-index -> evaluate` 闭环之后,补跑了一轮现成 `synthetic_v2` hard-case smoke,验证下一步优化重点 | ||
| 4 | - fresh evidence(`2026-06-02 15:43:17 UTC`): | ||
| 5 | - 评测文件:`/tmp/synthetic_v2_eval_v6_top16.json` | ||
| 6 | - 组合:`data/synthetic_v2` + `data/models_v6/best_model.pt` + `data/index_v6/reference` | ||
| 7 | - 结果:`num_queries=16`, `top1=0.6875`, `topk=1.0` | ||
| 8 | - by_type: | ||
| 9 | - `clean`: `n=7`, `top1=1.0`, `topk=1.0` | ||
| 10 | - `augmented`: `n=4`, `top1=0.75`, `topk=1.0` | ||
| 11 | - `humming_like`: `n=4`, `top1=0.25`, `topk=1.0` | ||
| 12 | - `confused`: `n=1`, `top1=0.0`, `topk=1.0` | ||
| 13 | - 关键解释: | ||
| 14 | - 当前 real-path FMA external smoke manifest 只包含 `clean` query,没有 `confused` / `humming_like` | ||
| 15 | - 因此 real-path 评测只能证明 clean 闭环,不足以证明 hard-case 鲁棒性 | ||
| 16 | - 结论:当前最明确的优化方向已收敛到 `humming_like` / `confused` 的 top1 提升,而不是继续重复 clean smoke | ||
| 17 | |||
| 1 | ## 2026-06-02 15:40 UTC / real-path 200-ref rerun closed the first explicit evaluate loop | 18 | ## 2026-06-02 15:40 UTC / real-path 200-ref rerun closed the first explicit evaluate loop |
| 2 | 19 | ||
| 3 | - 基于已完成的 `200-ref` real-path index,补了一轮显式 `evaluate.py` smoke | 20 | - 基于已完成的 `200-ref` real-path index,补了一轮显式 `evaluate.py` smoke | ... | ... |
| ... | @@ -192,3 +192,32 @@ | ... | @@ -192,3 +192,32 @@ |
| 192 | 192 | ||
| 193 | - 当前已不只是建索引成功,而是已经拿到首份真实路径 `build-index -> evaluate` 闭环证据。 | 193 | - 当前已不只是建索引成功,而是已经拿到首份真实路径 `build-index -> evaluate` 闭环证据。 |
| 194 | - 下一轮应把重点切到:更大评测规模与 hard case / confusion 评测。 | 194 | - 下一轮应把重点切到:更大评测规模与 hard case / confusion 评测。 |
| 195 | |||
| 196 | ## 本次追加交付(2026-06-02 15:43 UTC) | ||
| 197 | |||
| 198 | ### 新增运行证据 | ||
| 199 | |||
| 200 | | 类别 | 内容 | | ||
| 201 | |---|---| | ||
| 202 | | hard-case smoke | `synthetic_v2 + models_v6 + index_v6` 显式评测完成 | | ||
| 203 | | 总体 | `num_queries=16`, `top1=0.6875`, `topk=1.0` | | ||
| 204 | | hard case | `humming_like top1=0.25`, `confused top1=0.0` | | ||
| 205 | | 结论 | 当前短板已明确落在 hard-case top1,而不是 clean/topk | | ||
| 206 | |||
| 207 | ### 当前最重要的 fresh evidence | ||
| 208 | |||
| 209 | - 观测时间:`2026-06-02 15:43:17 UTC` | ||
| 210 | - 结果文件:`/tmp/synthetic_v2_eval_v6_top16.json` | ||
| 211 | - 评测结果: | ||
| 212 | - `top1=0.6875` | ||
| 213 | - `topk=1.0` | ||
| 214 | - `humming_like: n=4, top1=0.25, topk=1.0` | ||
| 215 | - `confused: n=1, top1=0.0, topk=1.0` | ||
| 216 | - manifest 审计结果: | ||
| 217 | - real-path FMA external smoke 只有 `clean` query | ||
| 218 | - synthetic_v2 才包含 `augmented` / `humming_like` / `confused` | ||
| 219 | |||
| 220 | ### 结论 | ||
| 221 | |||
| 222 | - 当前已经不仅知道“系统能跑通”,还知道“最该优化哪里”:hard-case 的 top1。 | ||
| 223 | - 下一轮更有价值的是围绕 `humming_like` / `confused` 做输入层、切片、混淆增强与 hard negative 优化。 | ... | ... |
| 1 | ## 本次交付包追加更新(2026-06-02 15:43 UTC) | ||
| 2 | |||
| 3 | ### 交付结论 | ||
| 4 | |||
| 5 | 当前最新里程碑已经从“real-path clean 闭环跑通”推进到 **hard-case 短板已被明确量化**: | ||
| 6 | - 远程基线当前为:`81704ac`(更新前) | ||
| 7 | - real-path FMA smoke 已证明 `clean` 闭环可跑通 | ||
| 8 | - synthetic hard-case smoke 已证明当前主要短板在 `humming_like` / `confused` 的 top1 | ||
| 9 | - 因此下一阶段不应重复 clean smoke,而应聚焦 hard-case 鲁棒性优化 | ||
| 10 | |||
| 11 | ### 当前最新事实 | ||
| 12 | |||
| 13 | #### hard-case smoke 结果 | ||
| 14 | - 观测时间:`2026-06-02 15:43:17 UTC` | ||
| 15 | - 组合:`data/synthetic_v2` + `data/models_v6/best_model.pt` + `data/index_v6/reference` | ||
| 16 | - 结果文件:`/tmp/synthetic_v2_eval_v6_top16.json` | ||
| 17 | - 评测结果: | ||
| 18 | - `num_queries=16` | ||
| 19 | - `top1=0.6875` | ||
| 20 | - `topk=1.0` | ||
| 21 | - `clean: n=7, top1=1.0, topk=1.0` | ||
| 22 | - `augmented: n=4, top1=0.75, topk=1.0` | ||
| 23 | - `humming_like: n=4, top1=0.25, topk=1.0` | ||
| 24 | - `confused: n=1, top1=0.0, topk=1.0` | ||
| 25 | |||
| 26 | #### 关键解释 | ||
| 27 | - real-path FMA external smoke manifest 目前只有 `clean` query: | ||
| 28 | - external test = `1613 clean` | ||
| 29 | - rerun overlap test = `35 clean` | ||
| 30 | - 当前仓库里能提供 `humming_like` / `confused` 的现成评测集是 `data/synthetic_v2`。 | ||
| 31 | |||
| 32 | ### 当前判断 | ||
| 33 | |||
| 34 | - 真实路径闭环已经足够证明工程链可运行。 | ||
| 35 | - 下一阶段的收益最高点已经收敛到: | ||
| 36 | 1. `humming_like` top1 提升; | ||
| 37 | 2. `confused` top1 提升; | ||
| 38 | 3. 将 hard-case 生成/标注引入真实开放数据评测链。 | ||
| 39 | |||
| 40 | --- | ||
| 41 | |||
| 1 | ## 本次交付包追加更新(2026-06-02 15:40 UTC) | 42 | ## 本次交付包追加更新(2026-06-02 15:40 UTC) |
| 2 | 43 | ||
| 3 | ### 交付结论 | 44 | ### 交付结论 | ... | ... |
| ... | @@ -5,27 +5,28 @@ | ... | @@ -5,27 +5,28 @@ |
| 5 | 5 | ||
| 6 | ## 一页结论 | 6 | ## 一页结论 |
| 7 | 7 | ||
| 8 | ### 最新交付快照(2026-06-02 15:40 UTC) | 8 | ### 最新交付快照(2026-06-02 15:43 UTC) |
| 9 | 9 | ||
| 10 | - 当前远程同步基线:`9371e94`(更新前) | 10 | - 当前远程同步基线:`81704ac`(更新前) |
| 11 | - 当前最重要的新事实:**fixed real-path 200-ref rerun 已拿到首份显式 evaluate 指标** | 11 | - 当前最重要的新事实:**hard-case 短板已经被明确量化** |
| 12 | - index 路径:`/tmp/fma_realpath_small_rerun_index2` | 12 | - real-path clean 闭环结果: |
| 13 | - eval 路径:`/tmp/fma_realpath_small_rerun_eval/eval_top50.json` | ||
| 14 | - 当前核心结果: | ||
| 15 | - `num_queries=35` | 13 | - `num_queries=35` |
| 16 | - `top1=0.8571` | 14 | - `top1=0.8571` |
| 17 | - `topk=1.0` | 15 | - `topk=1.0` |
| 18 | - `by_type.clean: n=35, top1=0.8571, topk=1.0` | 16 | - synthetic hard-case smoke 结果: |
| 19 | - query 数说明: | 17 | - `num_queries=16` |
| 20 | - overlap test items = `235` | 18 | - `top1=0.6875` |
| 21 | - 非 `reference` query = `35` | 19 | - `topk=1.0` |
| 22 | - 所以 `--max-queries 50` 实际评到 `35` 条 | 20 | - `humming_like: n=4, top1=0.25, topk=1.0` |
| 23 | - 结论:当前已经形成真实路径的第一条完整闭环: | 21 | - `confused: n=1, top1=0.0, topk=1.0` |
| 24 | `chromaprint -> reference -> evaluate` | 22 | - 关键解释: |
| 23 | - real-path FMA external smoke 目前只有 `clean` query | ||
| 24 | - `humming_like` / `confused` 目前只能通过 `data/synthetic_v2` 这类 hard-case 集验证 | ||
| 25 | - 结论:下一阶段不应继续重复 clean smoke,而应优先针对 hard-case 提升 top1。 | ||
| 25 | - 新 session 第一优先级: | 26 | - 新 session 第一优先级: |
| 26 | 1. 扩大 query / reference 规模 | 27 | 1. 围绕 `humming_like` / `confused` 做输入层与切片优化 |
| 27 | 2. 补 hard case(`confused` / `humming_like`)评测 | 28 | 2. 设计真实开放数据上的 hard-case 生成/标注链 |
| 28 | 3. 再决定是否推到更大 FMA 子集或全量 | 29 | 3. 再扩大 query / reference 规模复测 |
| 29 | 30 | ||
| 30 | ### 最新可观测性修复(2026-06-02 15:18 UTC) | 31 | ### 最新可观测性修复(2026-06-02 15:18 UTC) |
| 31 | 32 | ... | ... |
-
Please register or sign in to post a comment