capture the first real-path index-to-evaluate closure\n\nConstraint: Delivery st…
…ate must reflect fresh evaluate evidence without staging temporary eval assets\nRejected: Wait for larger-scale or hard-case metrics | The first explicit evaluate closure is already a meaningful milestone and restart-safe handoff point\nConfidence: high\nScope-risk: narrow\nDirective: Reuse /tmp/fma_realpath_small_rerun_index2 and /tmp/fma_realpath_small_rerun_eval as the next validation baseline before scaling up\nTested: Verified eval_top50.json at num_queries 35 with top1 0.8571 and topk 1.0, confirmed query-count explanation, and updated handoff/changelog docs\nNot-tested: Larger query caps, hard-case buckets, and full-scale FMA evaluate runs
Showing
5 changed files
with
123 additions
and
46 deletions
| ... | @@ -74,31 +74,24 @@ | ... | @@ -74,31 +74,24 @@ |
| 74 | 74 | ||
| 75 | ## 5.5 最新真实 FMA / chromaprint 运行态(2026-06-02) | 75 | ## 5.5 最新真实 FMA / chromaprint 运行态(2026-06-02) |
| 76 | 76 | ||
| 77 | ### 当前最新快照(15:35 UTC) | 77 | ### 当前最新快照(15:40 UTC) |
| 78 | 78 | ||
| 79 | - 远程同步基线:`41c4d7c`(更新前) | 79 | - 远程同步基线:`9371e94`(更新前) |
| 80 | - 当前最重要的新证据:**fixed real-path 200-ref rerun 已完整产出最终 reference index**。 | 80 | - 当前最重要的新证据:**fixed real-path 200-ref rerun 已拿到首份显式 evaluate 指标**。 |
| 81 | - 输出目录:`/tmp/fma_realpath_small_rerun_index2` | 81 | - index 路径:`/tmp/fma_realpath_small_rerun_index2` |
| 82 | - chromaprint 已完成: | 82 | - eval 路径:`/tmp/fma_realpath_small_rerun_eval/eval_top50.json` |
| 83 | - `status=complete` | 83 | - 当前结果: |
| 84 | - `refs_done=200/200` | 84 | - `num_queries=35` |
| 85 | - `skipped_refs=0` | 85 | - `top1=0.8571` |
| 86 | - `hashes=57577` | 86 | - `topk=1.0` |
| 87 | - `postings=187446` | 87 | - `by_type.clean: n=35, top1=0.8571, topk=1.0` |
| 88 | - reference 已完成: | 88 | - query 数说明:overlap test items `235` 中,非 `reference` query 只有 `35`,所以 `--max-queries 50` 最终评到 `35`。 |
| 89 | - `status=complete` | 89 | - 这说明:当前已具备一条完整可复用的真实路径 smoke 证据链: |
| 90 | - `refs_done=200/200` | 90 | `chromaprint complete -> reference complete -> evaluate complete` |
| 91 | - `windows_done=2068` | ||
| 92 | - `embedding_shape=[2068, 192]` | ||
| 93 | - `skipped_refs=0` | ||
| 94 | - 当前已出现最终产物: | ||
| 95 | - `reference_embs.npy` | ||
| 96 | - `reference_ids.npy` | ||
| 97 | - 这说明:`flush=True` + 坏音频 skip tolerance 修复后,真实路径 rerun 已完整穿过两段核心建索引流程。 | ||
| 98 | - 下一次值得提交的事件: | 91 | - 下一次值得提交的事件: |
| 99 | 1. `evaluate.py` 启动或显式 evaluate smoke 完成 | 92 | 1. 更大 query 数 / 更大 reference 集评测 |
| 100 | 2. identify / 检索指标产出 | 93 | 2. `confused` / `humming_like` / hard negative 指标 |
| 101 | 3. 或新的更大样本/全量 rerun 结果 | 94 | 3. 更接近商用场景的数据组合结果 |
| 102 | 95 | ||
| 103 | 96 | ||
| 104 | ## 6. 高风险注意事项 | 97 | ## 6. 高风险注意事项 | ... | ... |
| 1 | ## 2026-06-02 15:40 UTC / real-path 200-ref rerun closed the first explicit evaluate loop | ||
| 2 | |||
| 3 | - 基于已完成的 `200-ref` real-path index,补了一轮显式 `evaluate.py` smoke | ||
| 4 | - 先定位并修复评测环境问题: | ||
| 5 | - 初次失败原因为 `/tmp/fma_realpath_small_rerun_eval/audio/...` 缺失 | ||
| 6 | - 通过软链补齐:`/tmp/fma_realpath_small_rerun_eval/audio -> /workspace/acr-engine/data/external_smoke/fma/audio` | ||
| 7 | - fresh evidence(`2026-06-02 15:40:30 UTC`): | ||
| 8 | - `eval_top50.json` 已落盘:`/tmp/fma_realpath_small_rerun_eval/eval_top50.json` | ||
| 9 | - 结果:`num_queries=35`, `top1=0.8571`, `topk=1.0` | ||
| 10 | - `by_type.clean`: `n=35`, `top1=0.8571`, `topk=1.0` | ||
| 11 | - 验证补充: | ||
| 12 | - overlap test items = `235` | ||
| 13 | - 其中非 `reference` query = `35` | ||
| 14 | - 因此即使 `--max-queries 50`,最终也只会评到 `35` 条 query | ||
| 15 | - 结论:当前已拿到第一份真实路径 `build-index -> evaluate` 闭环证据 | ||
| 16 | - 下一关键里程碑: | ||
| 17 | 1. 扩到更大 query 数或更大 reference 集 | ||
| 18 | 2. 引入 `confused` / `humming_like` 等 hard case 评测 | ||
| 19 | |||
| 1 | ## 2026-06-02 15:35 UTC / real-path 200-ref rerun finished reference index | 20 | ## 2026-06-02 15:35 UTC / real-path 200-ref rerun finished reference index |
| 2 | 21 | ||
| 3 | - fixed real-path 200 reference rerun:`/tmp/fma_realpath_small_rerun_index2` 已完成 reference/embedding 阶段 | 22 | - fixed real-path 200 reference rerun:`/tmp/fma_realpath_small_rerun_index2` 已完成 reference/embedding 阶段 | ... | ... |
| ... | @@ -162,3 +162,33 @@ | ... | @@ -162,3 +162,33 @@ |
| 162 | 162 | ||
| 163 | - 当前已确认:修复后的真实路径 rerun 不仅能进入 reference 阶段,而且能完整产出最终 embedding index。 | 163 | - 当前已确认:修复后的真实路径 rerun 不仅能进入 reference 阶段,而且能完整产出最终 embedding index。 |
| 164 | - 下一轮最高价值工作应切到:评测链是否自动衔接,以及必要时补显式 evaluate smoke。 | 164 | - 下一轮最高价值工作应切到:评测链是否自动衔接,以及必要时补显式 evaluate smoke。 |
| 165 | |||
| 166 | ## 本次追加交付(2026-06-02 15:40 UTC) | ||
| 167 | |||
| 168 | ### 新增运行证据 | ||
| 169 | |||
| 170 | | 类别 | 内容 | | ||
| 171 | |---|---| | ||
| 172 | | evaluate | 显式 `evaluate.py` smoke 已完成 | | ||
| 173 | | query 规模 | `num_queries=35`(overlap 中全部非 reference query) | | ||
| 174 | | 指标 | `top1=0.8571`, `topk=1.0` | | ||
| 175 | | by_type | `clean: n=35, top1=0.8571, topk=1.0` | | ||
| 176 | |||
| 177 | ### 当前最重要的 fresh evidence | ||
| 178 | |||
| 179 | - 观测时间:`2026-06-02 15:40:30 UTC` | ||
| 180 | - 结果文件:`/tmp/fma_realpath_small_rerun_eval/eval_top50.json` | ||
| 181 | - 评测结果: | ||
| 182 | - `split=test` | ||
| 183 | - `num_queries=35` | ||
| 184 | - `top1=0.8571` | ||
| 185 | - `topk=1.0` | ||
| 186 | - query 数说明: | ||
| 187 | - overlap test items = `235` | ||
| 188 | - 非 `reference` query = `35` | ||
| 189 | - 所以 `--max-queries 50` 实际评到 `35` 条 | ||
| 190 | |||
| 191 | ### 结论 | ||
| 192 | |||
| 193 | - 当前已不只是建索引成功,而是已经拿到首份真实路径 `build-index -> evaluate` 闭环证据。 | ||
| 194 | - 下一轮应把重点切到:更大评测规模与 hard case / confusion 评测。 | ... | ... |
| 1 | ## 本次交付包追加更新(2026-06-02 15:40 UTC) | ||
| 2 | |||
| 3 | ### 交付结论 | ||
| 4 | |||
| 5 | 当前最新里程碑已经从“reference index 完成”推进到 **fixed real-path 200-ref rerun 已拿到首份显式 evaluate 指标**: | ||
| 6 | - 远程基线当前为:`9371e94`(更新前) | ||
| 7 | - real-path `200-ref` index 已完整完成 | ||
| 8 | - 显式 `evaluate.py` smoke 已完成 | ||
| 9 | - 当前首份结果:`top1=0.8571`, `topk=1.0`, `num_queries=35` | ||
| 10 | - 因此主线已从“索引能否跑通”进入“评测质量与 hard case 扩展”阶段 | ||
| 11 | |||
| 12 | ### 当前最新事实 | ||
| 13 | |||
| 14 | #### evaluate smoke 路径 | ||
| 15 | - 观测时间:`2026-06-02 15:40:30 UTC` | ||
| 16 | - 结果文件:`/tmp/fma_realpath_small_rerun_eval/eval_top50.json` | ||
| 17 | - 评测结果: | ||
| 18 | - `split=test` | ||
| 19 | - `num_queries=35` | ||
| 20 | - `top1=0.8571` | ||
| 21 | - `topk=1.0` | ||
| 22 | - `by_type.clean`: `n=35`, `top1=0.8571`, `topk=1.0` | ||
| 23 | - query 数来源说明: | ||
| 24 | - 200-ref catalog 与现有 external smoke test overlap = `235` items | ||
| 25 | - 其中非 `reference` query = `35` | ||
| 26 | - 所以 `--max-queries 50` 实际只评到 `35` 条 | ||
| 27 | |||
| 28 | ### 当前判断 | ||
| 29 | |||
| 30 | - 当前已经拥有一条完整可复用的真实路径 smoke 证据链: | ||
| 31 | `chromaprint complete -> reference complete -> evaluate complete` | ||
| 32 | - 下一阶段更值得做的是: | ||
| 33 | 1. 扩大评测 query 数与 reference 规模; | ||
| 34 | 2. 引入 `confused` / `humming_like` / hard negative 评测。 | ||
| 35 | |||
| 36 | --- | ||
| 37 | |||
| 1 | ## 本次交付包追加更新(2026-06-02 15:35 UTC) | 38 | ## 本次交付包追加更新(2026-06-02 15:35 UTC) |
| 2 | 39 | ||
| 3 | ### 交付结论 | 40 | ### 交付结论 | ... | ... |
| ... | @@ -5,29 +5,27 @@ | ... | @@ -5,29 +5,27 @@ |
| 5 | 5 | ||
| 6 | ## 一页结论 | 6 | ## 一页结论 |
| 7 | 7 | ||
| 8 | ### 最新交付快照(2026-06-02 15:35 UTC) | 8 | ### 最新交付快照(2026-06-02 15:40 UTC) |
| 9 | 9 | ||
| 10 | - 当前远程同步基线:`41c4d7c`(更新前) | 10 | - 当前远程同步基线:`9371e94`(更新前) |
| 11 | - 当前最重要的新事实:**fixed real-path 200-ref rerun 已完整产出最终 reference index** | 11 | - 当前最重要的新事实:**fixed real-path 200-ref rerun 已拿到首份显式 evaluate 指标** |
| 12 | - 输出目录:`/tmp/fma_realpath_small_rerun_index2` | 12 | - index 路径:`/tmp/fma_realpath_small_rerun_index2` |
| 13 | - chromaprint 阶段: | 13 | - eval 路径:`/tmp/fma_realpath_small_rerun_eval/eval_top50.json` |
| 14 | - `status=complete` | 14 | - 当前核心结果: |
| 15 | - `refs_done=200/200` | 15 | - `num_queries=35` |
| 16 | - `skipped_refs=0` | 16 | - `top1=0.8571` |
| 17 | - reference 阶段: | 17 | - `topk=1.0` |
| 18 | - `status=complete` | 18 | - `by_type.clean: n=35, top1=0.8571, topk=1.0` |
| 19 | - `refs_done=200/200` | 19 | - query 数说明: |
| 20 | - `windows_done=2068` | 20 | - overlap test items = `235` |
| 21 | - `embedding_shape=[2068, 192]` | 21 | - 非 `reference` query = `35` |
| 22 | - `skipped_refs=0` | 22 | - 所以 `--max-queries 50` 实际评到 `35` 条 |
| 23 | - 当前已出现最终产物: | 23 | - 结论:当前已经形成真实路径的第一条完整闭环: |
| 24 | - `reference_embs.npy` | 24 | `chromaprint -> reference -> evaluate` |
| 25 | - `reference_ids.npy` | ||
| 26 | - 结论:修复后的真实路径 rerun 已完整跨过 `chromaprint -> reference` 两个核心阶段;当前下一优先级是评测链衔接验证。 | ||
| 27 | - 新 session 第一优先级: | 25 | - 新 session 第一优先级: |
| 28 | 1. 检查是否已有 evaluate / identify 后续证据 | 26 | 1. 扩大 query / reference 规模 |
| 29 | 2. 若无,基于这套已完成 index 补一轮显式 evaluate smoke | 27 | 2. 补 hard case(`confused` / `humming_like`)评测 |
| 30 | 3. 再决定是否继续扩到更大样本 / 全量 FMA | 28 | 3. 再决定是否推到更大 FMA 子集或全量 |
| 31 | 29 | ||
| 32 | ### 最新可观测性修复(2026-06-02 15:18 UTC) | 30 | ### 最新可观测性修复(2026-06-02 15:18 UTC) |
| 33 | 31 | ... | ... |
-
Please register or sign in to post a comment