capture the first real-path post-fix reference checkpoint\n\nConstraint: Handoff…
… must reflect fresh observable evidence before restart and avoid staging data artifacts\nRejected: Wait for full reference completion | User asked for immediate delivery package and current checkpoint is already a meaningful stage transition\nConfidence: high\nScope-risk: narrow\nDirective: Treat session 19709 and /tmp/fma_realpath_small_rerun_index2 as the primary continuation path until final reference artifacts or a new traceback appear\nTested: Verified chromaprint 200/200 complete, reference_progress.json 25/200 checkpoint, partial reference numpy artifacts, and updated handoff/changelog files\nNot-tested: Full reference completion and downstream evaluate stage on the active rerun
Showing
5 changed files
with
147 additions
and
40 deletions
| ... | @@ -74,26 +74,33 @@ | ... | @@ -74,26 +74,33 @@ |
| 74 | 74 | ||
| 75 | ## 5.5 最新真实 FMA / chromaprint 运行态(2026-06-02) | 75 | ## 5.5 最新真实 FMA / chromaprint 运行态(2026-06-02) |
| 76 | 76 | ||
| 77 | ### 当前最新快照(15:09 UTC) | 77 | ### 当前最新快照(15:29 UTC) |
| 78 | 78 | ||
| 79 | - 远程同步基线:`cdf33bb` | 79 | - 远程同步基线:`707449b` |
| 80 | - 当前已不是“进程仍在运行”的阶段,而是: | 80 | - 当前最重要的新证据,不再是旧 observable 异常退出,而是:**fixed real-path 200-ref rerun 已进入 reference 阶段**。 |
| 81 | - observable `PID=431703` 已退出 | 81 | - 前台运行 session:`19709` |
| 82 | - legacy `PID=424691` 已退出 | 82 | - 输出目录:`/tmp/fma_realpath_small_rerun_index2` |
| 83 | - observable 当前只留下: | 83 | - chromaprint 已完成: |
| 84 | - `/tmp/chroma_index_observable_smoke/chromaprint.pkl` | 84 | - `status=complete` |
| 85 | - `/tmp/chroma_index_observable_smoke/chromaprint_progress.json` | 85 | - `refs_done=200/200` |
| 86 | - 最后进度停在: | 86 | - `skipped_refs=0` |
| 87 | - `hashes=57577` | ||
| 88 | - `postings=187446` | ||
| 89 | - reference 已开始并完成首个 checkpoint: | ||
| 87 | - `status=building` | 90 | - `status=building` |
| 88 | - `refs_done=4420/8000` | 91 | - `refs_done=25/200` |
| 89 | - 当前仍未出现 `reference_*` 或 `evaluate.py`。 | 92 | - `windows_done=256` |
| 90 | - 因此下一轮工作重心必须切到:**排查 build-index 异常退出**,而不是继续把它当作纯线性慢任务。 | 93 | - `skipped_refs=0` |
| 91 | - 已完成一个低风险修复:关键 `print()` 已加 `flush=True`,并已用极小样本 `RC=1` 失败复现验证日志/traceback 可实时落盘,不再出现 `0 bytes` 日志黑箱。 | 94 | - 当前已出现: |
| 92 | - 已完成一个高价值容错修复:坏 MP3 / 缺失音频会在 chromaprint/reference 阶段被跳过,并已用 `1 good + 1 bad` 最小复现验证 `RC=0`、`reference_*` 成功产出。 | 95 | - `reference_progress.json` |
| 96 | - `reference_embs.partial.npy` | ||
| 97 | - `reference_ids.partial.npy` | ||
| 98 | - 这说明:`flush=True` + 坏音频 skip tolerance 修复后,真实路径 rerun 已跨过 `chromaprint -> reference` 边界。 | ||
| 93 | - 下一次值得提交的事件: | 99 | - 下一次值得提交的事件: |
| 94 | 1. 找到明确失败证据/退出原因 | 100 | 1. `reference_embs.npy` / `reference_ids.npy` 完整产出 |
| 95 | 2. 成功小样本复现并补日志 | 101 | 2. `evaluate.py` 启动或完整评测开始 |
| 96 | 3. 修复后重新跑到 `reference_*` 或 `evaluate.py` | 102 | 3. 或出现新的明确 traceback / failure evidence |
| 103 | |||
| 97 | 104 | ||
| 98 | ## 6. 高风险注意事项 | 105 | ## 6. 高风险注意事项 |
| 99 | 106 | ... | ... |
| 1 | ## 2026-06-02 15:29 UTC / real-path 200-ref rerun crossed into reference stage | ||
| 2 | |||
| 3 | - 基于已修复代码继续跟进真实路径 200 reference rerun:`/tmp/fma_realpath_small_rerun_index2` | ||
| 4 | - fresh evidence(`2026-06-02 15:29:17 UTC`): | ||
| 5 | - `chromaprint_progress.json` => `status=complete`, `refs_done=200/200`, `skipped_refs=0` | ||
| 6 | - 已落盘 `chromaprint.pkl`(`2266212 bytes`) | ||
| 7 | - `reference_progress.json` 已出现,当前为 `status=building` | ||
| 8 | - reference 阶段已完成首个 checkpoint:`refs_done=25/200`, `windows_done=256`, `skipped_refs=0` | ||
| 9 | - partial 产物已出现: | ||
| 10 | - `reference_embs.partial.npy` | ||
| 11 | - `reference_ids.partial.npy` | ||
| 12 | - 结论:当前主流程已经明确跨过 `chromaprint -> reference` 边界,之前“只停在 chromaprint 无下游产物”的状态不再适用于这条 fixed rerun | ||
| 13 | - 下一关键里程碑: | ||
| 14 | 1. `reference_embs.npy` / `reference_ids.npy` 完整产出 | ||
| 15 | 2. 或捕获新的明确 traceback / failure evidence | ||
| 16 | |||
| 1 | ## 2026-06-02 15:22 UTC / bad-mp3 skip tolerance verified | 17 | ## 2026-06-02 15:22 UTC / bad-mp3 skip tolerance verified |
| 2 | 18 | ||
| 3 | - 为 `chromaprint_matcher.py` 与 `ecapa_embedder.py` 的 reference 建索引循环增加单文件容错: | 19 | - 为 `chromaprint_matcher.py` 与 `ecapa_embedder.py` 的 reference 建索引循环增加单文件容错: | ... | ... |
| ... | @@ -94,3 +94,38 @@ | ... | @@ -94,3 +94,38 @@ |
| 94 | 94 | ||
| 95 | - 当前已验证:单个坏 MP3 不再拖垮整轮 `build-index`。 | 95 | - 当前已验证:单个坏 MP3 不再拖垮整轮 `build-index`。 |
| 96 | - 下一轮应回到真实路径复现,确认主问题是否就是由坏 MP3 触发。 | 96 | - 下一轮应回到真实路径复现,确认主问题是否就是由坏 MP3 触发。 |
| 97 | |||
| 98 | ## 本次追加交付(2026-06-02 15:29 UTC) | ||
| 99 | |||
| 100 | ### 新增运行证据 | ||
| 101 | |||
| 102 | | 类别 | 内容 | | ||
| 103 | |---|---| | ||
| 104 | | rerun | fixed real-path 200-ref rerun 仍在前台运行:`session 19709` | | ||
| 105 | | chromaprint | `200/200` 完成,`skipped_refs=0` | | ||
| 106 | | reference | 已进入 embedding/reference 阶段,并完成 `25/200` checkpoint | | ||
| 107 | | 产物 | 已落盘 `reference_progress.json`、`reference_embs.partial.npy`、`reference_ids.partial.npy` | | ||
| 108 | |||
| 109 | ### 当前最重要的 fresh evidence | ||
| 110 | |||
| 111 | - 观测时间:`2026-06-02 15:29:17 UTC` | ||
| 112 | - 输出目录:`/tmp/fma_realpath_small_rerun_index2` | ||
| 113 | - `chromaprint_progress.json`: | ||
| 114 | - `status=complete` | ||
| 115 | - `refs_done=200/200` | ||
| 116 | - `hashes=57577` | ||
| 117 | - `postings=187446` | ||
| 118 | - `skipped_refs=0` | ||
| 119 | - `reference_progress.json`: | ||
| 120 | - `status=building` | ||
| 121 | - `refs_done=25/200` | ||
| 122 | - `windows_done=256` | ||
| 123 | - `skipped_refs=0` | ||
| 124 | - 已出现: | ||
| 125 | - `reference_embs.partial.npy` | ||
| 126 | - `reference_ids.partial.npy` | ||
| 127 | |||
| 128 | ### 结论 | ||
| 129 | |||
| 130 | - 这次 fixed rerun 已经证明:修复后的真实路径样本不再卡死在 chromaprint 阶段。 | ||
| 131 | - 当前最有价值的下一步,变为继续盯 `reference_*` 完成或捕获新的明确失败证据。 | ... | ... |
| 1 | ## 本次交付包追加更新(2026-06-02 15:29 UTC) | ||
| 2 | |||
| 3 | ### 交付结论 | ||
| 4 | |||
| 5 | 当前最新里程碑不是新的失败,而是 **fixed real-path 200-ref rerun 已明确跨入 reference/embedding 阶段**: | ||
| 6 | - 远程基线当前为:`707449b` | ||
| 7 | - chromaprint 已完整完成:`200/200` | ||
| 8 | - reference 阶段已写出首个 checkpoint:`25/200` | ||
| 9 | - 已出现 `reference_progress.json` 与 partial numpy 产物 | ||
| 10 | - 因此下一 session 不应再把这条 rerun 当作“停在 chromaprint 无下游文件”的旧状态 | ||
| 11 | |||
| 12 | ### 当前最新事实 | ||
| 13 | |||
| 14 | #### fixed real-path rerun 路径 | ||
| 15 | - 前台 session:`19709` | ||
| 16 | - 观测时间:`2026-06-02 15:29:17 UTC` | ||
| 17 | - 输出目录:`/tmp/fma_realpath_small_rerun_index2` | ||
| 18 | - `chromaprint_progress.json`: | ||
| 19 | - `status=complete` | ||
| 20 | - `refs_done=200 / 200` | ||
| 21 | - `hashes=57577` | ||
| 22 | - `postings=187446` | ||
| 23 | - `skipped_refs=0` | ||
| 24 | - `reference_progress.json`: | ||
| 25 | - `status=building` | ||
| 26 | - `refs_done=25 / 200` | ||
| 27 | - `windows_done=256` | ||
| 28 | - `elapsed_sec=52.567` | ||
| 29 | - `eta_sec=367.967` | ||
| 30 | - `skipped_refs=0` | ||
| 31 | - 当前已出现: | ||
| 32 | - `reference_embs.partial.npy` | ||
| 33 | - `reference_ids.partial.npy` | ||
| 34 | |||
| 35 | ### 当前判断 | ||
| 36 | |||
| 37 | - `flush=True` 与坏音频 skip tolerance 修复之后,真实路径 rerun 已穿过 `chromaprint -> reference` 阶段边界。 | ||
| 38 | - 当前最高优先级不再是重复证明 chromaprint 完成,而是继续盯 reference 阶段是否: | ||
| 39 | 1. 完整落盘 `reference_embs.npy` / `reference_ids.npy`;或 | ||
| 40 | 2. 暴露新的明确 traceback / failure evidence。 | ||
| 41 | |||
| 42 | ### 建议的新 session 接管顺序 | ||
| 43 | |||
| 44 | 1. 先看 [./session-handoff.md](./session-handoff.md) 顶部新快照 | ||
| 45 | 2. 读取前台 `session 19709` 最新输出 | ||
| 46 | 3. 检查 `/tmp/fma_realpath_small_rerun_index2/` 是否已从 partial 转为 final 产物 | ||
| 47 | |||
| 48 | --- | ||
| 49 | |||
| 1 | # Delivery Handoff / 2026-06-02 | 50 | # Delivery Handoff / 2026-06-02 |
| 2 | 51 | ||
| 3 | ## 本次交付包(2026-06-02 15:09 UTC) | 52 | ## 本次交付包(2026-06-02 15:09 UTC) | ... | ... |
| ... | @@ -5,31 +5,31 @@ | ... | @@ -5,31 +5,31 @@ |
| 5 | 5 | ||
| 6 | ## 一页结论 | 6 | ## 一页结论 |
| 7 | 7 | ||
| 8 | ### 最新交付快照(2026-06-02 15:09 UTC) | 8 | ### 最新交付快照(2026-06-02 15:29 UTC) |
| 9 | 9 | ||
| 10 | - 当前远程同步基线:`cdf33bb` | 10 | - 当前远程同步基线:`707449b` |
| 11 | - 当前最重要的新事实:**两个 `build-index` 进程都已退出**,且没有进入 `reference_*` / `evaluate.py` | 11 | - 当前最重要的新事实:**fixed real-path 200-ref rerun 已明确进入 reference/embedding 阶段** |
| 12 | - observable 路径: | 12 | - 前台 session:`19709` |
| 13 | - 原 PID:`431703` | 13 | - 输出目录:`/tmp/fma_realpath_small_rerun_index2` |
| 14 | - 当前 `ps -p 431703`:无存活进程 | 14 | - chromaprint 阶段: |
| 15 | - 当前目录仅有:`chromaprint.pkl`、`chromaprint_progress.json` | 15 | - `status=complete` |
| 16 | - 最后状态:`status=building`, `refs_done=4420/8000` | 16 | - `refs_done=200/200` |
| 17 | - legacy 全量 FMA 路径: | 17 | - `skipped_refs=0` |
| 18 | - 原 PID:`424691` | 18 | - `chromaprint.pkl=2266212 bytes` |
| 19 | - 当前 `ps -p 424691`:无存活进程 | 19 | - reference 阶段: |
| 20 | - 当前目录仍只有 `/tmp/fma_real_smoke_stopcheck/fma_index_smoke` | 20 | - `reference_progress.json` 已出现 |
| 21 | - 当前尚未出现: | 21 | - `status=building` |
| 22 | - `reference_progress.json` | 22 | - `refs_done=25/200` |
| 23 | - `windows_done=256` | ||
| 24 | - `skipped_refs=0` | ||
| 25 | - 当前已出现: | ||
| 23 | - `reference_embs.partial.npy` | 26 | - `reference_embs.partial.npy` |
| 24 | - `reference_ids.partial.npy` | 27 | - `reference_ids.partial.npy` |
| 25 | - `reference_embs.npy` | 28 | - 结论:修复后的真实路径 rerun 已跨过 `chromaprint -> reference` 边界;下一关键里程碑是 final `reference_*` 产物或新的明确失败证据。 |
| 26 | - `reference_ids.npy` | ||
| 27 | - `evaluate.py` | ||
| 28 | - 结论:当前阶段已经从“继续观察运行中进度”切换为“排查 `build-index` 异常退出原因”。 | ||
| 29 | - 新 session 第一优先级: | 29 | - 新 session 第一优先级: |
| 30 | 1. 复盘 `run_demo.py build-index` 的退出路径 | 30 | 1. 继续读取 `session 19709` 最新输出 |
| 31 | 2. 查 silent failure / OOM / shell termination 证据 | 31 | 2. 检查 partial 是否转成 `reference_embs.npy` / `reference_ids.npy` |
| 32 | 3. 用小样本复现异常并补日志 | 32 | 3. 如失败,记录 traceback 并进入下一轮修复 |
| 33 | 33 | ||
| 34 | ### 最新可观测性修复(2026-06-02 15:18 UTC) | 34 | ### 最新可观测性修复(2026-06-02 15:18 UTC) |
| 35 | 35 | ... | ... |
-
Please register or sign in to post a comment