checkpoint the first end-to-end dual-axis smoke result\n\nConstraint: The handof…

…f must record the fresh dual-axis metric outcome without staging temporary smoke artifacts\nRejected: Keep tuning weights before checkpointing | The first end-to-end dual-axis result is already a meaningful evidence point and restart-safe boundary\nConfidence: high\nScope-risk: narrow\nDirective: Continue with finer-grained dual-axis weight search, targeting humming_like recovery while preserving confused gains\nTested: Verified dual-axis smoke completed train, build-index, and evaluate with top1 0.5 / topk 0.9 and updated handoff/changelog docs\nNot-tested: Improved dual-axis weight combinations beyond this first balanced trial

checkpoint the first end-to-end dual-axis smoke result\n\nConstraint: The handof…
…f must record the fresh dual-axis metric outcome without staging temporary smoke artifacts\nRejected: Keep tuning weights before checkpointing | The first end-to-end dual-axis result is already a meaningful evidence point and restart-safe boundary\nConfidence: high\nScope-risk: narrow\nDirective: Continue with finer-grained dual-axis weight search, targeting humming_like recovery while preserving confused gains\nTested: Verified dual-axis smoke completed train, build-index, and evaluate with top1 0.5 / topk 0.9 and updated handoff/changelog docs\nNot-tested: Improved dual-axis weight combinations beyond this first balanced trial
cnb.bofCdSsphPA
Commit 9c3f182a ... 9c3f182afc37734dd5920b756f2822cf26e89b3a authored 2026-06-02 23:57:16 +0800 by cnb.bofCdSsphPA
Showing 5 changed files with 103 additions and 30 deletions
AGENT.md
docs/CHANGELOG.md
docs/changelist-2026-06-02.md
docs/delivery-handoff-2026-06-02.md
docs/session-handoff.md
--- a/AGENT.md
View file @9c3f182
+++ b/AGENT.md
View file @9c3f182
@@ -74,22 +74,16 @@

 ## 5.5 最新真实 FMA / chromaprint 运行态（2026-06-02）

-### 当前最新快照（15:47 UTC）
-
- 远程同步基线：`7812b58`（更新前）
- 当前最重要的新证据：**dual-axis hard-case weighting 已在代码中参数化**。
- 当前可调入口：
-  - `training.sample_type_weights`
-  - `training.pair_type_weights`
- fresh verification：
-  - `py_compile` 通过
-  - `train.py --dry-run` 通过
-  - 自定义权重实例化检查通过
- 这说明：下一轮已经可以直接做权重搜索实验，而不需要再先改数据集/训练框架结构。
+### 当前最新快照（15:56 UTC）
+
+- 远程同步基线：`6279850`（更新前）
+- 当前最重要的新证据：**dual-axis smoke 已完成首轮端到端评测，但当前组合未改善 humming_like**。
+- 结果：`top1=0.5`, `topk=0.9`, `humming_like=0.0`, `confused=0.25`
+- 这说明：dual-axis 入口是通的，但当前权重组合不是更优解。
 - 下一次值得提交的事件：
-  1. 首轮 dual-axis 权重实验结果
-  2. `humming_like` 改善且 `confused` 不回退的组合
-  3. dual-track 回归验证结果
+  1. 更细粒度的 dual-axis 权重搜索结果
+  2. `humming_like` 回升且 `confused` 不掉的组合
+  3. dual-track 回归验证改善结果


 ## 6. 高风险注意事项
--- a/docs/CHANGELOG.md
View file @9c3f182
+++ b/docs/CHANGELOG.md
View file @9c3f182
+## 2026-06-02 15:56 UTC / dual-axis smoke completed first end-to-end eval
+
+- 以新的 dual-axis 配置跑通了一轮端到端 smoke：`train -> build-index -> evaluate`
+- fresh evidence（`2026-06-02 15:56:02 UTC`）：
+  - 训练输出：`/tmp/dualaxis_smoke/models/best_model.pt`
+  - 索引输出：`/tmp/dualaxis_smoke/index/`
+  - 评测输出：`/tmp/dualaxis_smoke/eval.json`
+- 结果：
+  - `num_queries=20`
+  - `top1=0.5`
+  - `topk=0.9`
+  - `clean top1=0.875`
+  - `humming_like top1=0.0`
+  - `confused top1=0.25`
+- 对比当前基线：
+  - 比 `v6` 的 `humming_like=0.25` 更差
+  - 与 `v6` 的 `confused=0.25` 持平
+- 结论：
+  - 双轴参数化已经能跑通完整链路
+  - 但这组权重并未改善 `humming_like`，后续应继续做更细粒度的双轴搜索，而不是直接接受当前组合
+
 ## 2026-06-02 15:47 UTC / dual-axis hard-case weighting is now configurable in code

 - 已把 `SongPairDataset` 中的 hard-case 采样权重与 pair loss 权重从硬编码改为配置驱动
--- a/docs/changelist-2026-06-02.md
View file @9c3f182
+++ b/docs/changelist-2026-06-02.md
View file @9c3f182
@@ -297,3 +297,29 @@

 - dual-axis hard-case weighting 已从“设计建议”升级为“代码中可直接调参实验”的状态。
 - 下一轮可直接围绕 `sample_type_weights` 与 `pair_type_weights` 做最小实验。
+
+## 本次追加交付（2026-06-02 15:56 UTC）
+
+### 新增运行证据
+
+| 类别 | 内容 |
+|---|---|
+| dual-axis smoke | `train -> build-index -> evaluate` 完整跑通 |
+| 训练输出 | `/tmp/dualaxis_smoke/models/best_model.pt` |
+| 索引输出 | `/tmp/dualaxis_smoke/index/` |
+| 评测输出 | `/tmp/dualaxis_smoke/eval.json` |
+| 结果 | `top1=0.5`, `topk=0.9` |
+| hard-case | `humming_like=0.0`, `confused=0.25` |
+
+### 当前最重要的 fresh evidence
+
+- `num_queries=20`
+- `clean: n=8, top1=0.875, topk=1.0`
+- `augmented: n=4, top1=0.5, topk=0.75`
+- `humming_like: n=4, top1=0.0, topk=0.75`
+- `confused: n=4, top1=0.25, topk=1.0`
+
+### 结论
+
+- 目前这组 dual-axis 配置证明了“可配置实验链路”是通的。
+- 但它没有带来 `humming_like` 改善，说明后续搜索需要更细：该拆分 `sample_type_weights` 与 `pair_type_weights` 的取值粒度。
--- a/docs/delivery-handoff-2026-06-02.md
View file @9c3f182
+++ b/docs/delivery-handoff-2026-06-02.md
View file @9c3f182
+## 本次交付包追加更新（2026-06-02 15:56 UTC）
+
+### 交付结论
+
+当前最新里程碑已经从“dual-axis 参数化完成”推进到 **dual-axis smoke 首次端到端评测完成**：
+- 远程基线当前为：`6279850`（更新前）
+- 训练、建索引、评测全部跑通
+- 但这组权重没有改善 `humming_like`，说明接下来要做更细粒度搜索
+
+### 当前最新事实
+
+#### dual-axis smoke 结果
+- 观测时间：`2026-06-02 15:56:02 UTC`
+- 结果文件：`/tmp/dualaxis_smoke/eval.json`
+- 评测结果：
+  - `num_queries=20`
+  - `top1=0.5`
+  - `topk=0.9`
+  - `clean=0.875`
+  - `augmented=0.5`
+  - `humming_like=0.0`
+  - `confused=0.25`
+
+### 当前判断
+
+- dual-axis 入口是可用的，但当前试验组合不是更优解。
+- 下一阶段应进入更细粒度的权重搜索，而不是直接扩大规模。
+
+---
+
 ## 本次交付包追加更新（2026-06-02 15:47 UTC）

 ### 交付结论
--- a/docs/session-handoff.md
View file @9c3f182
+++ b/docs/session-handoff.md
View file @9c3f182
@@ -5,22 +5,24 @@

 ## 一页结论

-### 最新交付快照（2026-06-02 15:47 UTC）
-
- 当前远程同步基线：`7812b58`（更新前）
- 当前最重要的新事实：**dual-axis hard-case weighting 已在代码中参数化**
- 新增可调入口：
-  - `training.sample_type_weights`
-  - `training.pair_type_weights`
- fresh verification：
-  - `py_compile` 通过
-  - `train.py --dry-run` 通过
-  - 自定义权重实例化检查通过
- 结论：下一轮不需要先改代码结构，已经可以直接做最小调参实验。
+### 最新交付快照（2026-06-02 15:56 UTC）
+
+- 当前远程同步基线：`6279850`（更新前）
+- 当前最重要的新事实：**dual-axis smoke 已完成首轮端到端评测，但当前组合未改善 humming_like**
+- 结果：
+  - `num_queries=20`
+  - `top1=0.5`
+  - `topk=0.9`
+  - `humming_like=0.0`
+  - `confused=0.25`
+- 结论：
+  - dual-axis 入口已可用
+  - 但当前权重组合不是更优解
+  - 下一轮应做更细粒度的权重搜索
 - 新 session 第一优先级：
-  1. 在 `v6` 主基线上搜索 dual-axis 权重组合
-  2. 目标优先提升 `humming_like top1`，同时不丢掉 `confused top1`
-  3. 用 real-path clean + synthetic hard-case 双轨复测
+  1. 继续搜索 `sample_type_weights` / `pair_type_weights`
+  2. 目标是把 `humming_like` 拉回到至少 `v6` 水平，同时不丢 `confused`
+  3. 再做 real-path clean + synthetic hard-case 双轨复测

 ### 最新可观测性修复（2026-06-02 15:18 UTC）