Commit 2a6e8e15 2a6e8e1558a8af0f813452c85b2fa943ddfa02f6 by cnb.bofCdSsphPA

Extend live FMA smoke handoff with later epoch evidence

Preserve a newer restart checkpoint so the next session inherits up-to-date proof that the real FMA smoke continues progressing inside Epoch 1 without yet saving a model or entering downstream stages.

Constraint: Verification is still limited to live runtime evidence because Epoch 1 has not completed
Rejected: Keep the prior 18:22 checkpoint only | would leave the handoff one monitoring cycle behind reality
Confidence: high
Scope-risk: narrow
Directive: Continue monitoring until the first saved model file or stage transition appears before changing status conclusions
Tested: ps on PID 311629; validate-splits on /tmp/fma_real_smoke_stopcheck/fma/manifests; find on /tmp/fma_real_smoke_stopcheck/fma_models_smoke
Not-tested: End-of-epoch artifacts, build-index, evaluate, final metrics
1 parent ba49a6ae
## 2026-06-02 真实 FMA smoke fresh evidence 19:12 checkpoint
完成项:
- 再次检查真实 FMA smoke 运行态,确认 `train.py` elapsed 已推进到 19:12。
- 更新 `docs/session-handoff.md``docs/changelist-2026-06-02.md`,同步更晚的 live evidence。
验证结果:
- `ps -p 311629 -o pid,etime,%cpu,%mem,cmd` => `ELAPSED=19:12`
- 仍未出现 `build-index/evaluate` 相关新进程
- `validate-splits /tmp/fma_real_smoke_stopcheck/fma/manifests` => `ok=true`
- `fma_models_smoke/` 仍仅有目录本身
结论:
- 真实 FMA 全量 smoke 仍在 epoch 内推进,没有中断迹象。
- 到该时点仍未产生首个模型文件或下游阶段切换证据。
## 2026-06-02 真实 FMA smoke fresh evidence 18:22 checkpoint
完成项:
......
......@@ -168,3 +168,11 @@ cd /workspace/acr-engine
- 最新 live 证据已推进到:`train.py ELAPSED=18:22`
- 仍未出现模型文件,也未切换到 `build-index/evaluate`
- manifest 复核继续通过,统计保持不变。
## 12:16 UTC 时间推进补充
- 最新 live 证据已推进到:`train.py ELAPSED=19:12`
- 当前 CPU / 内存观测:`%CPU≈614`, `%MEM≈10.6`
- 仍未出现模型文件,也未切换到 `build-index/evaluate`
- manifest 复核继续通过,统计保持不变。
......
......@@ -167,6 +167,28 @@
- 当前依旧只是第 1 个 epoch 内部持续推进。
- 到 12:15 UTC 为止,仍没有首个模型文件或后续检索/评测阶段证据。
### 再次延后的 fresh evidence(2026-06-02 12:16 UTC)
- 真实 FMA smoke 继续推进到:
- `train.py ELAPSED=19:12`
- `%CPU≈614`
- `%MEM≈10.6`
- 当前进程结构仍未发生阶段切换:
- `PID=311494``external_adapters.py smoke-local fma ...`
- `PID=311629``train.py --data /tmp/fma_real_smoke_stopcheck/fma/manifests ...`
- 仍未出现 `build-index` / `evaluate` 相关新进程。
- `fma_models_smoke/` 仍只有目录本身,没有模型文件。
- manifest 再次复核仍通过:
- `ok=true`
- `catalog_references=8000`
- `train_queries=6401`
- `test_queries=1593`
- `val_queries=0`
这说明:
- 当前依旧处于第 1 个 epoch 内部的持续训练阶段。
- 到 12:16 UTC 为止,仍没有首个模型文件或下游检索/评测阶段证据。
### 重启后第一优先级动作
1. 先检查真实 FMA smoke 是否完成:
......