Commit ba49a6ae ba49a6ae855bc1cde72a1d06920e7a70237cb4eb by cnb.bofCdSsphPA

Capture another live FMA smoke progress checkpoint

Keep the restart artifacts synchronized with the newest observed elapsed time so the next session can see that the real FMA smoke is still advancing without yet reaching model save or evaluation stages.

Constraint: Training remains inside Epoch 1, so verification is limited to live runtime evidence
Rejected: Stop at the prior 17:07 checkpoint | would leave handoff docs behind the latest verified state
Confidence: high
Scope-risk: narrow
Directive: Continue monitoring until the first saved model file or stage transition appears
Tested: ps on PID 311629; validate-splits on /tmp/fma_real_smoke_stopcheck/fma/manifests; find on /tmp/fma_real_smoke_stopcheck/fma_models_smoke
Not-tested: End-of-epoch artifacts, build-index, evaluate, final metrics
1 parent fc9e3bce
## 2026-06-02 真实 FMA smoke fresh evidence 18:22 checkpoint
完成项:
- 再次检查真实 FMA smoke 运行态,确认 `train.py` elapsed 已推进到 18:22。
- 更新 `docs/session-handoff.md``docs/changelist-2026-06-02.md`,同步更晚的 live evidence。
验证结果:
- `ps -p 311629 -o pid,etime,%cpu,%mem,cmd` => `ELAPSED=18:22`
- 仍未出现 `build-index/evaluate` 相关新进程
- `validate-splits /tmp/fma_real_smoke_stopcheck/fma/manifests` => `ok=true`
- `fma_models_smoke/` 仍仅有目录本身
结论:
- 真实 FMA 全量 smoke 仍在 epoch 内推进,没有中断迹象。
- 到该时点仍未产生首个模型文件或下游阶段切换证据。
## 2026-06-02 真实 FMA smoke fresh evidence 17:07 checkpoint
完成项:
......
......@@ -161,3 +161,10 @@ cd /workspace/acr-engine
- 最新 live 证据已推进到:`train.py ELAPSED=17:07`
- 仍未出现模型文件,也未切换到 `build-index/evaluate`
- manifest 校验结果保持不变且继续通过。
## 12:15 UTC 时间推进补充
- 最新 live 证据已推进到:`train.py ELAPSED=18:22`
- 仍未出现模型文件,也未切换到 `build-index/evaluate`
- manifest 复核继续通过,统计保持不变。
......
......@@ -145,6 +145,28 @@
- 当前 smoke 仍在第一个 epoch 内持续前进。
- 到 12:14 UTC 为止,仍未进入保存首个模型文件或下游检索/评测阶段。
### 再次推进的 fresh evidence(2026-06-02 12:15 UTC)
- 真实 FMA smoke 持续推进到:
- `train.py ELAPSED=18:22`
- `%CPU≈615`
- `%MEM≈10.5`
- 当前进程结构仍未发生阶段切换:
- `PID=311494``external_adapters.py smoke-local fma ...`
- `PID=311629``train.py --data /tmp/fma_real_smoke_stopcheck/fma/manifests ...`
- 仍未出现 `build-index` / `evaluate` 相关新进程。
- `fma_models_smoke/` 仍只有目录本身,没有模型文件。
- manifest 再次复核仍通过:
- `ok=true`
- `catalog_references=8000`
- `train_queries=6401`
- `test_queries=1593`
- `val_queries=0`
这说明:
- 当前依旧只是第 1 个 epoch 内部持续推进。
- 到 12:15 UTC 为止,仍没有首个模型文件或后续检索/评测阶段证据。
### 重启后第一优先级动作
1. 先检查真实 FMA smoke 是否完成:
......