Capture longest-window FMA smoke progress evidence
Persist a wider observation checkpoint so restart docs show continued forward motion across a 180-second window while the real FMA smoke remains inside Epoch 1. Constraint: Verification is still limited to runtime evidence and manifest revalidation because Epoch 1 has not completed Rejected: Stop at the 120-second checkpoint | would miss stronger evidence from the longer observation window Confidence: high Scope-risk: narrow Directive: Keep monitoring until the first saved model file or transition into build-index/evaluate appears Tested: ps on PID 311629 after 180s wait; validate-splits on /tmp/fma_real_smoke_stopcheck/fma/manifests; find on /tmp/fma_real_smoke_stopcheck/fma_models_smoke Not-tested: End-of-epoch artifacts, build-index, evaluate, final metrics
Showing
3 changed files
with
46 additions
and
0 deletions
| 1 | ## 2026-06-02 真实 FMA smoke fresh evidence 31:47 checkpoint | ||
| 2 | |||
| 3 | 完成项: | ||
| 4 | - 在额外约 180 秒窗口后再次检查真实 FMA smoke,确认 `train.py` elapsed 已推进到 31:47。 | ||
| 5 | - 更新 `docs/session-handoff.md` 与 `docs/changelist-2026-06-02.md`,同步更长观察窗口下的 live evidence。 | ||
| 6 | |||
| 7 | 验证结果: | ||
| 8 | - `ps -p 311629 -o pid,etime,%cpu,%mem,cmd` => `ELAPSED=31:47` | ||
| 9 | - 仍未出现 `build-index/evaluate` 相关新进程 | ||
| 10 | - `validate-splits /tmp/fma_real_smoke_stopcheck/fma/manifests` => `ok=true` | ||
| 11 | - `fma_models_smoke/` 仍仅有目录本身 | ||
| 12 | |||
| 13 | 结论: | ||
| 14 | - 真实 FMA 全量 smoke 在更长观察窗口下仍持续推进,没有中断迹象。 | ||
| 15 | - 到该时点仍未产生首个模型文件或下游阶段切换证据。 | ||
| 16 | |||
| 1 | ## 2026-06-02 真实 FMA smoke fresh evidence 27:54 checkpoint | 17 | ## 2026-06-02 真实 FMA smoke fresh evidence 27:54 checkpoint |
| 2 | 18 | ||
| 3 | 完成项: | 19 | 完成项: | ... | ... |
| ... | @@ -216,3 +216,11 @@ cd /workspace/acr-engine | ... | @@ -216,3 +216,11 @@ cd /workspace/acr-engine |
| 216 | - 当前 CPU / 内存观测:`%CPU≈615`, `%MEM≈11.2`。 | 216 | - 当前 CPU / 内存观测:`%CPU≈615`, `%MEM≈11.2`。 |
| 217 | - 120 秒额外等待后,仍未出现模型文件,也未切换到 `build-index/evaluate`。 | 217 | - 120 秒额外等待后,仍未出现模型文件,也未切换到 `build-index/evaluate`。 |
| 218 | - manifest 复核继续通过,统计保持不变。 | 218 | - manifest 复核继续通过,统计保持不变。 |
| 219 | |||
| 220 | |||
| 221 | ## 12:29 UTC(180 秒窗口)时间推进补充 | ||
| 222 | |||
| 223 | - 最新 live 证据已推进到:`train.py ELAPSED=31:47`。 | ||
| 224 | - 当前 CPU / 内存观测:`%CPU≈615`, `%MEM≈11.0`。 | ||
| 225 | - 180 秒额外等待后,仍未出现模型文件,也未切换到 `build-index/evaluate`。 | ||
| 226 | - manifest 复核继续通过,统计保持不变。 | ... | ... |
| ... | @@ -299,6 +299,28 @@ | ... | @@ -299,6 +299,28 @@ |
| 299 | - 在更长的观察窗口下,训练依然持续前进,而不是假性活动或僵死。 | 299 | - 在更长的观察窗口下,训练依然持续前进,而不是假性活动或僵死。 |
| 300 | - 到 12:25 UTC 为止,仍没有首个模型文件或下游检索/评测阶段证据。 | 300 | - 到 12:25 UTC 为止,仍没有首个模型文件或下游检索/评测阶段证据。 |
| 301 | 301 | ||
| 302 | ### 180 秒窗口后的 fresh evidence(2026-06-02 12:29 UTC) | ||
| 303 | |||
| 304 | - 经过更长的约 180 秒观察窗口后,真实 FMA smoke 继续推进到: | ||
| 305 | - `train.py ELAPSED=31:47` | ||
| 306 | - `%CPU≈615` | ||
| 307 | - `%MEM≈11.0` | ||
| 308 | - 当前进程结构仍未发生阶段切换: | ||
| 309 | - `PID=311494`:`external_adapters.py smoke-local fma ...` | ||
| 310 | - `PID=311629`:`train.py --data /tmp/fma_real_smoke_stopcheck/fma/manifests ...` | ||
| 311 | - 仍未出现 `build-index` / `evaluate` 相关新进程。 | ||
| 312 | - `fma_models_smoke/` 仍只有目录本身,没有模型文件。 | ||
| 313 | - manifest 再次复核仍通过: | ||
| 314 | - `ok=true` | ||
| 315 | - `catalog_references=8000` | ||
| 316 | - `train_queries=6401` | ||
| 317 | - `test_queries=1593` | ||
| 318 | - `val_queries=0` | ||
| 319 | |||
| 320 | 这说明: | ||
| 321 | - 在更长的观察窗口下,训练依然持续前进,而不是假性活动或僵死。 | ||
| 322 | - 到 12:29 UTC 为止,仍没有首个模型文件或下游检索/评测阶段证据。 | ||
| 323 | |||
| 302 | ### 重启后第一优先级动作 | 324 | ### 重启后第一优先级动作 |
| 303 | 325 | ||
| 304 | 1. 先检查真实 FMA smoke 是否完成: | 326 | 1. 先检查真实 FMA smoke 是否完成: | ... | ... |
-
Please register or sign in to post a comment