Make the FMA waiter durable enough for a real multi-hour transfer
Constraint: The real dataset download lasts far longer than the waiter's original three-cycle lifetime, so the handoff process must survive unattended. Rejected: Repeatedly restarting a short-lived waiter by hand | That is fragile and defeats the point of automation. Confidence: high Scope-risk: narrow Directive: Keep the waiter long-lived by default and preserve progress logs so future sessions can see active polling immediately. Tested: Diagnosed the original max-cycles behavior, ran a short two-cycle verification showing archive growth, then relaunched the long-lived waiter and confirmed live process plus log output. Not-tested: The completed handoff path from full archive to extraction has not fired yet because the download is still in progress.
Showing
2 changed files
with
226 additions
and
5 deletions
| ... | @@ -20,21 +20,42 @@ def inspect() -> dict: | ... | @@ -20,21 +20,42 @@ def inspect() -> dict: |
| 20 | def main(): | 20 | def main(): |
| 21 | parser = argparse.ArgumentParser() | 21 | parser = argparse.ArgumentParser() |
| 22 | parser.add_argument("--interval", type=float, default=30.0) | 22 | parser.add_argument("--interval", type=float, default=30.0) |
| 23 | parser.add_argument("--max-cycles", type=int, default=3) | 23 | parser.add_argument( |
| 24 | "--max-cycles", | ||
| 25 | type=int, | ||
| 26 | default=0, | ||
| 27 | help="Number of polling cycles before exiting; 0 means wait indefinitely.", | ||
| 28 | ) | ||
| 24 | args = parser.parse_args() | 29 | args = parser.parse_args() |
| 25 | 30 | ||
| 26 | snapshots = [] | 31 | snapshots = [] |
| 27 | for _ in range(args.max_cycles): | 32 | cycle = 0 |
| 33 | while True: | ||
| 34 | cycle += 1 | ||
| 28 | snap = inspect() | 35 | snap = inspect() |
| 29 | snapshots.append(snap) | 36 | snapshots.append(snap) |
| 37 | print( | ||
| 38 | json.dumps( | ||
| 39 | { | ||
| 40 | "status": "polling", | ||
| 41 | "cycle": cycle, | ||
| 42 | "archive_size": snap.get("archive_size", 0), | ||
| 43 | "archive_bytes_expected": snap.get("archive_bytes_expected", 0), | ||
| 44 | "archive_progress_percent": snap.get("archive_progress_percent", 0.0), | ||
| 45 | }, | ||
| 46 | ensure_ascii=False, | ||
| 47 | ), | ||
| 48 | flush=True, | ||
| 49 | ) | ||
| 30 | if snap.get("archive_size", 0) >= snap.get("archive_bytes_expected", 0): | 50 | if snap.get("archive_size", 0) >= snap.get("archive_bytes_expected", 0): |
| 31 | result = json.loads(subprocess.check_output(POST, text=True)) | 51 | result = json.loads(subprocess.check_output(POST, text=True)) |
| 32 | print(json.dumps({"status": "completed", "snapshots": snapshots, "postdownload": result}, indent=2, ensure_ascii=False)) | 52 | print(json.dumps({"status": "completed", "snapshots": snapshots, "postdownload": result}, indent=2, ensure_ascii=False), flush=True) |
| 53 | return | ||
| 54 | if args.max_cycles and cycle >= args.max_cycles: | ||
| 55 | print(json.dumps({"status": "waiting", "snapshots": snapshots}, indent=2, ensure_ascii=False), flush=True) | ||
| 33 | return | 56 | return |
| 34 | time.sleep(args.interval) | 57 | time.sleep(args.interval) |
| 35 | 58 | ||
| 36 | print(json.dumps({"status": "waiting", "snapshots": snapshots}, indent=2, ensure_ascii=False)) | ||
| 37 | |||
| 38 | 59 | ||
| 39 | if __name__ == "__main__": | 60 | if __name__ == "__main__": |
| 40 | main() | 61 | main() | ... | ... |
| ... | @@ -2,6 +2,31 @@ | ... | @@ -2,6 +2,31 @@ |
| 2 | 2 | ||
| 3 | ## 2026-06-02 | 3 | ## 2026-06-02 |
| 4 | 4 | ||
| 5 | ### Stage: 真实 FMA 等待器寿命缺陷修复 | ||
| 6 | |||
| 7 | 完成项: | ||
| 8 | - 诊断 `wait_for_fma_and_prepare.py` 掉线原因 | ||
| 9 | - 确认原实现默认 `max-cycles=3`,约 90 秒后自然退出 | ||
| 10 | - 修复为:`max-cycles=0` 时无限等待,并在每轮轮询输出进度日志 | ||
| 11 | - 重新启动长期等待器并验证日志开始产生轮询输出 | ||
| 12 | |||
| 13 | 验证结果: | ||
| 14 | - 复检下载进度时已达到: | ||
| 15 | - `archive_size=3886596096` | ||
| 16 | - `archive_progress_percent=50.6094` | ||
| 17 | - 修复后执行: | ||
| 18 | - `/usr/local/miniconda3/bin/python scripts/wait_for_fma_and_prepare.py --interval 0.1 --max-cycles 2` | ||
| 19 | 返回两轮 `polling`,并确认字节继续增长到: | ||
| 20 | - `3977117696 -> 3977314304` | ||
| 21 | - `51.7881% -> 51.7907%` | ||
| 22 | - 长期等待器已重新启动: | ||
| 23 | - `/usr/local/miniconda3/bin/python scripts/wait_for_fma_and_prepare.py --interval 30` | ||
| 24 | - 日志文件 `.omx_wait_for_fma.log` 已开始输出轮询 JSON | ||
| 25 | |||
| 26 | 结论: | ||
| 27 | - 之前等待器掉线不是下载异常,而是脚本寿命设计过短 | ||
| 28 | - 现在真实 FMA 链路已具备可长期驻留的自动等待与后处理能力 | ||
| 29 | |||
| 5 | ### Stage: 真实 FMA 守护链路掉线恢复 | 30 | ### Stage: 真实 FMA 守护链路掉线恢复 |
| 6 | 31 | ||
| 7 | 完成项: | 32 | 完成项: |
| ... | @@ -969,6 +994,31 @@ | ... | @@ -969,6 +994,31 @@ |
| 969 | 994 | ||
| 970 | ## 2026-06-02 | 995 | ## 2026-06-02 |
| 971 | 996 | ||
| 997 | ### Stage: 真实 FMA 等待器寿命缺陷修复 | ||
| 998 | |||
| 999 | 完成项: | ||
| 1000 | - 诊断 `wait_for_fma_and_prepare.py` 掉线原因 | ||
| 1001 | - 确认原实现默认 `max-cycles=3`,约 90 秒后自然退出 | ||
| 1002 | - 修复为:`max-cycles=0` 时无限等待,并在每轮轮询输出进度日志 | ||
| 1003 | - 重新启动长期等待器并验证日志开始产生轮询输出 | ||
| 1004 | |||
| 1005 | 验证结果: | ||
| 1006 | - 复检下载进度时已达到: | ||
| 1007 | - `archive_size=3886596096` | ||
| 1008 | - `archive_progress_percent=50.6094` | ||
| 1009 | - 修复后执行: | ||
| 1010 | - `/usr/local/miniconda3/bin/python scripts/wait_for_fma_and_prepare.py --interval 0.1 --max-cycles 2` | ||
| 1011 | 返回两轮 `polling`,并确认字节继续增长到: | ||
| 1012 | - `3977117696 -> 3977314304` | ||
| 1013 | - `51.7881% -> 51.7907%` | ||
| 1014 | - 长期等待器已重新启动: | ||
| 1015 | - `/usr/local/miniconda3/bin/python scripts/wait_for_fma_and_prepare.py --interval 30` | ||
| 1016 | - 日志文件 `.omx_wait_for_fma.log` 已开始输出轮询 JSON | ||
| 1017 | |||
| 1018 | 结论: | ||
| 1019 | - 之前等待器掉线不是下载异常,而是脚本寿命设计过短 | ||
| 1020 | - 现在真实 FMA 链路已具备可长期驻留的自动等待与后处理能力 | ||
| 1021 | |||
| 972 | ### Stage: 真实 FMA 守护链路掉线恢复 | 1022 | ### Stage: 真实 FMA 守护链路掉线恢复 |
| 973 | 1023 | ||
| 974 | 完成项: | 1024 | 完成项: |
| ... | @@ -1146,6 +1196,31 @@ | ... | @@ -1146,6 +1196,31 @@ |
| 1146 | 1196 | ||
| 1147 | ## 2026-06-02 | 1197 | ## 2026-06-02 |
| 1148 | 1198 | ||
| 1199 | ### Stage: 真实 FMA 等待器寿命缺陷修复 | ||
| 1200 | |||
| 1201 | 完成项: | ||
| 1202 | - 诊断 `wait_for_fma_and_prepare.py` 掉线原因 | ||
| 1203 | - 确认原实现默认 `max-cycles=3`,约 90 秒后自然退出 | ||
| 1204 | - 修复为:`max-cycles=0` 时无限等待,并在每轮轮询输出进度日志 | ||
| 1205 | - 重新启动长期等待器并验证日志开始产生轮询输出 | ||
| 1206 | |||
| 1207 | 验证结果: | ||
| 1208 | - 复检下载进度时已达到: | ||
| 1209 | - `archive_size=3886596096` | ||
| 1210 | - `archive_progress_percent=50.6094` | ||
| 1211 | - 修复后执行: | ||
| 1212 | - `/usr/local/miniconda3/bin/python scripts/wait_for_fma_and_prepare.py --interval 0.1 --max-cycles 2` | ||
| 1213 | 返回两轮 `polling`,并确认字节继续增长到: | ||
| 1214 | - `3977117696 -> 3977314304` | ||
| 1215 | - `51.7881% -> 51.7907%` | ||
| 1216 | - 长期等待器已重新启动: | ||
| 1217 | - `/usr/local/miniconda3/bin/python scripts/wait_for_fma_and_prepare.py --interval 30` | ||
| 1218 | - 日志文件 `.omx_wait_for_fma.log` 已开始输出轮询 JSON | ||
| 1219 | |||
| 1220 | 结论: | ||
| 1221 | - 之前等待器掉线不是下载异常,而是脚本寿命设计过短 | ||
| 1222 | - 现在真实 FMA 链路已具备可长期驻留的自动等待与后处理能力 | ||
| 1223 | |||
| 1149 | ### Stage: 真实 FMA 守护链路掉线恢复 | 1224 | ### Stage: 真实 FMA 守护链路掉线恢复 |
| 1150 | 1225 | ||
| 1151 | 完成项: | 1226 | 完成项: |
| ... | @@ -1333,6 +1408,31 @@ | ... | @@ -1333,6 +1408,31 @@ |
| 1333 | 1408 | ||
| 1334 | ## 2026-06-02 | 1409 | ## 2026-06-02 |
| 1335 | 1410 | ||
| 1411 | ### Stage: 真实 FMA 等待器寿命缺陷修复 | ||
| 1412 | |||
| 1413 | 完成项: | ||
| 1414 | - 诊断 `wait_for_fma_and_prepare.py` 掉线原因 | ||
| 1415 | - 确认原实现默认 `max-cycles=3`,约 90 秒后自然退出 | ||
| 1416 | - 修复为:`max-cycles=0` 时无限等待,并在每轮轮询输出进度日志 | ||
| 1417 | - 重新启动长期等待器并验证日志开始产生轮询输出 | ||
| 1418 | |||
| 1419 | 验证结果: | ||
| 1420 | - 复检下载进度时已达到: | ||
| 1421 | - `archive_size=3886596096` | ||
| 1422 | - `archive_progress_percent=50.6094` | ||
| 1423 | - 修复后执行: | ||
| 1424 | - `/usr/local/miniconda3/bin/python scripts/wait_for_fma_and_prepare.py --interval 0.1 --max-cycles 2` | ||
| 1425 | 返回两轮 `polling`,并确认字节继续增长到: | ||
| 1426 | - `3977117696 -> 3977314304` | ||
| 1427 | - `51.7881% -> 51.7907%` | ||
| 1428 | - 长期等待器已重新启动: | ||
| 1429 | - `/usr/local/miniconda3/bin/python scripts/wait_for_fma_and_prepare.py --interval 30` | ||
| 1430 | - 日志文件 `.omx_wait_for_fma.log` 已开始输出轮询 JSON | ||
| 1431 | |||
| 1432 | 结论: | ||
| 1433 | - 之前等待器掉线不是下载异常,而是脚本寿命设计过短 | ||
| 1434 | - 现在真实 FMA 链路已具备可长期驻留的自动等待与后处理能力 | ||
| 1435 | |||
| 1336 | ### Stage: 真实 FMA 守护链路掉线恢复 | 1436 | ### Stage: 真实 FMA 守护链路掉线恢复 |
| 1337 | 1437 | ||
| 1338 | 完成项: | 1438 | 完成项: |
| ... | @@ -1510,6 +1610,31 @@ | ... | @@ -1510,6 +1610,31 @@ |
| 1510 | 1610 | ||
| 1511 | ## 2026-06-02 | 1611 | ## 2026-06-02 |
| 1512 | 1612 | ||
| 1613 | ### Stage: 真实 FMA 等待器寿命缺陷修复 | ||
| 1614 | |||
| 1615 | 完成项: | ||
| 1616 | - 诊断 `wait_for_fma_and_prepare.py` 掉线原因 | ||
| 1617 | - 确认原实现默认 `max-cycles=3`,约 90 秒后自然退出 | ||
| 1618 | - 修复为:`max-cycles=0` 时无限等待,并在每轮轮询输出进度日志 | ||
| 1619 | - 重新启动长期等待器并验证日志开始产生轮询输出 | ||
| 1620 | |||
| 1621 | 验证结果: | ||
| 1622 | - 复检下载进度时已达到: | ||
| 1623 | - `archive_size=3886596096` | ||
| 1624 | - `archive_progress_percent=50.6094` | ||
| 1625 | - 修复后执行: | ||
| 1626 | - `/usr/local/miniconda3/bin/python scripts/wait_for_fma_and_prepare.py --interval 0.1 --max-cycles 2` | ||
| 1627 | 返回两轮 `polling`,并确认字节继续增长到: | ||
| 1628 | - `3977117696 -> 3977314304` | ||
| 1629 | - `51.7881% -> 51.7907%` | ||
| 1630 | - 长期等待器已重新启动: | ||
| 1631 | - `/usr/local/miniconda3/bin/python scripts/wait_for_fma_and_prepare.py --interval 30` | ||
| 1632 | - 日志文件 `.omx_wait_for_fma.log` 已开始输出轮询 JSON | ||
| 1633 | |||
| 1634 | 结论: | ||
| 1635 | - 之前等待器掉线不是下载异常,而是脚本寿命设计过短 | ||
| 1636 | - 现在真实 FMA 链路已具备可长期驻留的自动等待与后处理能力 | ||
| 1637 | |||
| 1513 | ### Stage: 真实 FMA 守护链路掉线恢复 | 1638 | ### Stage: 真实 FMA 守护链路掉线恢复 |
| 1514 | 1639 | ||
| 1515 | 完成项: | 1640 | 完成项: |
| ... | @@ -1685,6 +1810,31 @@ | ... | @@ -1685,6 +1810,31 @@ |
| 1685 | 1810 | ||
| 1686 | ## 2026-06-02 | 1811 | ## 2026-06-02 |
| 1687 | 1812 | ||
| 1813 | ### Stage: 真实 FMA 等待器寿命缺陷修复 | ||
| 1814 | |||
| 1815 | 完成项: | ||
| 1816 | - 诊断 `wait_for_fma_and_prepare.py` 掉线原因 | ||
| 1817 | - 确认原实现默认 `max-cycles=3`,约 90 秒后自然退出 | ||
| 1818 | - 修复为:`max-cycles=0` 时无限等待,并在每轮轮询输出进度日志 | ||
| 1819 | - 重新启动长期等待器并验证日志开始产生轮询输出 | ||
| 1820 | |||
| 1821 | 验证结果: | ||
| 1822 | - 复检下载进度时已达到: | ||
| 1823 | - `archive_size=3886596096` | ||
| 1824 | - `archive_progress_percent=50.6094` | ||
| 1825 | - 修复后执行: | ||
| 1826 | - `/usr/local/miniconda3/bin/python scripts/wait_for_fma_and_prepare.py --interval 0.1 --max-cycles 2` | ||
| 1827 | 返回两轮 `polling`,并确认字节继续增长到: | ||
| 1828 | - `3977117696 -> 3977314304` | ||
| 1829 | - `51.7881% -> 51.7907%` | ||
| 1830 | - 长期等待器已重新启动: | ||
| 1831 | - `/usr/local/miniconda3/bin/python scripts/wait_for_fma_and_prepare.py --interval 30` | ||
| 1832 | - 日志文件 `.omx_wait_for_fma.log` 已开始输出轮询 JSON | ||
| 1833 | |||
| 1834 | 结论: | ||
| 1835 | - 之前等待器掉线不是下载异常,而是脚本寿命设计过短 | ||
| 1836 | - 现在真实 FMA 链路已具备可长期驻留的自动等待与后处理能力 | ||
| 1837 | |||
| 1688 | ### Stage: 真实 FMA 守护链路掉线恢复 | 1838 | ### Stage: 真实 FMA 守护链路掉线恢复 |
| 1689 | 1839 | ||
| 1690 | 完成项: | 1840 | 完成项: |
| ... | @@ -1858,6 +2008,31 @@ | ... | @@ -1858,6 +2008,31 @@ |
| 1858 | 2008 | ||
| 1859 | ## 2026-06-02 | 2009 | ## 2026-06-02 |
| 1860 | 2010 | ||
| 2011 | ### Stage: 真实 FMA 等待器寿命缺陷修复 | ||
| 2012 | |||
| 2013 | 完成项: | ||
| 2014 | - 诊断 `wait_for_fma_and_prepare.py` 掉线原因 | ||
| 2015 | - 确认原实现默认 `max-cycles=3`,约 90 秒后自然退出 | ||
| 2016 | - 修复为:`max-cycles=0` 时无限等待,并在每轮轮询输出进度日志 | ||
| 2017 | - 重新启动长期等待器并验证日志开始产生轮询输出 | ||
| 2018 | |||
| 2019 | 验证结果: | ||
| 2020 | - 复检下载进度时已达到: | ||
| 2021 | - `archive_size=3886596096` | ||
| 2022 | - `archive_progress_percent=50.6094` | ||
| 2023 | - 修复后执行: | ||
| 2024 | - `/usr/local/miniconda3/bin/python scripts/wait_for_fma_and_prepare.py --interval 0.1 --max-cycles 2` | ||
| 2025 | 返回两轮 `polling`,并确认字节继续增长到: | ||
| 2026 | - `3977117696 -> 3977314304` | ||
| 2027 | - `51.7881% -> 51.7907%` | ||
| 2028 | - 长期等待器已重新启动: | ||
| 2029 | - `/usr/local/miniconda3/bin/python scripts/wait_for_fma_and_prepare.py --interval 30` | ||
| 2030 | - 日志文件 `.omx_wait_for_fma.log` 已开始输出轮询 JSON | ||
| 2031 | |||
| 2032 | 结论: | ||
| 2033 | - 之前等待器掉线不是下载异常,而是脚本寿命设计过短 | ||
| 2034 | - 现在真实 FMA 链路已具备可长期驻留的自动等待与后处理能力 | ||
| 2035 | |||
| 1861 | ### Stage: 真实 FMA 守护链路掉线恢复 | 2036 | ### Stage: 真实 FMA 守护链路掉线恢复 |
| 1862 | 2037 | ||
| 1863 | 完成项: | 2038 | 完成项: |
| ... | @@ -2036,6 +2211,31 @@ | ... | @@ -2036,6 +2211,31 @@ |
| 2036 | 2211 | ||
| 2037 | ## 2026-06-02 | 2212 | ## 2026-06-02 |
| 2038 | 2213 | ||
| 2214 | ### Stage: 真实 FMA 等待器寿命缺陷修复 | ||
| 2215 | |||
| 2216 | 完成项: | ||
| 2217 | - 诊断 `wait_for_fma_and_prepare.py` 掉线原因 | ||
| 2218 | - 确认原实现默认 `max-cycles=3`,约 90 秒后自然退出 | ||
| 2219 | - 修复为:`max-cycles=0` 时无限等待,并在每轮轮询输出进度日志 | ||
| 2220 | - 重新启动长期等待器并验证日志开始产生轮询输出 | ||
| 2221 | |||
| 2222 | 验证结果: | ||
| 2223 | - 复检下载进度时已达到: | ||
| 2224 | - `archive_size=3886596096` | ||
| 2225 | - `archive_progress_percent=50.6094` | ||
| 2226 | - 修复后执行: | ||
| 2227 | - `/usr/local/miniconda3/bin/python scripts/wait_for_fma_and_prepare.py --interval 0.1 --max-cycles 2` | ||
| 2228 | 返回两轮 `polling`,并确认字节继续增长到: | ||
| 2229 | - `3977117696 -> 3977314304` | ||
| 2230 | - `51.7881% -> 51.7907%` | ||
| 2231 | - 长期等待器已重新启动: | ||
| 2232 | - `/usr/local/miniconda3/bin/python scripts/wait_for_fma_and_prepare.py --interval 30` | ||
| 2233 | - 日志文件 `.omx_wait_for_fma.log` 已开始输出轮询 JSON | ||
| 2234 | |||
| 2235 | 结论: | ||
| 2236 | - 之前等待器掉线不是下载异常,而是脚本寿命设计过短 | ||
| 2237 | - 现在真实 FMA 链路已具备可长期驻留的自动等待与后处理能力 | ||
| 2238 | |||
| 2039 | ### Stage: 真实 FMA 守护链路掉线恢复 | 2239 | ### Stage: 真实 FMA 守护链路掉线恢复 |
| 2040 | 2240 | ||
| 2041 | 完成项: | 2241 | 完成项: | ... | ... |
-
Please register or sign in to post a comment