Recover stalled real-dataset transfer with a durable background resume path

Constraint: Long FMA archive downloads cannot rely on fragile foreground execution if Ralph-style work must continue across sessions Rejected: Keep manually reissuing foreground download commands after stalls | Increases interruption risk and weakens resumability evidence Confidence: high Scope-risk: narrow Directive: Prefer prepare_fma_archive.py bg-download for future large archive recovery so PID and log evidence remain standardized Tested: /usr/local/miniconda3/bin/python acr-engine/scripts/prepare_fma_archive.py bg-download; /usr/local/miniconda3/bin/python acr-engine/scripts/prepare_fma_archive.py inspect; tail -n 40 /tmp/fma_modelscope_download.log Not-tested: Full archive completion, extraction, and real-data smoke remain pending

Recover stalled real-dataset transfer with a durable background resume path
Constraint: Long FMA archive downloads cannot rely on fragile foreground execution if Ralph-style work must continue across sessions Rejected: Keep manually reissuing foreground download commands after stalls | Increases interruption risk and weakens resumability evidence Confidence: high Scope-risk: narrow Directive: Prefer prepare_fma_archive.py bg-download for future large archive recovery so PID and log evidence remain standardized Tested: /usr/local/miniconda3/bin/python acr-engine/scripts/prepare_fma_archive.py bg-download; /usr/local/miniconda3/bin/python acr-engine/scripts/prepare_fma_archive.py inspect; tail -n 40 /tmp/fma_modelscope_download.log Not-tested: Full archive completion, extraction, and real-data smoke remain pending
cnb.bofCdSsphPA
Commit 83a3f89f ... 83a3f89f5fe7f8fe3c81f2a13e2e1a2604069838 authored 2026-06-02 13:43:23 +0800 by cnb.bofCdSsphPA
Showing 2 changed files with 53 additions and 0 deletions
acr-engine/scripts/prepare_fma_archive.py
docs/CHANGELOG.md
--- a/acr-engine/scripts/prepare_fma_archive.py
View file @83a3f89
+++ b/acr-engine/scripts/prepare_fma_archive.py
View file @83a3f89
@@ -37,6 +37,28 @@ def download(resume: bool = True) -> dict:
    }


+def bg_download(log_path: Path, resume: bool = True) -> dict:
+    ARCHIVE_PATH.parent.mkdir(parents=True, exist_ok=True)
+    log_path.parent.mkdir(parents=True, exist_ok=True)
+    cmd = ["nohup", "curl", "-L"]
+    if resume:
+        cmd += ["--continue-at", "-"]
+    cmd += ["--output", str(ARCHIVE_PATH), FMA_SMALL_URL]
+    shell_cmd = " ".join(cmd) + f" >> {log_path} 2>&1 & echo $!"
+    proc = subprocess.run(["bash", "-lc", shell_cmd], text=True, capture_output=True)
+    pid = proc.stdout.strip()
+    return {
+        "action": "bg-download",
+        "returncode": proc.returncode,
+        "pid": pid,
+        "log_path": str(log_path.resolve()),
+        "archive_path": str(ARCHIVE_PATH.resolve()),
+        "archive_exists": ARCHIVE_PATH.exists(),
+        "archive_size": ARCHIVE_PATH.stat().st_size if ARCHIVE_PATH.exists() else 0,
+        "stderr_tail": proc.stderr[-1200:],
+    }
+
+
 def inspect() -> dict:
    archive_exists = ARCHIVE_PATH.exists()
    extract_exists = EXTRACT_DIR.exists()
@@ -95,6 +117,10 @@ def main():
    p = sub.add_parser("download")
    p.add_argument("--no-resume", action="store_true")

+    p = sub.add_parser("bg-download")
+    p.add_argument("--no-resume", action="store_true")
+    p.add_argument("--log-path", default="/tmp/fma_modelscope_download.log")
+
    sub.add_parser("inspect")

    p = sub.add_parser("extract")
@@ -103,6 +129,8 @@ def main():
    args = parser.parse_args()
    if args.cmd == "download":
        result = download(resume=not args.no_resume)
+    elif args.cmd == "bg-download":
+        result = bg_download(Path(args.log_path), resume=not args.no_resume)
    elif args.cmd == "inspect":
        result = inspect()
    elif args.cmd == "extract":
--- a/docs/CHANGELOG.md
View file @83a3f89
+++ b/docs/CHANGELOG.md
View file @83a3f89
@@ -231,6 +231,31 @@



+
+### Stage: FMA 后台续传恢复
+
+完成项：
+- 为 [acr-engine/scripts/prepare_fma_archive.py](../acr-engine/scripts/prepare_fma_archive.py) 新增 `bg-download`
+- 使用 `nohup curl` + 日志文件的方式增强大文件后台续传稳定性
+- 在发现下载停滞后，切换到新的后台恢复路径并重新托管 ModelScope 下载
+
+验证结果：
+- `/usr/local/miniconda3/bin/python scripts/prepare_fma_archive.py bg-download` 成功
+- 当前返回：
+  - `returncode=0`
+  - `pid=47175`
+  - `log_path=/tmp/fma_modelscope_download.log`
+- 重新 inspect 后结果：
+  - `archive_size` 从 `61550592` 增长到 `71835648`
+  - `archive_progress_percent=0.9354`
+- 日志验证：
+  - `Resuming transfer from byte position 61550592`
+  - 当前吞吐已达到 MB/s 级别
+
+结论：
+- FMA 真实数据下载不再依赖脆弱的一次性前台命令
+- 当前已恢复到可持续的后台续传状态，后续 session 更容易接力
+
 ### Stage: FMA 下载进度可视化

 完成项：