Commit 83a3f89f 83a3f89f5fe7f8fe3c81f2a13e2e1a2604069838 by cnb.bofCdSsphPA

Recover stalled real-dataset transfer with a durable background resume path

Constraint: Long FMA archive downloads cannot rely on fragile foreground execution if Ralph-style work must continue across sessions
Rejected: Keep manually reissuing foreground download commands after stalls | Increases interruption risk and weakens resumability evidence
Confidence: high
Scope-risk: narrow
Directive: Prefer prepare_fma_archive.py bg-download for future large archive recovery so PID and log evidence remain standardized
Tested: /usr/local/miniconda3/bin/python acr-engine/scripts/prepare_fma_archive.py bg-download; /usr/local/miniconda3/bin/python acr-engine/scripts/prepare_fma_archive.py inspect; tail -n 40 /tmp/fma_modelscope_download.log
Not-tested: Full archive completion, extraction, and real-data smoke remain pending
1 parent 730d9b90
......@@ -37,6 +37,28 @@ def download(resume: bool = True) -> dict:
}
def bg_download(log_path: Path, resume: bool = True) -> dict:
ARCHIVE_PATH.parent.mkdir(parents=True, exist_ok=True)
log_path.parent.mkdir(parents=True, exist_ok=True)
cmd = ["nohup", "curl", "-L"]
if resume:
cmd += ["--continue-at", "-"]
cmd += ["--output", str(ARCHIVE_PATH), FMA_SMALL_URL]
shell_cmd = " ".join(cmd) + f" >> {log_path} 2>&1 & echo $!"
proc = subprocess.run(["bash", "-lc", shell_cmd], text=True, capture_output=True)
pid = proc.stdout.strip()
return {
"action": "bg-download",
"returncode": proc.returncode,
"pid": pid,
"log_path": str(log_path.resolve()),
"archive_path": str(ARCHIVE_PATH.resolve()),
"archive_exists": ARCHIVE_PATH.exists(),
"archive_size": ARCHIVE_PATH.stat().st_size if ARCHIVE_PATH.exists() else 0,
"stderr_tail": proc.stderr[-1200:],
}
def inspect() -> dict:
archive_exists = ARCHIVE_PATH.exists()
extract_exists = EXTRACT_DIR.exists()
......@@ -95,6 +117,10 @@ def main():
p = sub.add_parser("download")
p.add_argument("--no-resume", action="store_true")
p = sub.add_parser("bg-download")
p.add_argument("--no-resume", action="store_true")
p.add_argument("--log-path", default="/tmp/fma_modelscope_download.log")
sub.add_parser("inspect")
p = sub.add_parser("extract")
......@@ -103,6 +129,8 @@ def main():
args = parser.parse_args()
if args.cmd == "download":
result = download(resume=not args.no_resume)
elif args.cmd == "bg-download":
result = bg_download(Path(args.log_path), resume=not args.no_resume)
elif args.cmd == "inspect":
result = inspect()
elif args.cmd == "extract":
......
......@@ -231,6 +231,31 @@
### Stage: FMA 后台续传恢复
完成项:
-[acr-engine/scripts/prepare_fma_archive.py](../acr-engine/scripts/prepare_fma_archive.py) 新增 `bg-download`
- 使用 `nohup curl` + 日志文件的方式增强大文件后台续传稳定性
- 在发现下载停滞后,切换到新的后台恢复路径并重新托管 ModelScope 下载
验证结果:
- `/usr/local/miniconda3/bin/python scripts/prepare_fma_archive.py bg-download` 成功
- 当前返回:
- `returncode=0`
- `pid=47175`
- `log_path=/tmp/fma_modelscope_download.log`
- 重新 inspect 后结果:
- `archive_size``61550592` 增长到 `71835648`
- `archive_progress_percent=0.9354`
- 日志验证:
- `Resuming transfer from byte position 61550592`
- 当前吞吐已达到 MB/s 级别
结论:
- FMA 真实数据下载不再依赖脆弱的一次性前台命令
- 当前已恢复到可持续的后台续传状态,后续 session 更容易接力
### Stage: FMA 下载进度可视化
完成项:
......