Commit 46b9d8d4 46b9d8d40b465cffc6d512639e3fe33de04779da by cnb.bofCdSsphPA

Reduce the handoff gap between archive completion and real-data readiness

Constraint: Once the large FMA archive finishes, future sessions should not need to manually stitch extraction and readiness checks together
Rejected: Leave post-download steps as manual shell sequences | Increases delay and error risk at the most valuable transition point
Confidence: high
Scope-risk: narrow
Directive: Keep fma_postdownload_ready.py as the canonical first command after archive completion before attempting real-data smoke runs
Tested: /usr/local/miniconda3/bin/python -m py_compile acr-engine/scripts/fma_postdownload_ready.py; /usr/local/miniconda3/bin/python acr-engine/scripts/fma_postdownload_ready.py
Not-tested: Successful extract and readiness on the full archive remain pending completion of the download
1 parent 44bbfcb5
#!/usr/bin/env python3
"""Post-download automation for the real FMA small archive.
Runs extract + readiness checks once the archive is complete.
If the archive is incomplete, exits with a structured blocked result.
"""
from __future__ import annotations
import json
import subprocess
from pathlib import Path
PYTHON = "/usr/local/miniconda3/bin/python"
EXPECTED_BYTES = 7679594875
ARCHIVE = Path("data/raw/fma_small.zip")
EXTRACT_DIR = Path("data/raw/fma_small_audio")
def run_json(cmd: list[str]) -> dict:
out = subprocess.check_output(cmd, text=True)
return json.loads(out)
def main():
archive_exists = ARCHIVE.exists()
archive_size = ARCHIVE.stat().st_size if archive_exists else 0
if (not archive_exists) or archive_size < EXPECTED_BYTES:
print(json.dumps({
"status": "blocked",
"reason": "archive_not_complete",
"archive_exists": archive_exists,
"archive_size": archive_size,
"expected_bytes": EXPECTED_BYTES,
"progress_percent": round((archive_size / EXPECTED_BYTES) * 100, 4) if archive_exists else 0.0,
}, indent=2, ensure_ascii=False))
return
extract = run_json([PYTHON, "scripts/prepare_fma_archive.py", "extract"])
ready = run_json([PYTHON, "src/data/external_adapters.py", "check-local-ready", "fma", str(EXTRACT_DIR), "--eval-ratio", "0.2", "--query-duration", "8.0"])
inspect = run_json([PYTHON, "src/data/external_adapters.py", "inspect-local", "fma", str(EXTRACT_DIR), "--eval-ratio", "0.2", "--query-duration", "8.0"])
print(json.dumps({
"status": "ok",
"extract": extract,
"ready": ready,
"inspect": inspect,
}, indent=2, ensure_ascii=False))
if __name__ == "__main__":
main()
......@@ -236,6 +236,30 @@
### Stage: FMA 下载完成后自动就绪
完成项:
- 新增 [acr-engine/scripts/fma_postdownload_ready.py](../acr-engine/scripts/fma_postdownload_ready.py)
- 在 FMA 整包完成后,可自动执行:
- `extract`
- `check-local-ready`
- `inspect-local`
- 将该脚本挂接到 [docs/open-dataset-workflow.md](./open-dataset-workflow.md)[docs/session-handoff.md](./session-handoff.md)
验证结果:
- `/usr/local/miniconda3/bin/python -m py_compile scripts/fma_postdownload_ready.py` 成功
- `/usr/local/miniconda3/bin/python scripts/fma_postdownload_ready.py` 成功返回结构化结果
- 当前结果:
- `status=blocked`
- `reason=archive_not_complete`
- `archive_size=1631584256`
- `progress_percent=21.2457`
结论:
- 真实 FMA 下载一旦完成,仓库已经具备单命令进入“解压 + 本地就绪检查”的能力
- 后续 session 不需要再手工串接这些步骤
### Stage: pgvector bulk load plan 模板
完成项:
......
......@@ -131,6 +131,22 @@ flowchart LR
---
### FMA 下载完成后的单条准备命令
```bash
cd acr-engine
/usr/local/miniconda3/bin/python scripts/fma_postdownload_ready.py
```
这个脚本会在归档完整时自动执行:
1. `extract`
2. `check-local-ready`
3. `inspect-local`
如果归档还没下完,会返回结构化 `archive_not_complete`
## Sources
- See [dataset-spec.md](./dataset-spec.md)
- See [dataset-sources-and-licensing.md](./dataset-sources-and-licensing.md)
......
......@@ -317,3 +317,5 @@
- [README.md](./README.md)
- [open-dataset-workflow.md](./open-dataset-workflow.md)
- [CHANGELOG.md](./CHANGELOG.md)
- FMA 下载完成后可直接执行:[acr-engine/scripts/fma_postdownload_ready.py](../acr-engine/scripts/fma_postdownload_ready.py)
......