Reduce the handoff gap between archive completion and real-data readiness
Constraint: Once the large FMA archive finishes, future sessions should not need to manually stitch extraction and readiness checks together Rejected: Leave post-download steps as manual shell sequences | Increases delay and error risk at the most valuable transition point Confidence: high Scope-risk: narrow Directive: Keep fma_postdownload_ready.py as the canonical first command after archive completion before attempting real-data smoke runs Tested: /usr/local/miniconda3/bin/python -m py_compile acr-engine/scripts/fma_postdownload_ready.py; /usr/local/miniconda3/bin/python acr-engine/scripts/fma_postdownload_ready.py Not-tested: Successful extract and readiness on the full archive remain pending completion of the download
Showing
4 changed files
with
94 additions
and
0 deletions
acr-engine/scripts/fma_postdownload_ready.py
0 → 100755
| 1 | #!/usr/bin/env python3 | ||
| 2 | """Post-download automation for the real FMA small archive. | ||
| 3 | |||
| 4 | Runs extract + readiness checks once the archive is complete. | ||
| 5 | If the archive is incomplete, exits with a structured blocked result. | ||
| 6 | """ | ||
| 7 | |||
| 8 | from __future__ import annotations | ||
| 9 | |||
| 10 | import json | ||
| 11 | import subprocess | ||
| 12 | from pathlib import Path | ||
| 13 | |||
| 14 | PYTHON = "/usr/local/miniconda3/bin/python" | ||
| 15 | EXPECTED_BYTES = 7679594875 | ||
| 16 | ARCHIVE = Path("data/raw/fma_small.zip") | ||
| 17 | EXTRACT_DIR = Path("data/raw/fma_small_audio") | ||
| 18 | |||
| 19 | |||
| 20 | def run_json(cmd: list[str]) -> dict: | ||
| 21 | out = subprocess.check_output(cmd, text=True) | ||
| 22 | return json.loads(out) | ||
| 23 | |||
| 24 | |||
| 25 | def main(): | ||
| 26 | archive_exists = ARCHIVE.exists() | ||
| 27 | archive_size = ARCHIVE.stat().st_size if archive_exists else 0 | ||
| 28 | if (not archive_exists) or archive_size < EXPECTED_BYTES: | ||
| 29 | print(json.dumps({ | ||
| 30 | "status": "blocked", | ||
| 31 | "reason": "archive_not_complete", | ||
| 32 | "archive_exists": archive_exists, | ||
| 33 | "archive_size": archive_size, | ||
| 34 | "expected_bytes": EXPECTED_BYTES, | ||
| 35 | "progress_percent": round((archive_size / EXPECTED_BYTES) * 100, 4) if archive_exists else 0.0, | ||
| 36 | }, indent=2, ensure_ascii=False)) | ||
| 37 | return | ||
| 38 | |||
| 39 | extract = run_json([PYTHON, "scripts/prepare_fma_archive.py", "extract"]) | ||
| 40 | ready = run_json([PYTHON, "src/data/external_adapters.py", "check-local-ready", "fma", str(EXTRACT_DIR), "--eval-ratio", "0.2", "--query-duration", "8.0"]) | ||
| 41 | inspect = run_json([PYTHON, "src/data/external_adapters.py", "inspect-local", "fma", str(EXTRACT_DIR), "--eval-ratio", "0.2", "--query-duration", "8.0"]) | ||
| 42 | |||
| 43 | print(json.dumps({ | ||
| 44 | "status": "ok", | ||
| 45 | "extract": extract, | ||
| 46 | "ready": ready, | ||
| 47 | "inspect": inspect, | ||
| 48 | }, indent=2, ensure_ascii=False)) | ||
| 49 | |||
| 50 | |||
| 51 | if __name__ == "__main__": | ||
| 52 | main() |
| ... | @@ -236,6 +236,30 @@ | ... | @@ -236,6 +236,30 @@ |
| 236 | 236 | ||
| 237 | 237 | ||
| 238 | 238 | ||
| 239 | |||
| 240 | ### Stage: FMA 下载完成后自动就绪 | ||
| 241 | |||
| 242 | 完成项: | ||
| 243 | - 新增 [acr-engine/scripts/fma_postdownload_ready.py](../acr-engine/scripts/fma_postdownload_ready.py) | ||
| 244 | - 在 FMA 整包完成后,可自动执行: | ||
| 245 | - `extract` | ||
| 246 | - `check-local-ready` | ||
| 247 | - `inspect-local` | ||
| 248 | - 将该脚本挂接到 [docs/open-dataset-workflow.md](./open-dataset-workflow.md) 与 [docs/session-handoff.md](./session-handoff.md) | ||
| 249 | |||
| 250 | 验证结果: | ||
| 251 | - `/usr/local/miniconda3/bin/python -m py_compile scripts/fma_postdownload_ready.py` 成功 | ||
| 252 | - `/usr/local/miniconda3/bin/python scripts/fma_postdownload_ready.py` 成功返回结构化结果 | ||
| 253 | - 当前结果: | ||
| 254 | - `status=blocked` | ||
| 255 | - `reason=archive_not_complete` | ||
| 256 | - `archive_size=1631584256` | ||
| 257 | - `progress_percent=21.2457` | ||
| 258 | |||
| 259 | 结论: | ||
| 260 | - 真实 FMA 下载一旦完成,仓库已经具备单命令进入“解压 + 本地就绪检查”的能力 | ||
| 261 | - 后续 session 不需要再手工串接这些步骤 | ||
| 262 | |||
| 239 | ### Stage: pgvector bulk load plan 模板 | 263 | ### Stage: pgvector bulk load plan 模板 |
| 240 | 264 | ||
| 241 | 完成项: | 265 | 完成项: | ... | ... |
| ... | @@ -131,6 +131,22 @@ flowchart LR | ... | @@ -131,6 +131,22 @@ flowchart LR |
| 131 | 131 | ||
| 132 | --- | 132 | --- |
| 133 | 133 | ||
| 134 | |||
| 135 | ### FMA 下载完成后的单条准备命令 | ||
| 136 | |||
| 137 | ```bash | ||
| 138 | cd acr-engine | ||
| 139 | /usr/local/miniconda3/bin/python scripts/fma_postdownload_ready.py | ||
| 140 | ``` | ||
| 141 | |||
| 142 | 这个脚本会在归档完整时自动执行: | ||
| 143 | |||
| 144 | 1. `extract` | ||
| 145 | 2. `check-local-ready` | ||
| 146 | 3. `inspect-local` | ||
| 147 | |||
| 148 | 如果归档还没下完,会返回结构化 `archive_not_complete`。 | ||
| 149 | |||
| 134 | ## Sources | 150 | ## Sources |
| 135 | - See [dataset-spec.md](./dataset-spec.md) | 151 | - See [dataset-spec.md](./dataset-spec.md) |
| 136 | - See [dataset-sources-and-licensing.md](./dataset-sources-and-licensing.md) | 152 | - See [dataset-sources-and-licensing.md](./dataset-sources-and-licensing.md) | ... | ... |
| ... | @@ -317,3 +317,5 @@ | ... | @@ -317,3 +317,5 @@ |
| 317 | - [README.md](./README.md) | 317 | - [README.md](./README.md) |
| 318 | - [open-dataset-workflow.md](./open-dataset-workflow.md) | 318 | - [open-dataset-workflow.md](./open-dataset-workflow.md) |
| 319 | - [CHANGELOG.md](./CHANGELOG.md) | 319 | - [CHANGELOG.md](./CHANGELOG.md) |
| 320 | |||
| 321 | - FMA 下载完成后可直接执行:[acr-engine/scripts/fma_postdownload_ready.py](../acr-engine/scripts/fma_postdownload_ready.py) | ... | ... |
-
Please register or sign in to post a comment