Commit 7ea9b1d0 7ea9b1d020663d259f3eeb968f396c216f937e70 by cnb.bofCdSsphPA

Separate local tooling issues from upstream FMA URL breakage

Constraint: Real-data progress requires proving whether failures come from our environment or from changed upstream access paths
Rejected: Keep treating the fetch blocker as a missing-tool problem | Would misdirect future debugging after yt-dlp module support was verified
Confidence: high
Scope-risk: narrow
Directive: Do not retry historical FMA page URLs again unless a fresh source confirms their return; pivot to official archives or stable mirrors instead
Tested: which yt-dlp || true; /usr/local/miniconda3/bin/python -m yt_dlp --version; /usr/local/miniconda3/bin/python -m py_compile acr-engine/scripts/fetch_fma_subset.py; /usr/local/miniconda3/bin/python acr-engine/scripts/fetch_fma_subset.py --report acr-engine/reports/fma_fetch_subset_report.json
Not-tested: Successful real FMA download still pending a valid upstream archive or mirror URL
1 parent b32e002b
......@@ -2,105 +2,156 @@
"output_dir": "/workspace/acr-engine/data/raw/fma_small_audio",
"requested": 8,
"downloaded": 0,
"existing": 0,
"failures": [
{
"track_id": 2,
"status": "http_error",
"code": 403,
"url": "https://files.freemusicarchive.org/storage-freemusicarchive-org/music/000/000002.mp3"
},
{
"track_id": 5,
"status": "http_error",
"code": 403,
"url": "https://files.freemusicarchive.org/storage-freemusicarchive-org/music/000/000005.mp3"
},
{
"track_id": 10,
"status": "http_error",
"code": 403,
"url": "https://files.freemusicarchive.org/storage-freemusicarchive-org/music/000/000010.mp3"
},
{
"track_id": 20,
"status": "http_error",
"code": 403,
"url": "https://files.freemusicarchive.org/storage-freemusicarchive-org/music/000/000020.mp3"
},
{
"track_id": 26,
"status": "http_error",
"code": 403,
"url": "https://files.freemusicarchive.org/storage-freemusicarchive-org/music/000/000026.mp3"
},
{
"track_id": 30,
"status": "http_error",
"code": 403,
"url": "https://files.freemusicarchive.org/storage-freemusicarchive-org/music/000/000030.mp3"
},
{
"track_id": 46,
"status": "http_error",
"code": 403,
"url": "https://files.freemusicarchive.org/storage-freemusicarchive-org/music/000/000046.mp3"
},
{
"track_id": 48,
"status": "http_error",
"code": 403,
"url": "https://files.freemusicarchive.org/storage-freemusicarchive-org/music/000/000048.mp3"
}
"failed": 8,
"ytdlp_cmd": [
"/usr/local/miniconda3/bin/python",
"-m",
"yt_dlp"
],
"results": [
{
"track_id": 2,
"status": "http_error",
"code": 403,
"url": "https://files.freemusicarchive.org/storage-freemusicarchive-org/music/000/000002.mp3"
"url": "https://freemusicarchive.org/music/track/2",
"status": "failed",
"returncode": 1,
"stdout": "[generic] Extracting URL: https://freemusicarchive.org/music/track/2\n[generic] 2: Downloading webpage\n",
"stderr": "ERROR: [generic] Unable to download webpage: HTTP Error 404: Not Found (caused by <HTTPError 404: Not Found>)\n",
"command": [
"/usr/local/miniconda3/bin/python",
"-m",
"yt_dlp",
"--no-playlist",
"-o",
"data/raw/fma_small_audio/%(id)s.%(ext)s",
"--no-overwrites",
"https://freemusicarchive.org/music/track/2"
]
},
{
"track_id": 5,
"status": "http_error",
"code": 403,
"url": "https://files.freemusicarchive.org/storage-freemusicarchive-org/music/000/000005.mp3"
"url": "https://freemusicarchive.org/music/track/5",
"status": "failed",
"returncode": 1,
"stdout": "[generic] Extracting URL: https://freemusicarchive.org/music/track/5\n[generic] 5: Downloading webpage\n",
"stderr": "ERROR: [generic] Unable to download webpage: HTTP Error 404: Not Found (caused by <HTTPError 404: Not Found>)\n",
"command": [
"/usr/local/miniconda3/bin/python",
"-m",
"yt_dlp",
"--no-playlist",
"-o",
"data/raw/fma_small_audio/%(id)s.%(ext)s",
"--no-overwrites",
"https://freemusicarchive.org/music/track/5"
]
},
{
"track_id": 10,
"status": "http_error",
"code": 403,
"url": "https://files.freemusicarchive.org/storage-freemusicarchive-org/music/000/000010.mp3"
"url": "https://freemusicarchive.org/music/track/10",
"status": "failed",
"returncode": 1,
"stdout": "[generic] Extracting URL: https://freemusicarchive.org/music/track/10\n[generic] 10: Downloading webpage\n",
"stderr": "ERROR: [generic] Unable to download webpage: HTTP Error 404: Not Found (caused by <HTTPError 404: Not Found>)\n",
"command": [
"/usr/local/miniconda3/bin/python",
"-m",
"yt_dlp",
"--no-playlist",
"-o",
"data/raw/fma_small_audio/%(id)s.%(ext)s",
"--no-overwrites",
"https://freemusicarchive.org/music/track/10"
]
},
{
"track_id": 20,
"status": "http_error",
"code": 403,
"url": "https://files.freemusicarchive.org/storage-freemusicarchive-org/music/000/000020.mp3"
"url": "https://freemusicarchive.org/music/track/20",
"status": "failed",
"returncode": 1,
"stdout": "[generic] Extracting URL: https://freemusicarchive.org/music/track/20\n[generic] 20: Downloading webpage\n",
"stderr": "ERROR: [generic] Unable to download webpage: HTTP Error 404: Not Found (caused by <HTTPError 404: Not Found>)\n",
"command": [
"/usr/local/miniconda3/bin/python",
"-m",
"yt_dlp",
"--no-playlist",
"-o",
"data/raw/fma_small_audio/%(id)s.%(ext)s",
"--no-overwrites",
"https://freemusicarchive.org/music/track/20"
]
},
{
"track_id": 26,
"status": "http_error",
"code": 403,
"url": "https://files.freemusicarchive.org/storage-freemusicarchive-org/music/000/000026.mp3"
"url": "https://freemusicarchive.org/music/track/26",
"status": "failed",
"returncode": 1,
"stdout": "[generic] Extracting URL: https://freemusicarchive.org/music/track/26\n[generic] 26: Downloading webpage\n",
"stderr": "ERROR: [generic] Unable to download webpage: HTTP Error 404: Not Found (caused by <HTTPError 404: Not Found>)\n",
"command": [
"/usr/local/miniconda3/bin/python",
"-m",
"yt_dlp",
"--no-playlist",
"-o",
"data/raw/fma_small_audio/%(id)s.%(ext)s",
"--no-overwrites",
"https://freemusicarchive.org/music/track/26"
]
},
{
"track_id": 30,
"status": "http_error",
"code": 403,
"url": "https://files.freemusicarchive.org/storage-freemusicarchive-org/music/000/000030.mp3"
"url": "https://freemusicarchive.org/music/track/30",
"status": "failed",
"returncode": 1,
"stdout": "[generic] Extracting URL: https://freemusicarchive.org/music/track/30\n[generic] 30: Downloading webpage\n",
"stderr": "ERROR: [generic] Unable to download webpage: HTTP Error 404: Not Found (caused by <HTTPError 404: Not Found>)\n",
"command": [
"/usr/local/miniconda3/bin/python",
"-m",
"yt_dlp",
"--no-playlist",
"-o",
"data/raw/fma_small_audio/%(id)s.%(ext)s",
"--no-overwrites",
"https://freemusicarchive.org/music/track/30"
]
},
{
"track_id": 46,
"status": "http_error",
"code": 403,
"url": "https://files.freemusicarchive.org/storage-freemusicarchive-org/music/000/000046.mp3"
"url": "https://freemusicarchive.org/music/track/46",
"status": "failed",
"returncode": 1,
"stdout": "[generic] Extracting URL: https://freemusicarchive.org/music/track/46\n[generic] 46: Downloading webpage\n",
"stderr": "ERROR: [generic] Unable to download webpage: HTTP Error 404: Not Found (caused by <HTTPError 404: Not Found>)\n",
"command": [
"/usr/local/miniconda3/bin/python",
"-m",
"yt_dlp",
"--no-playlist",
"-o",
"data/raw/fma_small_audio/%(id)s.%(ext)s",
"--no-overwrites",
"https://freemusicarchive.org/music/track/46"
]
},
{
"track_id": 48,
"status": "http_error",
"code": 403,
"url": "https://files.freemusicarchive.org/storage-freemusicarchive-org/music/000/000048.mp3"
"url": "https://freemusicarchive.org/music/track/48",
"status": "failed",
"returncode": 1,
"stdout": "[generic] Extracting URL: https://freemusicarchive.org/music/track/48\n[generic] 48: Downloading webpage\n",
"stderr": "ERROR: [generic] Unable to download webpage: HTTP Error 404: Not Found (caused by <HTTPError 404: Not Found>)\n",
"command": [
"/usr/local/miniconda3/bin/python",
"-m",
"yt_dlp",
"--no-playlist",
"-o",
"data/raw/fma_small_audio/%(id)s.%(ext)s",
"--no-overwrites",
"https://freemusicarchive.org/music/track/48"
]
}
]
}
\ No newline at end of file
......
......@@ -11,24 +11,29 @@ from pathlib import Path
DEFAULT_TRACK_IDS = [2, 5, 10, 20, 26, 30, 46, 48]
FMA_TRACK_URL = "https://freemusicarchive.org/music/track/{track_id}"
PYTHON = "/usr/local/miniconda3/bin/python"
def ensure_ytdlp() -> str:
path = shutil.which("yt-dlp")
if not path:
def ensure_ytdlp_cmd() -> list[str]:
shell_entry = shutil.which("yt-dlp")
if shell_entry:
return [shell_entry]
probe = subprocess.run([PYTHON, "-m", "yt_dlp", "--version"], text=True, capture_output=True)
if probe.returncode == 0:
return [PYTHON, "-m", "yt_dlp"]
raise SystemExit(json.dumps({
"status": "blocked",
"reason": "yt_dlp_missing",
"recommendation": "Install yt-dlp or provide local FMA audio manually into data/raw/fma_small_audio",
"stderr": probe.stderr[-1200:],
}, indent=2, ensure_ascii=False))
return path
def fetch_one(track_id: int, output_dir: Path, ytdlp: str, overwrite: bool = False) -> dict:
def fetch_one(track_id: int, output_dir: Path, ytdlp_cmd: list[str], overwrite: bool = False) -> dict:
outtmpl = str(output_dir / "%(id)s.%(ext)s")
url = FMA_TRACK_URL.format(track_id=track_id)
cmd = [
ytdlp,
*ytdlp_cmd,
"--no-playlist",
"-o", outtmpl,
]
......@@ -43,6 +48,7 @@ def fetch_one(track_id: int, output_dir: Path, ytdlp: str, overwrite: bool = Fal
"returncode": proc.returncode,
"stdout": proc.stdout[-1200:],
"stderr": proc.stderr[-1200:],
"command": cmd,
}
......@@ -56,14 +62,15 @@ def main():
output_dir = Path(args.output_dir)
output_dir.mkdir(parents=True, exist_ok=True)
ytdlp = ensure_ytdlp()
ytdlp_cmd = ensure_ytdlp_cmd()
results = [fetch_one(track_id, output_dir, ytdlp, overwrite=args.overwrite) for track_id in args.track_ids]
results = [fetch_one(track_id, output_dir, ytdlp_cmd, overwrite=args.overwrite) for track_id in args.track_ids]
summary = {
"output_dir": str(output_dir.resolve()),
"requested": len(args.track_ids),
"downloaded": sum(1 for x in results if x["status"] == "downloaded"),
"failed": sum(1 for x in results if x["status"] != "downloaded"),
"ytdlp_cmd": ytdlp_cmd,
"results": results,
}
text = json.dumps(summary, indent=2, ensure_ascii=False)
......
......@@ -224,6 +224,30 @@
### Stage: FMA 下载器模块调用修复
完成项:
- 修复 [acr-engine/scripts/fetch_fma_subset.py](../acr-engine/scripts/fetch_fma_subset.py)`yt-dlp` 检测方式
- 从 shell 可执行查找改为优先支持:
- `yt-dlp` 可执行文件
- `/usr/local/miniconda3/bin/python -m yt_dlp` 模块调用
- 重新执行真实 FMA bounded 下载验证
验证结果:
- `which yt-dlp` 仍为空
- `/usr/local/miniconda3/bin/python -m yt_dlp --version` 成功,版本 `2026.03.17`
- `/usr/local/miniconda3/bin/python scripts/fetch_fma_subset.py --report reports/fma_fetch_subset_report.json` 成功执行
- 当前结果:
- `ytdlp_cmd=["/usr/local/miniconda3/bin/python", "-m", "yt_dlp"]`
- 8/8 请求均失败
- 失败原因已从“工具缺失”收敛为 `freemusicarchive.org/music/track/<id>` 返回 `HTTP 404`
结论:
- 下载脚本的模块调用问题已经修复
- 当前真实阻塞不再是本地环境,而是 FMA 历史页面 URL 路径已不可用
- 下一步应转向官方整包或稳定镜像,而不是继续重试旧页面 URL
### Stage: FMA 真实下载脚手架
完成项:
......
......@@ -276,7 +276,7 @@
- [docs/session-handoff.md](./session-handoff.md)
- [docs/current-capability-map.md](./current-capability-map.md)
- [acr-engine/FIRST_RUN_CHECKLIST.md](../acr-engine/FIRST_RUN_CHECKLIST.md)
- FMA 真实子集下载脚手架已存在:[acr-engine/scripts/fetch_fma_subset.py](../acr-engine/scripts/fetch_fma_subset.py);最近验证结果是旧直链 `403`当前环境缺 `yt-dlp`
- FMA 真实子集下载脚手架已存在:[acr-engine/scripts/fetch_fma_subset.py](../acr-engine/scripts/fetch_fma_subset.py);最近验证结果是旧直链 `403`随后已修复为 `python -m yt_dlp` 调用,但页面级历史 URL 又返回 `404`
- 运行 [acr-engine/scripts/status_snapshot.py](../acr-engine/scripts/status_snapshot.py)
- 或直接查看最新落盘快照:`acr-engine/.omx/latest_status_snapshot.json`
......