Commit d2218523 d22185234ed7a636101d364b2d614cd6c803669d by cnb.bofCdSsphPA

Add explicit drop zones for real open-music corpora

Constraint: Replacing the synthetic stand-in with real FMA or MTG-Jamendo data should not require users to infer directory structure
Rejected: Leave only generic workflow text | Still forces users to guess where local audio should live before smoke runs
Confidence: high
Scope-risk: narrow
Directive: Keep future real-corpus onboarding anchored to data/raw drop zones and smoke-local commands
Tested: filesystem existence checks for acr-engine/data/raw/fma_small_audio, acr-engine/data/raw/mtg_jamendo_audio, acr-engine/data/raw/README.md, docs/README.md, docs/open-dataset-workflow.md, acr-engine/data/external_ingested/README.md
Not-tested: Real downloaded audio placed into the new drop zones
1 parent eee15aca
......@@ -13,6 +13,9 @@ Examples:
- [data/raw/fma_small_audio/](../raw/fma_small_audio/)
- [data/raw/mtg_jamendo_audio/](../raw/mtg_jamendo_audio/)
Drop-zone details:
- [data/raw/README.md](../raw/README.md)
### 2. Generate manifests through the adapter entrypoint
Optional pre-check:
```bash
......
# Local Open-Music Drop Zones
Put real downloaded open-music audio files here before running the one-shot smoke flow.
## Recommended folders
- `data/raw/fma_small_audio/`
- `data/raw/mtg_jamendo_audio/`
## Next command
For FMA:
```bash
/usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local fma data/raw/fma_small_audio --output-root data/external_smoke --eval-ratio 0.2 --query-duration 8.0 --train-epochs 1 --batch-size 2
```
For MTG-Jamendo:
```bash
/usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local mtg_jamendo data/raw/mtg_jamendo_audio --output-root data/external_smoke --eval-ratio 0.2 --query-duration 8.0 --train-epochs 1 --batch-size 2
```
......@@ -144,6 +144,26 @@
- 现在只要替换 `input_dir`,就能对真实 FMA / MTG-Jamendo 本地目录跑完整 smoke
- 这显著降低了真实开放数据集接入和验证成本
### Stage: 真实开放数据落点目录模板
完成项:
- 新增 [acr-engine/data/raw/README.md](../acr-engine/data/raw/README.md)
- 新建本地开放数据落点目录:
- [acr-engine/data/raw/fma_small_audio/](../acr-engine/data/raw/fma_small_audio/)
- [acr-engine/data/raw/mtg_jamendo_audio/](../acr-engine/data/raw/mtg_jamendo_audio/)
- 将这些目录入口链接接入开放数据工作流与 docs 总入口
验证结果:
- 本地目录已创建:
- `data/raw/`
- `data/raw/fma_small_audio/`
- `data/raw/mtg_jamendo_audio/`
- `data/raw/README.md` 已包含可直接执行的下一条 smoke 命令模板
结论:
- 现在真实开放数据只需要放进明确目录即可
- 后续替换真实 FMA / MTG-Jamendo 本地音频时无需再猜目录结构
### Stage: confused 定向优化 v6(sample-level weighting)
完成项:
......
......@@ -63,6 +63,10 @@ flowchart TD
- [数据来源与接入](./dataset-sources-and-licensing.md)
- [工业评测规范](./industrial-benchmark-spec.md)
快速落地入口:
- [开放数据工作流](./open-dataset-workflow.md)
- [本地开放数据落点目录](../acr-engine/data/raw/README.md)
### C. 服务与工程
- [服务接口](./service-api.md)
- [更新记录](./CHANGELOG.md)
......
......@@ -69,6 +69,11 @@ flowchart LR
/usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local fma data/raw/fma_small_audio --output-root data/external_smoke --eval-ratio 0.2 --query-duration 8.0 --train-epochs 1 --batch-size 2
```
真实目录放置位置可参考:
- [acr-engine/data/raw/README.md](../acr-engine/data/raw/README.md)
- [acr-engine/data/raw/fma_small_audio/](../acr-engine/data/raw/fma_small_audio/)
- [acr-engine/data/raw/mtg_jamendo_audio/](../acr-engine/data/raw/mtg_jamendo_audio/)
---
## 4. 输出物说明
......