Commit d2218523 d22185234ed7a636101d364b2d614cd6c803669d by cnb.bofCdSsphPA

Add explicit drop zones for real open-music corpora

Constraint: Replacing the synthetic stand-in with real FMA or MTG-Jamendo data should not require users to infer directory structure
Rejected: Leave only generic workflow text | Still forces users to guess where local audio should live before smoke runs
Confidence: high
Scope-risk: narrow
Directive: Keep future real-corpus onboarding anchored to data/raw drop zones and smoke-local commands
Tested: filesystem existence checks for acr-engine/data/raw/fma_small_audio, acr-engine/data/raw/mtg_jamendo_audio, acr-engine/data/raw/README.md, docs/README.md, docs/open-dataset-workflow.md, acr-engine/data/external_ingested/README.md
Not-tested: Real downloaded audio placed into the new drop zones
1 parent eee15aca
...@@ -13,6 +13,9 @@ Examples: ...@@ -13,6 +13,9 @@ Examples:
13 - [data/raw/fma_small_audio/](../raw/fma_small_audio/) 13 - [data/raw/fma_small_audio/](../raw/fma_small_audio/)
14 - [data/raw/mtg_jamendo_audio/](../raw/mtg_jamendo_audio/) 14 - [data/raw/mtg_jamendo_audio/](../raw/mtg_jamendo_audio/)
15 15
16 Drop-zone details:
17 - [data/raw/README.md](../raw/README.md)
18
16 ### 2. Generate manifests through the adapter entrypoint 19 ### 2. Generate manifests through the adapter entrypoint
17 Optional pre-check: 20 Optional pre-check:
18 ```bash 21 ```bash
......
1 # Local Open-Music Drop Zones
2
3 Put real downloaded open-music audio files here before running the one-shot smoke flow.
4
5 ## Recommended folders
6 - `data/raw/fma_small_audio/`
7 - `data/raw/mtg_jamendo_audio/`
8
9 ## Next command
10 For FMA:
11 ```bash
12 /usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local fma data/raw/fma_small_audio --output-root data/external_smoke --eval-ratio 0.2 --query-duration 8.0 --train-epochs 1 --batch-size 2
13 ```
14
15 For MTG-Jamendo:
16 ```bash
17 /usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local mtg_jamendo data/raw/mtg_jamendo_audio --output-root data/external_smoke --eval-ratio 0.2 --query-duration 8.0 --train-epochs 1 --batch-size 2
18 ```
...@@ -144,6 +144,26 @@ ...@@ -144,6 +144,26 @@
144 - 现在只要替换 `input_dir`,就能对真实 FMA / MTG-Jamendo 本地目录跑完整 smoke 144 - 现在只要替换 `input_dir`,就能对真实 FMA / MTG-Jamendo 本地目录跑完整 smoke
145 - 这显著降低了真实开放数据集接入和验证成本 145 - 这显著降低了真实开放数据集接入和验证成本
146 146
147 ### Stage: 真实开放数据落点目录模板
148
149 完成项:
150 - 新增 [acr-engine/data/raw/README.md](../acr-engine/data/raw/README.md)
151 - 新建本地开放数据落点目录:
152 - [acr-engine/data/raw/fma_small_audio/](../acr-engine/data/raw/fma_small_audio/)
153 - [acr-engine/data/raw/mtg_jamendo_audio/](../acr-engine/data/raw/mtg_jamendo_audio/)
154 - 将这些目录入口链接接入开放数据工作流与 docs 总入口
155
156 验证结果:
157 - 本地目录已创建:
158 - `data/raw/`
159 - `data/raw/fma_small_audio/`
160 - `data/raw/mtg_jamendo_audio/`
161 - `data/raw/README.md` 已包含可直接执行的下一条 smoke 命令模板
162
163 结论:
164 - 现在真实开放数据只需要放进明确目录即可
165 - 后续替换真实 FMA / MTG-Jamendo 本地音频时无需再猜目录结构
166
147 ### Stage: confused 定向优化 v6(sample-level weighting) 167 ### Stage: confused 定向优化 v6(sample-level weighting)
148 168
149 完成项: 169 完成项:
......
...@@ -63,6 +63,10 @@ flowchart TD ...@@ -63,6 +63,10 @@ flowchart TD
63 - [数据来源与接入](./dataset-sources-and-licensing.md) 63 - [数据来源与接入](./dataset-sources-and-licensing.md)
64 - [工业评测规范](./industrial-benchmark-spec.md) 64 - [工业评测规范](./industrial-benchmark-spec.md)
65 65
66 快速落地入口:
67 - [开放数据工作流](./open-dataset-workflow.md)
68 - [本地开放数据落点目录](../acr-engine/data/raw/README.md)
69
66 ### C. 服务与工程 70 ### C. 服务与工程
67 - [服务接口](./service-api.md) 71 - [服务接口](./service-api.md)
68 - [更新记录](./CHANGELOG.md) 72 - [更新记录](./CHANGELOG.md)
......
...@@ -69,6 +69,11 @@ flowchart LR ...@@ -69,6 +69,11 @@ flowchart LR
69 /usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local fma data/raw/fma_small_audio --output-root data/external_smoke --eval-ratio 0.2 --query-duration 8.0 --train-epochs 1 --batch-size 2 69 /usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local fma data/raw/fma_small_audio --output-root data/external_smoke --eval-ratio 0.2 --query-duration 8.0 --train-epochs 1 --batch-size 2
70 ``` 70 ```
71 71
72 真实目录放置位置可参考:
73 - [acr-engine/data/raw/README.md](../acr-engine/data/raw/README.md)
74 - [acr-engine/data/raw/fma_small_audio/](../acr-engine/data/raw/fma_small_audio/)
75 - [acr-engine/data/raw/mtg_jamendo_audio/](../acr-engine/data/raw/mtg_jamendo_audio/)
76
72 --- 77 ---
73 78
74 ## 4. 输出物说明 79 ## 4. 输出物说明
......