Commit fa231444 fa2314445d2af3ac0518ae6139dcdfa2a31b29e9 by cnb.bofCdSsphPA

Add a single-page open dataset workflow for training prep

Constraint: Open-dataset onboarding needed one short executable path instead of scattered instructions across many docs
Rejected: Leave ingestion knowledge split across multiple pages only | Raises setup friction before real FMA or MTG-Jamendo training
Confidence: high
Scope-risk: narrow
Directive: Use the single-page workflow as the default operator path before adding more open-dataset sources
Tested: /usr/local/miniconda3/bin/python src/data/external_adapters.py inspect-local fma data/synthetic_v2/songs --eval-ratio 0.2 --query-duration 5.0; /usr/local/miniconda3/bin/python src/data/external_adapters.py prepare-local fma data/synthetic_v2/songs --output-root data/external_ingested/synthetic_as_open --eval-ratio 0.2 --query-duration 5.0; /usr/local/miniconda3/bin/python src/data/external_adapters.py validate-local fma data/external_ingested/synthetic_as_open/fma/manifests
Not-tested: Real FMA or MTG-Jamendo local download directories
1 parent af33be35
1 [
2 {
3 "song_id": "fma_00000",
4 "audio_path": "songs/song_0000.wav",
5 "duration": 15.0,
6 "type": "reference",
7 "source_dataset": "fma"
8 },
9 {
10 "song_id": "fma_00001",
11 "audio_path": "songs/song_0001.wav",
12 "duration": 15.0,
13 "type": "reference",
14 "source_dataset": "fma"
15 },
16 {
17 "song_id": "fma_00002",
18 "audio_path": "songs/song_0002.wav",
19 "duration": 15.0,
20 "type": "reference",
21 "source_dataset": "fma"
22 },
23 {
24 "song_id": "fma_00003",
25 "audio_path": "songs/song_0003.wav",
26 "duration": 15.0,
27 "type": "reference",
28 "source_dataset": "fma"
29 },
30 {
31 "song_id": "fma_00004",
32 "audio_path": "songs/song_0004.wav",
33 "duration": 15.0,
34 "type": "reference",
35 "source_dataset": "fma"
36 },
37 {
38 "song_id": "fma_00005",
39 "audio_path": "songs/song_0005.wav",
40 "duration": 15.0,
41 "type": "reference",
42 "source_dataset": "fma"
43 },
44 {
45 "song_id": "fma_00006",
46 "audio_path": "songs/song_0006.wav",
47 "duration": 15.0,
48 "type": "reference",
49 "source_dataset": "fma"
50 },
51 {
52 "song_id": "fma_00007",
53 "audio_path": "songs/song_0007.wav",
54 "duration": 15.0,
55 "type": "reference",
56 "source_dataset": "fma"
57 },
58 {
59 "song_id": "fma_00008",
60 "audio_path": "songs/song_0008.wav",
61 "duration": 15.0,
62 "type": "reference",
63 "source_dataset": "fma"
64 },
65 {
66 "song_id": "fma_00009",
67 "audio_path": "songs/song_0009.wav",
68 "duration": 15.0,
69 "type": "reference",
70 "source_dataset": "fma"
71 },
72 {
73 "song_id": "fma_00010",
74 "audio_path": "songs/song_0010.wav",
75 "duration": 15.0,
76 "type": "reference",
77 "source_dataset": "fma"
78 },
79 {
80 "song_id": "fma_00011",
81 "audio_path": "songs/song_0011.wav",
82 "duration": 15.0,
83 "type": "reference",
84 "source_dataset": "fma"
85 },
86 {
87 "song_id": "fma_00012",
88 "audio_path": "songs/song_0012.wav",
89 "duration": 15.0,
90 "type": "reference",
91 "source_dataset": "fma"
92 },
93 {
94 "song_id": "fma_00013",
95 "audio_path": "songs/song_0013.wav",
96 "duration": 15.0,
97 "type": "reference",
98 "source_dataset": "fma"
99 },
100 {
101 "song_id": "fma_00014",
102 "audio_path": "songs/song_0014.wav",
103 "duration": 15.0,
104 "type": "reference",
105 "source_dataset": "fma"
106 },
107 {
108 "song_id": "fma_00015",
109 "audio_path": "songs/song_0015.wav",
110 "duration": 15.0,
111 "type": "reference",
112 "source_dataset": "fma"
113 },
114 {
115 "song_id": "fma_00016",
116 "audio_path": "songs/song_0016.wav",
117 "duration": 15.0,
118 "type": "reference",
119 "source_dataset": "fma"
120 },
121 {
122 "song_id": "fma_00017",
123 "audio_path": "songs/song_0017.wav",
124 "duration": 15.0,
125 "type": "reference",
126 "source_dataset": "fma"
127 },
128 {
129 "song_id": "fma_00018",
130 "audio_path": "songs/song_0018.wav",
131 "duration": 15.0,
132 "type": "reference",
133 "source_dataset": "fma"
134 },
135 {
136 "song_id": "fma_00019",
137 "audio_path": "songs/song_0019.wav",
138 "duration": 15.0,
139 "type": "reference",
140 "source_dataset": "fma"
141 },
142 {
143 "song_id": "fma_00020",
144 "audio_path": "songs/song_0020.wav",
145 "duration": 15.0,
146 "type": "reference",
147 "source_dataset": "fma"
148 },
149 {
150 "song_id": "fma_00021",
151 "audio_path": "songs/song_0021.wav",
152 "duration": 15.0,
153 "type": "reference",
154 "source_dataset": "fma"
155 },
156 {
157 "song_id": "fma_00022",
158 "audio_path": "songs/song_0022.wav",
159 "duration": 15.0,
160 "type": "reference",
161 "source_dataset": "fma"
162 },
163 {
164 "song_id": "fma_00023",
165 "audio_path": "songs/song_0023.wav",
166 "duration": 15.0,
167 "type": "reference",
168 "source_dataset": "fma"
169 }
170 ]
...\ No newline at end of file ...\ No newline at end of file
1 [
2 {
3 "song_id": "fma_00000",
4 "audio_path": "songs/song_0000.wav",
5 "duration": 5.0,
6 "type": "clean",
7 "offset": 6.394,
8 "segment_type": "external_query",
9 "source_dataset": "fma"
10 },
11 {
12 "song_id": "fma_00003",
13 "audio_path": "songs/song_0003.wav",
14 "duration": 5.0,
15 "type": "clean",
16 "offset": 8.922,
17 "segment_type": "external_query",
18 "source_dataset": "fma"
19 },
20 {
21 "song_id": "fma_00004",
22 "audio_path": "songs/song_0004.wav",
23 "duration": 5.0,
24 "type": "clean",
25 "offset": 4.219,
26 "segment_type": "external_query",
27 "source_dataset": "fma"
28 },
29 {
30 "song_id": "fma_00006",
31 "audio_path": "songs/song_0006.wav",
32 "duration": 5.0,
33 "type": "clean",
34 "offset": 0.265,
35 "segment_type": "external_query",
36 "source_dataset": "fma"
37 },
38 {
39 "song_id": "fma_00009",
40 "audio_path": "songs/song_0009.wav",
41 "duration": 5.0,
42 "type": "clean",
43 "offset": 8.094,
44 "segment_type": "external_query",
45 "source_dataset": "fma"
46 },
47 {
48 "song_id": "fma_00011",
49 "audio_path": "songs/song_0011.wav",
50 "duration": 5.0,
51 "type": "clean",
52 "offset": 3.403,
53 "segment_type": "external_query",
54 "source_dataset": "fma"
55 },
56 {
57 "song_id": "fma_00013",
58 "audio_path": "songs/song_0013.wav",
59 "duration": 5.0,
60 "type": "clean",
61 "offset": 0.927,
62 "segment_type": "external_query",
63 "source_dataset": "fma"
64 },
65 {
66 "song_id": "fma_00020",
67 "audio_path": "songs/song_0020.wav",
68 "duration": 5.0,
69 "type": "clean",
70 "offset": 7.046,
71 "segment_type": "external_query",
72 "source_dataset": "fma"
73 },
74 {
75 "song_id": "fma_00000",
76 "audio_path": "songs/song_0000.wav",
77 "duration": 15.0,
78 "type": "reference",
79 "source_dataset": "fma"
80 },
81 {
82 "song_id": "fma_00001",
83 "audio_path": "songs/song_0001.wav",
84 "duration": 15.0,
85 "type": "reference",
86 "source_dataset": "fma"
87 },
88 {
89 "song_id": "fma_00002",
90 "audio_path": "songs/song_0002.wav",
91 "duration": 15.0,
92 "type": "reference",
93 "source_dataset": "fma"
94 },
95 {
96 "song_id": "fma_00003",
97 "audio_path": "songs/song_0003.wav",
98 "duration": 15.0,
99 "type": "reference",
100 "source_dataset": "fma"
101 },
102 {
103 "song_id": "fma_00004",
104 "audio_path": "songs/song_0004.wav",
105 "duration": 15.0,
106 "type": "reference",
107 "source_dataset": "fma"
108 },
109 {
110 "song_id": "fma_00005",
111 "audio_path": "songs/song_0005.wav",
112 "duration": 15.0,
113 "type": "reference",
114 "source_dataset": "fma"
115 },
116 {
117 "song_id": "fma_00006",
118 "audio_path": "songs/song_0006.wav",
119 "duration": 15.0,
120 "type": "reference",
121 "source_dataset": "fma"
122 },
123 {
124 "song_id": "fma_00007",
125 "audio_path": "songs/song_0007.wav",
126 "duration": 15.0,
127 "type": "reference",
128 "source_dataset": "fma"
129 },
130 {
131 "song_id": "fma_00008",
132 "audio_path": "songs/song_0008.wav",
133 "duration": 15.0,
134 "type": "reference",
135 "source_dataset": "fma"
136 },
137 {
138 "song_id": "fma_00009",
139 "audio_path": "songs/song_0009.wav",
140 "duration": 15.0,
141 "type": "reference",
142 "source_dataset": "fma"
143 },
144 {
145 "song_id": "fma_00010",
146 "audio_path": "songs/song_0010.wav",
147 "duration": 15.0,
148 "type": "reference",
149 "source_dataset": "fma"
150 },
151 {
152 "song_id": "fma_00011",
153 "audio_path": "songs/song_0011.wav",
154 "duration": 15.0,
155 "type": "reference",
156 "source_dataset": "fma"
157 },
158 {
159 "song_id": "fma_00012",
160 "audio_path": "songs/song_0012.wav",
161 "duration": 15.0,
162 "type": "reference",
163 "source_dataset": "fma"
164 },
165 {
166 "song_id": "fma_00013",
167 "audio_path": "songs/song_0013.wav",
168 "duration": 15.0,
169 "type": "reference",
170 "source_dataset": "fma"
171 },
172 {
173 "song_id": "fma_00014",
174 "audio_path": "songs/song_0014.wav",
175 "duration": 15.0,
176 "type": "reference",
177 "source_dataset": "fma"
178 },
179 {
180 "song_id": "fma_00015",
181 "audio_path": "songs/song_0015.wav",
182 "duration": 15.0,
183 "type": "reference",
184 "source_dataset": "fma"
185 },
186 {
187 "song_id": "fma_00016",
188 "audio_path": "songs/song_0016.wav",
189 "duration": 15.0,
190 "type": "reference",
191 "source_dataset": "fma"
192 },
193 {
194 "song_id": "fma_00017",
195 "audio_path": "songs/song_0017.wav",
196 "duration": 15.0,
197 "type": "reference",
198 "source_dataset": "fma"
199 },
200 {
201 "song_id": "fma_00018",
202 "audio_path": "songs/song_0018.wav",
203 "duration": 15.0,
204 "type": "reference",
205 "source_dataset": "fma"
206 },
207 {
208 "song_id": "fma_00019",
209 "audio_path": "songs/song_0019.wav",
210 "duration": 15.0,
211 "type": "reference",
212 "source_dataset": "fma"
213 },
214 {
215 "song_id": "fma_00020",
216 "audio_path": "songs/song_0020.wav",
217 "duration": 15.0,
218 "type": "reference",
219 "source_dataset": "fma"
220 },
221 {
222 "song_id": "fma_00021",
223 "audio_path": "songs/song_0021.wav",
224 "duration": 15.0,
225 "type": "reference",
226 "source_dataset": "fma"
227 },
228 {
229 "song_id": "fma_00022",
230 "audio_path": "songs/song_0022.wav",
231 "duration": 15.0,
232 "type": "reference",
233 "source_dataset": "fma"
234 },
235 {
236 "song_id": "fma_00023",
237 "audio_path": "songs/song_0023.wav",
238 "duration": 15.0,
239 "type": "reference",
240 "source_dataset": "fma"
241 }
242 ]
...\ No newline at end of file ...\ No newline at end of file
1 [
2 {
3 "song_id": "fma_00001",
4 "audio_path": "songs/song_0001.wav",
5 "duration": 5.0,
6 "type": "clean",
7 "offset": 2.75,
8 "segment_type": "external_query",
9 "source_dataset": "fma"
10 },
11 {
12 "song_id": "fma_00002",
13 "audio_path": "songs/song_0002.wav",
14 "duration": 5.0,
15 "type": "clean",
16 "offset": 7.365,
17 "segment_type": "external_query",
18 "source_dataset": "fma"
19 },
20 {
21 "song_id": "fma_00005",
22 "audio_path": "songs/song_0005.wav",
23 "duration": 5.0,
24 "type": "clean",
25 "offset": 2.186,
26 "segment_type": "external_query",
27 "source_dataset": "fma"
28 },
29 {
30 "song_id": "fma_00007",
31 "audio_path": "songs/song_0007.wav",
32 "duration": 5.0,
33 "type": "clean",
34 "offset": 6.499,
35 "segment_type": "external_query",
36 "source_dataset": "fma"
37 },
38 {
39 "song_id": "fma_00008",
40 "audio_path": "songs/song_0008.wav",
41 "duration": 5.0,
42 "type": "clean",
43 "offset": 2.204,
44 "segment_type": "external_query",
45 "source_dataset": "fma"
46 },
47 {
48 "song_id": "fma_00010",
49 "audio_path": "songs/song_0010.wav",
50 "duration": 5.0,
51 "type": "clean",
52 "offset": 8.058,
53 "segment_type": "external_query",
54 "source_dataset": "fma"
55 },
56 {
57 "song_id": "fma_00012",
58 "audio_path": "songs/song_0012.wav",
59 "duration": 5.0,
60 "type": "clean",
61 "offset": 9.572,
62 "segment_type": "external_query",
63 "source_dataset": "fma"
64 },
65 {
66 "song_id": "fma_00014",
67 "audio_path": "songs/song_0014.wav",
68 "duration": 5.0,
69 "type": "clean",
70 "offset": 8.475,
71 "segment_type": "external_query",
72 "source_dataset": "fma"
73 },
74 {
75 "song_id": "fma_00015",
76 "audio_path": "songs/song_0015.wav",
77 "duration": 5.0,
78 "type": "clean",
79 "offset": 8.071,
80 "segment_type": "external_query",
81 "source_dataset": "fma"
82 },
83 {
84 "song_id": "fma_00016",
85 "audio_path": "songs/song_0016.wav",
86 "duration": 5.0,
87 "type": "clean",
88 "offset": 5.362,
89 "segment_type": "external_query",
90 "source_dataset": "fma"
91 },
92 {
93 "song_id": "fma_00017",
94 "audio_path": "songs/song_0017.wav",
95 "duration": 5.0,
96 "type": "clean",
97 "offset": 3.785,
98 "segment_type": "external_query",
99 "source_dataset": "fma"
100 },
101 {
102 "song_id": "fma_00018",
103 "audio_path": "songs/song_0018.wav",
104 "duration": 5.0,
105 "type": "clean",
106 "offset": 8.294,
107 "segment_type": "external_query",
108 "source_dataset": "fma"
109 },
110 {
111 "song_id": "fma_00019",
112 "audio_path": "songs/song_0019.wav",
113 "duration": 5.0,
114 "type": "clean",
115 "offset": 8.617,
116 "segment_type": "external_query",
117 "source_dataset": "fma"
118 },
119 {
120 "song_id": "fma_00021",
121 "audio_path": "songs/song_0021.wav",
122 "duration": 5.0,
123 "type": "clean",
124 "offset": 2.279,
125 "segment_type": "external_query",
126 "source_dataset": "fma"
127 },
128 {
129 "song_id": "fma_00022",
130 "audio_path": "songs/song_0022.wav",
131 "duration": 5.0,
132 "type": "clean",
133 "offset": 0.798,
134 "segment_type": "external_query",
135 "source_dataset": "fma"
136 },
137 {
138 "song_id": "fma_00023",
139 "audio_path": "songs/song_0023.wav",
140 "duration": 5.0,
141 "type": "clean",
142 "offset": 1.01,
143 "segment_type": "external_query",
144 "source_dataset": "fma"
145 },
146 {
147 "song_id": "fma_00000",
148 "audio_path": "songs/song_0000.wav",
149 "duration": 15.0,
150 "type": "reference",
151 "source_dataset": "fma"
152 },
153 {
154 "song_id": "fma_00001",
155 "audio_path": "songs/song_0001.wav",
156 "duration": 15.0,
157 "type": "reference",
158 "source_dataset": "fma"
159 },
160 {
161 "song_id": "fma_00002",
162 "audio_path": "songs/song_0002.wav",
163 "duration": 15.0,
164 "type": "reference",
165 "source_dataset": "fma"
166 },
167 {
168 "song_id": "fma_00003",
169 "audio_path": "songs/song_0003.wav",
170 "duration": 15.0,
171 "type": "reference",
172 "source_dataset": "fma"
173 },
174 {
175 "song_id": "fma_00004",
176 "audio_path": "songs/song_0004.wav",
177 "duration": 15.0,
178 "type": "reference",
179 "source_dataset": "fma"
180 },
181 {
182 "song_id": "fma_00005",
183 "audio_path": "songs/song_0005.wav",
184 "duration": 15.0,
185 "type": "reference",
186 "source_dataset": "fma"
187 },
188 {
189 "song_id": "fma_00006",
190 "audio_path": "songs/song_0006.wav",
191 "duration": 15.0,
192 "type": "reference",
193 "source_dataset": "fma"
194 },
195 {
196 "song_id": "fma_00007",
197 "audio_path": "songs/song_0007.wav",
198 "duration": 15.0,
199 "type": "reference",
200 "source_dataset": "fma"
201 },
202 {
203 "song_id": "fma_00008",
204 "audio_path": "songs/song_0008.wav",
205 "duration": 15.0,
206 "type": "reference",
207 "source_dataset": "fma"
208 },
209 {
210 "song_id": "fma_00009",
211 "audio_path": "songs/song_0009.wav",
212 "duration": 15.0,
213 "type": "reference",
214 "source_dataset": "fma"
215 },
216 {
217 "song_id": "fma_00010",
218 "audio_path": "songs/song_0010.wav",
219 "duration": 15.0,
220 "type": "reference",
221 "source_dataset": "fma"
222 },
223 {
224 "song_id": "fma_00011",
225 "audio_path": "songs/song_0011.wav",
226 "duration": 15.0,
227 "type": "reference",
228 "source_dataset": "fma"
229 },
230 {
231 "song_id": "fma_00012",
232 "audio_path": "songs/song_0012.wav",
233 "duration": 15.0,
234 "type": "reference",
235 "source_dataset": "fma"
236 },
237 {
238 "song_id": "fma_00013",
239 "audio_path": "songs/song_0013.wav",
240 "duration": 15.0,
241 "type": "reference",
242 "source_dataset": "fma"
243 },
244 {
245 "song_id": "fma_00014",
246 "audio_path": "songs/song_0014.wav",
247 "duration": 15.0,
248 "type": "reference",
249 "source_dataset": "fma"
250 },
251 {
252 "song_id": "fma_00015",
253 "audio_path": "songs/song_0015.wav",
254 "duration": 15.0,
255 "type": "reference",
256 "source_dataset": "fma"
257 },
258 {
259 "song_id": "fma_00016",
260 "audio_path": "songs/song_0016.wav",
261 "duration": 15.0,
262 "type": "reference",
263 "source_dataset": "fma"
264 },
265 {
266 "song_id": "fma_00017",
267 "audio_path": "songs/song_0017.wav",
268 "duration": 15.0,
269 "type": "reference",
270 "source_dataset": "fma"
271 },
272 {
273 "song_id": "fma_00018",
274 "audio_path": "songs/song_0018.wav",
275 "duration": 15.0,
276 "type": "reference",
277 "source_dataset": "fma"
278 },
279 {
280 "song_id": "fma_00019",
281 "audio_path": "songs/song_0019.wav",
282 "duration": 15.0,
283 "type": "reference",
284 "source_dataset": "fma"
285 },
286 {
287 "song_id": "fma_00020",
288 "audio_path": "songs/song_0020.wav",
289 "duration": 15.0,
290 "type": "reference",
291 "source_dataset": "fma"
292 },
293 {
294 "song_id": "fma_00021",
295 "audio_path": "songs/song_0021.wav",
296 "duration": 15.0,
297 "type": "reference",
298 "source_dataset": "fma"
299 },
300 {
301 "song_id": "fma_00022",
302 "audio_path": "songs/song_0022.wav",
303 "duration": 15.0,
304 "type": "reference",
305 "source_dataset": "fma"
306 },
307 {
308 "song_id": "fma_00023",
309 "audio_path": "songs/song_0023.wav",
310 "duration": 15.0,
311 "type": "reference",
312 "source_dataset": "fma"
313 }
314 ]
...\ No newline at end of file ...\ No newline at end of file
1 []
...\ No newline at end of file ...\ No newline at end of file
...@@ -25,6 +25,31 @@ ...@@ -25,6 +25,31 @@
25 - 读者不再需要先面对大量平铺文件名 25 - 读者不再需要先面对大量平铺文件名
26 - 相对路径现在更适合直接跳转 26 - 相对路径现在更适合直接跳转
27 27
28 ### Stage: 开放数据单页工作流
29
30 完成项:
31 - 新增 [docs/open-dataset-workflow.md](./open-dataset-workflow.md)
32 - 把开放数据接入流程浓缩为:
33 - `inspect-local / inspect-batch`
34 - `prepare-local`
35 - `validate-local`
36 - 将该工作流挂到 [docs/README.md](./README.md) 的“数据与评测”组下
37
38 验证结果:
39 - `/usr/local/miniconda3/bin/python src/data/external_adapters.py inspect-local fma data/synthetic_v2/songs --eval-ratio 0.2 --query-duration 5.0` 成功
40 - `/usr/local/miniconda3/bin/python src/data/external_adapters.py prepare-local fma data/synthetic_v2/songs --output-root data/external_ingested/synthetic_as_open --eval-ratio 0.2 --query-duration 5.0` 成功
41 - `/usr/local/miniconda3/bin/python src/data/external_adapters.py validate-local fma data/external_ingested/synthetic_as_open/fma/manifests` 成功
42 - 当前结果:
43 - `num_audio_files=24`
44 - `catalog=24`
45 - `train_queries=16`
46 - `test_queries=8`
47 - `ok=true`
48
49 结论:
50 - 现在开放数据接入路径已经浓缩成单页可执行工作流
51 - 后续接真实 FMA / MTG-Jamendo 本地目录时,上手成本更低
52
28 ### Stage: confused 定向优化 v6(sample-level weighting) 53 ### Stage: confused 定向优化 v6(sample-level weighting)
29 54
30 完成项: 55 完成项:
......
...@@ -59,6 +59,7 @@ flowchart TD ...@@ -59,6 +59,7 @@ flowchart TD
59 59
60 ### B. 数据与评测 60 ### B. 数据与评测
61 - [数据规范](./dataset-spec.md) 61 - [数据规范](./dataset-spec.md)
62 - [开放数据工作流](./open-dataset-workflow.md)
62 - [数据来源与接入](./dataset-sources-and-licensing.md) 63 - [数据来源与接入](./dataset-sources-and-licensing.md)
63 - [工业评测规范](./industrial-benchmark-spec.md) 64 - [工业评测规范](./industrial-benchmark-spec.md)
64 65
......
...@@ -18,6 +18,9 @@ ...@@ -18,6 +18,9 @@
18 - CCMusic / ModelScope:优先当补充评估或探索来源 18 - CCMusic / ModelScope:优先当补充评估或探索来源
19 - 保留 license 注记,但不再把“商用阻塞”作为个人实验主阻塞 19 - 保留 license 注记,但不再把“商用阻塞”作为个人实验主阻塞
20 20
21 推荐先读:
22 - [开放数据工作流](./open-dataset-workflow.md)
23
21 建议接入顺序: 24 建议接入顺序:
22 1. 下载/准备 FMA 或 MTG-Jamendo 的本地音频目录 25 1. 下载/准备 FMA 或 MTG-Jamendo 的本地音频目录
23 2. 运行 [acr-engine/src/data/external_adapters.py](../acr-engine/src/data/external_adapters.py) `inspect-local``inspect-batch` 26 2. 运行 [acr-engine/src/data/external_adapters.py](../acr-engine/src/data/external_adapters.py) `inspect-local``inspect-batch`
......
1 # Open Dataset Workflow / 开放数据工作流
2
3 > 更新:2026-06-02
4
5 ## 一页结论
6
7 如果你要把 FMA / MTG-Jamendo 这类开源音乐目录真正接进项目,推荐只记住这一条链路:
8
9 1. **inspect-local / inspect-batch**
10 2. **prepare-local**
11 3. **validate-local**
12 4. 再进入训练与评估
13
14 ---
15
16 ## 1. 工作流图
17
18 ```mermaid
19 flowchart LR
20 A[Local Open Audio Dir] --> B[inspect-local / inspect-batch]
21 B --> C[prepare-local]
22 C --> D[validate-local]
23 D --> E[train.json]
24 D --> F[test.json]
25 ```
26
27 ---
28
29 ## 2. 最短命令表
30
31 | 步骤 | 命令 | 作用 |
32 |---|---|---|
33 | 预检查 | [`src/data/external_adapters.py`](../acr-engine/src/data/external_adapters.py) `inspect-local ...` | 看规模是否足够 |
34 | 批量比较 | [`src/data/external_adapters.py`](../acr-engine/src/data/external_adapters.py) `inspect-batch ...` | 比较多个候选目录 |
35 | 生成清单 | [`src/data/external_adapters.py`](../acr-engine/src/data/external_adapters.py) `prepare-local ...` | 产出 train/test/catalog |
36 | 训练前校验 | [`src/data/external_adapters.py`](../acr-engine/src/data/external_adapters.py) `validate-local ...` | 确认结构正确 |
37
38 ---
39
40 ## 3. 推荐顺序
41
42 ### 3.1 单目录
43
44 ```bash
45 /usr/local/miniconda3/bin/python src/data/external_adapters.py inspect-local fma data/raw/fma_small_audio --eval-ratio 0.2 --query-duration 8.0
46 /usr/local/miniconda3/bin/python src/data/external_adapters.py prepare-local fma data/raw/fma_small_audio --output-root data/external_ingested --eval-ratio 0.2 --query-duration 8.0
47 /usr/local/miniconda3/bin/python src/data/external_adapters.py validate-local fma data/external_ingested/fma/manifests
48 ```
49
50 ### 3.2 多目录比较
51
52 ```bash
53 /usr/local/miniconda3/bin/python src/data/external_adapters.py inspect-batch fma=data/raw/fma_small_audio mtg_jamendo=data/raw/mtg_jamendo_audio --eval-ratio 0.2 --query-duration 8.0
54 ```
55
56 ---
57
58 ## 4. 输出物说明
59
60 - [catalog.json](../acr-engine/data/external_ingested/demo_via_adapter/fma/manifests/catalog.json):建索引用 reference 清单
61 - [train.json](../acr-engine/data/external_ingested/demo_via_adapter/fma/manifests/train.json):训练 queries + references
62 - [test.json](../acr-engine/data/external_ingested/demo_via_adapter/fma/manifests/test.json):固定评估 queries + references
63 - [val.json](../acr-engine/data/external_ingested/demo_via_adapter/fma/manifests/val.json):可选验证集
64
65 ---
66
67 ## 5. 当前验证证据
68
69 已在本地 `data/synthetic_v2/songs` 上按开放数据流程跑通:
70
71 - `inspect-local`
72 - `num_audio_files=24`
73 - `recommended_train_queries=19`
74 - `recommended_test_queries=5`
75 - `prepare-local`
76 - `catalog=24`
77 - `train_queries=16`
78 - `test_queries=8`
79 - `validate-local`
80 - `ok=true`
81
82 ---
83
84 ## Sources
85 - See [dataset-spec.md](./dataset-spec.md)
86 - See [dataset-sources-and-licensing.md](./dataset-sources-and-licensing.md)