Commit eee15aca eee15aca7bf6230c2bcb57b19f424c8741c892a8 by cnb.bofCdSsphPA

Automate the full open-dataset smoke workflow behind one command

Constraint: Real FMA or MTG-Jamendo onboarding should require only an input directory change, not a long manual command chain
Rejected: Keep the smoke steps separate only | Slows repeated validation and increases operator error risk
Confidence: high
Scope-risk: moderate
Directive: Use smoke-local as the default first-pass validation path for every new local open-music corpus
Tested: /usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local fma data/synthetic_v2/songs --output-root data/external_smoke --eval-ratio 0.2 --query-duration 5.0 --train-epochs 1 --batch-size 2; /usr/local/miniconda3/bin/python -m py_compile src/data/external_adapters.py src/data/manifest_tools.py train.py run_demo.py evaluate.py scripts/generate_artifacts.py
Not-tested: Real downloaded FMA or MTG-Jamendo directories on larger-scale smoke runs
1 parent 87959076
Showing 42 changed files with 1050 additions and 0 deletions
1 [
2 {
3 "song_id": "fma_00000",
4 "audio_path": "audio/fma_00000.wav",
5 "duration": 15.0,
6 "type": "reference",
7 "source_dataset": "fma"
8 },
9 {
10 "song_id": "fma_00001",
11 "audio_path": "audio/fma_00001.wav",
12 "duration": 15.0,
13 "type": "reference",
14 "source_dataset": "fma"
15 },
16 {
17 "song_id": "fma_00002",
18 "audio_path": "audio/fma_00002.wav",
19 "duration": 15.0,
20 "type": "reference",
21 "source_dataset": "fma"
22 },
23 {
24 "song_id": "fma_00003",
25 "audio_path": "audio/fma_00003.wav",
26 "duration": 15.0,
27 "type": "reference",
28 "source_dataset": "fma"
29 },
30 {
31 "song_id": "fma_00004",
32 "audio_path": "audio/fma_00004.wav",
33 "duration": 15.0,
34 "type": "reference",
35 "source_dataset": "fma"
36 },
37 {
38 "song_id": "fma_00005",
39 "audio_path": "audio/fma_00005.wav",
40 "duration": 15.0,
41 "type": "reference",
42 "source_dataset": "fma"
43 },
44 {
45 "song_id": "fma_00006",
46 "audio_path": "audio/fma_00006.wav",
47 "duration": 15.0,
48 "type": "reference",
49 "source_dataset": "fma"
50 },
51 {
52 "song_id": "fma_00007",
53 "audio_path": "audio/fma_00007.wav",
54 "duration": 15.0,
55 "type": "reference",
56 "source_dataset": "fma"
57 },
58 {
59 "song_id": "fma_00008",
60 "audio_path": "audio/fma_00008.wav",
61 "duration": 15.0,
62 "type": "reference",
63 "source_dataset": "fma"
64 },
65 {
66 "song_id": "fma_00009",
67 "audio_path": "audio/fma_00009.wav",
68 "duration": 15.0,
69 "type": "reference",
70 "source_dataset": "fma"
71 },
72 {
73 "song_id": "fma_00010",
74 "audio_path": "audio/fma_00010.wav",
75 "duration": 15.0,
76 "type": "reference",
77 "source_dataset": "fma"
78 },
79 {
80 "song_id": "fma_00011",
81 "audio_path": "audio/fma_00011.wav",
82 "duration": 15.0,
83 "type": "reference",
84 "source_dataset": "fma"
85 },
86 {
87 "song_id": "fma_00012",
88 "audio_path": "audio/fma_00012.wav",
89 "duration": 15.0,
90 "type": "reference",
91 "source_dataset": "fma"
92 },
93 {
94 "song_id": "fma_00013",
95 "audio_path": "audio/fma_00013.wav",
96 "duration": 15.0,
97 "type": "reference",
98 "source_dataset": "fma"
99 },
100 {
101 "song_id": "fma_00014",
102 "audio_path": "audio/fma_00014.wav",
103 "duration": 15.0,
104 "type": "reference",
105 "source_dataset": "fma"
106 },
107 {
108 "song_id": "fma_00015",
109 "audio_path": "audio/fma_00015.wav",
110 "duration": 15.0,
111 "type": "reference",
112 "source_dataset": "fma"
113 },
114 {
115 "song_id": "fma_00016",
116 "audio_path": "audio/fma_00016.wav",
117 "duration": 15.0,
118 "type": "reference",
119 "source_dataset": "fma"
120 },
121 {
122 "song_id": "fma_00017",
123 "audio_path": "audio/fma_00017.wav",
124 "duration": 15.0,
125 "type": "reference",
126 "source_dataset": "fma"
127 },
128 {
129 "song_id": "fma_00018",
130 "audio_path": "audio/fma_00018.wav",
131 "duration": 15.0,
132 "type": "reference",
133 "source_dataset": "fma"
134 },
135 {
136 "song_id": "fma_00019",
137 "audio_path": "audio/fma_00019.wav",
138 "duration": 15.0,
139 "type": "reference",
140 "source_dataset": "fma"
141 },
142 {
143 "song_id": "fma_00020",
144 "audio_path": "audio/fma_00020.wav",
145 "duration": 15.0,
146 "type": "reference",
147 "source_dataset": "fma"
148 },
149 {
150 "song_id": "fma_00021",
151 "audio_path": "audio/fma_00021.wav",
152 "duration": 15.0,
153 "type": "reference",
154 "source_dataset": "fma"
155 },
156 {
157 "song_id": "fma_00022",
158 "audio_path": "audio/fma_00022.wav",
159 "duration": 15.0,
160 "type": "reference",
161 "source_dataset": "fma"
162 },
163 {
164 "song_id": "fma_00023",
165 "audio_path": "audio/fma_00023.wav",
166 "duration": 15.0,
167 "type": "reference",
168 "source_dataset": "fma"
169 }
170 ]
...\ No newline at end of file ...\ No newline at end of file
1 [
2 {
3 "song_id": "fma_00000",
4 "audio_path": "audio/fma_00000.wav",
5 "duration": 5.0,
6 "type": "clean",
7 "offset": 6.394,
8 "segment_type": "external_query",
9 "source_dataset": "fma"
10 },
11 {
12 "song_id": "fma_00003",
13 "audio_path": "audio/fma_00003.wav",
14 "duration": 5.0,
15 "type": "clean",
16 "offset": 8.922,
17 "segment_type": "external_query",
18 "source_dataset": "fma"
19 },
20 {
21 "song_id": "fma_00004",
22 "audio_path": "audio/fma_00004.wav",
23 "duration": 5.0,
24 "type": "clean",
25 "offset": 4.219,
26 "segment_type": "external_query",
27 "source_dataset": "fma"
28 },
29 {
30 "song_id": "fma_00006",
31 "audio_path": "audio/fma_00006.wav",
32 "duration": 5.0,
33 "type": "clean",
34 "offset": 0.265,
35 "segment_type": "external_query",
36 "source_dataset": "fma"
37 },
38 {
39 "song_id": "fma_00009",
40 "audio_path": "audio/fma_00009.wav",
41 "duration": 5.0,
42 "type": "clean",
43 "offset": 8.094,
44 "segment_type": "external_query",
45 "source_dataset": "fma"
46 },
47 {
48 "song_id": "fma_00011",
49 "audio_path": "audio/fma_00011.wav",
50 "duration": 5.0,
51 "type": "clean",
52 "offset": 3.403,
53 "segment_type": "external_query",
54 "source_dataset": "fma"
55 },
56 {
57 "song_id": "fma_00013",
58 "audio_path": "audio/fma_00013.wav",
59 "duration": 5.0,
60 "type": "clean",
61 "offset": 0.927,
62 "segment_type": "external_query",
63 "source_dataset": "fma"
64 },
65 {
66 "song_id": "fma_00020",
67 "audio_path": "audio/fma_00020.wav",
68 "duration": 5.0,
69 "type": "clean",
70 "offset": 7.046,
71 "segment_type": "external_query",
72 "source_dataset": "fma"
73 },
74 {
75 "song_id": "fma_00000",
76 "audio_path": "audio/fma_00000.wav",
77 "duration": 15.0,
78 "type": "reference",
79 "source_dataset": "fma"
80 },
81 {
82 "song_id": "fma_00001",
83 "audio_path": "audio/fma_00001.wav",
84 "duration": 15.0,
85 "type": "reference",
86 "source_dataset": "fma"
87 },
88 {
89 "song_id": "fma_00002",
90 "audio_path": "audio/fma_00002.wav",
91 "duration": 15.0,
92 "type": "reference",
93 "source_dataset": "fma"
94 },
95 {
96 "song_id": "fma_00003",
97 "audio_path": "audio/fma_00003.wav",
98 "duration": 15.0,
99 "type": "reference",
100 "source_dataset": "fma"
101 },
102 {
103 "song_id": "fma_00004",
104 "audio_path": "audio/fma_00004.wav",
105 "duration": 15.0,
106 "type": "reference",
107 "source_dataset": "fma"
108 },
109 {
110 "song_id": "fma_00005",
111 "audio_path": "audio/fma_00005.wav",
112 "duration": 15.0,
113 "type": "reference",
114 "source_dataset": "fma"
115 },
116 {
117 "song_id": "fma_00006",
118 "audio_path": "audio/fma_00006.wav",
119 "duration": 15.0,
120 "type": "reference",
121 "source_dataset": "fma"
122 },
123 {
124 "song_id": "fma_00007",
125 "audio_path": "audio/fma_00007.wav",
126 "duration": 15.0,
127 "type": "reference",
128 "source_dataset": "fma"
129 },
130 {
131 "song_id": "fma_00008",
132 "audio_path": "audio/fma_00008.wav",
133 "duration": 15.0,
134 "type": "reference",
135 "source_dataset": "fma"
136 },
137 {
138 "song_id": "fma_00009",
139 "audio_path": "audio/fma_00009.wav",
140 "duration": 15.0,
141 "type": "reference",
142 "source_dataset": "fma"
143 },
144 {
145 "song_id": "fma_00010",
146 "audio_path": "audio/fma_00010.wav",
147 "duration": 15.0,
148 "type": "reference",
149 "source_dataset": "fma"
150 },
151 {
152 "song_id": "fma_00011",
153 "audio_path": "audio/fma_00011.wav",
154 "duration": 15.0,
155 "type": "reference",
156 "source_dataset": "fma"
157 },
158 {
159 "song_id": "fma_00012",
160 "audio_path": "audio/fma_00012.wav",
161 "duration": 15.0,
162 "type": "reference",
163 "source_dataset": "fma"
164 },
165 {
166 "song_id": "fma_00013",
167 "audio_path": "audio/fma_00013.wav",
168 "duration": 15.0,
169 "type": "reference",
170 "source_dataset": "fma"
171 },
172 {
173 "song_id": "fma_00014",
174 "audio_path": "audio/fma_00014.wav",
175 "duration": 15.0,
176 "type": "reference",
177 "source_dataset": "fma"
178 },
179 {
180 "song_id": "fma_00015",
181 "audio_path": "audio/fma_00015.wav",
182 "duration": 15.0,
183 "type": "reference",
184 "source_dataset": "fma"
185 },
186 {
187 "song_id": "fma_00016",
188 "audio_path": "audio/fma_00016.wav",
189 "duration": 15.0,
190 "type": "reference",
191 "source_dataset": "fma"
192 },
193 {
194 "song_id": "fma_00017",
195 "audio_path": "audio/fma_00017.wav",
196 "duration": 15.0,
197 "type": "reference",
198 "source_dataset": "fma"
199 },
200 {
201 "song_id": "fma_00018",
202 "audio_path": "audio/fma_00018.wav",
203 "duration": 15.0,
204 "type": "reference",
205 "source_dataset": "fma"
206 },
207 {
208 "song_id": "fma_00019",
209 "audio_path": "audio/fma_00019.wav",
210 "duration": 15.0,
211 "type": "reference",
212 "source_dataset": "fma"
213 },
214 {
215 "song_id": "fma_00020",
216 "audio_path": "audio/fma_00020.wav",
217 "duration": 15.0,
218 "type": "reference",
219 "source_dataset": "fma"
220 },
221 {
222 "song_id": "fma_00021",
223 "audio_path": "audio/fma_00021.wav",
224 "duration": 15.0,
225 "type": "reference",
226 "source_dataset": "fma"
227 },
228 {
229 "song_id": "fma_00022",
230 "audio_path": "audio/fma_00022.wav",
231 "duration": 15.0,
232 "type": "reference",
233 "source_dataset": "fma"
234 },
235 {
236 "song_id": "fma_00023",
237 "audio_path": "audio/fma_00023.wav",
238 "duration": 15.0,
239 "type": "reference",
240 "source_dataset": "fma"
241 }
242 ]
...\ No newline at end of file ...\ No newline at end of file
1 [
2 {
3 "song_id": "fma_00001",
4 "audio_path": "audio/fma_00001.wav",
5 "duration": 5.0,
6 "type": "clean",
7 "offset": 2.75,
8 "segment_type": "external_query",
9 "source_dataset": "fma"
10 },
11 {
12 "song_id": "fma_00002",
13 "audio_path": "audio/fma_00002.wav",
14 "duration": 5.0,
15 "type": "clean",
16 "offset": 7.365,
17 "segment_type": "external_query",
18 "source_dataset": "fma"
19 },
20 {
21 "song_id": "fma_00005",
22 "audio_path": "audio/fma_00005.wav",
23 "duration": 5.0,
24 "type": "clean",
25 "offset": 2.186,
26 "segment_type": "external_query",
27 "source_dataset": "fma"
28 },
29 {
30 "song_id": "fma_00007",
31 "audio_path": "audio/fma_00007.wav",
32 "duration": 5.0,
33 "type": "clean",
34 "offset": 6.499,
35 "segment_type": "external_query",
36 "source_dataset": "fma"
37 },
38 {
39 "song_id": "fma_00008",
40 "audio_path": "audio/fma_00008.wav",
41 "duration": 5.0,
42 "type": "clean",
43 "offset": 2.204,
44 "segment_type": "external_query",
45 "source_dataset": "fma"
46 },
47 {
48 "song_id": "fma_00010",
49 "audio_path": "audio/fma_00010.wav",
50 "duration": 5.0,
51 "type": "clean",
52 "offset": 8.058,
53 "segment_type": "external_query",
54 "source_dataset": "fma"
55 },
56 {
57 "song_id": "fma_00012",
58 "audio_path": "audio/fma_00012.wav",
59 "duration": 5.0,
60 "type": "clean",
61 "offset": 9.572,
62 "segment_type": "external_query",
63 "source_dataset": "fma"
64 },
65 {
66 "song_id": "fma_00014",
67 "audio_path": "audio/fma_00014.wav",
68 "duration": 5.0,
69 "type": "clean",
70 "offset": 8.475,
71 "segment_type": "external_query",
72 "source_dataset": "fma"
73 },
74 {
75 "song_id": "fma_00015",
76 "audio_path": "audio/fma_00015.wav",
77 "duration": 5.0,
78 "type": "clean",
79 "offset": 8.071,
80 "segment_type": "external_query",
81 "source_dataset": "fma"
82 },
83 {
84 "song_id": "fma_00016",
85 "audio_path": "audio/fma_00016.wav",
86 "duration": 5.0,
87 "type": "clean",
88 "offset": 5.362,
89 "segment_type": "external_query",
90 "source_dataset": "fma"
91 },
92 {
93 "song_id": "fma_00017",
94 "audio_path": "audio/fma_00017.wav",
95 "duration": 5.0,
96 "type": "clean",
97 "offset": 3.785,
98 "segment_type": "external_query",
99 "source_dataset": "fma"
100 },
101 {
102 "song_id": "fma_00018",
103 "audio_path": "audio/fma_00018.wav",
104 "duration": 5.0,
105 "type": "clean",
106 "offset": 8.294,
107 "segment_type": "external_query",
108 "source_dataset": "fma"
109 },
110 {
111 "song_id": "fma_00019",
112 "audio_path": "audio/fma_00019.wav",
113 "duration": 5.0,
114 "type": "clean",
115 "offset": 8.617,
116 "segment_type": "external_query",
117 "source_dataset": "fma"
118 },
119 {
120 "song_id": "fma_00021",
121 "audio_path": "audio/fma_00021.wav",
122 "duration": 5.0,
123 "type": "clean",
124 "offset": 2.279,
125 "segment_type": "external_query",
126 "source_dataset": "fma"
127 },
128 {
129 "song_id": "fma_00022",
130 "audio_path": "audio/fma_00022.wav",
131 "duration": 5.0,
132 "type": "clean",
133 "offset": 0.798,
134 "segment_type": "external_query",
135 "source_dataset": "fma"
136 },
137 {
138 "song_id": "fma_00023",
139 "audio_path": "audio/fma_00023.wav",
140 "duration": 5.0,
141 "type": "clean",
142 "offset": 1.01,
143 "segment_type": "external_query",
144 "source_dataset": "fma"
145 },
146 {
147 "song_id": "fma_00000",
148 "audio_path": "audio/fma_00000.wav",
149 "duration": 15.0,
150 "type": "reference",
151 "source_dataset": "fma"
152 },
153 {
154 "song_id": "fma_00001",
155 "audio_path": "audio/fma_00001.wav",
156 "duration": 15.0,
157 "type": "reference",
158 "source_dataset": "fma"
159 },
160 {
161 "song_id": "fma_00002",
162 "audio_path": "audio/fma_00002.wav",
163 "duration": 15.0,
164 "type": "reference",
165 "source_dataset": "fma"
166 },
167 {
168 "song_id": "fma_00003",
169 "audio_path": "audio/fma_00003.wav",
170 "duration": 15.0,
171 "type": "reference",
172 "source_dataset": "fma"
173 },
174 {
175 "song_id": "fma_00004",
176 "audio_path": "audio/fma_00004.wav",
177 "duration": 15.0,
178 "type": "reference",
179 "source_dataset": "fma"
180 },
181 {
182 "song_id": "fma_00005",
183 "audio_path": "audio/fma_00005.wav",
184 "duration": 15.0,
185 "type": "reference",
186 "source_dataset": "fma"
187 },
188 {
189 "song_id": "fma_00006",
190 "audio_path": "audio/fma_00006.wav",
191 "duration": 15.0,
192 "type": "reference",
193 "source_dataset": "fma"
194 },
195 {
196 "song_id": "fma_00007",
197 "audio_path": "audio/fma_00007.wav",
198 "duration": 15.0,
199 "type": "reference",
200 "source_dataset": "fma"
201 },
202 {
203 "song_id": "fma_00008",
204 "audio_path": "audio/fma_00008.wav",
205 "duration": 15.0,
206 "type": "reference",
207 "source_dataset": "fma"
208 },
209 {
210 "song_id": "fma_00009",
211 "audio_path": "audio/fma_00009.wav",
212 "duration": 15.0,
213 "type": "reference",
214 "source_dataset": "fma"
215 },
216 {
217 "song_id": "fma_00010",
218 "audio_path": "audio/fma_00010.wav",
219 "duration": 15.0,
220 "type": "reference",
221 "source_dataset": "fma"
222 },
223 {
224 "song_id": "fma_00011",
225 "audio_path": "audio/fma_00011.wav",
226 "duration": 15.0,
227 "type": "reference",
228 "source_dataset": "fma"
229 },
230 {
231 "song_id": "fma_00012",
232 "audio_path": "audio/fma_00012.wav",
233 "duration": 15.0,
234 "type": "reference",
235 "source_dataset": "fma"
236 },
237 {
238 "song_id": "fma_00013",
239 "audio_path": "audio/fma_00013.wav",
240 "duration": 15.0,
241 "type": "reference",
242 "source_dataset": "fma"
243 },
244 {
245 "song_id": "fma_00014",
246 "audio_path": "audio/fma_00014.wav",
247 "duration": 15.0,
248 "type": "reference",
249 "source_dataset": "fma"
250 },
251 {
252 "song_id": "fma_00015",
253 "audio_path": "audio/fma_00015.wav",
254 "duration": 15.0,
255 "type": "reference",
256 "source_dataset": "fma"
257 },
258 {
259 "song_id": "fma_00016",
260 "audio_path": "audio/fma_00016.wav",
261 "duration": 15.0,
262 "type": "reference",
263 "source_dataset": "fma"
264 },
265 {
266 "song_id": "fma_00017",
267 "audio_path": "audio/fma_00017.wav",
268 "duration": 15.0,
269 "type": "reference",
270 "source_dataset": "fma"
271 },
272 {
273 "song_id": "fma_00018",
274 "audio_path": "audio/fma_00018.wav",
275 "duration": 15.0,
276 "type": "reference",
277 "source_dataset": "fma"
278 },
279 {
280 "song_id": "fma_00019",
281 "audio_path": "audio/fma_00019.wav",
282 "duration": 15.0,
283 "type": "reference",
284 "source_dataset": "fma"
285 },
286 {
287 "song_id": "fma_00020",
288 "audio_path": "audio/fma_00020.wav",
289 "duration": 15.0,
290 "type": "reference",
291 "source_dataset": "fma"
292 },
293 {
294 "song_id": "fma_00021",
295 "audio_path": "audio/fma_00021.wav",
296 "duration": 15.0,
297 "type": "reference",
298 "source_dataset": "fma"
299 },
300 {
301 "song_id": "fma_00022",
302 "audio_path": "audio/fma_00022.wav",
303 "duration": 15.0,
304 "type": "reference",
305 "source_dataset": "fma"
306 },
307 {
308 "song_id": "fma_00023",
309 "audio_path": "audio/fma_00023.wav",
310 "duration": 15.0,
311 "type": "reference",
312 "source_dataset": "fma"
313 }
314 ]
...\ No newline at end of file ...\ No newline at end of file
1 []
...\ No newline at end of file ...\ No newline at end of file
1 {
2 "fma_00001": 0,
3 "fma_00002": 1,
4 "fma_00005": 2,
5 "fma_00007": 3,
6 "fma_00008": 4,
7 "fma_00010": 5,
8 "fma_00012": 6,
9 "fma_00014": 7,
10 "fma_00015": 8,
11 "fma_00016": 9,
12 "fma_00017": 10,
13 "fma_00018": 11,
14 "fma_00019": 12,
15 "fma_00021": 13,
16 "fma_00022": 14,
17 "fma_00023": 15
18 }
...\ No newline at end of file ...\ No newline at end of file
1 {
2 "generated_at": "2026-06-02T05:04:14Z",
3 "model_version": "fma-smoke",
4 "data_version": "fma_local",
5 "files": {
6 "benchmark_report": "data/external_smoke/fma_reports_smoke/benchmark-report.md",
7 "model_card": "data/external_smoke/fma_reports_smoke/model-card.md",
8 "release_checklist": "data/external_smoke/fma_reports_smoke/release-checklist.md"
9 }
10 }
...\ No newline at end of file ...\ No newline at end of file
1 # Benchmark Report
2
3 ## 一页结论
4 - 模型版本:fma-smoke
5 - 数据版本:fma_local
6 - 核心结论:top1=1.0 top5=1.0
7 - 是否通过上线门禁:TBD
8
9 ## 1. 评测范围图
10
11 ```mermaid
12 flowchart LR
13 A[fma-smoke] --> B[fma_local]
14 A --> C[Scenario Buckets]
15 A --> D[Latency / Ops]
16 ```
17
18 ## 2. 指标表
19
20 | Bucket | top1 | top5 | MRR | FAR | Notes |
21 |---|---:|---:|---:|---:|---|
22 | clean | 1.0 | 1.0 | | | |
23
24 ## 3. 文字分析
25 - 最强项:clean/augmented buckets if present
26 - 最弱项:see hard-case summary
27 - 与上一版本对比:TBD
28
29 ## 4. 细节附录
30 - 原始 JSON 报告:embedded source
31
32 ## Sources
33 - docs/industrial-benchmark-spec.md
1 {
2 "model": {
3 "embed_dim": 192,
4 "channels": 512,
5 "n_mels": 128,
6 "use_band_split": true
7 },
8 "data": {
9 "source_dataset": "fma",
10 "manifests_dir": "data/external_smoke/fma/manifests",
11 "query_duration": 5.0
12 },
13 "run": {
14 "train_epochs": 1,
15 "batch_size": 2
16 }
17 }
...\ No newline at end of file ...\ No newline at end of file
1 {
2 "split": "test",
3 "num_queries": 8,
4 "top1": 1.0,
5 "topk": 1.0,
6 "by_type": {
7 "clean": {
8 "n": 8,
9 "top1": 1.0,
10 "topk": 1.0
11 }
12 },
13 "hard_case_summary": {},
14 "sample_failures": []
15 }
...\ No newline at end of file ...\ No newline at end of file
1 # Model Card
2
3 ## 一页结论
4 - 模型名称:ACR Hybrid Encoder
5 - 版本:fma-smoke
6 - 适用场景:music ACR prototype / retrieval
7 - 不适用场景:未经白名单数据验证的生产商用全量上线
8
9 ## 1. 模型结构图
10
11 ```mermaid
12 flowchart LR
13 A[Input Audio] --> B[128 Mel + BandSplit]
14 B --> C[Encoder]
15 C --> D[Embedding]
16 D --> E[Hybrid Retrieval]
17 ```
18
19 ## 2. 关键信息表
20
21 | 项 | 内容 |
22 |---|---|
23 | embed_dim | 192 |
24 | channels | 512 |
25 | n_mels | 128 |
26 | use_band_split | True |
27 | benchmark report | data/external_smoke/fma_reports_smoke/benchmark-report.md |
28
29 ## 3. 文字说明
30 - 训练方式:retrieval-oriented pair training
31 - 模型限制:hard-case accuracy still evolving
32 - 风险提示:requires whitelist-reviewed datasets for commercial deployment
33
34 ## 4. 细节附录
35 - config embedded from source JSON
36
37 ## Sources
38 - docs/dataset-spec.md
39 - docs/benchmark-report-template.md
1 # Release Checklist
2
3 ## 一页结论
4 发布前必须同时满足:质量通过、合规通过、服务通过、文档齐全。
5
6 ## 1. 发布门禁图
7
8 ```mermaid
9 flowchart TD
10 A[fma-smoke] --> B[Benchmark Pass]
11 A --> C[License Review Pass]
12 A --> D[Service Smoke Pass]
13 A --> E[Docs Complete]
14 ```
15
16 ## 2. Checklist 表
17
18 | 项目 | 状态 |
19 |---|---|
20 | benchmark report 已生成 | yes |
21 | model card 已生成 | yes |
22 | license registry 已更新 | pending |
23 | service smoke test 通过 | yes |
24 | dataset whitelist 已确认 | pending |
25 | changelog 已更新 | pending |
26
27 ## 3. 文字说明
28 - 当前用于工程治理与预发布检查,不代表已满足商用法律门槛。
29
30 ## 4. 细节附录
31 - benchmark 报告路径:data/external_smoke/fma_reports_smoke/benchmark-report.md
32 - model card 路径:data/external_smoke/fma_reports_smoke/model-card.md
33
34 ## Sources
35 - docs/dataset-sources-and-licensing.md
36 - docs/industrial-benchmark-spec.md
...@@ -221,6 +221,100 @@ def inspect_batch(pairs: List[str], eval_ratio: float, query_duration: float) -> ...@@ -221,6 +221,100 @@ def inspect_batch(pairs: List[str], eval_ratio: float, query_duration: float) ->
221 return {"datasets": results, "count": len(results)} 221 return {"datasets": results, "count": len(results)}
222 222
223 223
224 def smoke_local_dataset(
225 dataset: str,
226 input_dir: Path,
227 output_root: Path,
228 eval_ratio: float,
229 query_duration: float,
230 seed: int,
231 train_epochs: int,
232 batch_size: int,
233 ) -> Dict:
234 adapter = ADAPTERS[dataset]
235 inspect_summary = adapter.inspect_local_audio(input_dir, query_duration=query_duration, eval_ratio=eval_ratio)
236 prepare_summary = adapter.prepare_local_audio(
237 input_dir,
238 output_root / dataset,
239 eval_ratio=eval_ratio,
240 query_duration=query_duration,
241 seed=seed,
242 )
243 manifests_dir = Path(prepare_summary["output_dir"])
244 validate_summary = adapter.validate_local_manifests(manifests_dir)
245
246 model_dir = output_root / f"{dataset}_models_smoke"
247 index_dir = output_root / f"{dataset}_index_smoke"
248 report_dir = output_root / f"{dataset}_reports_smoke"
249 config_path = report_dir / "config.json"
250
251 subprocess.run([
252 "/usr/local/miniconda3/bin/python",
253 "train.py",
254 "--data", str(manifests_dir),
255 "--output", str(model_dir),
256 "--device", "cpu",
257 "--epochs", str(train_epochs),
258 "--batch-size", str(batch_size),
259 ], check=True)
260
261 subprocess.run([
262 "/usr/local/miniconda3/bin/python",
263 "run_demo.py",
264 "build-index",
265 "--data", str(manifests_dir),
266 "--model", str(model_dir / "best_model.pt"),
267 "--output", str(index_dir),
268 "--device", "cpu",
269 ], check=True)
270
271 report_dir.mkdir(parents=True, exist_ok=True)
272 eval_json = report_dir / "eval.json"
273 subprocess.run([
274 "/usr/local/miniconda3/bin/python",
275 "evaluate.py",
276 "--data", str(manifests_dir),
277 "--model", str(model_dir / "best_model.pt"),
278 "--index-prefix", str(index_dir / "reference"),
279 "--split", "test",
280 "--device", "cpu",
281 "--fast-eval",
282 "--output-json", str(eval_json),
283 ], check=True)
284
285 config = {
286 "model": {"embed_dim": 192, "channels": 512, "n_mels": 128, "use_band_split": True},
287 "data": {"source_dataset": dataset, "manifests_dir": str(manifests_dir), "query_duration": query_duration},
288 "run": {
289 "train_epochs": train_epochs,
290 "batch_size": batch_size,
291 },
292 }
293 report_dir.mkdir(parents=True, exist_ok=True)
294 config_path.write_text(json.dumps(config, indent=2))
295
296 subprocess.run([
297 "/usr/local/miniconda3/bin/python",
298 "scripts/generate_artifacts.py",
299 "--eval-json", str(eval_json),
300 "--config-json", str(config_path),
301 "--output-dir", str(report_dir),
302 "--model-version", f"{dataset}-smoke",
303 "--data-version", f"{dataset}_local",
304 ], check=True)
305
306 return {
307 "dataset": dataset,
308 "inspect": inspect_summary,
309 "prepare": prepare_summary,
310 "validate": validate_summary,
311 "model_dir": str(model_dir),
312 "index_dir": str(index_dir),
313 "report_dir": str(report_dir),
314 "eval_json": str(eval_json),
315 }
316
317
224 def main(): 318 def main():
225 parser = argparse.ArgumentParser() 319 parser = argparse.ArgumentParser()
226 sub = parser.add_subparsers(dest="cmd", required=True) 320 sub = parser.add_subparsers(dest="cmd", required=True)
...@@ -258,6 +352,16 @@ def main(): ...@@ -258,6 +352,16 @@ def main():
258 p.add_argument("dataset", choices=sorted(ADAPTERS)) 352 p.add_argument("dataset", choices=sorted(ADAPTERS))
259 p.add_argument("manifests_dir") 353 p.add_argument("manifests_dir")
260 354
355 p = sub.add_parser("smoke-local")
356 p.add_argument("dataset", choices=sorted(ADAPTERS))
357 p.add_argument("input_dir")
358 p.add_argument("--output-root", default="data/external_smoke")
359 p.add_argument("--eval-ratio", type=float, default=0.2)
360 p.add_argument("--query-duration", type=float, default=8.0)
361 p.add_argument("--seed", type=int, default=42)
362 p.add_argument("--train-epochs", type=int, default=1)
363 p.add_argument("--batch-size", type=int, default=2)
364
261 args = parser.parse_args() 365 args = parser.parse_args()
262 if args.cmd == "registry": 366 if args.cmd == "registry":
263 path = write_registry(args.output) 367 path = write_registry(args.output)
...@@ -290,6 +394,18 @@ def main(): ...@@ -290,6 +394,18 @@ def main():
290 elif args.cmd == "validate-local": 394 elif args.cmd == "validate-local":
291 summary = ADAPTERS[args.dataset].validate_local_manifests(Path(args.manifests_dir)) 395 summary = ADAPTERS[args.dataset].validate_local_manifests(Path(args.manifests_dir))
292 print(json.dumps(summary, indent=2, ensure_ascii=False)) 396 print(json.dumps(summary, indent=2, ensure_ascii=False))
397 elif args.cmd == "smoke-local":
398 summary = smoke_local_dataset(
399 dataset=args.dataset,
400 input_dir=Path(args.input_dir),
401 output_root=Path(args.output_root),
402 eval_ratio=args.eval_ratio,
403 query_duration=args.query_duration,
404 seed=args.seed,
405 train_epochs=args.train_epochs,
406 batch_size=args.batch_size,
407 )
408 print(json.dumps(summary, indent=2, ensure_ascii=False))
293 409
294 410
295 if __name__ == "__main__": 411 if __name__ == "__main__":
......
...@@ -115,6 +115,35 @@ ...@@ -115,6 +115,35 @@
115 - 现在开放数据链路已经不只是“能跑”,还具备基础发布/汇报产物 115 - 现在开放数据链路已经不只是“能跑”,还具备基础发布/汇报产物
116 - 下一步替换成真实 FMA / MTG-Jamendo 本地目录后,可直接复用同一 release 流程 116 - 下一步替换成真实 FMA / MTG-Jamendo 本地目录后,可直接复用同一 release 流程
117 117
118 ### Stage: 一键 open-dataset smoke
119
120 完成项:
121 - 扩展 `src/data/external_adapters.py`
122 - 新增 `smoke-local`
123 - 一条命令自动执行:
124 - inspect-local
125 - prepare-local
126 - validate-local
127 - train
128 - build-index
129 - evaluate
130 - generate_artifacts
131
132 验证结果:
133 - `/usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local fma data/synthetic_v2/songs --output-root data/external_smoke --eval-ratio 0.2 --query-duration 5.0 --train-epochs 1 --batch-size 2` 成功
134 - 当前结果:
135 - `num_audio_files=24`
136 - `catalog=24`
137 - `train_queries=16`
138 - `test_queries=8`
139 - `top1=1.0`
140 - `topk=1.0`
141 - 产物目录:`data/external_smoke/fma_reports_smoke`
142
143 结论:
144 - 现在只要替换 `input_dir`,就能对真实 FMA / MTG-Jamendo 本地目录跑完整 smoke
145 - 这显著降低了真实开放数据集接入和验证成本
146
118 ### Stage: confused 定向优化 v6(sample-level weighting) 147 ### Stage: confused 定向优化 v6(sample-level weighting)
119 148
120 完成项: 149 完成项:
......
...@@ -11,6 +11,7 @@ ...@@ -11,6 +11,7 @@
11 3. **validate-local** 11 3. **validate-local**
12 4. 再进入训练与评估 12 4. 再进入训练与评估
13 5. 生成 benchmark / model card / release artifacts 13 5. 生成 benchmark / model card / release artifacts
14 6. 或直接使用一键 `smoke-local`
14 15
15 --- 16 ---
16 17
...@@ -38,6 +39,7 @@ flowchart LR ...@@ -38,6 +39,7 @@ flowchart LR
38 | 训练前校验 | [`src/data/external_adapters.py`](../acr-engine/src/data/external_adapters.py) `validate-local ...` | 确认结构正确 | 39 | 训练前校验 | [`src/data/external_adapters.py`](../acr-engine/src/data/external_adapters.py) `validate-local ...` | 确认结构正确 |
39 | 训练 smoke | [`train.py`](../acr-engine/train.py) `--data ... --dry-run` | 验证 manifests 可直接进入训练 | 40 | 训练 smoke | [`train.py`](../acr-engine/train.py) `--data ... --dry-run` | 验证 manifests 可直接进入训练 |
40 | 发布制品 | [`scripts/generate_artifacts.py`](../acr-engine/scripts/generate_artifacts.py) | 生成 benchmark/model-card/release-checklist | 41 | 发布制品 | [`scripts/generate_artifacts.py`](../acr-engine/scripts/generate_artifacts.py) | 生成 benchmark/model-card/release-checklist |
42 | 一键 smoke | [`src/data/external_adapters.py`](../acr-engine/src/data/external_adapters.py) `smoke-local ...` | 自动跑完整链路 |
41 43
42 --- 44 ---
43 45
...@@ -61,6 +63,12 @@ flowchart LR ...@@ -61,6 +63,12 @@ flowchart LR
61 /usr/local/miniconda3/bin/python src/data/external_adapters.py inspect-batch fma=data/raw/fma_small_audio mtg_jamendo=data/raw/mtg_jamendo_audio --eval-ratio 0.2 --query-duration 8.0 63 /usr/local/miniconda3/bin/python src/data/external_adapters.py inspect-batch fma=data/raw/fma_small_audio mtg_jamendo=data/raw/mtg_jamendo_audio --eval-ratio 0.2 --query-duration 8.0
62 ``` 64 ```
63 65
66 ### 3.3 一键 smoke
67
68 ```bash
69 /usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local fma data/raw/fma_small_audio --output-root data/external_smoke --eval-ratio 0.2 --query-duration 8.0 --train-epochs 1 --batch-size 2
70 ```
71
64 --- 72 ---
65 73
66 ## 4. 输出物说明 74 ## 4. 输出物说明
...@@ -95,6 +103,8 @@ flowchart LR ...@@ -95,6 +103,8 @@ flowchart LR
95 - `benchmark-report.md` 103 - `benchmark-report.md`
96 - `model-card.md` 104 - `model-card.md`
97 - `release-checklist.md` 105 - `release-checklist.md`
106 - `smoke-local`
107 - 会一次性返回 inspect / prepare / validate / report 路径摘要
98 108
99 --- 109 ---
100 110
......