Automate the full open-dataset smoke workflow behind one command
Constraint: Real FMA or MTG-Jamendo onboarding should require only an input directory change, not a long manual command chain Rejected: Keep the smoke steps separate only | Slows repeated validation and increases operator error risk Confidence: high Scope-risk: moderate Directive: Use smoke-local as the default first-pass validation path for every new local open-music corpus Tested: /usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local fma data/synthetic_v2/songs --output-root data/external_smoke --eval-ratio 0.2 --query-duration 5.0 --train-epochs 1 --batch-size 2; /usr/local/miniconda3/bin/python -m py_compile src/data/external_adapters.py src/data/manifest_tools.py train.py run_demo.py evaluate.py scripts/generate_artifacts.py Not-tested: Real downloaded FMA or MTG-Jamendo directories on larger-scale smoke runs
Showing
42 changed files
with
1050 additions
and
0 deletions
No preview for this file type
No preview for this file type
No preview for this file type
No preview for this file type
No preview for this file type
No preview for this file type
No preview for this file type
No preview for this file type
No preview for this file type
No preview for this file type
No preview for this file type
No preview for this file type
No preview for this file type
No preview for this file type
No preview for this file type
No preview for this file type
No preview for this file type
No preview for this file type
No preview for this file type
No preview for this file type
No preview for this file type
No preview for this file type
No preview for this file type
No preview for this file type
| 1 | [ | ||
| 2 | { | ||
| 3 | "song_id": "fma_00000", | ||
| 4 | "audio_path": "audio/fma_00000.wav", | ||
| 5 | "duration": 15.0, | ||
| 6 | "type": "reference", | ||
| 7 | "source_dataset": "fma" | ||
| 8 | }, | ||
| 9 | { | ||
| 10 | "song_id": "fma_00001", | ||
| 11 | "audio_path": "audio/fma_00001.wav", | ||
| 12 | "duration": 15.0, | ||
| 13 | "type": "reference", | ||
| 14 | "source_dataset": "fma" | ||
| 15 | }, | ||
| 16 | { | ||
| 17 | "song_id": "fma_00002", | ||
| 18 | "audio_path": "audio/fma_00002.wav", | ||
| 19 | "duration": 15.0, | ||
| 20 | "type": "reference", | ||
| 21 | "source_dataset": "fma" | ||
| 22 | }, | ||
| 23 | { | ||
| 24 | "song_id": "fma_00003", | ||
| 25 | "audio_path": "audio/fma_00003.wav", | ||
| 26 | "duration": 15.0, | ||
| 27 | "type": "reference", | ||
| 28 | "source_dataset": "fma" | ||
| 29 | }, | ||
| 30 | { | ||
| 31 | "song_id": "fma_00004", | ||
| 32 | "audio_path": "audio/fma_00004.wav", | ||
| 33 | "duration": 15.0, | ||
| 34 | "type": "reference", | ||
| 35 | "source_dataset": "fma" | ||
| 36 | }, | ||
| 37 | { | ||
| 38 | "song_id": "fma_00005", | ||
| 39 | "audio_path": "audio/fma_00005.wav", | ||
| 40 | "duration": 15.0, | ||
| 41 | "type": "reference", | ||
| 42 | "source_dataset": "fma" | ||
| 43 | }, | ||
| 44 | { | ||
| 45 | "song_id": "fma_00006", | ||
| 46 | "audio_path": "audio/fma_00006.wav", | ||
| 47 | "duration": 15.0, | ||
| 48 | "type": "reference", | ||
| 49 | "source_dataset": "fma" | ||
| 50 | }, | ||
| 51 | { | ||
| 52 | "song_id": "fma_00007", | ||
| 53 | "audio_path": "audio/fma_00007.wav", | ||
| 54 | "duration": 15.0, | ||
| 55 | "type": "reference", | ||
| 56 | "source_dataset": "fma" | ||
| 57 | }, | ||
| 58 | { | ||
| 59 | "song_id": "fma_00008", | ||
| 60 | "audio_path": "audio/fma_00008.wav", | ||
| 61 | "duration": 15.0, | ||
| 62 | "type": "reference", | ||
| 63 | "source_dataset": "fma" | ||
| 64 | }, | ||
| 65 | { | ||
| 66 | "song_id": "fma_00009", | ||
| 67 | "audio_path": "audio/fma_00009.wav", | ||
| 68 | "duration": 15.0, | ||
| 69 | "type": "reference", | ||
| 70 | "source_dataset": "fma" | ||
| 71 | }, | ||
| 72 | { | ||
| 73 | "song_id": "fma_00010", | ||
| 74 | "audio_path": "audio/fma_00010.wav", | ||
| 75 | "duration": 15.0, | ||
| 76 | "type": "reference", | ||
| 77 | "source_dataset": "fma" | ||
| 78 | }, | ||
| 79 | { | ||
| 80 | "song_id": "fma_00011", | ||
| 81 | "audio_path": "audio/fma_00011.wav", | ||
| 82 | "duration": 15.0, | ||
| 83 | "type": "reference", | ||
| 84 | "source_dataset": "fma" | ||
| 85 | }, | ||
| 86 | { | ||
| 87 | "song_id": "fma_00012", | ||
| 88 | "audio_path": "audio/fma_00012.wav", | ||
| 89 | "duration": 15.0, | ||
| 90 | "type": "reference", | ||
| 91 | "source_dataset": "fma" | ||
| 92 | }, | ||
| 93 | { | ||
| 94 | "song_id": "fma_00013", | ||
| 95 | "audio_path": "audio/fma_00013.wav", | ||
| 96 | "duration": 15.0, | ||
| 97 | "type": "reference", | ||
| 98 | "source_dataset": "fma" | ||
| 99 | }, | ||
| 100 | { | ||
| 101 | "song_id": "fma_00014", | ||
| 102 | "audio_path": "audio/fma_00014.wav", | ||
| 103 | "duration": 15.0, | ||
| 104 | "type": "reference", | ||
| 105 | "source_dataset": "fma" | ||
| 106 | }, | ||
| 107 | { | ||
| 108 | "song_id": "fma_00015", | ||
| 109 | "audio_path": "audio/fma_00015.wav", | ||
| 110 | "duration": 15.0, | ||
| 111 | "type": "reference", | ||
| 112 | "source_dataset": "fma" | ||
| 113 | }, | ||
| 114 | { | ||
| 115 | "song_id": "fma_00016", | ||
| 116 | "audio_path": "audio/fma_00016.wav", | ||
| 117 | "duration": 15.0, | ||
| 118 | "type": "reference", | ||
| 119 | "source_dataset": "fma" | ||
| 120 | }, | ||
| 121 | { | ||
| 122 | "song_id": "fma_00017", | ||
| 123 | "audio_path": "audio/fma_00017.wav", | ||
| 124 | "duration": 15.0, | ||
| 125 | "type": "reference", | ||
| 126 | "source_dataset": "fma" | ||
| 127 | }, | ||
| 128 | { | ||
| 129 | "song_id": "fma_00018", | ||
| 130 | "audio_path": "audio/fma_00018.wav", | ||
| 131 | "duration": 15.0, | ||
| 132 | "type": "reference", | ||
| 133 | "source_dataset": "fma" | ||
| 134 | }, | ||
| 135 | { | ||
| 136 | "song_id": "fma_00019", | ||
| 137 | "audio_path": "audio/fma_00019.wav", | ||
| 138 | "duration": 15.0, | ||
| 139 | "type": "reference", | ||
| 140 | "source_dataset": "fma" | ||
| 141 | }, | ||
| 142 | { | ||
| 143 | "song_id": "fma_00020", | ||
| 144 | "audio_path": "audio/fma_00020.wav", | ||
| 145 | "duration": 15.0, | ||
| 146 | "type": "reference", | ||
| 147 | "source_dataset": "fma" | ||
| 148 | }, | ||
| 149 | { | ||
| 150 | "song_id": "fma_00021", | ||
| 151 | "audio_path": "audio/fma_00021.wav", | ||
| 152 | "duration": 15.0, | ||
| 153 | "type": "reference", | ||
| 154 | "source_dataset": "fma" | ||
| 155 | }, | ||
| 156 | { | ||
| 157 | "song_id": "fma_00022", | ||
| 158 | "audio_path": "audio/fma_00022.wav", | ||
| 159 | "duration": 15.0, | ||
| 160 | "type": "reference", | ||
| 161 | "source_dataset": "fma" | ||
| 162 | }, | ||
| 163 | { | ||
| 164 | "song_id": "fma_00023", | ||
| 165 | "audio_path": "audio/fma_00023.wav", | ||
| 166 | "duration": 15.0, | ||
| 167 | "type": "reference", | ||
| 168 | "source_dataset": "fma" | ||
| 169 | } | ||
| 170 | ] | ||
| ... | \ No newline at end of file | ... | \ No newline at end of file |
| 1 | [ | ||
| 2 | { | ||
| 3 | "song_id": "fma_00000", | ||
| 4 | "audio_path": "audio/fma_00000.wav", | ||
| 5 | "duration": 5.0, | ||
| 6 | "type": "clean", | ||
| 7 | "offset": 6.394, | ||
| 8 | "segment_type": "external_query", | ||
| 9 | "source_dataset": "fma" | ||
| 10 | }, | ||
| 11 | { | ||
| 12 | "song_id": "fma_00003", | ||
| 13 | "audio_path": "audio/fma_00003.wav", | ||
| 14 | "duration": 5.0, | ||
| 15 | "type": "clean", | ||
| 16 | "offset": 8.922, | ||
| 17 | "segment_type": "external_query", | ||
| 18 | "source_dataset": "fma" | ||
| 19 | }, | ||
| 20 | { | ||
| 21 | "song_id": "fma_00004", | ||
| 22 | "audio_path": "audio/fma_00004.wav", | ||
| 23 | "duration": 5.0, | ||
| 24 | "type": "clean", | ||
| 25 | "offset": 4.219, | ||
| 26 | "segment_type": "external_query", | ||
| 27 | "source_dataset": "fma" | ||
| 28 | }, | ||
| 29 | { | ||
| 30 | "song_id": "fma_00006", | ||
| 31 | "audio_path": "audio/fma_00006.wav", | ||
| 32 | "duration": 5.0, | ||
| 33 | "type": "clean", | ||
| 34 | "offset": 0.265, | ||
| 35 | "segment_type": "external_query", | ||
| 36 | "source_dataset": "fma" | ||
| 37 | }, | ||
| 38 | { | ||
| 39 | "song_id": "fma_00009", | ||
| 40 | "audio_path": "audio/fma_00009.wav", | ||
| 41 | "duration": 5.0, | ||
| 42 | "type": "clean", | ||
| 43 | "offset": 8.094, | ||
| 44 | "segment_type": "external_query", | ||
| 45 | "source_dataset": "fma" | ||
| 46 | }, | ||
| 47 | { | ||
| 48 | "song_id": "fma_00011", | ||
| 49 | "audio_path": "audio/fma_00011.wav", | ||
| 50 | "duration": 5.0, | ||
| 51 | "type": "clean", | ||
| 52 | "offset": 3.403, | ||
| 53 | "segment_type": "external_query", | ||
| 54 | "source_dataset": "fma" | ||
| 55 | }, | ||
| 56 | { | ||
| 57 | "song_id": "fma_00013", | ||
| 58 | "audio_path": "audio/fma_00013.wav", | ||
| 59 | "duration": 5.0, | ||
| 60 | "type": "clean", | ||
| 61 | "offset": 0.927, | ||
| 62 | "segment_type": "external_query", | ||
| 63 | "source_dataset": "fma" | ||
| 64 | }, | ||
| 65 | { | ||
| 66 | "song_id": "fma_00020", | ||
| 67 | "audio_path": "audio/fma_00020.wav", | ||
| 68 | "duration": 5.0, | ||
| 69 | "type": "clean", | ||
| 70 | "offset": 7.046, | ||
| 71 | "segment_type": "external_query", | ||
| 72 | "source_dataset": "fma" | ||
| 73 | }, | ||
| 74 | { | ||
| 75 | "song_id": "fma_00000", | ||
| 76 | "audio_path": "audio/fma_00000.wav", | ||
| 77 | "duration": 15.0, | ||
| 78 | "type": "reference", | ||
| 79 | "source_dataset": "fma" | ||
| 80 | }, | ||
| 81 | { | ||
| 82 | "song_id": "fma_00001", | ||
| 83 | "audio_path": "audio/fma_00001.wav", | ||
| 84 | "duration": 15.0, | ||
| 85 | "type": "reference", | ||
| 86 | "source_dataset": "fma" | ||
| 87 | }, | ||
| 88 | { | ||
| 89 | "song_id": "fma_00002", | ||
| 90 | "audio_path": "audio/fma_00002.wav", | ||
| 91 | "duration": 15.0, | ||
| 92 | "type": "reference", | ||
| 93 | "source_dataset": "fma" | ||
| 94 | }, | ||
| 95 | { | ||
| 96 | "song_id": "fma_00003", | ||
| 97 | "audio_path": "audio/fma_00003.wav", | ||
| 98 | "duration": 15.0, | ||
| 99 | "type": "reference", | ||
| 100 | "source_dataset": "fma" | ||
| 101 | }, | ||
| 102 | { | ||
| 103 | "song_id": "fma_00004", | ||
| 104 | "audio_path": "audio/fma_00004.wav", | ||
| 105 | "duration": 15.0, | ||
| 106 | "type": "reference", | ||
| 107 | "source_dataset": "fma" | ||
| 108 | }, | ||
| 109 | { | ||
| 110 | "song_id": "fma_00005", | ||
| 111 | "audio_path": "audio/fma_00005.wav", | ||
| 112 | "duration": 15.0, | ||
| 113 | "type": "reference", | ||
| 114 | "source_dataset": "fma" | ||
| 115 | }, | ||
| 116 | { | ||
| 117 | "song_id": "fma_00006", | ||
| 118 | "audio_path": "audio/fma_00006.wav", | ||
| 119 | "duration": 15.0, | ||
| 120 | "type": "reference", | ||
| 121 | "source_dataset": "fma" | ||
| 122 | }, | ||
| 123 | { | ||
| 124 | "song_id": "fma_00007", | ||
| 125 | "audio_path": "audio/fma_00007.wav", | ||
| 126 | "duration": 15.0, | ||
| 127 | "type": "reference", | ||
| 128 | "source_dataset": "fma" | ||
| 129 | }, | ||
| 130 | { | ||
| 131 | "song_id": "fma_00008", | ||
| 132 | "audio_path": "audio/fma_00008.wav", | ||
| 133 | "duration": 15.0, | ||
| 134 | "type": "reference", | ||
| 135 | "source_dataset": "fma" | ||
| 136 | }, | ||
| 137 | { | ||
| 138 | "song_id": "fma_00009", | ||
| 139 | "audio_path": "audio/fma_00009.wav", | ||
| 140 | "duration": 15.0, | ||
| 141 | "type": "reference", | ||
| 142 | "source_dataset": "fma" | ||
| 143 | }, | ||
| 144 | { | ||
| 145 | "song_id": "fma_00010", | ||
| 146 | "audio_path": "audio/fma_00010.wav", | ||
| 147 | "duration": 15.0, | ||
| 148 | "type": "reference", | ||
| 149 | "source_dataset": "fma" | ||
| 150 | }, | ||
| 151 | { | ||
| 152 | "song_id": "fma_00011", | ||
| 153 | "audio_path": "audio/fma_00011.wav", | ||
| 154 | "duration": 15.0, | ||
| 155 | "type": "reference", | ||
| 156 | "source_dataset": "fma" | ||
| 157 | }, | ||
| 158 | { | ||
| 159 | "song_id": "fma_00012", | ||
| 160 | "audio_path": "audio/fma_00012.wav", | ||
| 161 | "duration": 15.0, | ||
| 162 | "type": "reference", | ||
| 163 | "source_dataset": "fma" | ||
| 164 | }, | ||
| 165 | { | ||
| 166 | "song_id": "fma_00013", | ||
| 167 | "audio_path": "audio/fma_00013.wav", | ||
| 168 | "duration": 15.0, | ||
| 169 | "type": "reference", | ||
| 170 | "source_dataset": "fma" | ||
| 171 | }, | ||
| 172 | { | ||
| 173 | "song_id": "fma_00014", | ||
| 174 | "audio_path": "audio/fma_00014.wav", | ||
| 175 | "duration": 15.0, | ||
| 176 | "type": "reference", | ||
| 177 | "source_dataset": "fma" | ||
| 178 | }, | ||
| 179 | { | ||
| 180 | "song_id": "fma_00015", | ||
| 181 | "audio_path": "audio/fma_00015.wav", | ||
| 182 | "duration": 15.0, | ||
| 183 | "type": "reference", | ||
| 184 | "source_dataset": "fma" | ||
| 185 | }, | ||
| 186 | { | ||
| 187 | "song_id": "fma_00016", | ||
| 188 | "audio_path": "audio/fma_00016.wav", | ||
| 189 | "duration": 15.0, | ||
| 190 | "type": "reference", | ||
| 191 | "source_dataset": "fma" | ||
| 192 | }, | ||
| 193 | { | ||
| 194 | "song_id": "fma_00017", | ||
| 195 | "audio_path": "audio/fma_00017.wav", | ||
| 196 | "duration": 15.0, | ||
| 197 | "type": "reference", | ||
| 198 | "source_dataset": "fma" | ||
| 199 | }, | ||
| 200 | { | ||
| 201 | "song_id": "fma_00018", | ||
| 202 | "audio_path": "audio/fma_00018.wav", | ||
| 203 | "duration": 15.0, | ||
| 204 | "type": "reference", | ||
| 205 | "source_dataset": "fma" | ||
| 206 | }, | ||
| 207 | { | ||
| 208 | "song_id": "fma_00019", | ||
| 209 | "audio_path": "audio/fma_00019.wav", | ||
| 210 | "duration": 15.0, | ||
| 211 | "type": "reference", | ||
| 212 | "source_dataset": "fma" | ||
| 213 | }, | ||
| 214 | { | ||
| 215 | "song_id": "fma_00020", | ||
| 216 | "audio_path": "audio/fma_00020.wav", | ||
| 217 | "duration": 15.0, | ||
| 218 | "type": "reference", | ||
| 219 | "source_dataset": "fma" | ||
| 220 | }, | ||
| 221 | { | ||
| 222 | "song_id": "fma_00021", | ||
| 223 | "audio_path": "audio/fma_00021.wav", | ||
| 224 | "duration": 15.0, | ||
| 225 | "type": "reference", | ||
| 226 | "source_dataset": "fma" | ||
| 227 | }, | ||
| 228 | { | ||
| 229 | "song_id": "fma_00022", | ||
| 230 | "audio_path": "audio/fma_00022.wav", | ||
| 231 | "duration": 15.0, | ||
| 232 | "type": "reference", | ||
| 233 | "source_dataset": "fma" | ||
| 234 | }, | ||
| 235 | { | ||
| 236 | "song_id": "fma_00023", | ||
| 237 | "audio_path": "audio/fma_00023.wav", | ||
| 238 | "duration": 15.0, | ||
| 239 | "type": "reference", | ||
| 240 | "source_dataset": "fma" | ||
| 241 | } | ||
| 242 | ] | ||
| ... | \ No newline at end of file | ... | \ No newline at end of file |
| 1 | [ | ||
| 2 | { | ||
| 3 | "song_id": "fma_00001", | ||
| 4 | "audio_path": "audio/fma_00001.wav", | ||
| 5 | "duration": 5.0, | ||
| 6 | "type": "clean", | ||
| 7 | "offset": 2.75, | ||
| 8 | "segment_type": "external_query", | ||
| 9 | "source_dataset": "fma" | ||
| 10 | }, | ||
| 11 | { | ||
| 12 | "song_id": "fma_00002", | ||
| 13 | "audio_path": "audio/fma_00002.wav", | ||
| 14 | "duration": 5.0, | ||
| 15 | "type": "clean", | ||
| 16 | "offset": 7.365, | ||
| 17 | "segment_type": "external_query", | ||
| 18 | "source_dataset": "fma" | ||
| 19 | }, | ||
| 20 | { | ||
| 21 | "song_id": "fma_00005", | ||
| 22 | "audio_path": "audio/fma_00005.wav", | ||
| 23 | "duration": 5.0, | ||
| 24 | "type": "clean", | ||
| 25 | "offset": 2.186, | ||
| 26 | "segment_type": "external_query", | ||
| 27 | "source_dataset": "fma" | ||
| 28 | }, | ||
| 29 | { | ||
| 30 | "song_id": "fma_00007", | ||
| 31 | "audio_path": "audio/fma_00007.wav", | ||
| 32 | "duration": 5.0, | ||
| 33 | "type": "clean", | ||
| 34 | "offset": 6.499, | ||
| 35 | "segment_type": "external_query", | ||
| 36 | "source_dataset": "fma" | ||
| 37 | }, | ||
| 38 | { | ||
| 39 | "song_id": "fma_00008", | ||
| 40 | "audio_path": "audio/fma_00008.wav", | ||
| 41 | "duration": 5.0, | ||
| 42 | "type": "clean", | ||
| 43 | "offset": 2.204, | ||
| 44 | "segment_type": "external_query", | ||
| 45 | "source_dataset": "fma" | ||
| 46 | }, | ||
| 47 | { | ||
| 48 | "song_id": "fma_00010", | ||
| 49 | "audio_path": "audio/fma_00010.wav", | ||
| 50 | "duration": 5.0, | ||
| 51 | "type": "clean", | ||
| 52 | "offset": 8.058, | ||
| 53 | "segment_type": "external_query", | ||
| 54 | "source_dataset": "fma" | ||
| 55 | }, | ||
| 56 | { | ||
| 57 | "song_id": "fma_00012", | ||
| 58 | "audio_path": "audio/fma_00012.wav", | ||
| 59 | "duration": 5.0, | ||
| 60 | "type": "clean", | ||
| 61 | "offset": 9.572, | ||
| 62 | "segment_type": "external_query", | ||
| 63 | "source_dataset": "fma" | ||
| 64 | }, | ||
| 65 | { | ||
| 66 | "song_id": "fma_00014", | ||
| 67 | "audio_path": "audio/fma_00014.wav", | ||
| 68 | "duration": 5.0, | ||
| 69 | "type": "clean", | ||
| 70 | "offset": 8.475, | ||
| 71 | "segment_type": "external_query", | ||
| 72 | "source_dataset": "fma" | ||
| 73 | }, | ||
| 74 | { | ||
| 75 | "song_id": "fma_00015", | ||
| 76 | "audio_path": "audio/fma_00015.wav", | ||
| 77 | "duration": 5.0, | ||
| 78 | "type": "clean", | ||
| 79 | "offset": 8.071, | ||
| 80 | "segment_type": "external_query", | ||
| 81 | "source_dataset": "fma" | ||
| 82 | }, | ||
| 83 | { | ||
| 84 | "song_id": "fma_00016", | ||
| 85 | "audio_path": "audio/fma_00016.wav", | ||
| 86 | "duration": 5.0, | ||
| 87 | "type": "clean", | ||
| 88 | "offset": 5.362, | ||
| 89 | "segment_type": "external_query", | ||
| 90 | "source_dataset": "fma" | ||
| 91 | }, | ||
| 92 | { | ||
| 93 | "song_id": "fma_00017", | ||
| 94 | "audio_path": "audio/fma_00017.wav", | ||
| 95 | "duration": 5.0, | ||
| 96 | "type": "clean", | ||
| 97 | "offset": 3.785, | ||
| 98 | "segment_type": "external_query", | ||
| 99 | "source_dataset": "fma" | ||
| 100 | }, | ||
| 101 | { | ||
| 102 | "song_id": "fma_00018", | ||
| 103 | "audio_path": "audio/fma_00018.wav", | ||
| 104 | "duration": 5.0, | ||
| 105 | "type": "clean", | ||
| 106 | "offset": 8.294, | ||
| 107 | "segment_type": "external_query", | ||
| 108 | "source_dataset": "fma" | ||
| 109 | }, | ||
| 110 | { | ||
| 111 | "song_id": "fma_00019", | ||
| 112 | "audio_path": "audio/fma_00019.wav", | ||
| 113 | "duration": 5.0, | ||
| 114 | "type": "clean", | ||
| 115 | "offset": 8.617, | ||
| 116 | "segment_type": "external_query", | ||
| 117 | "source_dataset": "fma" | ||
| 118 | }, | ||
| 119 | { | ||
| 120 | "song_id": "fma_00021", | ||
| 121 | "audio_path": "audio/fma_00021.wav", | ||
| 122 | "duration": 5.0, | ||
| 123 | "type": "clean", | ||
| 124 | "offset": 2.279, | ||
| 125 | "segment_type": "external_query", | ||
| 126 | "source_dataset": "fma" | ||
| 127 | }, | ||
| 128 | { | ||
| 129 | "song_id": "fma_00022", | ||
| 130 | "audio_path": "audio/fma_00022.wav", | ||
| 131 | "duration": 5.0, | ||
| 132 | "type": "clean", | ||
| 133 | "offset": 0.798, | ||
| 134 | "segment_type": "external_query", | ||
| 135 | "source_dataset": "fma" | ||
| 136 | }, | ||
| 137 | { | ||
| 138 | "song_id": "fma_00023", | ||
| 139 | "audio_path": "audio/fma_00023.wav", | ||
| 140 | "duration": 5.0, | ||
| 141 | "type": "clean", | ||
| 142 | "offset": 1.01, | ||
| 143 | "segment_type": "external_query", | ||
| 144 | "source_dataset": "fma" | ||
| 145 | }, | ||
| 146 | { | ||
| 147 | "song_id": "fma_00000", | ||
| 148 | "audio_path": "audio/fma_00000.wav", | ||
| 149 | "duration": 15.0, | ||
| 150 | "type": "reference", | ||
| 151 | "source_dataset": "fma" | ||
| 152 | }, | ||
| 153 | { | ||
| 154 | "song_id": "fma_00001", | ||
| 155 | "audio_path": "audio/fma_00001.wav", | ||
| 156 | "duration": 15.0, | ||
| 157 | "type": "reference", | ||
| 158 | "source_dataset": "fma" | ||
| 159 | }, | ||
| 160 | { | ||
| 161 | "song_id": "fma_00002", | ||
| 162 | "audio_path": "audio/fma_00002.wav", | ||
| 163 | "duration": 15.0, | ||
| 164 | "type": "reference", | ||
| 165 | "source_dataset": "fma" | ||
| 166 | }, | ||
| 167 | { | ||
| 168 | "song_id": "fma_00003", | ||
| 169 | "audio_path": "audio/fma_00003.wav", | ||
| 170 | "duration": 15.0, | ||
| 171 | "type": "reference", | ||
| 172 | "source_dataset": "fma" | ||
| 173 | }, | ||
| 174 | { | ||
| 175 | "song_id": "fma_00004", | ||
| 176 | "audio_path": "audio/fma_00004.wav", | ||
| 177 | "duration": 15.0, | ||
| 178 | "type": "reference", | ||
| 179 | "source_dataset": "fma" | ||
| 180 | }, | ||
| 181 | { | ||
| 182 | "song_id": "fma_00005", | ||
| 183 | "audio_path": "audio/fma_00005.wav", | ||
| 184 | "duration": 15.0, | ||
| 185 | "type": "reference", | ||
| 186 | "source_dataset": "fma" | ||
| 187 | }, | ||
| 188 | { | ||
| 189 | "song_id": "fma_00006", | ||
| 190 | "audio_path": "audio/fma_00006.wav", | ||
| 191 | "duration": 15.0, | ||
| 192 | "type": "reference", | ||
| 193 | "source_dataset": "fma" | ||
| 194 | }, | ||
| 195 | { | ||
| 196 | "song_id": "fma_00007", | ||
| 197 | "audio_path": "audio/fma_00007.wav", | ||
| 198 | "duration": 15.0, | ||
| 199 | "type": "reference", | ||
| 200 | "source_dataset": "fma" | ||
| 201 | }, | ||
| 202 | { | ||
| 203 | "song_id": "fma_00008", | ||
| 204 | "audio_path": "audio/fma_00008.wav", | ||
| 205 | "duration": 15.0, | ||
| 206 | "type": "reference", | ||
| 207 | "source_dataset": "fma" | ||
| 208 | }, | ||
| 209 | { | ||
| 210 | "song_id": "fma_00009", | ||
| 211 | "audio_path": "audio/fma_00009.wav", | ||
| 212 | "duration": 15.0, | ||
| 213 | "type": "reference", | ||
| 214 | "source_dataset": "fma" | ||
| 215 | }, | ||
| 216 | { | ||
| 217 | "song_id": "fma_00010", | ||
| 218 | "audio_path": "audio/fma_00010.wav", | ||
| 219 | "duration": 15.0, | ||
| 220 | "type": "reference", | ||
| 221 | "source_dataset": "fma" | ||
| 222 | }, | ||
| 223 | { | ||
| 224 | "song_id": "fma_00011", | ||
| 225 | "audio_path": "audio/fma_00011.wav", | ||
| 226 | "duration": 15.0, | ||
| 227 | "type": "reference", | ||
| 228 | "source_dataset": "fma" | ||
| 229 | }, | ||
| 230 | { | ||
| 231 | "song_id": "fma_00012", | ||
| 232 | "audio_path": "audio/fma_00012.wav", | ||
| 233 | "duration": 15.0, | ||
| 234 | "type": "reference", | ||
| 235 | "source_dataset": "fma" | ||
| 236 | }, | ||
| 237 | { | ||
| 238 | "song_id": "fma_00013", | ||
| 239 | "audio_path": "audio/fma_00013.wav", | ||
| 240 | "duration": 15.0, | ||
| 241 | "type": "reference", | ||
| 242 | "source_dataset": "fma" | ||
| 243 | }, | ||
| 244 | { | ||
| 245 | "song_id": "fma_00014", | ||
| 246 | "audio_path": "audio/fma_00014.wav", | ||
| 247 | "duration": 15.0, | ||
| 248 | "type": "reference", | ||
| 249 | "source_dataset": "fma" | ||
| 250 | }, | ||
| 251 | { | ||
| 252 | "song_id": "fma_00015", | ||
| 253 | "audio_path": "audio/fma_00015.wav", | ||
| 254 | "duration": 15.0, | ||
| 255 | "type": "reference", | ||
| 256 | "source_dataset": "fma" | ||
| 257 | }, | ||
| 258 | { | ||
| 259 | "song_id": "fma_00016", | ||
| 260 | "audio_path": "audio/fma_00016.wav", | ||
| 261 | "duration": 15.0, | ||
| 262 | "type": "reference", | ||
| 263 | "source_dataset": "fma" | ||
| 264 | }, | ||
| 265 | { | ||
| 266 | "song_id": "fma_00017", | ||
| 267 | "audio_path": "audio/fma_00017.wav", | ||
| 268 | "duration": 15.0, | ||
| 269 | "type": "reference", | ||
| 270 | "source_dataset": "fma" | ||
| 271 | }, | ||
| 272 | { | ||
| 273 | "song_id": "fma_00018", | ||
| 274 | "audio_path": "audio/fma_00018.wav", | ||
| 275 | "duration": 15.0, | ||
| 276 | "type": "reference", | ||
| 277 | "source_dataset": "fma" | ||
| 278 | }, | ||
| 279 | { | ||
| 280 | "song_id": "fma_00019", | ||
| 281 | "audio_path": "audio/fma_00019.wav", | ||
| 282 | "duration": 15.0, | ||
| 283 | "type": "reference", | ||
| 284 | "source_dataset": "fma" | ||
| 285 | }, | ||
| 286 | { | ||
| 287 | "song_id": "fma_00020", | ||
| 288 | "audio_path": "audio/fma_00020.wav", | ||
| 289 | "duration": 15.0, | ||
| 290 | "type": "reference", | ||
| 291 | "source_dataset": "fma" | ||
| 292 | }, | ||
| 293 | { | ||
| 294 | "song_id": "fma_00021", | ||
| 295 | "audio_path": "audio/fma_00021.wav", | ||
| 296 | "duration": 15.0, | ||
| 297 | "type": "reference", | ||
| 298 | "source_dataset": "fma" | ||
| 299 | }, | ||
| 300 | { | ||
| 301 | "song_id": "fma_00022", | ||
| 302 | "audio_path": "audio/fma_00022.wav", | ||
| 303 | "duration": 15.0, | ||
| 304 | "type": "reference", | ||
| 305 | "source_dataset": "fma" | ||
| 306 | }, | ||
| 307 | { | ||
| 308 | "song_id": "fma_00023", | ||
| 309 | "audio_path": "audio/fma_00023.wav", | ||
| 310 | "duration": 15.0, | ||
| 311 | "type": "reference", | ||
| 312 | "source_dataset": "fma" | ||
| 313 | } | ||
| 314 | ] | ||
| ... | \ No newline at end of file | ... | \ No newline at end of file |
| 1 | [] | ||
| ... | \ No newline at end of file | ... | \ No newline at end of file |
No preview for this file type
No preview for this file type
No preview for this file type
This file is too large to display.
| 1 | { | ||
| 2 | "fma_00001": 0, | ||
| 3 | "fma_00002": 1, | ||
| 4 | "fma_00005": 2, | ||
| 5 | "fma_00007": 3, | ||
| 6 | "fma_00008": 4, | ||
| 7 | "fma_00010": 5, | ||
| 8 | "fma_00012": 6, | ||
| 9 | "fma_00014": 7, | ||
| 10 | "fma_00015": 8, | ||
| 11 | "fma_00016": 9, | ||
| 12 | "fma_00017": 10, | ||
| 13 | "fma_00018": 11, | ||
| 14 | "fma_00019": 12, | ||
| 15 | "fma_00021": 13, | ||
| 16 | "fma_00022": 14, | ||
| 17 | "fma_00023": 15 | ||
| 18 | } | ||
| ... | \ No newline at end of file | ... | \ No newline at end of file |
| 1 | { | ||
| 2 | "generated_at": "2026-06-02T05:04:14Z", | ||
| 3 | "model_version": "fma-smoke", | ||
| 4 | "data_version": "fma_local", | ||
| 5 | "files": { | ||
| 6 | "benchmark_report": "data/external_smoke/fma_reports_smoke/benchmark-report.md", | ||
| 7 | "model_card": "data/external_smoke/fma_reports_smoke/model-card.md", | ||
| 8 | "release_checklist": "data/external_smoke/fma_reports_smoke/release-checklist.md" | ||
| 9 | } | ||
| 10 | } | ||
| ... | \ No newline at end of file | ... | \ No newline at end of file |
| 1 | # Benchmark Report | ||
| 2 | |||
| 3 | ## 一页结论 | ||
| 4 | - 模型版本:fma-smoke | ||
| 5 | - 数据版本:fma_local | ||
| 6 | - 核心结论:top1=1.0 top5=1.0 | ||
| 7 | - 是否通过上线门禁:TBD | ||
| 8 | |||
| 9 | ## 1. 评测范围图 | ||
| 10 | |||
| 11 | ```mermaid | ||
| 12 | flowchart LR | ||
| 13 | A[fma-smoke] --> B[fma_local] | ||
| 14 | A --> C[Scenario Buckets] | ||
| 15 | A --> D[Latency / Ops] | ||
| 16 | ``` | ||
| 17 | |||
| 18 | ## 2. 指标表 | ||
| 19 | |||
| 20 | | Bucket | top1 | top5 | MRR | FAR | Notes | | ||
| 21 | |---|---:|---:|---:|---:|---| | ||
| 22 | | clean | 1.0 | 1.0 | | | | | ||
| 23 | |||
| 24 | ## 3. 文字分析 | ||
| 25 | - 最强项:clean/augmented buckets if present | ||
| 26 | - 最弱项:see hard-case summary | ||
| 27 | - 与上一版本对比:TBD | ||
| 28 | |||
| 29 | ## 4. 细节附录 | ||
| 30 | - 原始 JSON 报告:embedded source | ||
| 31 | |||
| 32 | ## Sources | ||
| 33 | - docs/industrial-benchmark-spec.md |
| 1 | { | ||
| 2 | "model": { | ||
| 3 | "embed_dim": 192, | ||
| 4 | "channels": 512, | ||
| 5 | "n_mels": 128, | ||
| 6 | "use_band_split": true | ||
| 7 | }, | ||
| 8 | "data": { | ||
| 9 | "source_dataset": "fma", | ||
| 10 | "manifests_dir": "data/external_smoke/fma/manifests", | ||
| 11 | "query_duration": 5.0 | ||
| 12 | }, | ||
| 13 | "run": { | ||
| 14 | "train_epochs": 1, | ||
| 15 | "batch_size": 2 | ||
| 16 | } | ||
| 17 | } | ||
| ... | \ No newline at end of file | ... | \ No newline at end of file |
| 1 | # Model Card | ||
| 2 | |||
| 3 | ## 一页结论 | ||
| 4 | - 模型名称:ACR Hybrid Encoder | ||
| 5 | - 版本:fma-smoke | ||
| 6 | - 适用场景:music ACR prototype / retrieval | ||
| 7 | - 不适用场景:未经白名单数据验证的生产商用全量上线 | ||
| 8 | |||
| 9 | ## 1. 模型结构图 | ||
| 10 | |||
| 11 | ```mermaid | ||
| 12 | flowchart LR | ||
| 13 | A[Input Audio] --> B[128 Mel + BandSplit] | ||
| 14 | B --> C[Encoder] | ||
| 15 | C --> D[Embedding] | ||
| 16 | D --> E[Hybrid Retrieval] | ||
| 17 | ``` | ||
| 18 | |||
| 19 | ## 2. 关键信息表 | ||
| 20 | |||
| 21 | | 项 | 内容 | | ||
| 22 | |---|---| | ||
| 23 | | embed_dim | 192 | | ||
| 24 | | channels | 512 | | ||
| 25 | | n_mels | 128 | | ||
| 26 | | use_band_split | True | | ||
| 27 | | benchmark report | data/external_smoke/fma_reports_smoke/benchmark-report.md | | ||
| 28 | |||
| 29 | ## 3. 文字说明 | ||
| 30 | - 训练方式:retrieval-oriented pair training | ||
| 31 | - 模型限制:hard-case accuracy still evolving | ||
| 32 | - 风险提示:requires whitelist-reviewed datasets for commercial deployment | ||
| 33 | |||
| 34 | ## 4. 细节附录 | ||
| 35 | - config embedded from source JSON | ||
| 36 | |||
| 37 | ## Sources | ||
| 38 | - docs/dataset-spec.md | ||
| 39 | - docs/benchmark-report-template.md |
| 1 | # Release Checklist | ||
| 2 | |||
| 3 | ## 一页结论 | ||
| 4 | 发布前必须同时满足:质量通过、合规通过、服务通过、文档齐全。 | ||
| 5 | |||
| 6 | ## 1. 发布门禁图 | ||
| 7 | |||
| 8 | ```mermaid | ||
| 9 | flowchart TD | ||
| 10 | A[fma-smoke] --> B[Benchmark Pass] | ||
| 11 | A --> C[License Review Pass] | ||
| 12 | A --> D[Service Smoke Pass] | ||
| 13 | A --> E[Docs Complete] | ||
| 14 | ``` | ||
| 15 | |||
| 16 | ## 2. Checklist 表 | ||
| 17 | |||
| 18 | | 项目 | 状态 | | ||
| 19 | |---|---| | ||
| 20 | | benchmark report 已生成 | yes | | ||
| 21 | | model card 已生成 | yes | | ||
| 22 | | license registry 已更新 | pending | | ||
| 23 | | service smoke test 通过 | yes | | ||
| 24 | | dataset whitelist 已确认 | pending | | ||
| 25 | | changelog 已更新 | pending | | ||
| 26 | |||
| 27 | ## 3. 文字说明 | ||
| 28 | - 当前用于工程治理与预发布检查,不代表已满足商用法律门槛。 | ||
| 29 | |||
| 30 | ## 4. 细节附录 | ||
| 31 | - benchmark 报告路径:data/external_smoke/fma_reports_smoke/benchmark-report.md | ||
| 32 | - model card 路径:data/external_smoke/fma_reports_smoke/model-card.md | ||
| 33 | |||
| 34 | ## Sources | ||
| 35 | - docs/dataset-sources-and-licensing.md | ||
| 36 | - docs/industrial-benchmark-spec.md |
| ... | @@ -221,6 +221,100 @@ def inspect_batch(pairs: List[str], eval_ratio: float, query_duration: float) -> | ... | @@ -221,6 +221,100 @@ def inspect_batch(pairs: List[str], eval_ratio: float, query_duration: float) -> |
| 221 | return {"datasets": results, "count": len(results)} | 221 | return {"datasets": results, "count": len(results)} |
| 222 | 222 | ||
| 223 | 223 | ||
| 224 | def smoke_local_dataset( | ||
| 225 | dataset: str, | ||
| 226 | input_dir: Path, | ||
| 227 | output_root: Path, | ||
| 228 | eval_ratio: float, | ||
| 229 | query_duration: float, | ||
| 230 | seed: int, | ||
| 231 | train_epochs: int, | ||
| 232 | batch_size: int, | ||
| 233 | ) -> Dict: | ||
| 234 | adapter = ADAPTERS[dataset] | ||
| 235 | inspect_summary = adapter.inspect_local_audio(input_dir, query_duration=query_duration, eval_ratio=eval_ratio) | ||
| 236 | prepare_summary = adapter.prepare_local_audio( | ||
| 237 | input_dir, | ||
| 238 | output_root / dataset, | ||
| 239 | eval_ratio=eval_ratio, | ||
| 240 | query_duration=query_duration, | ||
| 241 | seed=seed, | ||
| 242 | ) | ||
| 243 | manifests_dir = Path(prepare_summary["output_dir"]) | ||
| 244 | validate_summary = adapter.validate_local_manifests(manifests_dir) | ||
| 245 | |||
| 246 | model_dir = output_root / f"{dataset}_models_smoke" | ||
| 247 | index_dir = output_root / f"{dataset}_index_smoke" | ||
| 248 | report_dir = output_root / f"{dataset}_reports_smoke" | ||
| 249 | config_path = report_dir / "config.json" | ||
| 250 | |||
| 251 | subprocess.run([ | ||
| 252 | "/usr/local/miniconda3/bin/python", | ||
| 253 | "train.py", | ||
| 254 | "--data", str(manifests_dir), | ||
| 255 | "--output", str(model_dir), | ||
| 256 | "--device", "cpu", | ||
| 257 | "--epochs", str(train_epochs), | ||
| 258 | "--batch-size", str(batch_size), | ||
| 259 | ], check=True) | ||
| 260 | |||
| 261 | subprocess.run([ | ||
| 262 | "/usr/local/miniconda3/bin/python", | ||
| 263 | "run_demo.py", | ||
| 264 | "build-index", | ||
| 265 | "--data", str(manifests_dir), | ||
| 266 | "--model", str(model_dir / "best_model.pt"), | ||
| 267 | "--output", str(index_dir), | ||
| 268 | "--device", "cpu", | ||
| 269 | ], check=True) | ||
| 270 | |||
| 271 | report_dir.mkdir(parents=True, exist_ok=True) | ||
| 272 | eval_json = report_dir / "eval.json" | ||
| 273 | subprocess.run([ | ||
| 274 | "/usr/local/miniconda3/bin/python", | ||
| 275 | "evaluate.py", | ||
| 276 | "--data", str(manifests_dir), | ||
| 277 | "--model", str(model_dir / "best_model.pt"), | ||
| 278 | "--index-prefix", str(index_dir / "reference"), | ||
| 279 | "--split", "test", | ||
| 280 | "--device", "cpu", | ||
| 281 | "--fast-eval", | ||
| 282 | "--output-json", str(eval_json), | ||
| 283 | ], check=True) | ||
| 284 | |||
| 285 | config = { | ||
| 286 | "model": {"embed_dim": 192, "channels": 512, "n_mels": 128, "use_band_split": True}, | ||
| 287 | "data": {"source_dataset": dataset, "manifests_dir": str(manifests_dir), "query_duration": query_duration}, | ||
| 288 | "run": { | ||
| 289 | "train_epochs": train_epochs, | ||
| 290 | "batch_size": batch_size, | ||
| 291 | }, | ||
| 292 | } | ||
| 293 | report_dir.mkdir(parents=True, exist_ok=True) | ||
| 294 | config_path.write_text(json.dumps(config, indent=2)) | ||
| 295 | |||
| 296 | subprocess.run([ | ||
| 297 | "/usr/local/miniconda3/bin/python", | ||
| 298 | "scripts/generate_artifacts.py", | ||
| 299 | "--eval-json", str(eval_json), | ||
| 300 | "--config-json", str(config_path), | ||
| 301 | "--output-dir", str(report_dir), | ||
| 302 | "--model-version", f"{dataset}-smoke", | ||
| 303 | "--data-version", f"{dataset}_local", | ||
| 304 | ], check=True) | ||
| 305 | |||
| 306 | return { | ||
| 307 | "dataset": dataset, | ||
| 308 | "inspect": inspect_summary, | ||
| 309 | "prepare": prepare_summary, | ||
| 310 | "validate": validate_summary, | ||
| 311 | "model_dir": str(model_dir), | ||
| 312 | "index_dir": str(index_dir), | ||
| 313 | "report_dir": str(report_dir), | ||
| 314 | "eval_json": str(eval_json), | ||
| 315 | } | ||
| 316 | |||
| 317 | |||
| 224 | def main(): | 318 | def main(): |
| 225 | parser = argparse.ArgumentParser() | 319 | parser = argparse.ArgumentParser() |
| 226 | sub = parser.add_subparsers(dest="cmd", required=True) | 320 | sub = parser.add_subparsers(dest="cmd", required=True) |
| ... | @@ -258,6 +352,16 @@ def main(): | ... | @@ -258,6 +352,16 @@ def main(): |
| 258 | p.add_argument("dataset", choices=sorted(ADAPTERS)) | 352 | p.add_argument("dataset", choices=sorted(ADAPTERS)) |
| 259 | p.add_argument("manifests_dir") | 353 | p.add_argument("manifests_dir") |
| 260 | 354 | ||
| 355 | p = sub.add_parser("smoke-local") | ||
| 356 | p.add_argument("dataset", choices=sorted(ADAPTERS)) | ||
| 357 | p.add_argument("input_dir") | ||
| 358 | p.add_argument("--output-root", default="data/external_smoke") | ||
| 359 | p.add_argument("--eval-ratio", type=float, default=0.2) | ||
| 360 | p.add_argument("--query-duration", type=float, default=8.0) | ||
| 361 | p.add_argument("--seed", type=int, default=42) | ||
| 362 | p.add_argument("--train-epochs", type=int, default=1) | ||
| 363 | p.add_argument("--batch-size", type=int, default=2) | ||
| 364 | |||
| 261 | args = parser.parse_args() | 365 | args = parser.parse_args() |
| 262 | if args.cmd == "registry": | 366 | if args.cmd == "registry": |
| 263 | path = write_registry(args.output) | 367 | path = write_registry(args.output) |
| ... | @@ -290,6 +394,18 @@ def main(): | ... | @@ -290,6 +394,18 @@ def main(): |
| 290 | elif args.cmd == "validate-local": | 394 | elif args.cmd == "validate-local": |
| 291 | summary = ADAPTERS[args.dataset].validate_local_manifests(Path(args.manifests_dir)) | 395 | summary = ADAPTERS[args.dataset].validate_local_manifests(Path(args.manifests_dir)) |
| 292 | print(json.dumps(summary, indent=2, ensure_ascii=False)) | 396 | print(json.dumps(summary, indent=2, ensure_ascii=False)) |
| 397 | elif args.cmd == "smoke-local": | ||
| 398 | summary = smoke_local_dataset( | ||
| 399 | dataset=args.dataset, | ||
| 400 | input_dir=Path(args.input_dir), | ||
| 401 | output_root=Path(args.output_root), | ||
| 402 | eval_ratio=args.eval_ratio, | ||
| 403 | query_duration=args.query_duration, | ||
| 404 | seed=args.seed, | ||
| 405 | train_epochs=args.train_epochs, | ||
| 406 | batch_size=args.batch_size, | ||
| 407 | ) | ||
| 408 | print(json.dumps(summary, indent=2, ensure_ascii=False)) | ||
| 293 | 409 | ||
| 294 | 410 | ||
| 295 | if __name__ == "__main__": | 411 | if __name__ == "__main__": | ... | ... |
| ... | @@ -115,6 +115,35 @@ | ... | @@ -115,6 +115,35 @@ |
| 115 | - 现在开放数据链路已经不只是“能跑”,还具备基础发布/汇报产物 | 115 | - 现在开放数据链路已经不只是“能跑”,还具备基础发布/汇报产物 |
| 116 | - 下一步替换成真实 FMA / MTG-Jamendo 本地目录后,可直接复用同一 release 流程 | 116 | - 下一步替换成真实 FMA / MTG-Jamendo 本地目录后,可直接复用同一 release 流程 |
| 117 | 117 | ||
| 118 | ### Stage: 一键 open-dataset smoke | ||
| 119 | |||
| 120 | 完成项: | ||
| 121 | - 扩展 `src/data/external_adapters.py` | ||
| 122 | - 新增 `smoke-local` | ||
| 123 | - 一条命令自动执行: | ||
| 124 | - inspect-local | ||
| 125 | - prepare-local | ||
| 126 | - validate-local | ||
| 127 | - train | ||
| 128 | - build-index | ||
| 129 | - evaluate | ||
| 130 | - generate_artifacts | ||
| 131 | |||
| 132 | 验证结果: | ||
| 133 | - `/usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local fma data/synthetic_v2/songs --output-root data/external_smoke --eval-ratio 0.2 --query-duration 5.0 --train-epochs 1 --batch-size 2` 成功 | ||
| 134 | - 当前结果: | ||
| 135 | - `num_audio_files=24` | ||
| 136 | - `catalog=24` | ||
| 137 | - `train_queries=16` | ||
| 138 | - `test_queries=8` | ||
| 139 | - `top1=1.0` | ||
| 140 | - `topk=1.0` | ||
| 141 | - 产物目录:`data/external_smoke/fma_reports_smoke` | ||
| 142 | |||
| 143 | 结论: | ||
| 144 | - 现在只要替换 `input_dir`,就能对真实 FMA / MTG-Jamendo 本地目录跑完整 smoke | ||
| 145 | - 这显著降低了真实开放数据集接入和验证成本 | ||
| 146 | |||
| 118 | ### Stage: confused 定向优化 v6(sample-level weighting) | 147 | ### Stage: confused 定向优化 v6(sample-level weighting) |
| 119 | 148 | ||
| 120 | 完成项: | 149 | 完成项: | ... | ... |
| ... | @@ -11,6 +11,7 @@ | ... | @@ -11,6 +11,7 @@ |
| 11 | 3. **validate-local** | 11 | 3. **validate-local** |
| 12 | 4. 再进入训练与评估 | 12 | 4. 再进入训练与评估 |
| 13 | 5. 生成 benchmark / model card / release artifacts | 13 | 5. 生成 benchmark / model card / release artifacts |
| 14 | 6. 或直接使用一键 `smoke-local` | ||
| 14 | 15 | ||
| 15 | --- | 16 | --- |
| 16 | 17 | ||
| ... | @@ -38,6 +39,7 @@ flowchart LR | ... | @@ -38,6 +39,7 @@ flowchart LR |
| 38 | | 训练前校验 | [`src/data/external_adapters.py`](../acr-engine/src/data/external_adapters.py) `validate-local ...` | 确认结构正确 | | 39 | | 训练前校验 | [`src/data/external_adapters.py`](../acr-engine/src/data/external_adapters.py) `validate-local ...` | 确认结构正确 | |
| 39 | | 训练 smoke | [`train.py`](../acr-engine/train.py) `--data ... --dry-run` | 验证 manifests 可直接进入训练 | | 40 | | 训练 smoke | [`train.py`](../acr-engine/train.py) `--data ... --dry-run` | 验证 manifests 可直接进入训练 | |
| 40 | | 发布制品 | [`scripts/generate_artifacts.py`](../acr-engine/scripts/generate_artifacts.py) | 生成 benchmark/model-card/release-checklist | | 41 | | 发布制品 | [`scripts/generate_artifacts.py`](../acr-engine/scripts/generate_artifacts.py) | 生成 benchmark/model-card/release-checklist | |
| 42 | | 一键 smoke | [`src/data/external_adapters.py`](../acr-engine/src/data/external_adapters.py) `smoke-local ...` | 自动跑完整链路 | | ||
| 41 | 43 | ||
| 42 | --- | 44 | --- |
| 43 | 45 | ||
| ... | @@ -61,6 +63,12 @@ flowchart LR | ... | @@ -61,6 +63,12 @@ flowchart LR |
| 61 | /usr/local/miniconda3/bin/python src/data/external_adapters.py inspect-batch fma=data/raw/fma_small_audio mtg_jamendo=data/raw/mtg_jamendo_audio --eval-ratio 0.2 --query-duration 8.0 | 63 | /usr/local/miniconda3/bin/python src/data/external_adapters.py inspect-batch fma=data/raw/fma_small_audio mtg_jamendo=data/raw/mtg_jamendo_audio --eval-ratio 0.2 --query-duration 8.0 |
| 62 | ``` | 64 | ``` |
| 63 | 65 | ||
| 66 | ### 3.3 一键 smoke | ||
| 67 | |||
| 68 | ```bash | ||
| 69 | /usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local fma data/raw/fma_small_audio --output-root data/external_smoke --eval-ratio 0.2 --query-duration 8.0 --train-epochs 1 --batch-size 2 | ||
| 70 | ``` | ||
| 71 | |||
| 64 | --- | 72 | --- |
| 65 | 73 | ||
| 66 | ## 4. 输出物说明 | 74 | ## 4. 输出物说明 |
| ... | @@ -95,6 +103,8 @@ flowchart LR | ... | @@ -95,6 +103,8 @@ flowchart LR |
| 95 | - `benchmark-report.md` | 103 | - `benchmark-report.md` |
| 96 | - `model-card.md` | 104 | - `model-card.md` |
| 97 | - `release-checklist.md` | 105 | - `release-checklist.md` |
| 106 | - `smoke-local`: | ||
| 107 | - 会一次性返回 inspect / prepare / validate / report 路径摘要 | ||
| 98 | 108 | ||
| 99 | --- | 109 | --- |
| 100 | 110 | ... | ... |
-
Please register or sign in to post a comment