Commit 713425f5 713425f5e84bb3663bc54013327568c9bbecc4b3 by cnb.bofCdSsphPA

Record the transition from waiting on real FMA bytes to running a real smoke train

Constraint: The user asked for continuous staged commits, and the real milestone is the pipeline crossing from download-gated to actual dataset execution.
Rejected: Waiting for the entire smoke pipeline to finish before checkpointing | The phase transition itself is significant and already verified.
Confidence: high
Scope-risk: narrow
Directive: Keep the smoke run going, then checkpoint again with concrete train/index/eval results once the real-data pipeline completes.
Tested: Verified the archive reached full expected size, confirmed local FMA readiness with 8000 audio files and 7994 eligible queries, and observed the real smoke pipeline enter epoch-1 training with 6381 classes.
Not-tested: The full smoke pipeline outcome (final training artifact, index, and evaluation metrics) is still in progress.
1 parent 948d325d
...@@ -2,6 +2,39 @@ ...@@ -2,6 +2,39 @@
2 2
3 ## 2026-06-02 3 ## 2026-06-02
4 4
5 ### Stage: 真实 FMA 本地数据门槛打开并进入 smoke 训练
6
7 完成项:
8 - 复检归档下载状态,确认 `fma_small.zip` 已达完整字节数
9 - 验证本地 FMA 音频目录已可用于真实 smoke
10 - 直接启动真实 FMA `smoke-local`,进入训练/索引/评测主链路
11
12 验证结果:
13 - `/usr/local/miniconda3/bin/python scripts/prepare_fma_archive.py inspect` 返回:
14 - `archive_size=7679594875`
15 - `archive_progress_percent=100.0`
16 - `num_audio_files=3025`(inspect 阶段)
17 - 本地解压目录复检:
18 - `find data/raw/fma_small_audio ... | wc -l` 返回 `5827`
19 - `check-local-ready` / `inspect-local` 返回:
20 - `ready_for_smoke=true`
21 - `num_audio_files=8000`
22 - `eligible_query_files=7994`
23 - `recommended_train_queries=6395`
24 - `recommended_test_queries=1599`
25 - 真实 smoke 已启动:
26 - `/usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local fma data/raw/fma_small_audio --output-root data/external_smoke --eval-ratio 0.2 --query-duration 8.0 --train-epochs 1 --batch-size 2`
27 - 当前训练侧实时证据:
28 - `Device: cpu`
29 - `Classes: 6381`
30 - `Train songs: 6381`
31 - `Epoch 1` 已启动
32 - 当前 epoch 总 batch 数:`3191`
33
34 结论:
35 - 真实 FMA 数据下载门槛已正式打开
36 - 项目已从“等待真实数据”切换到“真实数据 smoke 正在执行”的阶段
37
5 ### Stage: 真实 FMA 下载超过八成半 38 ### Stage: 真实 FMA 下载超过八成半
6 39
7 完成项: 40 完成项:
...@@ -1287,6 +1320,39 @@ ...@@ -1287,6 +1320,39 @@
1287 1320
1288 ## 2026-06-02 1321 ## 2026-06-02
1289 1322
1323 ### Stage: 真实 FMA 本地数据门槛打开并进入 smoke 训练
1324
1325 完成项:
1326 - 复检归档下载状态,确认 `fma_small.zip` 已达完整字节数
1327 - 验证本地 FMA 音频目录已可用于真实 smoke
1328 - 直接启动真实 FMA `smoke-local`,进入训练/索引/评测主链路
1329
1330 验证结果:
1331 - `/usr/local/miniconda3/bin/python scripts/prepare_fma_archive.py inspect` 返回:
1332 - `archive_size=7679594875`
1333 - `archive_progress_percent=100.0`
1334 - `num_audio_files=3025`(inspect 阶段)
1335 - 本地解压目录复检:
1336 - `find data/raw/fma_small_audio ... | wc -l` 返回 `5827`
1337 - `check-local-ready` / `inspect-local` 返回:
1338 - `ready_for_smoke=true`
1339 - `num_audio_files=8000`
1340 - `eligible_query_files=7994`
1341 - `recommended_train_queries=6395`
1342 - `recommended_test_queries=1599`
1343 - 真实 smoke 已启动:
1344 - `/usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local fma data/raw/fma_small_audio --output-root data/external_smoke --eval-ratio 0.2 --query-duration 8.0 --train-epochs 1 --batch-size 2`
1345 - 当前训练侧实时证据:
1346 - `Device: cpu`
1347 - `Classes: 6381`
1348 - `Train songs: 6381`
1349 - `Epoch 1` 已启动
1350 - 当前 epoch 总 batch 数:`3191`
1351
1352 结论:
1353 - 真实 FMA 数据下载门槛已正式打开
1354 - 项目已从“等待真实数据”切换到“真实数据 smoke 正在执行”的阶段
1355
1290 ### Stage: 真实 FMA 下载超过八成半 1356 ### Stage: 真实 FMA 下载超过八成半
1291 1357
1292 完成项: 1358 完成项:
...@@ -1782,6 +1848,39 @@ ...@@ -1782,6 +1848,39 @@
1782 1848
1783 ## 2026-06-02 1849 ## 2026-06-02
1784 1850
1851 ### Stage: 真实 FMA 本地数据门槛打开并进入 smoke 训练
1852
1853 完成项:
1854 - 复检归档下载状态,确认 `fma_small.zip` 已达完整字节数
1855 - 验证本地 FMA 音频目录已可用于真实 smoke
1856 - 直接启动真实 FMA `smoke-local`,进入训练/索引/评测主链路
1857
1858 验证结果:
1859 - `/usr/local/miniconda3/bin/python scripts/prepare_fma_archive.py inspect` 返回:
1860 - `archive_size=7679594875`
1861 - `archive_progress_percent=100.0`
1862 - `num_audio_files=3025`(inspect 阶段)
1863 - 本地解压目录复检:
1864 - `find data/raw/fma_small_audio ... | wc -l` 返回 `5827`
1865 - `check-local-ready` / `inspect-local` 返回:
1866 - `ready_for_smoke=true`
1867 - `num_audio_files=8000`
1868 - `eligible_query_files=7994`
1869 - `recommended_train_queries=6395`
1870 - `recommended_test_queries=1599`
1871 - 真实 smoke 已启动:
1872 - `/usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local fma data/raw/fma_small_audio --output-root data/external_smoke --eval-ratio 0.2 --query-duration 8.0 --train-epochs 1 --batch-size 2`
1873 - 当前训练侧实时证据:
1874 - `Device: cpu`
1875 - `Classes: 6381`
1876 - `Train songs: 6381`
1877 - `Epoch 1` 已启动
1878 - 当前 epoch 总 batch 数:`3191`
1879
1880 结论:
1881 - 真实 FMA 数据下载门槛已正式打开
1882 - 项目已从“等待真实数据”切换到“真实数据 smoke 正在执行”的阶段
1883
1785 ### Stage: 真实 FMA 下载超过八成半 1884 ### Stage: 真实 FMA 下载超过八成半
1786 1885
1787 完成项: 1886 完成项:
...@@ -2287,6 +2386,39 @@ ...@@ -2287,6 +2386,39 @@
2287 2386
2288 ## 2026-06-02 2387 ## 2026-06-02
2289 2388
2389 ### Stage: 真实 FMA 本地数据门槛打开并进入 smoke 训练
2390
2391 完成项:
2392 - 复检归档下载状态,确认 `fma_small.zip` 已达完整字节数
2393 - 验证本地 FMA 音频目录已可用于真实 smoke
2394 - 直接启动真实 FMA `smoke-local`,进入训练/索引/评测主链路
2395
2396 验证结果:
2397 - `/usr/local/miniconda3/bin/python scripts/prepare_fma_archive.py inspect` 返回:
2398 - `archive_size=7679594875`
2399 - `archive_progress_percent=100.0`
2400 - `num_audio_files=3025`(inspect 阶段)
2401 - 本地解压目录复检:
2402 - `find data/raw/fma_small_audio ... | wc -l` 返回 `5827`
2403 - `check-local-ready` / `inspect-local` 返回:
2404 - `ready_for_smoke=true`
2405 - `num_audio_files=8000`
2406 - `eligible_query_files=7994`
2407 - `recommended_train_queries=6395`
2408 - `recommended_test_queries=1599`
2409 - 真实 smoke 已启动:
2410 - `/usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local fma data/raw/fma_small_audio --output-root data/external_smoke --eval-ratio 0.2 --query-duration 8.0 --train-epochs 1 --batch-size 2`
2411 - 当前训练侧实时证据:
2412 - `Device: cpu`
2413 - `Classes: 6381`
2414 - `Train songs: 6381`
2415 - `Epoch 1` 已启动
2416 - 当前 epoch 总 batch 数:`3191`
2417
2418 结论:
2419 - 真实 FMA 数据下载门槛已正式打开
2420 - 项目已从“等待真实数据”切换到“真实数据 smoke 正在执行”的阶段
2421
2290 ### Stage: 真实 FMA 下载超过八成半 2422 ### Stage: 真实 FMA 下载超过八成半
2291 2423
2292 完成项: 2424 完成项:
...@@ -2782,6 +2914,39 @@ ...@@ -2782,6 +2914,39 @@
2782 2914
2783 ## 2026-06-02 2915 ## 2026-06-02
2784 2916
2917 ### Stage: 真实 FMA 本地数据门槛打开并进入 smoke 训练
2918
2919 完成项:
2920 - 复检归档下载状态,确认 `fma_small.zip` 已达完整字节数
2921 - 验证本地 FMA 音频目录已可用于真实 smoke
2922 - 直接启动真实 FMA `smoke-local`,进入训练/索引/评测主链路
2923
2924 验证结果:
2925 - `/usr/local/miniconda3/bin/python scripts/prepare_fma_archive.py inspect` 返回:
2926 - `archive_size=7679594875`
2927 - `archive_progress_percent=100.0`
2928 - `num_audio_files=3025`(inspect 阶段)
2929 - 本地解压目录复检:
2930 - `find data/raw/fma_small_audio ... | wc -l` 返回 `5827`
2931 - `check-local-ready` / `inspect-local` 返回:
2932 - `ready_for_smoke=true`
2933 - `num_audio_files=8000`
2934 - `eligible_query_files=7994`
2935 - `recommended_train_queries=6395`
2936 - `recommended_test_queries=1599`
2937 - 真实 smoke 已启动:
2938 - `/usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local fma data/raw/fma_small_audio --output-root data/external_smoke --eval-ratio 0.2 --query-duration 8.0 --train-epochs 1 --batch-size 2`
2939 - 当前训练侧实时证据:
2940 - `Device: cpu`
2941 - `Classes: 6381`
2942 - `Train songs: 6381`
2943 - `Epoch 1` 已启动
2944 - 当前 epoch 总 batch 数:`3191`
2945
2946 结论:
2947 - 真实 FMA 数据下载门槛已正式打开
2948 - 项目已从“等待真实数据”切换到“真实数据 smoke 正在执行”的阶段
2949
2785 ### Stage: 真实 FMA 下载超过八成半 2950 ### Stage: 真实 FMA 下载超过八成半
2786 2951
2787 完成项: 2952 完成项:
...@@ -3275,6 +3440,39 @@ ...@@ -3275,6 +3440,39 @@
3275 3440
3276 ## 2026-06-02 3441 ## 2026-06-02
3277 3442
3443 ### Stage: 真实 FMA 本地数据门槛打开并进入 smoke 训练
3444
3445 完成项:
3446 - 复检归档下载状态,确认 `fma_small.zip` 已达完整字节数
3447 - 验证本地 FMA 音频目录已可用于真实 smoke
3448 - 直接启动真实 FMA `smoke-local`,进入训练/索引/评测主链路
3449
3450 验证结果:
3451 - `/usr/local/miniconda3/bin/python scripts/prepare_fma_archive.py inspect` 返回:
3452 - `archive_size=7679594875`
3453 - `archive_progress_percent=100.0`
3454 - `num_audio_files=3025`(inspect 阶段)
3455 - 本地解压目录复检:
3456 - `find data/raw/fma_small_audio ... | wc -l` 返回 `5827`
3457 - `check-local-ready` / `inspect-local` 返回:
3458 - `ready_for_smoke=true`
3459 - `num_audio_files=8000`
3460 - `eligible_query_files=7994`
3461 - `recommended_train_queries=6395`
3462 - `recommended_test_queries=1599`
3463 - 真实 smoke 已启动:
3464 - `/usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local fma data/raw/fma_small_audio --output-root data/external_smoke --eval-ratio 0.2 --query-duration 8.0 --train-epochs 1 --batch-size 2`
3465 - 当前训练侧实时证据:
3466 - `Device: cpu`
3467 - `Classes: 6381`
3468 - `Train songs: 6381`
3469 - `Epoch 1` 已启动
3470 - 当前 epoch 总 batch 数:`3191`
3471
3472 结论:
3473 - 真实 FMA 数据下载门槛已正式打开
3474 - 项目已从“等待真实数据”切换到“真实数据 smoke 正在执行”的阶段
3475
3278 ### Stage: 真实 FMA 下载超过八成半 3476 ### Stage: 真实 FMA 下载超过八成半
3279 3477
3280 完成项: 3478 完成项:
...@@ -3766,6 +3964,39 @@ ...@@ -3766,6 +3964,39 @@
3766 3964
3767 ## 2026-06-02 3965 ## 2026-06-02
3768 3966
3967 ### Stage: 真实 FMA 本地数据门槛打开并进入 smoke 训练
3968
3969 完成项:
3970 - 复检归档下载状态,确认 `fma_small.zip` 已达完整字节数
3971 - 验证本地 FMA 音频目录已可用于真实 smoke
3972 - 直接启动真实 FMA `smoke-local`,进入训练/索引/评测主链路
3973
3974 验证结果:
3975 - `/usr/local/miniconda3/bin/python scripts/prepare_fma_archive.py inspect` 返回:
3976 - `archive_size=7679594875`
3977 - `archive_progress_percent=100.0`
3978 - `num_audio_files=3025`(inspect 阶段)
3979 - 本地解压目录复检:
3980 - `find data/raw/fma_small_audio ... | wc -l` 返回 `5827`
3981 - `check-local-ready` / `inspect-local` 返回:
3982 - `ready_for_smoke=true`
3983 - `num_audio_files=8000`
3984 - `eligible_query_files=7994`
3985 - `recommended_train_queries=6395`
3986 - `recommended_test_queries=1599`
3987 - 真实 smoke 已启动:
3988 - `/usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local fma data/raw/fma_small_audio --output-root data/external_smoke --eval-ratio 0.2 --query-duration 8.0 --train-epochs 1 --batch-size 2`
3989 - 当前训练侧实时证据:
3990 - `Device: cpu`
3991 - `Classes: 6381`
3992 - `Train songs: 6381`
3993 - `Epoch 1` 已启动
3994 - 当前 epoch 总 batch 数:`3191`
3995
3996 结论:
3997 - 真实 FMA 数据下载门槛已正式打开
3998 - 项目已从“等待真实数据”切换到“真实数据 smoke 正在执行”的阶段
3999
3769 ### Stage: 真实 FMA 下载超过八成半 4000 ### Stage: 真实 FMA 下载超过八成半
3770 4001
3771 完成项: 4002 完成项:
...@@ -4262,6 +4493,39 @@ ...@@ -4262,6 +4493,39 @@
4262 4493
4263 ## 2026-06-02 4494 ## 2026-06-02
4264 4495
4496 ### Stage: 真实 FMA 本地数据门槛打开并进入 smoke 训练
4497
4498 完成项:
4499 - 复检归档下载状态,确认 `fma_small.zip` 已达完整字节数
4500 - 验证本地 FMA 音频目录已可用于真实 smoke
4501 - 直接启动真实 FMA `smoke-local`,进入训练/索引/评测主链路
4502
4503 验证结果:
4504 - `/usr/local/miniconda3/bin/python scripts/prepare_fma_archive.py inspect` 返回:
4505 - `archive_size=7679594875`
4506 - `archive_progress_percent=100.0`
4507 - `num_audio_files=3025`(inspect 阶段)
4508 - 本地解压目录复检:
4509 - `find data/raw/fma_small_audio ... | wc -l` 返回 `5827`
4510 - `check-local-ready` / `inspect-local` 返回:
4511 - `ready_for_smoke=true`
4512 - `num_audio_files=8000`
4513 - `eligible_query_files=7994`
4514 - `recommended_train_queries=6395`
4515 - `recommended_test_queries=1599`
4516 - 真实 smoke 已启动:
4517 - `/usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local fma data/raw/fma_small_audio --output-root data/external_smoke --eval-ratio 0.2 --query-duration 8.0 --train-epochs 1 --batch-size 2`
4518 - 当前训练侧实时证据:
4519 - `Device: cpu`
4520 - `Classes: 6381`
4521 - `Train songs: 6381`
4522 - `Epoch 1` 已启动
4523 - 当前 epoch 总 batch 数:`3191`
4524
4525 结论:
4526 - 真实 FMA 数据下载门槛已正式打开
4527 - 项目已从“等待真实数据”切换到“真实数据 smoke 正在执行”的阶段
4528
4265 ### Stage: 真实 FMA 下载超过八成半 4529 ### Stage: 真实 FMA 下载超过八成半
4266 4530
4267 完成项: 4531 完成项:
......