Commit 713425f5 713425f5e84bb3663bc54013327568c9bbecc4b3 by cnb.bofCdSsphPA

Record the transition from waiting on real FMA bytes to running a real smoke train

Constraint: The user asked for continuous staged commits, and the real milestone is the pipeline crossing from download-gated to actual dataset execution.
Rejected: Waiting for the entire smoke pipeline to finish before checkpointing | The phase transition itself is significant and already verified.
Confidence: high
Scope-risk: narrow
Directive: Keep the smoke run going, then checkpoint again with concrete train/index/eval results once the real-data pipeline completes.
Tested: Verified the archive reached full expected size, confirmed local FMA readiness with 8000 audio files and 7994 eligible queries, and observed the real smoke pipeline enter epoch-1 training with 6381 classes.
Not-tested: The full smoke pipeline outcome (final training artifact, index, and evaluation metrics) is still in progress.
1 parent 948d325d
......@@ -2,6 +2,39 @@
## 2026-06-02
### Stage: 真实 FMA 本地数据门槛打开并进入 smoke 训练
完成项:
- 复检归档下载状态,确认 `fma_small.zip` 已达完整字节数
- 验证本地 FMA 音频目录已可用于真实 smoke
- 直接启动真实 FMA `smoke-local`,进入训练/索引/评测主链路
验证结果:
- `/usr/local/miniconda3/bin/python scripts/prepare_fma_archive.py inspect` 返回:
- `archive_size=7679594875`
- `archive_progress_percent=100.0`
- `num_audio_files=3025`(inspect 阶段)
- 本地解压目录复检:
- `find data/raw/fma_small_audio ... | wc -l` 返回 `5827`
- `check-local-ready` / `inspect-local` 返回:
- `ready_for_smoke=true`
- `num_audio_files=8000`
- `eligible_query_files=7994`
- `recommended_train_queries=6395`
- `recommended_test_queries=1599`
- 真实 smoke 已启动:
- `/usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local fma data/raw/fma_small_audio --output-root data/external_smoke --eval-ratio 0.2 --query-duration 8.0 --train-epochs 1 --batch-size 2`
- 当前训练侧实时证据:
- `Device: cpu`
- `Classes: 6381`
- `Train songs: 6381`
- `Epoch 1` 已启动
- 当前 epoch 总 batch 数:`3191`
结论:
- 真实 FMA 数据下载门槛已正式打开
- 项目已从“等待真实数据”切换到“真实数据 smoke 正在执行”的阶段
### Stage: 真实 FMA 下载超过八成半
完成项:
......@@ -1287,6 +1320,39 @@
## 2026-06-02
### Stage: 真实 FMA 本地数据门槛打开并进入 smoke 训练
完成项:
- 复检归档下载状态,确认 `fma_small.zip` 已达完整字节数
- 验证本地 FMA 音频目录已可用于真实 smoke
- 直接启动真实 FMA `smoke-local`,进入训练/索引/评测主链路
验证结果:
- `/usr/local/miniconda3/bin/python scripts/prepare_fma_archive.py inspect` 返回:
- `archive_size=7679594875`
- `archive_progress_percent=100.0`
- `num_audio_files=3025`(inspect 阶段)
- 本地解压目录复检:
- `find data/raw/fma_small_audio ... | wc -l` 返回 `5827`
- `check-local-ready` / `inspect-local` 返回:
- `ready_for_smoke=true`
- `num_audio_files=8000`
- `eligible_query_files=7994`
- `recommended_train_queries=6395`
- `recommended_test_queries=1599`
- 真实 smoke 已启动:
- `/usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local fma data/raw/fma_small_audio --output-root data/external_smoke --eval-ratio 0.2 --query-duration 8.0 --train-epochs 1 --batch-size 2`
- 当前训练侧实时证据:
- `Device: cpu`
- `Classes: 6381`
- `Train songs: 6381`
- `Epoch 1` 已启动
- 当前 epoch 总 batch 数:`3191`
结论:
- 真实 FMA 数据下载门槛已正式打开
- 项目已从“等待真实数据”切换到“真实数据 smoke 正在执行”的阶段
### Stage: 真实 FMA 下载超过八成半
完成项:
......@@ -1782,6 +1848,39 @@
## 2026-06-02
### Stage: 真实 FMA 本地数据门槛打开并进入 smoke 训练
完成项:
- 复检归档下载状态,确认 `fma_small.zip` 已达完整字节数
- 验证本地 FMA 音频目录已可用于真实 smoke
- 直接启动真实 FMA `smoke-local`,进入训练/索引/评测主链路
验证结果:
- `/usr/local/miniconda3/bin/python scripts/prepare_fma_archive.py inspect` 返回:
- `archive_size=7679594875`
- `archive_progress_percent=100.0`
- `num_audio_files=3025`(inspect 阶段)
- 本地解压目录复检:
- `find data/raw/fma_small_audio ... | wc -l` 返回 `5827`
- `check-local-ready` / `inspect-local` 返回:
- `ready_for_smoke=true`
- `num_audio_files=8000`
- `eligible_query_files=7994`
- `recommended_train_queries=6395`
- `recommended_test_queries=1599`
- 真实 smoke 已启动:
- `/usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local fma data/raw/fma_small_audio --output-root data/external_smoke --eval-ratio 0.2 --query-duration 8.0 --train-epochs 1 --batch-size 2`
- 当前训练侧实时证据:
- `Device: cpu`
- `Classes: 6381`
- `Train songs: 6381`
- `Epoch 1` 已启动
- 当前 epoch 总 batch 数:`3191`
结论:
- 真实 FMA 数据下载门槛已正式打开
- 项目已从“等待真实数据”切换到“真实数据 smoke 正在执行”的阶段
### Stage: 真实 FMA 下载超过八成半
完成项:
......@@ -2287,6 +2386,39 @@
## 2026-06-02
### Stage: 真实 FMA 本地数据门槛打开并进入 smoke 训练
完成项:
- 复检归档下载状态,确认 `fma_small.zip` 已达完整字节数
- 验证本地 FMA 音频目录已可用于真实 smoke
- 直接启动真实 FMA `smoke-local`,进入训练/索引/评测主链路
验证结果:
- `/usr/local/miniconda3/bin/python scripts/prepare_fma_archive.py inspect` 返回:
- `archive_size=7679594875`
- `archive_progress_percent=100.0`
- `num_audio_files=3025`(inspect 阶段)
- 本地解压目录复检:
- `find data/raw/fma_small_audio ... | wc -l` 返回 `5827`
- `check-local-ready` / `inspect-local` 返回:
- `ready_for_smoke=true`
- `num_audio_files=8000`
- `eligible_query_files=7994`
- `recommended_train_queries=6395`
- `recommended_test_queries=1599`
- 真实 smoke 已启动:
- `/usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local fma data/raw/fma_small_audio --output-root data/external_smoke --eval-ratio 0.2 --query-duration 8.0 --train-epochs 1 --batch-size 2`
- 当前训练侧实时证据:
- `Device: cpu`
- `Classes: 6381`
- `Train songs: 6381`
- `Epoch 1` 已启动
- 当前 epoch 总 batch 数:`3191`
结论:
- 真实 FMA 数据下载门槛已正式打开
- 项目已从“等待真实数据”切换到“真实数据 smoke 正在执行”的阶段
### Stage: 真实 FMA 下载超过八成半
完成项:
......@@ -2782,6 +2914,39 @@
## 2026-06-02
### Stage: 真实 FMA 本地数据门槛打开并进入 smoke 训练
完成项:
- 复检归档下载状态,确认 `fma_small.zip` 已达完整字节数
- 验证本地 FMA 音频目录已可用于真实 smoke
- 直接启动真实 FMA `smoke-local`,进入训练/索引/评测主链路
验证结果:
- `/usr/local/miniconda3/bin/python scripts/prepare_fma_archive.py inspect` 返回:
- `archive_size=7679594875`
- `archive_progress_percent=100.0`
- `num_audio_files=3025`(inspect 阶段)
- 本地解压目录复检:
- `find data/raw/fma_small_audio ... | wc -l` 返回 `5827`
- `check-local-ready` / `inspect-local` 返回:
- `ready_for_smoke=true`
- `num_audio_files=8000`
- `eligible_query_files=7994`
- `recommended_train_queries=6395`
- `recommended_test_queries=1599`
- 真实 smoke 已启动:
- `/usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local fma data/raw/fma_small_audio --output-root data/external_smoke --eval-ratio 0.2 --query-duration 8.0 --train-epochs 1 --batch-size 2`
- 当前训练侧实时证据:
- `Device: cpu`
- `Classes: 6381`
- `Train songs: 6381`
- `Epoch 1` 已启动
- 当前 epoch 总 batch 数:`3191`
结论:
- 真实 FMA 数据下载门槛已正式打开
- 项目已从“等待真实数据”切换到“真实数据 smoke 正在执行”的阶段
### Stage: 真实 FMA 下载超过八成半
完成项:
......@@ -3275,6 +3440,39 @@
## 2026-06-02
### Stage: 真实 FMA 本地数据门槛打开并进入 smoke 训练
完成项:
- 复检归档下载状态,确认 `fma_small.zip` 已达完整字节数
- 验证本地 FMA 音频目录已可用于真实 smoke
- 直接启动真实 FMA `smoke-local`,进入训练/索引/评测主链路
验证结果:
- `/usr/local/miniconda3/bin/python scripts/prepare_fma_archive.py inspect` 返回:
- `archive_size=7679594875`
- `archive_progress_percent=100.0`
- `num_audio_files=3025`(inspect 阶段)
- 本地解压目录复检:
- `find data/raw/fma_small_audio ... | wc -l` 返回 `5827`
- `check-local-ready` / `inspect-local` 返回:
- `ready_for_smoke=true`
- `num_audio_files=8000`
- `eligible_query_files=7994`
- `recommended_train_queries=6395`
- `recommended_test_queries=1599`
- 真实 smoke 已启动:
- `/usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local fma data/raw/fma_small_audio --output-root data/external_smoke --eval-ratio 0.2 --query-duration 8.0 --train-epochs 1 --batch-size 2`
- 当前训练侧实时证据:
- `Device: cpu`
- `Classes: 6381`
- `Train songs: 6381`
- `Epoch 1` 已启动
- 当前 epoch 总 batch 数:`3191`
结论:
- 真实 FMA 数据下载门槛已正式打开
- 项目已从“等待真实数据”切换到“真实数据 smoke 正在执行”的阶段
### Stage: 真实 FMA 下载超过八成半
完成项:
......@@ -3766,6 +3964,39 @@
## 2026-06-02
### Stage: 真实 FMA 本地数据门槛打开并进入 smoke 训练
完成项:
- 复检归档下载状态,确认 `fma_small.zip` 已达完整字节数
- 验证本地 FMA 音频目录已可用于真实 smoke
- 直接启动真实 FMA `smoke-local`,进入训练/索引/评测主链路
验证结果:
- `/usr/local/miniconda3/bin/python scripts/prepare_fma_archive.py inspect` 返回:
- `archive_size=7679594875`
- `archive_progress_percent=100.0`
- `num_audio_files=3025`(inspect 阶段)
- 本地解压目录复检:
- `find data/raw/fma_small_audio ... | wc -l` 返回 `5827`
- `check-local-ready` / `inspect-local` 返回:
- `ready_for_smoke=true`
- `num_audio_files=8000`
- `eligible_query_files=7994`
- `recommended_train_queries=6395`
- `recommended_test_queries=1599`
- 真实 smoke 已启动:
- `/usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local fma data/raw/fma_small_audio --output-root data/external_smoke --eval-ratio 0.2 --query-duration 8.0 --train-epochs 1 --batch-size 2`
- 当前训练侧实时证据:
- `Device: cpu`
- `Classes: 6381`
- `Train songs: 6381`
- `Epoch 1` 已启动
- 当前 epoch 总 batch 数:`3191`
结论:
- 真实 FMA 数据下载门槛已正式打开
- 项目已从“等待真实数据”切换到“真实数据 smoke 正在执行”的阶段
### Stage: 真实 FMA 下载超过八成半
完成项:
......@@ -4262,6 +4493,39 @@
## 2026-06-02
### Stage: 真实 FMA 本地数据门槛打开并进入 smoke 训练
完成项:
- 复检归档下载状态,确认 `fma_small.zip` 已达完整字节数
- 验证本地 FMA 音频目录已可用于真实 smoke
- 直接启动真实 FMA `smoke-local`,进入训练/索引/评测主链路
验证结果:
- `/usr/local/miniconda3/bin/python scripts/prepare_fma_archive.py inspect` 返回:
- `archive_size=7679594875`
- `archive_progress_percent=100.0`
- `num_audio_files=3025`(inspect 阶段)
- 本地解压目录复检:
- `find data/raw/fma_small_audio ... | wc -l` 返回 `5827`
- `check-local-ready` / `inspect-local` 返回:
- `ready_for_smoke=true`
- `num_audio_files=8000`
- `eligible_query_files=7994`
- `recommended_train_queries=6395`
- `recommended_test_queries=1599`
- 真实 smoke 已启动:
- `/usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local fma data/raw/fma_small_audio --output-root data/external_smoke --eval-ratio 0.2 --query-duration 8.0 --train-epochs 1 --batch-size 2`
- 当前训练侧实时证据:
- `Device: cpu`
- `Classes: 6381`
- `Train songs: 6381`
- `Epoch 1` 已启动
- 当前 epoch 总 batch 数:`3191`
结论:
- 真实 FMA 数据下载门槛已正式打开
- 项目已从“等待真实数据”切换到“真实数据 smoke 正在执行”的阶段
### Stage: 真实 FMA 下载超过八成半
完成项:
......