Prevent empty local dataset folders from masquerading as smoke-ready
Constraint: Real-data validation now depends on user-requested local corpus drop zones that may exist before they contain any audio Rejected: Let smoke-local fail deep inside training | Produces slower and less actionable feedback for continuous sessions Confidence: high Scope-risk: narrow Directive: Keep readiness thresholds aligned with the minimum viable query split assumptions before expanding real-data automation Tested: /usr/local/miniconda3/bin/python -m py_compile src/data/external_adapters.py scripts/status_snapshot.py; /usr/local/miniconda3/bin/python src/data/external_adapters.py check-local-ready fma data/raw/fma_small_audio --eval-ratio 0.2 --query-duration 8.0; /usr/local/miniconda3/bin/python src/data/external_adapters.py check-local-ready mtg_jamendo data/raw/mtg_jamendo_audio --eval-ratio 0.2 --query-duration 8.0; /usr/local/miniconda3/bin/python scripts/status_snapshot.py --output .omx/latest_status_snapshot.json Not-tested: Full smoke-local on real FMA or MTG-Jamendo remains blocked until audio is actually downloaded
Showing
4 changed files
with
197 additions
and
21 deletions
-
Please register or sign in to post a comment