1. 02 Jun, 2026 15 commits
    • Constraint: Personal-use experimentation needs a single entrypoint from local open-audio directories to train/eval manifests
      Rejected: Separate manual manifest generation per dataset | Too error-prone and slows iterative training/evaluation
      Confidence: high
      Scope-risk: narrow
      Directive: Point real FMA or MTG-Jamendo local download folders at prepare-local before expanding training runs
      Tested: /usr/local/miniconda3/bin/python -m py_compile src/data/external_adapters.py src/data/manifest_tools.py; /usr/local/miniconda3/bin/python src/data/external_adapters.py prepare-local fma tmp/open_music_demo --output-root data/external_ingested/demo_via_adapter --eval-ratio 0.5 --query-duration 5.0
      Not-tested: Full upstream corpus import and large-scale training
      cnb.bofCdSsphPA authored
    • Constraint: Personal-use workflow needs real train/eval manifests rather than bootstrap-only placeholders
      Rejected: Keep external datasets as catalog skeletons only | Does not satisfy training/evaluation reuse requirement
      Confidence: high
      Scope-risk: narrow
      Directive: Wire real FMA or MTG-Jamendo local download directories into this ingestion path before larger-scale training
      Tested: /usr/local/miniconda3/bin/python -m py_compile src/data/manifest_tools.py; /usr/local/miniconda3/bin/python src/data/manifest_tools.py audio-dir-to-splits tmp/open_music_demo data/external_ingested/demo_fma_like --source-dataset demo_fma_like --eval-ratio 0.5 --query-duration 5.0
      Not-tested: Full download/import of upstream FMA or MTG-Jamendo corpora
      cnb.bofCdSsphPA authored
    • Constraint: Need fresh, like-for-like evidence on stable v6 assets before changing defaults
      Rejected: More training-weight tuning | v7 and v8 regressed hard-case and overall accuracy
      Confidence: high
      Scope-risk: narrow
      Directive: Use open datasets as separate train/eval assets and tune fusion on held-out eval manifests before retraining
      Tested: /usr/local/miniconda3/bin/python -m py_compile evaluate.py; /usr/local/miniconda3/bin/python evaluate.py --data data/synthetic_v2 --model data/models_v6/best_model.pt --index-prefix data/index_v6/reference --split test --device cpu --fast-eval; /usr/local/miniconda3/bin/python evaluate.py --data data/synthetic_v2 --model data/models_v6/best_model.pt --index-prefix data/index_v6/reference --split test --device cpu --fast-eval --chroma-weight 0.2 --ecapa-weight 0.55 --melody-weight 0.25 --output-json reports/smoke-v6/synthetic_v2/eval-fusion-tuned.json
      Not-tested: Full melody-enabled sweep across multiple weight grids and real external datasets
      cnb.bofCdSsphPA authored
    • Constraint: Must preserve runnable pipeline and record stage evidence before continuing optimization
      Rejected: More naive oversampling | Regressed overall and hard-case accuracy in smoke-v4
      Confidence: medium
      Scope-risk: moderate
      Directive: Treat confused and humming_like as separate optimization lanes in future stages
      Tested: /usr/local/miniconda3/bin/python train.py --data data/synthetic_v2 --output data/models_v6 --device cpu --epochs 1 --batch-size 6 --dry-run; /usr/local/miniconda3/bin/python -m py_compile train.py src/models/losses.py src/data/dataset.py; /usr/local/miniconda3/bin/python train.py --data data/synthetic_v2 --output data/models_v6 --device cpu --epochs 2 --batch-size 6; /usr/local/miniconda3/bin/python run_demo.py build-index --data data/synthetic_v2 --model data/models_v6/best_model.pt --output data/index_v6 --device cpu; /usr/local/miniconda3/bin/python evaluate.py --data data/synthetic_v2 --model data/models_v6/best_model.pt --index-prefix data/index_v6/reference --split test --device cpu --fast-eval --output-json reports/smoke-v6/synthetic_v2/eval.json; /usr/local/miniconda3/bin/python scripts/generate_artifacts.py --eval-json reports/smoke-v6/synthetic_v2/eval.json --config-json reports/smoke-v6/synthetic_v2/config.json --output-dir reports/smoke-v6/synthetic_v2 --model-version smoke-v6 --data-version synthetic_v2
      Not-tested: Real external dataset training run and GPU-scale convergence
      cnb.bofCdSsphPA authored
    • Broaden external dataset bootstrap support and replace naive hard-case oversampling with a more targeted weighting signal that measurably helps humming-like queries while preserving the release/eval workflow.
      
      Constraint: Hard-case optimization must be evidence-driven and preserve a record of mixed outcomes across iterations
      Rejected: Reuse naive oversampling after regression | it already showed worse overall behavior with no hard-case gain
      Confidence: medium
      Scope-risk: moderate
      Directive: Next iteration should target confused-case negatives explicitly; do not assume humming gains transfer to confusion robustness
      Tested: bootstrap generation for MTG-Jamendo and ModelScope placeholders; 2-epoch CPU training for models_v5; index_v5 build; fast eval JSON generation for smoke-v5
      Not-tested: real audio ingestion for the new datasets; full melody-aware slow evaluation on models_v5
      cnb.bofCdSsphPA authored
    • Extend the data ingress path with bootstrap manifests for real datasets and capture an unsuccessful hard-case oversampling experiment so future iterations can avoid repeating the same weak strategy.
      
      Constraint: Continuous optimization requires preserving negative results, not just successful ones
      Rejected: Drop the oversampling attempt without record | would lose evidence and encourage redoing the same low-yield change
      Confidence: high
      Scope-risk: moderate
      Directive: Next hard-case work should focus on melody-aware supervision and harder negatives instead of naive sample repetition
      Tested: bootstrap manifest generation for FMA and CCMusic; 2-epoch CPU training for models_v4; index_v4 build; fast eval JSON generation for smoke-v4
      Not-tested: whitelisted real audio ingestion beyond placeholder manifests; full melody-aware slow-eval on models_v4
      cnb.bofCdSsphPA authored
    • Make the benchmark pipeline produce reusable release artifacts from actual evaluation results so model iterations can be tracked, reviewed, and shipped with evidence.
      
      Constraint: Continuous training only helps if each stage emits durable reports and release metadata
      Rejected: Keep artifact generation as a disconnected smoke utility | would block repeatable release discipline
      Confidence: high
      Scope-risk: moderate
      Directive: Next iterations should improve hard-case metrics on real/whitelisted datasets and keep artifact generation on every training milestone
      Tested: synthetic_v2 data regeneration; 2-epoch CPU training; index build; fast evaluation JSON export; artifact generation to reports/smoke-v2/synthetic_v2
      Not-tested: full melody-aware slow evaluation as release default; real external dataset benchmark generation
      cnb.bofCdSsphPA authored
    • Turn the docs set into a layered documentation portal with navigation, source tracing, and reusable governance templates so the project can scale beyond ad hoc notes.
      
      Constraint: Industrialization requires documentation that supports decisions, traceability, and release discipline
      Rejected: Keep docs as isolated topical files without navigation or templates | would slow onboarding and weaken release governance
      Confidence: high
      Scope-risk: narrow
      Directive: Keep future docs in the executive-summary -> diagram -> table -> text -> appendix pattern with explicit Sources sections
      Tested: structural checks for core docs and templates; source-section checks; docs file-presence checks; service /config and /health smoke checks from earlier stage remain valid
      Not-tested: rendered markdown visuals in a browser; external publishing pipeline
      cnb.bofCdSsphPA authored
    • Prepare the prototype for industrial evolution by adding a service surface, external manifest conversion tools, and dataset adapter scaffolding with explicit licensing checkpoints.
      
      Constraint: Commercialization requires auditable data ingress and callable service boundaries, not just offline notebooks
      Rejected: Delay service and data-ingest work until after model perfection | would block end-to-end productization and ops readiness
      Confidence: medium
      Scope-risk: moderate
      Directive: Next stages should connect real whitelisted datasets, benchmark latency, and improve hard-case acceptance/rejection quality
      Tested: dataset adapter registry/describe/init commands; manifest csv-to-catalog; service health; service build_index; service recognize; train.py --dry-run
      Not-tested: live uvicorn deployment; external dataset downloads; ANN-backed production indexing
      cnb.bofCdSsphPA authored
    • add src · 31a72045
      cnb.bofCdSsphPA authored
    • Shift the prototype toward music-retrieval behavior by documenting dataset contracts, upgrading the frontend to 128-bin Mel plus band splitting, and adding retrieval evaluation plus harder confusion-oriented augmentation.
      
      Constraint: The previous pipeline mixed train splits with the searchable catalog and hid real retrieval quality
      Rejected: Keep classification-centric validation and whole-song averaged references | it masked structural accuracy failures
      Confidence: medium
      Scope-risk: moderate
      Directive: Next iterations should target humming/confused top1 with specialized melody-aware retrieval and stronger real-data calibration
      Tested: synthetic_v2 generation; 3-epoch CPU training; index build; evaluate.py top1=0.65 top5=0.95 on test split
      Not-tested: external open-dataset ingestion; foundation-model baselines; production latency
      cnb.bofCdSsphPA authored
    • period upload · 62688d3b
      cnb.bofCdSsphPA authored
    • Add missing project documentation and a minimal executable demo flow so the repository can be understood and validated end to end.
      
      Constraint: The existing repo had design fragments but no verified runnable path
      Rejected: Delay documentation until after full productization | would keep scope opaque and slow iteration
      Confidence: medium
      Scope-risk: moderate
      Directive: Keep future stages checkpointed with changelog entries and runnable verification commands
      Tested: synthetic dataset generation; train.py --dry-run; 1 epoch CPU training; index build; recognition JSON output
      Not-tested: production-scale retrieval; real copyrighted audio; API serving
      cnb.bofCdSsphPA authored
    • add codex · e25a16be
      cnb.bofCdSsphPA authored
    • cnb.bofCdSsphPA authored