Make retrieval fusion tuning reproducible for fast evaluation
Constraint: Need fresh, like-for-like evidence on stable v6 assets before changing defaults Rejected: More training-weight tuning | v7 and v8 regressed hard-case and overall accuracy Confidence: high Scope-risk: narrow Directive: Use open datasets as separate train/eval assets and tune fusion on held-out eval manifests before retraining Tested: /usr/local/miniconda3/bin/python -m py_compile evaluate.py; /usr/local/miniconda3/bin/python evaluate.py --data data/synthetic_v2 --model data/models_v6/best_model.pt --index-prefix data/index_v6/reference --split test --device cpu --fast-eval; /usr/local/miniconda3/bin/python evaluate.py --data data/synthetic_v2 --model data/models_v6/best_model.pt --index-prefix data/index_v6/reference --split test --device cpu --fast-eval --chroma-weight 0.2 --ecapa-weight 0.55 --melody-weight 0.25 --output-json reports/smoke-v6/synthetic_v2/eval-fusion-tuned.json Not-tested: Full melody-enabled sweep across multiple weight grids and real external datasets
Showing
5 changed files
with
137 additions
and
1 deletions
-
Please register or sign in to post a comment