industrial-benchmark-spec.md
1.12 KB
Industrial Benchmark Spec
更新:2026-06-02
目标
为工业级可商用 ACR 设立持续基准,不只看总体 top1/top5,还看场景化与风险化指标。
Benchmark 维度
1. Retrieval Quality
- top1
- top5
- MRR
- recall@k
2. Scenario Buckets
- clean
- noisy
- compressed
- time-stretched
- pitch-shifted
- humming_like
- confused
- partial-overlap
- far-field / device-recorded
3. Catalog Scale Buckets
- 1K songs
- 10K songs
- 100K songs
- 1M+ songs
4. Operational Metrics
- p50 / p95 latency
- indexing throughput
- incremental update time
- memory / disk footprint
5. Business Safety Metrics
- false accept rate
- rejection quality
- near-duplicate confusion rate
- license provenance coverage
Required Artifacts per Model Release
- dataset registry snapshot
- training config snapshot
- benchmark report JSON
- benchmark summary markdown
- model card
- license review manifest
Minimum Go/No-Go Gate
- clean top1 >= 0.95
- noisy top1 >= 0.85
- confused top1 >= 0.70
- humming_like top1 >= 0.60
- top5 >= 0.95 on all production-relevant buckets
- false accept below agreed threshold