industrial-benchmark-spec.md 1.12 KB

Industrial Benchmark Spec

更新:2026-06-02

目标

为工业级可商用 ACR 设立持续基准,不只看总体 top1/top5,还看场景化与风险化指标。

Benchmark 维度

1. Retrieval Quality

  • top1
  • top5
  • MRR
  • recall@k

2. Scenario Buckets

  • clean
  • noisy
  • compressed
  • time-stretched
  • pitch-shifted
  • humming_like
  • confused
  • partial-overlap
  • far-field / device-recorded

3. Catalog Scale Buckets

  • 1K songs
  • 10K songs
  • 100K songs
  • 1M+ songs

4. Operational Metrics

  • p50 / p95 latency
  • indexing throughput
  • incremental update time
  • memory / disk footprint

5. Business Safety Metrics

  • false accept rate
  • rejection quality
  • near-duplicate confusion rate
  • license provenance coverage

Required Artifacts per Model Release

  • dataset registry snapshot
  • training config snapshot
  • benchmark report JSON
  • benchmark summary markdown
  • model card
  • license review manifest

Minimum Go/No-Go Gate

  • clean top1 >= 0.95
  • noisy top1 >= 0.85
  • confused top1 >= 0.70
  • humming_like top1 >= 0.60
  • top5 >= 0.95 on all production-relevant buckets
  • false accept below agreed threshold