1. 02 Jun, 2026 40 commits
    • Constraint: The running cap48 seed=999 benchmark has not emitted its final report yet, so only in-flight evidence can be recorded safely
      Rejected: Claim a new three-seed conclusion now | The aggregate would be speculative without report.json and eval outputs
      Confidence: high
      Scope-risk: narrow
      Directive: When a long benchmark is still active, checkpoint stage evidence explicitly and wait for report.json before changing strategy guidance
      Tested: Verified process tree shows hybrid moved from build-index to evaluate.py; verified reference_progress.json reports 48 refs, 491 windows, 192-d embeddings, and complete status; verified report.json is still absent
      Not-tested: Final hybrid eval metrics, subsequent high_energy run, and final three-seed aggregate
      cnb.bofCdSsphPA authored
    • Constraint: The cap48 seed=999 benchmark is still running, so this checkpoint must avoid unverified algorithm conclusions
      Rejected: Wait for the CPU benchmark to finish | Would delay handoff and leave the next session without a clean restart package
      Confidence: high
      Scope-risk: narrow
      Directive: Keep future doc-only checkpoints surgically staged and do not add data/raw, external_smoke, /tmp outputs, or model artifacts
      Tested: Verified staged diff only includes AGENT memory, handoff, changelog, and changelist docs; confirmed /tmp cap48 seed=999 report is not ready yet
      Not-tested: The in-flight cap48 seed=999 benchmark result and any follow-up aggregate metrics
      cnb.bofCdSsphPA authored
    • Persist the current two-seed cap48 summary so the strategy recommendation is grounded in aggregated evidence rather than whichever single run happened most recently.
      
      Constraint: Only documentation changes are allowed because benchmark artifacts remain outside version control
      Rejected: Keep narrating cap48 one run at a time | The aggregate is now more informative than any individual cap48 run
      Confidence: high
      Scope-risk: narrow
      Directive: Prefer reporting aggregate seed statistics once two or more runs exist; avoid re-elevating single-seed claims above the aggregate
      Tested: Verified both cap48 report.json files; computed aggregate mean/min/max/stdev; verified docs now record high_energy mean_top1=0.9167 and hybrid mean_top1=0.8750
      Not-tested: Aggregates beyond two seeds or style-bucketed aggregates
      cnb.bofCdSsphPA authored
    • Persist the completed seed123 benchmark showing hybrid ahead again, and update the strategy guidance from single-run winner claims to a multi-seed interpretation.
      
      Constraint: Only documentation changes are allowed because benchmark outputs remain outside version control
      Rejected: Keep framing cap48 as a stable high_energy win | The second seed materially weakens that interpretation
      Confidence: high
      Scope-risk: narrow
      Directive: Base the hybrid vs high_energy default decision on aggregated multi-seed evidence, not any single cap48 run
      Tested: Verified /tmp/ab_smoke_seg_cap48_top2_seed123/report.json; verified high_energy eval.json; verified docs now record hybrid=24/0.9583/1.0 and high_energy=24/0.9167/1.0 for seed123
      Not-tested: Formal aggregation across multiple seeds beyond these two cap48 runs
      cnb.bofCdSsphPA authored
    • Persist the newly finished cap48 seed123 hybrid result so the second-seed validation run now has measured evidence instead of only a runtime checkpoint.
      
      Constraint: seed123 high_energy and the final report are still pending
      Rejected: Wait for the full seed123 report before updating docs | Would leave the multi-seed evidence stale across sessions
      Confidence: high
      Scope-risk: narrow
      Directive: Replace the seed123 partial section with the final two-strategy ranking once high_energy eval and report.json land
      Tested: Verified /tmp/ab_smoke_seg_cap48_top2_seed123/hybrid/fma_reports_smoke/eval.json; verified docs record hybrid=24/0.9583/1.0 and high_energy still in build-index
      Not-tested: Final seed123 comparison because high_energy has not finished yet
      cnb.bofCdSsphPA authored
    • Update the handoff and changelog with the newer seed123 runtime milestone so later sessions know the hybrid lane has advanced from build-index into capped evaluation.
      
      Constraint: No measured seed123 score is available yet, only a later execution milestone
      Rejected: Leave the older build-index note in place | Would make the restart handoff stale and less actionable
      Confidence: high
      Scope-risk: narrow
      Directive: Replace the seed123 runtime note with measured scores as soon as hybrid eval.json or report.json land
      Tested: Verified active seed123 hybrid evaluate.py process; verified docs now record seed123 current phase as evaluate.py --max-queries 24
      Not-tested: Seed123 strategy scores because hybrid eval.json has not landed yet
      cnb.bofCdSsphPA authored
    • Preserve the second-seed cap48 entry point and current build-index phase so later sessions can validate whether the cap48 reversal was stable or a seed artifact.
      
      Constraint: The second-seed run has not produced scores yet, so only execution-state evidence is available
      Rejected: Wait for the seed123 scores before recording anything | Risks losing the multi-seed validation checkpoint if the session ends first
      Confidence: high
      Scope-risk: narrow
      Directive: Replace the seed123 running-state section with measured scores once hybrid eval.json or report.json land
      Tested: Verified active cap48 seed123 processes; verified handoff records work-root, seed, subset size, query cap, and current build-index phase
      Not-tested: cap48 seed123 strategy scores because the run is still in progress
      cnb.bofCdSsphPA authored
    • Persist the larger 48-track benchmark where high_energy overtook hybrid, and downgrade the previously overconfident default-strategy claim to a conditional recommendation pending broader validation.
      
      Constraint: Only documentation changes are allowed because benchmark outputs remain outside version control
      Rejected: Keep asserting hybrid as fully settled default after cap48 | The 48-track capped benchmark materially contradicts that stronger claim
      Confidence: high
      Scope-risk: narrow
      Directive: Resolve the hybrid vs high_energy default question with larger, multi-seed, style-aware benchmarks before making a final hard default claim
      Tested: Verified /tmp/ab_smoke_seg_cap48_top2/report.json; verified high_energy eval.json; verified docs now record high_energy=24/0.9167/1.0 and hybrid=24/0.7917/1.0
      Not-tested: Multi-seed or style-balanced follow-up benchmark beyond the single cap48 run
      cnb.bofCdSsphPA authored
    • Update the handoff and changelog with the newer cap48 runtime milestone so later sessions know the high_energy lane has advanced from build-index into capped evaluation.
      
      Constraint: No measured cap48 high_energy score is available yet, only a later execution milestone
      Rejected: Leave the older build-index note in place | Would make the restart handoff stale and less actionable
      Confidence: high
      Scope-risk: narrow
      Directive: Replace the cap48 runtime note with final top-two scores as soon as high_energy eval.json or report.json lands
      Tested: Verified active cap48 high_energy evaluate.py process; verified docs now record high_energy current phase as evaluate.py --max-queries 24
      Not-tested: Final cap48 comparison because high_energy eval.json has not landed yet
      cnb.bofCdSsphPA authored
    • Persist the newly finished cap48 hybrid result so the next session can continue the 48-track validation run from measured evidence instead of only a runtime checkpoint.
      
      Constraint: cap48 high_energy and the final report are still pending
      Rejected: Wait for the full cap48 report before updating docs | Would leave the largest current real-data checkpoint stale across sessions
      Confidence: high
      Scope-risk: narrow
      Directive: Replace the cap48 partial section with the final two-strategy ranking once high_energy eval and report.json land
      Tested: Verified /tmp/ab_smoke_seg_cap48_top2/hybrid/fma_reports_smoke/eval.json; verified docs record hybrid=24/0.7917/1.0 and high_energy still in build-index
      Not-tested: Final cap48 comparison because high_energy has not finished yet
      cnb.bofCdSsphPA authored
    • Update the handoff and changelog with the newer cap48 runtime milestone so later sessions know the run has advanced from build-index into capped evaluation.
      
      Constraint: No measured cap48 score is available yet, only a later execution milestone
      Rejected: Leave the older build-index note in place | Would make the restart handoff stale and less actionable
      Confidence: high
      Scope-risk: narrow
      Directive: Replace the cap48 runtime note with hybrid scores as soon as eval.json lands
      Tested: Verified active cap48 evaluate.py process; verified docs now record cap48 current phase as evaluate.py --max-queries 24
      Not-tested: cap48 strategy scores because hybrid eval.json has not landed yet
      cnb.bofCdSsphPA authored
    • Preserve the new 48-track top-two benchmark entry point and current build-index phase so later sessions can continue the expanding validation ladder without rediscovering runtime state.
      
      Constraint: cap48 has not produced scores yet, so only execution-state evidence is available
      Rejected: Wait for cap48 scores before recording anything | Risks losing the larger-benchmark checkpoint if the session ends first
      Confidence: high
      Scope-risk: narrow
      Directive: Replace the cap48 running-state section with measured scores once hybrid eval.json or report.json land
      Tested: Verified active cap48 processes; verified handoff records work-root, subset size, query cap, and current build-index phase
      Not-tested: cap48 strategy scores because the run is still in progress
      cnb.bofCdSsphPA authored
    • Persist the larger 32-track benchmark showing hybrid strongly outperforming high_energy, so the default strategy decision rests on multiple larger real-data checkpoints instead of a single subset.
      
      Constraint: Only documentation changes are allowed because benchmark artifacts stay outside version control
      Rejected: Keep the default recommendation tentative after cap32 | The 24-track and 32-track capped benchmarks now agree on hybrid superiority
      Confidence: high
      Scope-risk: narrow
      Directive: Use cap24 and cap32 together as the current strongest strategy evidence until a broader multi-style benchmark supersedes them
      Tested: Verified /tmp/ab_smoke_seg_cap32_top2/report.json; verified high_energy eval.json; verified docs now record hybrid=20/0.95/1.0 and high_energy=20/0.5/1.0
      Not-tested: Wider style-balanced benchmark beyond the FMA top-two subsets
      cnb.bofCdSsphPA authored
    • Persist the newly finished cap32 hybrid result so the next session can continue the top-two validation run from measured evidence instead of only a running-state checkpoint.
      
      Constraint: cap32 high_energy and the final report are still pending
      Rejected: Wait for the full cap32 report before updating docs | Would leave the larger-subset evidence stale across sessions
      Confidence: high
      Scope-risk: narrow
      Directive: Replace the cap32 partial section with the final two-strategy ranking once high_energy eval and report.json land
      Tested: Verified /tmp/ab_smoke_seg_cap32_top2/hybrid/fma_reports_smoke/eval.json; verified docs record hybrid=20/0.95/1.0 and high_energy still training
      Not-tested: Final cap32 comparison because high_energy has not finished yet
      cnb.bofCdSsphPA authored
    • Preserve the new 32-track top-two benchmark entry point and current build-index phase so a later session can continue the stronger validation run without losing runtime context.
      
      Constraint: The cap32 benchmark is still running, so only execution-state evidence is available
      Rejected: Wait for cap32 results before recording anything | Risks losing the larger-benchmark checkpoint if the session ends first
      Confidence: high
      Scope-risk: narrow
      Directive: Replace the cap32 running-state section with measured scores once hybrid eval.json and report.json land
      Tested: Verified active cap32 processes; verified handoff records work-root, subset size, query cap, and current build-index phase
      Not-tested: cap32 strategy scores because the run is still in progress
      cnb.bofCdSsphPA authored
    • Persist the larger real-FMA benchmark result showing hybrid clearly outperforming high_energy, so the project recommendation can converge on one default instead of an unresolved tie.
      
      Constraint: Only docs change because benchmark outputs remain outside version control
      Rejected: Keep treating hybrid and high_energy as co-equal defaults | The larger 24-track capped benchmark now separates them clearly
      Confidence: high
      Scope-risk: narrow
      Directive: Use cap24 top-two as the current strongest public evidence until a larger capped benchmark supersedes it
      Tested: Verified /tmp/ab_smoke_seg_cap24_top2/report.json; verified high_energy eval.json; verified docs now state hybrid=16/1.0/1.0 and high_energy=16/0.8125/1.0
      Not-tested: Broader strategy comparison beyond hybrid vs high_energy on the 24-track subset
      cnb.bofCdSsphPA authored
    • Record the new 24-track capped benchmark setup and the first completed hybrid result so the next session can continue the stronger tie-break experiment without rediscovering runtime state.
      
      Constraint: The cap24 benchmark is still in progress, so only partial evidence can be documented now
      Rejected: Wait for high_energy to finish before updating handoff | Risks losing the fresh larger-subset evidence if the session ends first
      Confidence: high
      Scope-risk: narrow
      Directive: Replace the partial cap24 section with the final two-strategy ranking once report.json lands
      Tested: Verified /tmp/ab_smoke_seg_cap24_top2/hybrid/fma_reports_smoke/eval.json; verified active cap24 processes; verified docs include the exact work-root and resume command
      Not-tested: Final cap24 top-two comparison because high_energy is still training
      cnb.bofCdSsphPA authored
    • Persist the completed capped real-data benchmark results so future sessions can use the final strategy ordering and recommendation without replaying the run.
      
      Constraint: Only documentation should change because benchmark artifacts live outside version control
      Rejected: Leave the result only in /tmp report files | Would make the evidence fragile across sessions
      Confidence: high
      Scope-risk: narrow
      Directive: Use cap16 as the current default evidence point until a larger capped benchmark supersedes it
      Tested: Verified /tmp/ab_smoke_seg_cap16/report.json; verified repeated_section_aware eval.json; verified docs reflect final ranking hybrid/high_energy/beat_aware/repeated_section_aware
      Not-tested: Larger real-dataset benchmark beyond the 16-track capped subset
      cnb.bofCdSsphPA authored
    • Update the handoff and changelog with the newly finished capped FMA high_energy result so the next session starts from current evidence instead of stale partials.
      
      Constraint: Benchmark is still running overall and only partial strategies are complete
      Rejected: Wait for repeated_section_aware to finish before updating handoff | Risks another stale restart gap
      Confidence: high
      Scope-risk: narrow
      Directive: Replace the partial cap16 table with the final ranking once repeated_section_aware and report.json land
      Tested: Verified /tmp/ab_smoke_seg_cap16/high_energy/fma_reports_smoke/eval.json; verified docs now record high_energy = 12 / 1.0 / 1.0
      Not-tested: Final cap16 multi-strategy report because repeated_section_aware is still in progress
      cnb.bofCdSsphPA authored
    • Record the latest delivered benchmark evidence, active work-root, partial results, and exact resume commands so a new session can continue without rediscovering context.
      
      Constraint: User requested immediate delivery artifacts before the long benchmark fully finishes
      Rejected: Wait for the entire cap16 benchmark to finish before handing off | Would delay delivery and risk losing resumable context
      Confidence: high
      Scope-risk: narrow
      Directive: Update the handoff again once high_energy and repeated_section_aware finish on cap16
      Tested: Verified partial eval files for hybrid and beat_aware; verified active cap16 benchmark processes; verified session-handoff contains resume commands and partial scores
      Not-tested: Final multi-strategy cap16 ranking because high_energy and repeated_section_aware are still running
      cnb.bofCdSsphPA authored
    • Clarify that the pipeline already mixes random sampling with librosa-guided candidate selection, while keeping heavier structural segmentation as a later optimization path.
      
      Constraint: Must avoid staging local datasets and transient smoke artifacts
      Rejected: Full librosa.segment.* default rollout | Too CPU-heavy and too distribution-shaping for current smoke/training stage
      Confidence: high
      Scope-risk: narrow
      Directive: Keep future segmentation comparisons capped by equal query budgets when reporting quality deltas
      Tested: py_compile for evaluate/external_adapters/ab_smoke_segmentation; evaluate.py --max-queries 5; ab_smoke_segmentation end-to-end smoke with max_test_queries=5
      Not-tested: Multi-strategy medium-size capped A/B benchmark on larger real FMA subset
      cnb.bofCdSsphPA authored
    • Constraint: Strategy comparisons need real-audio evidence, but the benchmark must stay cheap enough to run repeatedly on CPU during active development
      Rejected: Judge winners only by top1/topk on a tiny subset | ties hide the practical value of strategies that generate far more usable queries
      Confidence: medium
      Scope-risk: narrow
      Directive: Keep num_queries as a tie-breaker for tiny-smoke comparisons; increase subset size before promoting benchmark winners to default training policy
      Tested: /usr/local/miniconda3/bin/python acr-engine/scripts/ab_smoke_segmentation.py --dataset fma --input-dir acr-engine/data/raw/fma_small_audio --work-root /tmp/ab_smoke_seg --subset-size 8 --query-duration 8 --train-epochs 1 --batch-size 2 --device cpu --output-json /tmp/ab_smoke_seg/report.json; post-run ranking verification from /tmp/ab_smoke_seg/report.json
      Not-tested: Larger FMA subsets or difficult internal query mixes in the same benchmark script
      cnb.bofCdSsphPA authored
    • Constraint: Music retrieval should sample repeated hook-like regions without adding heavyweight structure models or breaking the existing lightweight candidate stack
      Rejected: Reserve repeated-section logic for a later dedicated chorus detector | delays a practical chorus-like signal that can already improve query realism today
      Confidence: medium
      Scope-risk: moderate
      Directive: Treat repeated_section_aware as a lightweight chorus proxy; future chorus ranking should refine rather than discard these candidates
      Tested: /usr/local/miniconda3/bin/python -m py_compile acr-engine/src/data/dataset.py acr-engine/src/data/manifest_tools.py acr-engine/train.py acr-engine/src/data/external_adapters.py; synthetic_v2 dry-run with --segment-strategy repeated_section_aware; handcrafted 24s repeated-motif fixture with repeated_section_aware and hybrid offset checks
      Not-tested: Full end-to-end metric impact on FMA/internal datasets with repeated_section_aware enabled
      cnb.bofCdSsphPA authored
    • Constraint: Music queries often begin near stable pulse locations, but beat tracking can fail on sparse or synthetic signals and must degrade safely
      Rejected: Depend on beat tracking alone for all rhythmic sampling | too brittle when beat extraction is weak or absent
      Confidence: high
      Scope-risk: moderate
      Directive: Keep beat_aware as a lightweight candidate generator with onset fallback; future chorus/repeated-section logic should compose with beat-aware rather than bypass it
      Tested: /usr/local/miniconda3/bin/python -m py_compile acr-engine/src/data/dataset.py acr-engine/src/data/manifest_tools.py acr-engine/train.py acr-engine/src/data/external_adapters.py; synthetic_v2 dry-run with --segment-strategy beat_aware; handcrafted 20s pulse-track fixture with beat_aware and hybrid offset checks
      Not-tested: Full retraining/evaluation impact on open/internal datasets using beat_aware end-to-end
      cnb.bofCdSsphPA authored
    • Constraint: Music ACR queries should be closer to choruses, strong rhythmic sections, and attack regions without giving up the existing random and silence-aware fallbacks
      Rejected: Add only heavier beat/chorus modeling first | higher complexity and more brittle than lightweight energy/onset heuristics for the current training pipeline
      Confidence: high
      Scope-risk: moderate
      Directive: Keep high_energy/onset_aware as heuristic candidate generators; future beat/chorus logic should layer on top of them rather than replace the fallback stack
      Tested: /usr/local/miniconda3/bin/python -m py_compile acr-engine/src/data/dataset.py acr-engine/src/data/manifest_tools.py acr-engine/train.py acr-engine/src/data/external_adapters.py; synthetic_v2 dry-run with --segment-strategy high_energy and onset_aware; handcrafted 20s audio fixture with high_energy/onset_aware query offset checks
      Not-tested: Full retraining/evaluation impact on FMA or internal production datasets
      cnb.bofCdSsphPA authored
    • Constraint: smoke-local must recover long CPU index builds automatically, but partial embeddings from an older model must never contaminate a newly trained index
      Rejected: Always reuse any existing partial checkpoint | can silently blend embeddings from different model generations into one index
      Confidence: high
      Scope-risk: moderate
      Directive: Keep model-signature checks on all future index resume paths; auto-resume should fall back to clean rebuild on any signature mismatch
      Tested: /usr/local/miniconda3/bin/python -m py_compile acr-engine/src/engines/ecapa_embedder.py acr-engine/src/data/external_adapters.py acr-engine/run_demo.py; same-model partial checkpoint resume vs fresh rebuild equality; mismatched-model checkpoint rejection and clean rebuild equality
      Not-tested: Reattaching the currently running real FMA smoke process after an external interruption
      cnb.bofCdSsphPA authored
    • Constraint: Real FMA smoke indexing can run for a long time on CPU and synthetic/root-layout datasets must still use the same build-index entrypoint
      Rejected: Treat build-index as all-or-nothing and require full reruns after interruption | wastes hours on CPU and obscures whether work was already completed
      Confidence: high
      Scope-risk: moderate
      Directive: Preserve checkpoint file compatibility; future smoke-local automation should prefer resume before rebuilding from scratch
      Tested: /usr/local/miniconda3/bin/python -m py_compile acr-engine/src/engines/ecapa_embedder.py acr-engine/src/engines/chromaprint_matcher.py acr-engine/run_demo.py; synthetic_v2 partial-checkpoint resume vs fresh rebuild equality check (shape/ids/embeddings/progress)
      Not-tested: In-place resumption of the currently running real FMA process after an actual external kill/restart
      cnb.bofCdSsphPA authored
    • Constraint: Real music queries often include long silence heads/tails, but the pipeline still needs random-crop generalization and simple CLI controls
      Rejected: Replace all random crops with structure-aware segmentation | would overfit to curated boundaries and diverge from messy real-world query distributions
      Confidence: high
      Scope-risk: moderate
      Directive: Keep random as fallback; layer beat/onset/chorus-aware segmentation on top instead of removing silence-aware and sliding paths
      Tested: /usr/local/miniconda3/bin/python -m py_compile acr-engine/src/data/dataset.py acr-engine/src/data/manifest_tools.py acr-engine/train.py acr-engine/src/data/external_adapters.py; external_adapters.py prepare-local fma /tmp/segtest_audio --query-strategy silence_aware; train.py --data data/synthetic_v2 --dry-run --segment-strategy hybrid
      Not-tested: Full FMA smoke retraining/eval with the new segmentation strategies
      cnb.bofCdSsphPA authored
    • Constraint: Internal assets must support both manually labeled clips and whole-track auto-window generation without breaking pgvector export
      Rejected: Treat missing query duration as full audio duration | prevents multi-window query expansion for long source audio
      Confidence: high
      Scope-risk: narrow
      Directive: Keep explicit CSV offset authoritative; only auto-expand when offset is absent and query_stride is set
      Tested: /usr/local/miniconda3/bin/python -m py_compile acr-engine/scripts/internal_asset_type_mapper.py; local 30s/40s WAV fixture export with manifest + pgvector verification
      Not-tested: End-to-end retraining with newly expanded internal manifests
      cnb.bofCdSsphPA authored
    • Constraint: Internal short-video and demo assets need explicit duration/offset semantics before they can behave like real training or pgvector segment records
      Rejected: Leave query offsets empty by default | Produces weaker provenance and less useful downstream segment metadata
      Confidence: high
      Scope-risk: narrow
      Directive: Prefer source CSV timing when available, then fall back to inspected audio duration and conservative default offsets
      Tested: Sample CSV run confirmed one query used CSV duration/offset (5.0/12.5) and another fell back to inspected duration/default offset (6.5/0.0), with pgvector segments matching
      Not-tested: Complex multi-segment offset generation from long-form internal masters
      cnb.bofCdSsphPA authored
    • Constraint: Internal CSV ingestion should reach a pgvector-ready payload without requiring a second custom export path
      Rejected: Limit the mapper to manifest outputs only | Forces another transformation layer before database loading
      Confidence: high
      Scope-risk: narrow
      Directive: Keep pgvector payloads aligned with the shared songs/references/segments contract while preserving internal asset metadata fields
      Tested: internal_asset_type_mapper.py with --emit-pgvector-json produced songs=2 references=2 segments=2 and included audio_role/asset_type_code/validation_status in sample rows
      Not-tested: Direct bulk load into PostgreSQL using a live pgvector database
      cnb.bofCdSsphPA authored
    • Constraint: Internal CSV exports should expose missing audio and usable durations before they are treated as train-ready manifests
      Rejected: Defer path and duration checks to later training failures | Would make ingestion debugging slow and noisy
      Confidence: high
      Scope-risk: narrow
      Directive: Keep internal asset validation lightweight at mapping time; surface existence and duration early, then layer richer QC rules incrementally
      Tested: internal_asset_type_mapper.py with --audio-root on a 6-row sample detected missing_audio=2 and emitted durations for existing reference/query assets
      Not-tested: Production-scale scans over the full internal asset repository
      cnb.bofCdSsphPA authored
    • Constraint: Internal asset exports should reach train/test-ready manifests without repeated manual reshaping
      Rejected: Stop at references/queries JSON only | Still leaves each import needing custom bundle assembly and split logic
      Confidence: high
      Scope-risk: narrow
      Directive: Keep internal manifest emission conservative and deterministic; preserve train/test query presence even on tiny exports
      Tested: internal_asset_type_mapper.py sample run with --emit-manifests produced catalog/train/test/val and balanced 1 query in both train and test
      Not-tested: Duration/offset enrichment from live source metadata and audio-path existence checks on production exports
      cnb.bofCdSsphPA authored
    • Constraint: Internal type enums need a repeatable mapping path into manifest-ready buckets before bulk database exports begin
      Rejected: Leave type handling as documentation only | Would force repeated manual filtering and inconsistent ingestion decisions
      Confidence: high
      Scope-risk: narrow
      Directive: Keep internal asset mapping defaults conservative; conditional instrumental variants should stay opt-in until version-aware training is ready
      Tested: internal_asset_type_mapper.py on a 6-row sample CSV produced references=2 queries=2 metadata_only=1 excluded=1 with expected type routing
      Not-tested: Direct SQL export integration against the live source database
      cnb.bofCdSsphPA authored
    • Constraint: Internal media types need a clear training whitelist and versioning policy before they are mapped into manifests and pgvector
      Rejected: Treat all audio-like assets as the same training label source | Would blur original-vs-instrumental semantics and degrade retrieval quality
      Confidence: high
      Scope-risk: narrow
      Directive: Keep original recordings, instrumental variants, and short-video clips explicitly separated by audio_role and version semantics during ingestion
      Tested: Verified new documentation anchors and mapping tables in training-data-and-pgvector-guide.md
      Not-tested: Automated import from the upstream SQL type enum into manifests
      cnb.bofCdSsphPA authored
    • Constraint: Open-dataset ingestion needs a way to generate multiple overlapping queries per track, otherwise training/eval coverage stays too sparse
      Rejected: Keep only one random external query per track | Leaves long songs underrepresented and weakens reproducibility
      Confidence: high
      Scope-risk: moderate
      Directive: Preserve single-query behavior as the default, but keep overlap-query generation configurable through query_stride for future corpora
      Tested: manifest_tools audio-dir-to-splits --help shows --query-stride; prepare-local on data/synthetic_v2/songs with query_duration=8.0 and query_stride=4.0 produced 72 queries with query_index fields
      Not-tested: Full end-to-end smoke-local completion on the still-running real FMA corpus with overlap-query mode enabled
      cnb.bofCdSsphPA authored
    • Constraint: Real-data smoke reports must distinguish manifest query duration from training segment duration to avoid 5s-vs-8s confusion across runs
      Rejected: Keep a single ambiguous query_duration field | Makes cross-run analysis and handoff error-prone
      Confidence: high
      Scope-risk: narrow
      Directive: Preserve explicit duration semantics in future smoke/report artifacts and keep legacy aliases only for compatibility
      Tested: build_smoke_config_summary() emits manifest_query_duration=8.0 and train_segment_duration=5.0 using configs/default.yaml
      Not-tested: End-to-end regeneration of the still-running real FMA smoke report bundle with the new config schema
      cnb.bofCdSsphPA authored
    • Constraint: Future sessions need startup memory for user preferences, real-data status, and the current FMA bottleneck without re-discovery
      Rejected: Leave continuity only in transient chat context | Would force every new session to reconstruct state from scratch
      Confidence: high
      Scope-risk: narrow
      Directive: Keep AGENTS continuity memory concise, code-true, and refreshed when project direction or bottlenecks materially change
      Tested: AGENTS.md anchor search for continuity keys; verified host CUDA snapshot; verified build-index progress logs on small smoke artifacts
      Not-tested: Full completion of the long-running real FMA CPU build-index stage
      cnb.bofCdSsphPA authored
    • Constraint: Real FMA smoke is already running on CPU, but future smoke runs must be able to target GPU without manually splitting the pipeline
      Rejected: Pass through raw 'auto' everywhere | run_demo/evaluate embedder paths cannot consume torch.device('auto') safely
      Confidence: high
      Scope-risk: narrow
      Directive: Keep smoke orchestration device handling normalized at the adapter boundary unless all downstream CLIs gain native auto-device support
      Tested: smoke-local --help shows --device; resolve_device('auto') returns cpu on this host; smoke-local synthetic run prints Device: cpu; manual build-index and evaluate succeed on smoke artifacts with top1=1.0 topk=1.0
      Not-tested: End-to-end smoke-local completion on the long-running real FMA job and a live CUDA host path
      cnb.bofCdSsphPA authored
    • Constraint: Must document code-true behavior for training crops, retrieval windows, GPU support, and FMA reuse before more dataset automation lands
      Rejected: Leave docs at high-level abstractions only | Would hide 5s-vs-8s and CPU-vs-GPU operational realities
      Confidence: high
      Scope-risk: narrow
      Directive: Keep future dataset docs aligned with actual code paths and artifact timestamps, not intended architecture alone
      Tested: Source review of dataset.py manifest_tools.py external_adapters.py utils/audio.py ecapa_embedder.py train.py; live FMA smoke progress observed through epoch completion
      Not-tested: Markdown renderer-specific Mermaid rendering and every relative link target in external viewers
      cnb.bofCdSsphPA authored