- 02 Jun, 2026 40 commits
-
-
Constraint: Strategy guidance had to wait until the full seed=999 report landed and all three cap48 runs could be aggregated consistently Rejected: Keep treating cap48 as unresolved | The third seed now confirms high_energy repeats the same score while hybrid remains volatile Confidence: high Scope-risk: narrow Directive: Treat high_energy as the cap48 default only within the documented FMA smoke condition until larger cap64 and bucketed benchmarks either confirm or overturn it Tested: Verified seed=999 report.json, high_energy eval.json, hybrid eval.json, and computed three-seed aggregate showing high_energy mean_top1=0.9167 with zero variance versus hybrid mean_top1=0.8750 Not-tested: cap64-or-larger benchmarks, bucket/style-aware evaluations, and any future hybrid redesign
cnb.bofCdSsphPA authored -
Constraint: The cap48 seed=999 run has only completed the hybrid leg, so the three-seed aggregate is still incomplete Rejected: Wait for high_energy to finish before checkpointing | Would risk losing the verified hybrid seed999 score from the active Ralph session Confidence: high Scope-risk: narrow Directive: Keep recording verified partial benchmark milestones, but do not revise default-strategy guidance until both strategies and the final report are available Tested: Verified hybrid eval.json reports num_queries=24, top1=0.875, topk=1.0; verified progress.json records the same result; verified high_energy is still running and report.json is still absent Not-tested: Final high_energy seed999 metrics, final report.json, and updated three-seed aggregate
cnb.bofCdSsphPA authored -
Constraint: The running cap48 seed=999 benchmark has not emitted its final report yet, so only in-flight evidence can be recorded safely Rejected: Claim a new three-seed conclusion now | The aggregate would be speculative without report.json and eval outputs Confidence: high Scope-risk: narrow Directive: When a long benchmark is still active, checkpoint stage evidence explicitly and wait for report.json before changing strategy guidance Tested: Verified process tree shows hybrid moved from build-index to evaluate.py; verified reference_progress.json reports 48 refs, 491 windows, 192-d embeddings, and complete status; verified report.json is still absent Not-tested: Final hybrid eval metrics, subsequent high_energy run, and final three-seed aggregate
cnb.bofCdSsphPA authored -
Constraint: The cap48 seed=999 benchmark is still running, so this checkpoint must avoid unverified algorithm conclusions Rejected: Wait for the CPU benchmark to finish | Would delay handoff and leave the next session without a clean restart package Confidence: high Scope-risk: narrow Directive: Keep future doc-only checkpoints surgically staged and do not add data/raw, external_smoke, /tmp outputs, or model artifacts Tested: Verified staged diff only includes AGENT memory, handoff, changelog, and changelist docs; confirmed /tmp cap48 seed=999 report is not ready yet Not-tested: The in-flight cap48 seed=999 benchmark result and any follow-up aggregate metrics
cnb.bofCdSsphPA authored -
Persist the current two-seed cap48 summary so the strategy recommendation is grounded in aggregated evidence rather than whichever single run happened most recently. Constraint: Only documentation changes are allowed because benchmark artifacts remain outside version control Rejected: Keep narrating cap48 one run at a time | The aggregate is now more informative than any individual cap48 run Confidence: high Scope-risk: narrow Directive: Prefer reporting aggregate seed statistics once two or more runs exist; avoid re-elevating single-seed claims above the aggregate Tested: Verified both cap48 report.json files; computed aggregate mean/min/max/stdev; verified docs now record high_energy mean_top1=0.9167 and hybrid mean_top1=0.8750 Not-tested: Aggregates beyond two seeds or style-bucketed aggregates
cnb.bofCdSsphPA authored -
Persist the completed seed123 benchmark showing hybrid ahead again, and update the strategy guidance from single-run winner claims to a multi-seed interpretation. Constraint: Only documentation changes are allowed because benchmark outputs remain outside version control Rejected: Keep framing cap48 as a stable high_energy win | The second seed materially weakens that interpretation Confidence: high Scope-risk: narrow Directive: Base the hybrid vs high_energy default decision on aggregated multi-seed evidence, not any single cap48 run Tested: Verified /tmp/ab_smoke_seg_cap48_top2_seed123/report.json; verified high_energy eval.json; verified docs now record hybrid=24/0.9583/1.0 and high_energy=24/0.9167/1.0 for seed123 Not-tested: Formal aggregation across multiple seeds beyond these two cap48 runs
cnb.bofCdSsphPA authored -
Persist the newly finished cap48 seed123 hybrid result so the second-seed validation run now has measured evidence instead of only a runtime checkpoint. Constraint: seed123 high_energy and the final report are still pending Rejected: Wait for the full seed123 report before updating docs | Would leave the multi-seed evidence stale across sessions Confidence: high Scope-risk: narrow Directive: Replace the seed123 partial section with the final two-strategy ranking once high_energy eval and report.json land Tested: Verified /tmp/ab_smoke_seg_cap48_top2_seed123/hybrid/fma_reports_smoke/eval.json; verified docs record hybrid=24/0.9583/1.0 and high_energy still in build-index Not-tested: Final seed123 comparison because high_energy has not finished yet
cnb.bofCdSsphPA authored -
Update the handoff and changelog with the newer seed123 runtime milestone so later sessions know the hybrid lane has advanced from build-index into capped evaluation. Constraint: No measured seed123 score is available yet, only a later execution milestone Rejected: Leave the older build-index note in place | Would make the restart handoff stale and less actionable Confidence: high Scope-risk: narrow Directive: Replace the seed123 runtime note with measured scores as soon as hybrid eval.json or report.json land Tested: Verified active seed123 hybrid evaluate.py process; verified docs now record seed123 current phase as evaluate.py --max-queries 24 Not-tested: Seed123 strategy scores because hybrid eval.json has not landed yet
cnb.bofCdSsphPA authored -
Preserve the second-seed cap48 entry point and current build-index phase so later sessions can validate whether the cap48 reversal was stable or a seed artifact. Constraint: The second-seed run has not produced scores yet, so only execution-state evidence is available Rejected: Wait for the seed123 scores before recording anything | Risks losing the multi-seed validation checkpoint if the session ends first Confidence: high Scope-risk: narrow Directive: Replace the seed123 running-state section with measured scores once hybrid eval.json or report.json land Tested: Verified active cap48 seed123 processes; verified handoff records work-root, seed, subset size, query cap, and current build-index phase Not-tested: cap48 seed123 strategy scores because the run is still in progress
cnb.bofCdSsphPA authored -
Persist the larger 48-track benchmark where high_energy overtook hybrid, and downgrade the previously overconfident default-strategy claim to a conditional recommendation pending broader validation. Constraint: Only documentation changes are allowed because benchmark outputs remain outside version control Rejected: Keep asserting hybrid as fully settled default after cap48 | The 48-track capped benchmark materially contradicts that stronger claim Confidence: high Scope-risk: narrow Directive: Resolve the hybrid vs high_energy default question with larger, multi-seed, style-aware benchmarks before making a final hard default claim Tested: Verified /tmp/ab_smoke_seg_cap48_top2/report.json; verified high_energy eval.json; verified docs now record high_energy=24/0.9167/1.0 and hybrid=24/0.7917/1.0 Not-tested: Multi-seed or style-balanced follow-up benchmark beyond the single cap48 run
cnb.bofCdSsphPA authored -
Update the handoff and changelog with the newer cap48 runtime milestone so later sessions know the high_energy lane has advanced from build-index into capped evaluation. Constraint: No measured cap48 high_energy score is available yet, only a later execution milestone Rejected: Leave the older build-index note in place | Would make the restart handoff stale and less actionable Confidence: high Scope-risk: narrow Directive: Replace the cap48 runtime note with final top-two scores as soon as high_energy eval.json or report.json lands Tested: Verified active cap48 high_energy evaluate.py process; verified docs now record high_energy current phase as evaluate.py --max-queries 24 Not-tested: Final cap48 comparison because high_energy eval.json has not landed yet
cnb.bofCdSsphPA authored -
Persist the newly finished cap48 hybrid result so the next session can continue the 48-track validation run from measured evidence instead of only a runtime checkpoint. Constraint: cap48 high_energy and the final report are still pending Rejected: Wait for the full cap48 report before updating docs | Would leave the largest current real-data checkpoint stale across sessions Confidence: high Scope-risk: narrow Directive: Replace the cap48 partial section with the final two-strategy ranking once high_energy eval and report.json land Tested: Verified /tmp/ab_smoke_seg_cap48_top2/hybrid/fma_reports_smoke/eval.json; verified docs record hybrid=24/0.7917/1.0 and high_energy still in build-index Not-tested: Final cap48 comparison because high_energy has not finished yet
cnb.bofCdSsphPA authored -
Update the handoff and changelog with the newer cap48 runtime milestone so later sessions know the run has advanced from build-index into capped evaluation. Constraint: No measured cap48 score is available yet, only a later execution milestone Rejected: Leave the older build-index note in place | Would make the restart handoff stale and less actionable Confidence: high Scope-risk: narrow Directive: Replace the cap48 runtime note with hybrid scores as soon as eval.json lands Tested: Verified active cap48 evaluate.py process; verified docs now record cap48 current phase as evaluate.py --max-queries 24 Not-tested: cap48 strategy scores because hybrid eval.json has not landed yet
cnb.bofCdSsphPA authored -
Preserve the new 48-track top-two benchmark entry point and current build-index phase so later sessions can continue the expanding validation ladder without rediscovering runtime state. Constraint: cap48 has not produced scores yet, so only execution-state evidence is available Rejected: Wait for cap48 scores before recording anything | Risks losing the larger-benchmark checkpoint if the session ends first Confidence: high Scope-risk: narrow Directive: Replace the cap48 running-state section with measured scores once hybrid eval.json or report.json land Tested: Verified active cap48 processes; verified handoff records work-root, subset size, query cap, and current build-index phase Not-tested: cap48 strategy scores because the run is still in progress
cnb.bofCdSsphPA authored -
Persist the larger 32-track benchmark showing hybrid strongly outperforming high_energy, so the default strategy decision rests on multiple larger real-data checkpoints instead of a single subset. Constraint: Only documentation changes are allowed because benchmark artifacts stay outside version control Rejected: Keep the default recommendation tentative after cap32 | The 24-track and 32-track capped benchmarks now agree on hybrid superiority Confidence: high Scope-risk: narrow Directive: Use cap24 and cap32 together as the current strongest strategy evidence until a broader multi-style benchmark supersedes them Tested: Verified /tmp/ab_smoke_seg_cap32_top2/report.json; verified high_energy eval.json; verified docs now record hybrid=20/0.95/1.0 and high_energy=20/0.5/1.0 Not-tested: Wider style-balanced benchmark beyond the FMA top-two subsets
cnb.bofCdSsphPA authored -
Persist the newly finished cap32 hybrid result so the next session can continue the top-two validation run from measured evidence instead of only a running-state checkpoint. Constraint: cap32 high_energy and the final report are still pending Rejected: Wait for the full cap32 report before updating docs | Would leave the larger-subset evidence stale across sessions Confidence: high Scope-risk: narrow Directive: Replace the cap32 partial section with the final two-strategy ranking once high_energy eval and report.json land Tested: Verified /tmp/ab_smoke_seg_cap32_top2/hybrid/fma_reports_smoke/eval.json; verified docs record hybrid=20/0.95/1.0 and high_energy still training Not-tested: Final cap32 comparison because high_energy has not finished yet
cnb.bofCdSsphPA authored -
Preserve the new 32-track top-two benchmark entry point and current build-index phase so a later session can continue the stronger validation run without losing runtime context. Constraint: The cap32 benchmark is still running, so only execution-state evidence is available Rejected: Wait for cap32 results before recording anything | Risks losing the larger-benchmark checkpoint if the session ends first Confidence: high Scope-risk: narrow Directive: Replace the cap32 running-state section with measured scores once hybrid eval.json and report.json land Tested: Verified active cap32 processes; verified handoff records work-root, subset size, query cap, and current build-index phase Not-tested: cap32 strategy scores because the run is still in progress
cnb.bofCdSsphPA authored -
Persist the larger real-FMA benchmark result showing hybrid clearly outperforming high_energy, so the project recommendation can converge on one default instead of an unresolved tie. Constraint: Only docs change because benchmark outputs remain outside version control Rejected: Keep treating hybrid and high_energy as co-equal defaults | The larger 24-track capped benchmark now separates them clearly Confidence: high Scope-risk: narrow Directive: Use cap24 top-two as the current strongest public evidence until a larger capped benchmark supersedes it Tested: Verified /tmp/ab_smoke_seg_cap24_top2/report.json; verified high_energy eval.json; verified docs now state hybrid=16/1.0/1.0 and high_energy=16/0.8125/1.0 Not-tested: Broader strategy comparison beyond hybrid vs high_energy on the 24-track subset
cnb.bofCdSsphPA authored -
Record the new 24-track capped benchmark setup and the first completed hybrid result so the next session can continue the stronger tie-break experiment without rediscovering runtime state. Constraint: The cap24 benchmark is still in progress, so only partial evidence can be documented now Rejected: Wait for high_energy to finish before updating handoff | Risks losing the fresh larger-subset evidence if the session ends first Confidence: high Scope-risk: narrow Directive: Replace the partial cap24 section with the final two-strategy ranking once report.json lands Tested: Verified /tmp/ab_smoke_seg_cap24_top2/hybrid/fma_reports_smoke/eval.json; verified active cap24 processes; verified docs include the exact work-root and resume command Not-tested: Final cap24 top-two comparison because high_energy is still training
cnb.bofCdSsphPA authored -
Persist the completed capped real-data benchmark results so future sessions can use the final strategy ordering and recommendation without replaying the run. Constraint: Only documentation should change because benchmark artifacts live outside version control Rejected: Leave the result only in /tmp report files | Would make the evidence fragile across sessions Confidence: high Scope-risk: narrow Directive: Use cap16 as the current default evidence point until a larger capped benchmark supersedes it Tested: Verified /tmp/ab_smoke_seg_cap16/report.json; verified repeated_section_aware eval.json; verified docs reflect final ranking hybrid/high_energy/beat_aware/repeated_section_aware Not-tested: Larger real-dataset benchmark beyond the 16-track capped subset
cnb.bofCdSsphPA authored -
Update the handoff and changelog with the newly finished capped FMA high_energy result so the next session starts from current evidence instead of stale partials. Constraint: Benchmark is still running overall and only partial strategies are complete Rejected: Wait for repeated_section_aware to finish before updating handoff | Risks another stale restart gap Confidence: high Scope-risk: narrow Directive: Replace the partial cap16 table with the final ranking once repeated_section_aware and report.json land Tested: Verified /tmp/ab_smoke_seg_cap16/high_energy/fma_reports_smoke/eval.json; verified docs now record high_energy = 12 / 1.0 / 1.0 Not-tested: Final cap16 multi-strategy report because repeated_section_aware is still in progress
cnb.bofCdSsphPA authored -
Record the latest delivered benchmark evidence, active work-root, partial results, and exact resume commands so a new session can continue without rediscovering context. Constraint: User requested immediate delivery artifacts before the long benchmark fully finishes Rejected: Wait for the entire cap16 benchmark to finish before handing off | Would delay delivery and risk losing resumable context Confidence: high Scope-risk: narrow Directive: Update the handoff again once high_energy and repeated_section_aware finish on cap16 Tested: Verified partial eval files for hybrid and beat_aware; verified active cap16 benchmark processes; verified session-handoff contains resume commands and partial scores Not-tested: Final multi-strategy cap16 ranking because high_energy and repeated_section_aware are still running
cnb.bofCdSsphPA authored -
Clarify that the pipeline already mixes random sampling with librosa-guided candidate selection, while keeping heavier structural segmentation as a later optimization path. Constraint: Must avoid staging local datasets and transient smoke artifacts Rejected: Full librosa.segment.* default rollout | Too CPU-heavy and too distribution-shaping for current smoke/training stage Confidence: high Scope-risk: narrow Directive: Keep future segmentation comparisons capped by equal query budgets when reporting quality deltas Tested: py_compile for evaluate/external_adapters/ab_smoke_segmentation; evaluate.py --max-queries 5; ab_smoke_segmentation end-to-end smoke with max_test_queries=5 Not-tested: Multi-strategy medium-size capped A/B benchmark on larger real FMA subset
cnb.bofCdSsphPA authored -
Constraint: Strategy comparisons need real-audio evidence, but the benchmark must stay cheap enough to run repeatedly on CPU during active development Rejected: Judge winners only by top1/topk on a tiny subset | ties hide the practical value of strategies that generate far more usable queries Confidence: medium Scope-risk: narrow Directive: Keep num_queries as a tie-breaker for tiny-smoke comparisons; increase subset size before promoting benchmark winners to default training policy Tested: /usr/local/miniconda3/bin/python acr-engine/scripts/ab_smoke_segmentation.py --dataset fma --input-dir acr-engine/data/raw/fma_small_audio --work-root /tmp/ab_smoke_seg --subset-size 8 --query-duration 8 --train-epochs 1 --batch-size 2 --device cpu --output-json /tmp/ab_smoke_seg/report.json; post-run ranking verification from /tmp/ab_smoke_seg/report.json Not-tested: Larger FMA subsets or difficult internal query mixes in the same benchmark script
cnb.bofCdSsphPA authored -
Constraint: Music retrieval should sample repeated hook-like regions without adding heavyweight structure models or breaking the existing lightweight candidate stack Rejected: Reserve repeated-section logic for a later dedicated chorus detector | delays a practical chorus-like signal that can already improve query realism today Confidence: medium Scope-risk: moderate Directive: Treat repeated_section_aware as a lightweight chorus proxy; future chorus ranking should refine rather than discard these candidates Tested: /usr/local/miniconda3/bin/python -m py_compile acr-engine/src/data/dataset.py acr-engine/src/data/manifest_tools.py acr-engine/train.py acr-engine/src/data/external_adapters.py; synthetic_v2 dry-run with --segment-strategy repeated_section_aware; handcrafted 24s repeated-motif fixture with repeated_section_aware and hybrid offset checks Not-tested: Full end-to-end metric impact on FMA/internal datasets with repeated_section_aware enabled
cnb.bofCdSsphPA authored -
Constraint: Music queries often begin near stable pulse locations, but beat tracking can fail on sparse or synthetic signals and must degrade safely Rejected: Depend on beat tracking alone for all rhythmic sampling | too brittle when beat extraction is weak or absent Confidence: high Scope-risk: moderate Directive: Keep beat_aware as a lightweight candidate generator with onset fallback; future chorus/repeated-section logic should compose with beat-aware rather than bypass it Tested: /usr/local/miniconda3/bin/python -m py_compile acr-engine/src/data/dataset.py acr-engine/src/data/manifest_tools.py acr-engine/train.py acr-engine/src/data/external_adapters.py; synthetic_v2 dry-run with --segment-strategy beat_aware; handcrafted 20s pulse-track fixture with beat_aware and hybrid offset checks Not-tested: Full retraining/evaluation impact on open/internal datasets using beat_aware end-to-end
cnb.bofCdSsphPA authored -
Constraint: Music ACR queries should be closer to choruses, strong rhythmic sections, and attack regions without giving up the existing random and silence-aware fallbacks Rejected: Add only heavier beat/chorus modeling first | higher complexity and more brittle than lightweight energy/onset heuristics for the current training pipeline Confidence: high Scope-risk: moderate Directive: Keep high_energy/onset_aware as heuristic candidate generators; future beat/chorus logic should layer on top of them rather than replace the fallback stack Tested: /usr/local/miniconda3/bin/python -m py_compile acr-engine/src/data/dataset.py acr-engine/src/data/manifest_tools.py acr-engine/train.py acr-engine/src/data/external_adapters.py; synthetic_v2 dry-run with --segment-strategy high_energy and onset_aware; handcrafted 20s audio fixture with high_energy/onset_aware query offset checks Not-tested: Full retraining/evaluation impact on FMA or internal production datasets
cnb.bofCdSsphPA authored -
Constraint: smoke-local must recover long CPU index builds automatically, but partial embeddings from an older model must never contaminate a newly trained index Rejected: Always reuse any existing partial checkpoint | can silently blend embeddings from different model generations into one index Confidence: high Scope-risk: moderate Directive: Keep model-signature checks on all future index resume paths; auto-resume should fall back to clean rebuild on any signature mismatch Tested: /usr/local/miniconda3/bin/python -m py_compile acr-engine/src/engines/ecapa_embedder.py acr-engine/src/data/external_adapters.py acr-engine/run_demo.py; same-model partial checkpoint resume vs fresh rebuild equality; mismatched-model checkpoint rejection and clean rebuild equality Not-tested: Reattaching the currently running real FMA smoke process after an external interruption
cnb.bofCdSsphPA authored -
Constraint: Real FMA smoke indexing can run for a long time on CPU and synthetic/root-layout datasets must still use the same build-index entrypoint Rejected: Treat build-index as all-or-nothing and require full reruns after interruption | wastes hours on CPU and obscures whether work was already completed Confidence: high Scope-risk: moderate Directive: Preserve checkpoint file compatibility; future smoke-local automation should prefer resume before rebuilding from scratch Tested: /usr/local/miniconda3/bin/python -m py_compile acr-engine/src/engines/ecapa_embedder.py acr-engine/src/engines/chromaprint_matcher.py acr-engine/run_demo.py; synthetic_v2 partial-checkpoint resume vs fresh rebuild equality check (shape/ids/embeddings/progress) Not-tested: In-place resumption of the currently running real FMA process after an actual external kill/restart
cnb.bofCdSsphPA authored -
Constraint: Real music queries often include long silence heads/tails, but the pipeline still needs random-crop generalization and simple CLI controls Rejected: Replace all random crops with structure-aware segmentation | would overfit to curated boundaries and diverge from messy real-world query distributions Confidence: high Scope-risk: moderate Directive: Keep random as fallback; layer beat/onset/chorus-aware segmentation on top instead of removing silence-aware and sliding paths Tested: /usr/local/miniconda3/bin/python -m py_compile acr-engine/src/data/dataset.py acr-engine/src/data/manifest_tools.py acr-engine/train.py acr-engine/src/data/external_adapters.py; external_adapters.py prepare-local fma /tmp/segtest_audio --query-strategy silence_aware; train.py --data data/synthetic_v2 --dry-run --segment-strategy hybrid Not-tested: Full FMA smoke retraining/eval with the new segmentation strategies
cnb.bofCdSsphPA authored -
Constraint: Internal assets must support both manually labeled clips and whole-track auto-window generation without breaking pgvector export Rejected: Treat missing query duration as full audio duration | prevents multi-window query expansion for long source audio Confidence: high Scope-risk: narrow Directive: Keep explicit CSV offset authoritative; only auto-expand when offset is absent and query_stride is set Tested: /usr/local/miniconda3/bin/python -m py_compile acr-engine/scripts/internal_asset_type_mapper.py; local 30s/40s WAV fixture export with manifest + pgvector verification Not-tested: End-to-end retraining with newly expanded internal manifests
cnb.bofCdSsphPA authored -
Constraint: Internal short-video and demo assets need explicit duration/offset semantics before they can behave like real training or pgvector segment records Rejected: Leave query offsets empty by default | Produces weaker provenance and less useful downstream segment metadata Confidence: high Scope-risk: narrow Directive: Prefer source CSV timing when available, then fall back to inspected audio duration and conservative default offsets Tested: Sample CSV run confirmed one query used CSV duration/offset (5.0/12.5) and another fell back to inspected duration/default offset (6.5/0.0), with pgvector segments matching Not-tested: Complex multi-segment offset generation from long-form internal masters
cnb.bofCdSsphPA authored -
Constraint: Internal CSV ingestion should reach a pgvector-ready payload without requiring a second custom export path Rejected: Limit the mapper to manifest outputs only | Forces another transformation layer before database loading Confidence: high Scope-risk: narrow Directive: Keep pgvector payloads aligned with the shared songs/references/segments contract while preserving internal asset metadata fields Tested: internal_asset_type_mapper.py with --emit-pgvector-json produced songs=2 references=2 segments=2 and included audio_role/asset_type_code/validation_status in sample rows Not-tested: Direct bulk load into PostgreSQL using a live pgvector database
cnb.bofCdSsphPA authored -
Constraint: Internal CSV exports should expose missing audio and usable durations before they are treated as train-ready manifests Rejected: Defer path and duration checks to later training failures | Would make ingestion debugging slow and noisy Confidence: high Scope-risk: narrow Directive: Keep internal asset validation lightweight at mapping time; surface existence and duration early, then layer richer QC rules incrementally Tested: internal_asset_type_mapper.py with --audio-root on a 6-row sample detected missing_audio=2 and emitted durations for existing reference/query assets Not-tested: Production-scale scans over the full internal asset repository
cnb.bofCdSsphPA authored -
Constraint: Internal asset exports should reach train/test-ready manifests without repeated manual reshaping Rejected: Stop at references/queries JSON only | Still leaves each import needing custom bundle assembly and split logic Confidence: high Scope-risk: narrow Directive: Keep internal manifest emission conservative and deterministic; preserve train/test query presence even on tiny exports Tested: internal_asset_type_mapper.py sample run with --emit-manifests produced catalog/train/test/val and balanced 1 query in both train and test Not-tested: Duration/offset enrichment from live source metadata and audio-path existence checks on production exports
cnb.bofCdSsphPA authored -
Constraint: Internal type enums need a repeatable mapping path into manifest-ready buckets before bulk database exports begin Rejected: Leave type handling as documentation only | Would force repeated manual filtering and inconsistent ingestion decisions Confidence: high Scope-risk: narrow Directive: Keep internal asset mapping defaults conservative; conditional instrumental variants should stay opt-in until version-aware training is ready Tested: internal_asset_type_mapper.py on a 6-row sample CSV produced references=2 queries=2 metadata_only=1 excluded=1 with expected type routing Not-tested: Direct SQL export integration against the live source database
cnb.bofCdSsphPA authored -
Constraint: Internal media types need a clear training whitelist and versioning policy before they are mapped into manifests and pgvector Rejected: Treat all audio-like assets as the same training label source | Would blur original-vs-instrumental semantics and degrade retrieval quality Confidence: high Scope-risk: narrow Directive: Keep original recordings, instrumental variants, and short-video clips explicitly separated by audio_role and version semantics during ingestion Tested: Verified new documentation anchors and mapping tables in training-data-and-pgvector-guide.md Not-tested: Automated import from the upstream SQL type enum into manifests
cnb.bofCdSsphPA authored -
Constraint: Open-dataset ingestion needs a way to generate multiple overlapping queries per track, otherwise training/eval coverage stays too sparse Rejected: Keep only one random external query per track | Leaves long songs underrepresented and weakens reproducibility Confidence: high Scope-risk: moderate Directive: Preserve single-query behavior as the default, but keep overlap-query generation configurable through query_stride for future corpora Tested: manifest_tools audio-dir-to-splits --help shows --query-stride; prepare-local on data/synthetic_v2/songs with query_duration=8.0 and query_stride=4.0 produced 72 queries with query_index fields Not-tested: Full end-to-end smoke-local completion on the still-running real FMA corpus with overlap-query mode enabled
cnb.bofCdSsphPA authored -
Constraint: Real-data smoke reports must distinguish manifest query duration from training segment duration to avoid 5s-vs-8s confusion across runs Rejected: Keep a single ambiguous query_duration field | Makes cross-run analysis and handoff error-prone Confidence: high Scope-risk: narrow Directive: Preserve explicit duration semantics in future smoke/report artifacts and keep legacy aliases only for compatibility Tested: build_smoke_config_summary() emits manifest_query_duration=8.0 and train_segment_duration=5.0 using configs/default.yaml Not-tested: End-to-end regeneration of the still-running real FMA smoke report bundle with the new config schema
cnb.bofCdSsphPA authored -
Constraint: Future sessions need startup memory for user preferences, real-data status, and the current FMA bottleneck without re-discovery Rejected: Leave continuity only in transient chat context | Would force every new session to reconstruct state from scratch Confidence: high Scope-risk: narrow Directive: Keep AGENTS continuity memory concise, code-true, and refreshed when project direction or bottlenecks materially change Tested: AGENTS.md anchor search for continuity keys; verified host CUDA snapshot; verified build-index progress logs on small smoke artifacts Not-tested: Full completion of the long-running real FMA CPU build-index stage
cnb.bofCdSsphPA authored
-