- 02 Jun, 2026 40 commits
-
-
Constraint: Keep verification offline-only and avoid touching real databases or production assets Rejected: Stop at manifest generation without execution evidence | A dry-run smoke gives the next session stronger handoff confidence Confidence: high Scope-risk: narrow Directive: Stage local sample audio inside the smoke workspace so manifest paths remain self-contained and reproducible Tested: Ran business_export_offline_smoke.py end-to-end; verified normalize/build summaries and train.py --dry-run success; rechecked adapter doc links Not-tested: Did not run full training/evaluation on live business exports or connect to any database
cnb.bofCdSsphPA authored -
…y from normalized rows Constraint: Keep this checkpoint offline-only and avoid touching real business data, datasets, or model artifacts Rejected: Leave final manifest shaping as a manual next-session task | The handoff is stronger when catalog/train/test/val can already be produced automatically Confidence: high Scope-risk: narrow Directive: Treat these generated manifests as integration-stage scaffolds and validate final field policy again before production data ingestion Tested: Ran build_business_project_manifests.py on normalized sample data and verified catalog/train/test/val structure; rechecked 70 relative links Not-tested: Did not run the generated manifests through full training/evaluation against live business audio
cnb.bofCdSsphPA authored -
Constraint: Keep this checkpoint offline-only and avoid touching real business data, datasets, or model artifacts Rejected: Leave role splitting as a manual next-session step | The export chain is more usable when reference/query/excluded lists are produced automatically Confidence: high Scope-risk: narrow Directive: Treat the split outputs as staging lists and keep final project-manifest adaptation explicit in the downstream integration step Tested: Normalized the sample CSV, ran split_business_manifest_ready.py, verified 1 reference + 1 query + 1 excluded row, and rechecked 73 relative links Not-tested: Did not run against a live business export or feed the split outputs into the full training pipeline
cnb.bofCdSsphPA authored -
Constraint: Keep this checkpoint offline-only and avoid touching real databases, datasets, or model artifacts Rejected: Stop at static CSV/JSONL examples only | The next session needs an executable normalization path, not just samples Confidence: high Scope-risk: narrow Directive: Treat normalized JSONL as manifest-ready staging output and keep final manifest shaping explicit in the integration step Tested: Ran normalize_business_export.py on the sample CSV and JSONL inputs; verified 3 output rows each; rechecked 71 relative links Not-tested: Did not run against a live business export or connect to any database
cnb.bofCdSsphPA authored -
Constraint: Keep this checkpoint static and avoid any real database connectivity or dataset mutation Rejected: Leave export details implicit until a live exporter exists | The next session needs concrete SQL, CSV, and JSONL examples now Confidence: high Scope-risk: narrow Directive: Treat the SQL as a field-mapping example only and adapt table names to the real schema during integration Tested: Parsed the CSV and JSONL examples and rechecked 69 relative links across the export docs Not-tested: Did not connect to a production database or execute a live export
cnb.bofCdSsphPA authored -
Constraint: Keep the checkpoint lightweight and avoid touching real datasets or generated artifacts Rejected: Defer manifest guidance until a DB export tool exists | The next session needs repo-native field and role contracts now Confidence: high Scope-risk: narrow Directive: Default ambiguous assets to excluded until manual review confirms song identity and usable role Tested: Parsed manifest templates; verified print_business_type_mapping.py emits valid JSON; rechecked 94 relative links Not-tested: Did not connect to a real database or run a live export in this checkpoint
cnb.bofCdSsphPA authored -
Constraint: Keep this checkpoint documentation-first and avoid staging dataset, cache, or model artifacts Rejected: Leave the asset-type strategy implicit in chat only | The next session needs repo-native guidance and templates Confidence: high Scope-risk: narrow Directive: Treat type-based buckets as a starting scaffold and keep hard-negative curation manual until evidence supports automation Tested: Parsed both bucket JSON templates and rechecked 104 relative links across the new docs Not-tested: Did not run a fresh business-type benchmark in this checkpoint
cnb.bofCdSsphPA authored -
Constraint: Keep the checkpoint lightweight and avoid touching dataset or model artifacts Rejected: Wait to add buckets until automatic semantic labeling exists | Manual curated buckets are enough to unblock the next session now Confidence: high Scope-risk: narrow Directive: Use the template as a curated benchmark scaffold, not as evidence that filenames imply semantics Tested: Parsed the new JSON template; ran ab_smoke_bucketed.py --help; rechecked targeted relative links Not-tested: Did not launch a new semantic bucket benchmark run in this checkpoint
cnb.bofCdSsphPA authored -
Constraint: Avoid staging datasets, smoke artifacts, /tmp outputs, and caches Rejected: Delay handoff until larger semantic buckets exist | User asked for immediate delivery and resumability now Confidence: high Scope-risk: narrow Directive: Treat toy prefix buckets as a methodology baseline, not a product conclusion Tested: Verified /tmp/ab_smoke_bucketed_smoke/report.json and bucket_report.json outputs; reviewed targeted git diff Not-tested: No new training or benchmark execution in this documentation-only checkpoint
cnb.bofCdSsphPA authored -
Constraint: The cap48/cap64 reversal means strategy guidance can no longer rely on a single overall subset result Rejected: Keep bucket benchmarking as a doc-only next step | The repo now needs an executable baseline so later sessions can measure scale/style divergence directly Confidence: high Scope-risk: moderate Directive: Treat ab_smoke_bucketed.py as the canonical seed for style-aware evaluation, and expand bucket definitions before revisiting global default-strategy claims Tested: Verified acr-engine/scripts/ab_smoke_bucketed.py passes py_compile; verified first bucket prefix_000_a produced bucket_report.json with hybrid 4/1.0/1.0 and high_energy 3/1.0/1.0; verified second bucket execution is in progress Not-tested: Full multi-bucket report.json completion, richer bucket definitions, and bucket-level aggregate conclusions
cnb.bofCdSsphPA authored -
Constraint: Strategy guidance must now reflect that cap48 and cap64 produce different winners under verified runs Rejected: Keep high_energy as the generic default | The completed cap64 run shows hybrid winning clearly at a larger subset size, so the docs must acknowledge scale sensitivity Confidence: high Scope-risk: moderate Directive: Do not present a single global default strategy again until bucketed and style-aware benchmarks explain the cap48/cap64 divergence Tested: Verified cap64 report.json, progress.json, high_energy eval.json, and hybrid eval.json; confirmed cap64 winner=hybrid with top1 0.875 vs high_energy 0.625 Not-tested: Multi-seed cap64 aggregates, bucket/style-aware benchmarks, and any revised hybrid training design
cnb.bofCdSsphPA authored -
Constraint: The cap64 run is still incomplete, so only verified hybrid index-complete and evaluation-running evidence can be recorded safely now Rejected: Wait for hybrid eval.json before checkpointing | Would lose the verified handoff that hybrid indexing finished and evaluate.py is already running Confidence: high Scope-risk: narrow Directive: Keep cap64 high_energy and hybrid checkpoints symmetric so the final comparison can be written from docs alone if needed Tested: Verified hybrid reference_progress.json shows 64 refs, 657 windows, 192-d embeddings, and complete status; verified active process is evaluate.py on /tmp/ab_smoke_seg_cap64_top2/hybrid/fma/manifests; verified hybrid eval.json and report.json are still absent Not-tested: Final hybrid cap64 metrics, final report.json, and any cap64 winner conclusion
cnb.bofCdSsphPA authored -
Constraint: The cap64 run is still active, so only verified training-complete evidence can be recorded now without overstating results Rejected: Wait for hybrid eval before checkpointing | Would lose the stronger handoff evidence that the full hybrid epoch already completed Confidence: high Scope-risk: narrow Directive: Keep distinguishing hybrid training-complete from hybrid index/eval completion until report.json lands Tested: Verified live session output shows hybrid Epoch 1 progressed from 0/32 to 32/32, and verified the active process remains run_demo.py build-index on /tmp/ab_smoke_seg_cap64_top2/hybrid/fma/manifests while hybrid eval.json and report.json remain absent Not-tested: Final hybrid cap64 metrics, final report.json, and any cap64 winner conclusion
cnb.bofCdSsphPA authored -
Constraint: The cap64 run is still in progress, so this checkpoint can only record verified hybrid stage transitions, not final comparisons Rejected: Wait for hybrid eval before checkpointing | Would lose the verified evidence that hybrid training finished and indexing has already started Confidence: high Scope-risk: narrow Directive: Keep cap64 branch checkpoints symmetric so high_energy and hybrid can be compared later without re-reading process history Tested: Verified active process is run_demo.py build-index on /tmp/ab_smoke_seg_cap64_top2/hybrid/fma/manifests; verified /tmp/ab_smoke_seg_cap64_top2/hybrid/fma_models_smoke/best_model.pt exists; verified hybrid eval.json and report.json are still absent Not-tested: Final hybrid cap64 metrics, final report.json, and any cap64 winner conclusion
cnb.bofCdSsphPA authored -
Constraint: The cap64 run is still incomplete, so only branch-transition evidence can be recorded safely at this point Rejected: Wait for the hybrid eval before checkpointing | Would lose the verified handoff that execution has moved beyond high_energy into hybrid training Confidence: high Scope-risk: narrow Directive: Keep cap64 branch progression explicit so the next session can resume from the current strategy leg without re-inspection Tested: Verified high_energy eval.json reports num_queries=32, top1=0.625, topk=1.0; verified active processes show external_adapters.py on /tmp/ab_smoke_seg_cap64_top2/hybrid and train.py on /tmp/ab_smoke_seg_cap64_top2/hybrid/fma/manifests; verified hybrid eval.json and report.json are still absent Not-tested: Final hybrid cap64 metrics, final report.json, and any cap64 winner conclusion
cnb.bofCdSsphPA authored -
Constraint: The cap64 run has only produced the high_energy leg so far, so any larger conclusion must wait for hybrid and the final report Rejected: Wait for report.json before checkpointing | Would lose the verified cap64 high_energy score and the proof that execution has already switched into the hybrid branch Confidence: high Scope-risk: narrow Directive: Do not compare cap64 strategy winners until both legs and the final report land; treat the current 0.625 high_energy score as an intermediate checkpoint only Tested: Verified high_energy eval.json reports num_queries=32, top1=0.625, topk=1.0; verified progress.json records the same result; verified the active process has switched to the hybrid smoke-local branch and report.json is still absent Not-tested: Final cap64 hybrid metrics, final report.json, and any cap64-based strategy conclusion
cnb.bofCdSsphPA authored -
Constraint: The cap64 run is still active, so this checkpoint can only record stage completion evidence rather than final benchmark conclusions Rejected: Wait for eval.json or report.json before committing | Would lose the verified handoff that indexing finished and evaluate.py is now running Confidence: high Scope-risk: narrow Directive: Keep stage checkpoints explicit—training complete, index complete, evaluation running, report complete—until cap64 fully settles Tested: Verified reference_progress.json shows 64 refs, 657 windows, and complete status; verified active process is evaluate.py on /tmp/ab_smoke_seg_cap64_top2/high_energy/fma/manifests; verified high_energy eval.json and report.json are still absent Not-tested: Final cap64 high_energy metrics, hybrid branch execution, and post-cap64 strategy guidance
cnb.bofCdSsphPA authored -
Constraint: The cap64 run is still active, so only verified training-complete evidence can be recorded without overstating results Rejected: Keep only the older build-index note | The live session now proves the entire high_energy epoch finished, which is stronger handoff evidence Confidence: high Scope-risk: narrow Directive: Distinguish clearly between training-complete, indexing-complete, and report-complete milestones in future cap64 checkpoints Tested: Verified live session output shows high_energy Epoch 1 progressed from 0/32 to 32/32, and verified the active process remains run_demo.py build-index on /tmp/ab_smoke_seg_cap64_top2/high_energy/fma/manifests Not-tested: Final cap64 eval metrics, hybrid branch progress, and report.json generation
cnb.bofCdSsphPA authored -
Constraint: The cap64 benchmark is still running, so only verified stage-transition evidence can be documented safely Rejected: Wait for cap64 completion before checkpointing | Would leave the next session without proof that the run advanced from training into build-index Confidence: high Scope-risk: narrow Directive: Keep recording cap64 milestones as they happen, but avoid updating winner guidance until report.json lands Tested: Verified cap64 processes are active, confirmed the high_energy branch advanced from train.py to run_demo.py build-index on /tmp/ab_smoke_seg_cap64_top2/high_energy/fma/manifests, and confirmed report.json is still absent Not-tested: Final cap64 scores, hybrid branch progression, and any post-cap64 strategy conclusion
cnb.bofCdSsphPA authored -
Constraint: The new cap64 run is still in-flight, so only startup and stage-transition evidence can be documented safely Rejected: Wait for cap64 results before checkpointing | Would leave the next session without a verified handoff that the larger benchmark is already running Confidence: high Scope-risk: narrow Directive: Keep cap64 artifacts out of git and update strategy guidance only after report.json lands Tested: Verified the cap64 ab_smoke process is running, confirmed the high_energy smoke-local branch entered train.py on /tmp/ab_smoke_seg_cap64_top2/high_energy/fma/manifests, and recorded the active work root and parameters in docs Not-tested: Final cap64 metrics, hybrid branch execution, and any post-cap64 strategy conclusion
cnb.bofCdSsphPA authored -
Constraint: Strategy guidance had to wait until the full seed=999 report landed and all three cap48 runs could be aggregated consistently Rejected: Keep treating cap48 as unresolved | The third seed now confirms high_energy repeats the same score while hybrid remains volatile Confidence: high Scope-risk: narrow Directive: Treat high_energy as the cap48 default only within the documented FMA smoke condition until larger cap64 and bucketed benchmarks either confirm or overturn it Tested: Verified seed=999 report.json, high_energy eval.json, hybrid eval.json, and computed three-seed aggregate showing high_energy mean_top1=0.9167 with zero variance versus hybrid mean_top1=0.8750 Not-tested: cap64-or-larger benchmarks, bucket/style-aware evaluations, and any future hybrid redesign
cnb.bofCdSsphPA authored -
Constraint: The cap48 seed=999 run has only completed the hybrid leg, so the three-seed aggregate is still incomplete Rejected: Wait for high_energy to finish before checkpointing | Would risk losing the verified hybrid seed999 score from the active Ralph session Confidence: high Scope-risk: narrow Directive: Keep recording verified partial benchmark milestones, but do not revise default-strategy guidance until both strategies and the final report are available Tested: Verified hybrid eval.json reports num_queries=24, top1=0.875, topk=1.0; verified progress.json records the same result; verified high_energy is still running and report.json is still absent Not-tested: Final high_energy seed999 metrics, final report.json, and updated three-seed aggregate
cnb.bofCdSsphPA authored -
Constraint: The running cap48 seed=999 benchmark has not emitted its final report yet, so only in-flight evidence can be recorded safely Rejected: Claim a new three-seed conclusion now | The aggregate would be speculative without report.json and eval outputs Confidence: high Scope-risk: narrow Directive: When a long benchmark is still active, checkpoint stage evidence explicitly and wait for report.json before changing strategy guidance Tested: Verified process tree shows hybrid moved from build-index to evaluate.py; verified reference_progress.json reports 48 refs, 491 windows, 192-d embeddings, and complete status; verified report.json is still absent Not-tested: Final hybrid eval metrics, subsequent high_energy run, and final three-seed aggregate
cnb.bofCdSsphPA authored -
Constraint: The cap48 seed=999 benchmark is still running, so this checkpoint must avoid unverified algorithm conclusions Rejected: Wait for the CPU benchmark to finish | Would delay handoff and leave the next session without a clean restart package Confidence: high Scope-risk: narrow Directive: Keep future doc-only checkpoints surgically staged and do not add data/raw, external_smoke, /tmp outputs, or model artifacts Tested: Verified staged diff only includes AGENT memory, handoff, changelog, and changelist docs; confirmed /tmp cap48 seed=999 report is not ready yet Not-tested: The in-flight cap48 seed=999 benchmark result and any follow-up aggregate metrics
cnb.bofCdSsphPA authored -
Persist the current two-seed cap48 summary so the strategy recommendation is grounded in aggregated evidence rather than whichever single run happened most recently. Constraint: Only documentation changes are allowed because benchmark artifacts remain outside version control Rejected: Keep narrating cap48 one run at a time | The aggregate is now more informative than any individual cap48 run Confidence: high Scope-risk: narrow Directive: Prefer reporting aggregate seed statistics once two or more runs exist; avoid re-elevating single-seed claims above the aggregate Tested: Verified both cap48 report.json files; computed aggregate mean/min/max/stdev; verified docs now record high_energy mean_top1=0.9167 and hybrid mean_top1=0.8750 Not-tested: Aggregates beyond two seeds or style-bucketed aggregates
cnb.bofCdSsphPA authored -
Persist the completed seed123 benchmark showing hybrid ahead again, and update the strategy guidance from single-run winner claims to a multi-seed interpretation. Constraint: Only documentation changes are allowed because benchmark outputs remain outside version control Rejected: Keep framing cap48 as a stable high_energy win | The second seed materially weakens that interpretation Confidence: high Scope-risk: narrow Directive: Base the hybrid vs high_energy default decision on aggregated multi-seed evidence, not any single cap48 run Tested: Verified /tmp/ab_smoke_seg_cap48_top2_seed123/report.json; verified high_energy eval.json; verified docs now record hybrid=24/0.9583/1.0 and high_energy=24/0.9167/1.0 for seed123 Not-tested: Formal aggregation across multiple seeds beyond these two cap48 runs
cnb.bofCdSsphPA authored -
Persist the newly finished cap48 seed123 hybrid result so the second-seed validation run now has measured evidence instead of only a runtime checkpoint. Constraint: seed123 high_energy and the final report are still pending Rejected: Wait for the full seed123 report before updating docs | Would leave the multi-seed evidence stale across sessions Confidence: high Scope-risk: narrow Directive: Replace the seed123 partial section with the final two-strategy ranking once high_energy eval and report.json land Tested: Verified /tmp/ab_smoke_seg_cap48_top2_seed123/hybrid/fma_reports_smoke/eval.json; verified docs record hybrid=24/0.9583/1.0 and high_energy still in build-index Not-tested: Final seed123 comparison because high_energy has not finished yet
cnb.bofCdSsphPA authored -
Update the handoff and changelog with the newer seed123 runtime milestone so later sessions know the hybrid lane has advanced from build-index into capped evaluation. Constraint: No measured seed123 score is available yet, only a later execution milestone Rejected: Leave the older build-index note in place | Would make the restart handoff stale and less actionable Confidence: high Scope-risk: narrow Directive: Replace the seed123 runtime note with measured scores as soon as hybrid eval.json or report.json land Tested: Verified active seed123 hybrid evaluate.py process; verified docs now record seed123 current phase as evaluate.py --max-queries 24 Not-tested: Seed123 strategy scores because hybrid eval.json has not landed yet
cnb.bofCdSsphPA authored -
Preserve the second-seed cap48 entry point and current build-index phase so later sessions can validate whether the cap48 reversal was stable or a seed artifact. Constraint: The second-seed run has not produced scores yet, so only execution-state evidence is available Rejected: Wait for the seed123 scores before recording anything | Risks losing the multi-seed validation checkpoint if the session ends first Confidence: high Scope-risk: narrow Directive: Replace the seed123 running-state section with measured scores once hybrid eval.json or report.json land Tested: Verified active cap48 seed123 processes; verified handoff records work-root, seed, subset size, query cap, and current build-index phase Not-tested: cap48 seed123 strategy scores because the run is still in progress
cnb.bofCdSsphPA authored -
Persist the larger 48-track benchmark where high_energy overtook hybrid, and downgrade the previously overconfident default-strategy claim to a conditional recommendation pending broader validation. Constraint: Only documentation changes are allowed because benchmark outputs remain outside version control Rejected: Keep asserting hybrid as fully settled default after cap48 | The 48-track capped benchmark materially contradicts that stronger claim Confidence: high Scope-risk: narrow Directive: Resolve the hybrid vs high_energy default question with larger, multi-seed, style-aware benchmarks before making a final hard default claim Tested: Verified /tmp/ab_smoke_seg_cap48_top2/report.json; verified high_energy eval.json; verified docs now record high_energy=24/0.9167/1.0 and hybrid=24/0.7917/1.0 Not-tested: Multi-seed or style-balanced follow-up benchmark beyond the single cap48 run
cnb.bofCdSsphPA authored -
Update the handoff and changelog with the newer cap48 runtime milestone so later sessions know the high_energy lane has advanced from build-index into capped evaluation. Constraint: No measured cap48 high_energy score is available yet, only a later execution milestone Rejected: Leave the older build-index note in place | Would make the restart handoff stale and less actionable Confidence: high Scope-risk: narrow Directive: Replace the cap48 runtime note with final top-two scores as soon as high_energy eval.json or report.json lands Tested: Verified active cap48 high_energy evaluate.py process; verified docs now record high_energy current phase as evaluate.py --max-queries 24 Not-tested: Final cap48 comparison because high_energy eval.json has not landed yet
cnb.bofCdSsphPA authored -
Persist the newly finished cap48 hybrid result so the next session can continue the 48-track validation run from measured evidence instead of only a runtime checkpoint. Constraint: cap48 high_energy and the final report are still pending Rejected: Wait for the full cap48 report before updating docs | Would leave the largest current real-data checkpoint stale across sessions Confidence: high Scope-risk: narrow Directive: Replace the cap48 partial section with the final two-strategy ranking once high_energy eval and report.json land Tested: Verified /tmp/ab_smoke_seg_cap48_top2/hybrid/fma_reports_smoke/eval.json; verified docs record hybrid=24/0.7917/1.0 and high_energy still in build-index Not-tested: Final cap48 comparison because high_energy has not finished yet
cnb.bofCdSsphPA authored -
Update the handoff and changelog with the newer cap48 runtime milestone so later sessions know the run has advanced from build-index into capped evaluation. Constraint: No measured cap48 score is available yet, only a later execution milestone Rejected: Leave the older build-index note in place | Would make the restart handoff stale and less actionable Confidence: high Scope-risk: narrow Directive: Replace the cap48 runtime note with hybrid scores as soon as eval.json lands Tested: Verified active cap48 evaluate.py process; verified docs now record cap48 current phase as evaluate.py --max-queries 24 Not-tested: cap48 strategy scores because hybrid eval.json has not landed yet
cnb.bofCdSsphPA authored -
Preserve the new 48-track top-two benchmark entry point and current build-index phase so later sessions can continue the expanding validation ladder without rediscovering runtime state. Constraint: cap48 has not produced scores yet, so only execution-state evidence is available Rejected: Wait for cap48 scores before recording anything | Risks losing the larger-benchmark checkpoint if the session ends first Confidence: high Scope-risk: narrow Directive: Replace the cap48 running-state section with measured scores once hybrid eval.json or report.json land Tested: Verified active cap48 processes; verified handoff records work-root, subset size, query cap, and current build-index phase Not-tested: cap48 strategy scores because the run is still in progress
cnb.bofCdSsphPA authored -
Persist the larger 32-track benchmark showing hybrid strongly outperforming high_energy, so the default strategy decision rests on multiple larger real-data checkpoints instead of a single subset. Constraint: Only documentation changes are allowed because benchmark artifacts stay outside version control Rejected: Keep the default recommendation tentative after cap32 | The 24-track and 32-track capped benchmarks now agree on hybrid superiority Confidence: high Scope-risk: narrow Directive: Use cap24 and cap32 together as the current strongest strategy evidence until a broader multi-style benchmark supersedes them Tested: Verified /tmp/ab_smoke_seg_cap32_top2/report.json; verified high_energy eval.json; verified docs now record hybrid=20/0.95/1.0 and high_energy=20/0.5/1.0 Not-tested: Wider style-balanced benchmark beyond the FMA top-two subsets
cnb.bofCdSsphPA authored -
Persist the newly finished cap32 hybrid result so the next session can continue the top-two validation run from measured evidence instead of only a running-state checkpoint. Constraint: cap32 high_energy and the final report are still pending Rejected: Wait for the full cap32 report before updating docs | Would leave the larger-subset evidence stale across sessions Confidence: high Scope-risk: narrow Directive: Replace the cap32 partial section with the final two-strategy ranking once high_energy eval and report.json land Tested: Verified /tmp/ab_smoke_seg_cap32_top2/hybrid/fma_reports_smoke/eval.json; verified docs record hybrid=20/0.95/1.0 and high_energy still training Not-tested: Final cap32 comparison because high_energy has not finished yet
cnb.bofCdSsphPA authored -
Preserve the new 32-track top-two benchmark entry point and current build-index phase so a later session can continue the stronger validation run without losing runtime context. Constraint: The cap32 benchmark is still running, so only execution-state evidence is available Rejected: Wait for cap32 results before recording anything | Risks losing the larger-benchmark checkpoint if the session ends first Confidence: high Scope-risk: narrow Directive: Replace the cap32 running-state section with measured scores once hybrid eval.json and report.json land Tested: Verified active cap32 processes; verified handoff records work-root, subset size, query cap, and current build-index phase Not-tested: cap32 strategy scores because the run is still in progress
cnb.bofCdSsphPA authored -
Persist the larger real-FMA benchmark result showing hybrid clearly outperforming high_energy, so the project recommendation can converge on one default instead of an unresolved tie. Constraint: Only docs change because benchmark outputs remain outside version control Rejected: Keep treating hybrid and high_energy as co-equal defaults | The larger 24-track capped benchmark now separates them clearly Confidence: high Scope-risk: narrow Directive: Use cap24 top-two as the current strongest public evidence until a larger capped benchmark supersedes it Tested: Verified /tmp/ab_smoke_seg_cap24_top2/report.json; verified high_energy eval.json; verified docs now state hybrid=16/1.0/1.0 and high_energy=16/0.8125/1.0 Not-tested: Broader strategy comparison beyond hybrid vs high_energy on the 24-track subset
cnb.bofCdSsphPA authored -
Record the new 24-track capped benchmark setup and the first completed hybrid result so the next session can continue the stronger tie-break experiment without rediscovering runtime state. Constraint: The cap24 benchmark is still in progress, so only partial evidence can be documented now Rejected: Wait for high_energy to finish before updating handoff | Risks losing the fresh larger-subset evidence if the session ends first Confidence: high Scope-risk: narrow Directive: Replace the partial cap24 section with the final two-strategy ranking once report.json lands Tested: Verified /tmp/ab_smoke_seg_cap24_top2/hybrid/fma_reports_smoke/eval.json; verified active cap24 processes; verified docs include the exact work-root and resume command Not-tested: Final cap24 top-two comparison because high_energy is still training
cnb.bofCdSsphPA authored -
Persist the completed capped real-data benchmark results so future sessions can use the final strategy ordering and recommendation without replaying the run. Constraint: Only documentation should change because benchmark artifacts live outside version control Rejected: Leave the result only in /tmp report files | Would make the evidence fragile across sessions Confidence: high Scope-risk: narrow Directive: Use cap16 as the current default evidence point until a larger capped benchmark supersedes it Tested: Verified /tmp/ab_smoke_seg_cap16/report.json; verified repeated_section_aware eval.json; verified docs reflect final ranking hybrid/high_energy/beat_aware/repeated_section_aware Not-tested: Larger real-dataset benchmark beyond the 16-track capped subset
cnb.bofCdSsphPA authored
-