Commits · 808e028df4248d7ed926a94f2c90c9cb244d8bd5 · wanghai-tech / hikoon-ACR

02 Jun, 2026 40 commits

Preserve continued build-index evidence for restart continuity · 808e028d ...

Capture the next downstream checkpoint so restart docs reflect that the real FMA smoke remains in build-index, with no evaluate stage yet and no emitted index artifacts so far.

Constraint: Final downstream evidence is still unavailable because build-index has not produced artifacts or switched to evaluate
Rejected: Wait silently for evaluate to start | would leave the handoff missing the latest verified downstream state
Confidence: high
Scope-risk: narrow
Directive: Next verify the first index artifact file or the transition into evaluate before changing the delivery summary again
Tested: process scan showing build-index and no evaluate; presence of /tmp/fma_real_smoke_stopcheck/fma_index_smoke directory; validate-splits on /tmp/fma_real_smoke_stopcheck/fma/manifests
Not-tested: Completed build-index output, evaluate, final metrics/report generation

authored 2026-06-02 20:40:36 +0800

Record ongoing build-index state for real FMA smoke · 3f9f1ac1 ...

3f9f1ac1 Browse Directory

Update the handoff package with the next downstream checkpoint so a restarted session knows training is done, build-index is active, and evaluate has not started yet.

Constraint: Final evaluation evidence is still unavailable because build-index has not completed
Rejected: Wait silently for evaluate to start | would lose a useful downstream checkpoint for restart continuity
Confidence: high
Scope-risk: narrow
Directive: Next capture either the first index artifact file or the transition into evaluate
Tested: process scan showing build-index and no evaluate; presence of /tmp/fma_real_smoke_stopcheck/fma_index_smoke directory; validate-splits on /tmp/fma_real_smoke_stopcheck/fma/manifests
Not-tested: Completed build-index output, evaluate, final metrics/report generation

authored 2026-06-02 20:37:40 +0800

Capture real FMA smoke transition into index building · 74374625 ...

74374625 Browse Directory

Record the first decisive runtime milestone so restart docs show that the real FMA smoke has finished training, produced a model, and moved into build-index.

Constraint: Final evaluation metrics are not available yet because the smoke is still running downstream of training
Rejected: Keep describing the run as training-only | would now be materially inaccurate
Confidence: high
Scope-risk: narrow
Directive: Next verify the transition from build-index into evaluate and then capture the final report artifacts
Tested: process scan showing build-index; absence of train.py PID 311629; presence of best_model.pt and song_to_idx.json; validate-splits on /tmp/fma_real_smoke_stopcheck/fma/manifests
Not-tested: Completed build-index, evaluate, and final metrics/report generation

authored 2026-06-02 20:34:44 +0800

Capture longest-window FMA smoke progress evidence · 9ded4a56 ...

9ded4a56 Browse Directory

Persist a wider observation checkpoint so restart docs show continued forward motion across a 180-second window while the real FMA smoke remains inside Epoch 1.

Constraint: Verification is still limited to runtime evidence and manifest revalidation because Epoch 1 has not completed
Rejected: Stop at the 120-second checkpoint | would miss stronger evidence from the longer observation window
Confidence: high
Scope-risk: narrow
Directive: Keep monitoring until the first saved model file or transition into build-index/evaluate appears
Tested: ps on PID 311629 after 180s wait; validate-splits on /tmp/fma_real_smoke_stopcheck/fma/manifests; find on /tmp/fma_real_smoke_stopcheck/fma_models_smoke
Not-tested: End-of-epoch artifacts, build-index, evaluate, final metrics

authored 2026-06-02 20:29:46 +0800

Capture longer-window FMA smoke progress evidence · 6d60b587 ...

6d60b587 Browse Directory

Persist a wider observation checkpoint so restart docs demonstrate continued forward motion across a longer interval while the real FMA smoke remains inside Epoch 1.

Constraint: Verification is still limited to runtime evidence and manifest revalidation because Epoch 1 has not completed
Rejected: Stop at the 30-second checkpoint | would miss stronger evidence from a longer observation window
Confidence: high
Scope-risk: narrow
Directive: Keep monitoring until the first saved model file or transition into build-index/evaluate appears
Tested: ps on PID 311629 after 120s wait; validate-splits on /tmp/fma_real_smoke_stopcheck/fma/manifests; find on /tmp/fma_real_smoke_stopcheck/fma_models_smoke
Not-tested: End-of-epoch artifacts, build-index, evaluate, final metrics

authored 2026-06-02 20:25:51 +0800

Record wider-window FMA smoke progress evidence · 56bfd71a ...

56bfd71a Browse Directory

Capture a more meaningful follow-up checkpoint after an added wait window so the restart docs show continued forward motion rather than trivial second-to-second sampling.

Constraint: Epoch 1 still has not completed, so verification is limited to runtime evidence and manifest revalidation
Rejected: Skip the wider-window checkpoint | would miss the chance to prove progress across a longer observation gap
Confidence: high
Scope-risk: narrow
Directive: Keep watching for the first saved model file or transition into build-index/evaluate before changing the project status summary
Tested: ps on PID 311629 after 30s wait; validate-splits on /tmp/fma_real_smoke_stopcheck/fma/manifests; find on /tmp/fma_real_smoke_stopcheck/fma_models_smoke
Not-tested: End-of-epoch artifacts, build-index, evaluate, final metrics

authored 2026-06-02 20:22:23 +0800

Refresh handoff with latest live FMA smoke checkpoint · 0513c36a ...

0513c36a Browse Directory

Persist a newer runtime checkpoint so restart docs continue to prove that the real FMA smoke is still progressing inside Epoch 1 without yet saving a model or entering downstream stages.

Constraint: Verification is still limited to live runtime evidence because Epoch 1 has not completed
Rejected: Reuse the prior 22:10 checkpoint | would leave handoff docs behind the latest verified state
Confidence: high
Scope-risk: narrow
Directive: Keep monitoring until the first saved model file or stage transition appears
Tested: ps on PID 311629; validate-splits on /tmp/fma_real_smoke_stopcheck/fma/manifests; find on /tmp/fma_real_smoke_stopcheck/fma_models_smoke
Not-tested: End-of-epoch artifacts, build-index, evaluate, final metrics

authored 2026-06-02 20:20:47 +0800

Update handoff with newer real FMA smoke runtime evidence · 11a17e9b ...

11a17e9b Browse Directory

Record a later live checkpoint so restart docs keep proving that the real FMA smoke is still advancing inside Epoch 1 without yet producing a saved model or entering downstream stages.

Constraint: Verification is still limited to live runtime evidence because Epoch 1 has not completed
Rejected: Reuse the prior 20:08 checkpoint | would leave handoff docs behind the latest verified state
Confidence: high
Scope-risk: narrow
Directive: Keep monitoring until the first saved model file or stage transition appears
Tested: ps on PID 311629; validate-splits on /tmp/fma_real_smoke_stopcheck/fma/manifests; find on /tmp/fma_real_smoke_stopcheck/fma_models_smoke
Not-tested: End-of-epoch artifacts, build-index, evaluate, final metrics

authored 2026-06-02 20:20:00 +0800

Refresh handoff with later real FMA smoke progress · cc9c0690 ...

cc9c0690 Browse Directory

Capture a newer live checkpoint so restart docs continue to prove the real FMA smoke is progressing inside Epoch 1 without yet reaching model save or downstream evaluation stages.

Constraint: Verification remains limited to live runtime state because the first epoch has not completed
Rejected: Stop at the prior 19:12 checkpoint | would leave the handoff behind the latest verified state
Confidence: high
Scope-risk: narrow
Directive: Keep monitoring until the first saved model file or stage transition appears
Tested: ps on PID 311629; validate-splits on /tmp/fma_real_smoke_stopcheck/fma/manifests; find on /tmp/fma_real_smoke_stopcheck/fma_models_smoke
Not-tested: End-of-epoch artifacts, build-index, evaluate, final metrics

authored 2026-06-02 20:18:42 +0800

Extend live FMA smoke handoff with later epoch evidence · 2a6e8e15 ...

2a6e8e15 Browse Directory

Preserve a newer restart checkpoint so the next session inherits up-to-date proof that the real FMA smoke continues progressing inside Epoch 1 without yet saving a model or entering downstream stages.

Constraint: Verification is still limited to live runtime evidence because Epoch 1 has not completed
Rejected: Keep the prior 18:22 checkpoint only | would leave the handoff one monitoring cycle behind reality
Confidence: high
Scope-risk: narrow
Directive: Continue monitoring until the first saved model file or stage transition appears before changing status conclusions
Tested: ps on PID 311629; validate-splits on /tmp/fma_real_smoke_stopcheck/fma/manifests; find on /tmp/fma_real_smoke_stopcheck/fma_models_smoke
Not-tested: End-of-epoch artifacts, build-index, evaluate, final metrics

authored 2026-06-02 20:17:14 +0800

Capture another live FMA smoke progress checkpoint · ba49a6ae ...

ba49a6ae Browse Directory

Keep the restart artifacts synchronized with the newest observed elapsed time so the next session can see that the real FMA smoke is still advancing without yet reaching model save or evaluation stages.

Constraint: Training remains inside Epoch 1, so verification is limited to live runtime evidence
Rejected: Stop at the prior 17:07 checkpoint | would leave handoff docs behind the latest verified state
Confidence: high
Scope-risk: narrow
Directive: Continue monitoring until the first saved model file or stage transition appears
Tested: ps on PID 311629; validate-splits on /tmp/fma_real_smoke_stopcheck/fma/manifests; find on /tmp/fma_real_smoke_stopcheck/fma_models_smoke
Not-tested: End-of-epoch artifacts, build-index, evaluate, final metrics

authored 2026-06-02 20:16:20 +0800

Advance handoff evidence with continued epoch progress · fc9e3bce ...

fc9e3bce Browse Directory

Keep the restart package aligned with the newest observed runtime so the next session inherits proof that the real FMA smoke continues moving forward inside Epoch 1.

Constraint: The only new evidence available was live process progress because training has not finished the epoch
Rejected: Reuse the 15:12 checkpoint | would leave handoff evidence stale by another monitoring cycle
Confidence: high
Scope-risk: narrow
Directive: Keep watching for first model output or stage transition before changing any roadmap conclusions
Tested: ps on PID 311629; validate-splits on /tmp/fma_real_smoke_stopcheck/fma/manifests; find on /tmp/fma_real_smoke_stopcheck/fma_models_smoke
Not-tested: Final checkpoint write, build-index, evaluate, and report generation

authored 2026-06-02 20:15:03 +0800

Refresh live smoke evidence with newer epoch progress · dc269a8f ...

dc269a8f Browse Directory

Advance the handoff timestamp so a restarted session inherits the latest proof that the real FMA smoke is still progressing inside Epoch 1 rather than stalling before model output.

Constraint: Only live process evidence was available because the first epoch still has not finished
Rejected: Skip another checkpoint update | would leave restart docs one verification step behind reality
Confidence: high
Scope-risk: narrow
Directive: Wait for the first saved model file or stage transition before making any accuracy claims
Tested: ps on PID 311629; validate-splits on /tmp/fma_real_smoke_stopcheck/fma/manifests; find on /tmp/fma_real_smoke_stopcheck/fma_models_smoke
Not-tested: End-of-epoch checkpoint creation and downstream evaluate output

authored 2026-06-02 20:13:35 +0800

Record fresh FMA smoke verification before epoch completion · 47390fe9 ...

47390fe9 Browse Directory

Update the handoff package with newer runtime evidence so the next session can distinguish a still-progressing epoch from a hung pipeline while waiting for the first saved model file.

Constraint: Verification had to rely on live process state because Epoch 1 has not completed yet
Rejected: Leave the prior checkpoint as-is | would force the next session to re-check whether progress continued
Confidence: high
Scope-risk: narrow
Directive: Continue checking for the first transition into saved model output, build-index, or evaluate before drawing quality conclusions
Tested: ps on PID 311629; process scan for smoke-local/build-index/evaluate; validate-splits on /tmp/fma_real_smoke_stopcheck/fma/manifests; find on /tmp/fma_real_smoke_stopcheck/fma_models_smoke
Not-tested: Final FMA smoke report and accuracy metrics

authored 2026-06-02 20:12:18 +0800

Preserve live FMA smoke state for fast session restart · 60e0f9e3 ...

60e0f9e3 Browse Directory

Capture the current real-FMA CPU smoke checkpoint, restart path, and delivery handoff so the next session can resume without re-diagnosing an expected long-running training stage.

Constraint: Real FMA smoke is still running on CPU with no GPU available
Rejected: Wait for final smoke completion before documenting | would delay a usable handoff artifact
Confidence: high
Scope-risk: narrow
Directive: Keep staging explicit; do not include datasets, smoke outputs, checkpoints, or caches
Tested: git diff review; live process check; validate-splits on /tmp/fma_real_smoke_stopcheck/fma/manifests
Not-tested: Final FMA smoke metrics after Epoch 1 completion

authored 2026-06-02 20:10:59 +0800

Capture real FMA smoke execution evidence for restart handoff · fd574b22 ...

fd574b22

Constraint: This checkpoint records running-smoke evidence only and must not stage data, model artifacts, or tmp outputs
Rejected: Wait for the full real FMA smoke to finish before updating handoff docs | The running-state evidence is already valuable for the next session and should not be lost
Confidence: high
Scope-risk: narrow
Directive: Keep future restart notes aligned with the live smoke status and continue using explicit file staging
Tested: Re-verified real FMA smoke is running on CPU, manifests validate, and the documented no-GPU condition explains the long training phase
Not-tested: Did not wait for Epoch 1 completion, model checkpoint emission, or downstream build-index/evaluate completion

authored 2026-06-02 20:07:21 +0800

Reduce restart noise further by ignoring common untracked Python cache artifacts · b90754c6 ...

b90754c6 Browse Directory

Constraint: Limit this checkpoint to ignore rules and handoff notes; do not change tracked artifact history
Rejected: Expand ignore coverage to all noisy data trees immediately | This pass only suppresses well-understood untracked cache noise
Confidence: high
Scope-risk: narrow
Directive: Keep ignore changes incremental and distinguish between untracked cache noise and already-tracked historical artifacts
Tested: Confirmed untracked __pycache__ and .pyc noise disappeared from git status after the ignore update
Not-tested: Did not rewrite tracking state for already-versioned cache or data artifacts

authored 2026-06-02 19:12:39 +0800

Reduce restart noise by ignoring known local smoke artifacts · 0184cb37 ...

0184cb37 Browse Directory

Constraint: Limit this checkpoint to ignore rules and handoff notes; do not alter dataset contents
Rejected: Ignore broad data or cache trees immediately | This pass only suppresses confirmed local-generated noise with low risk
Confidence: high
Scope-risk: narrow
Directive: Keep adding ignore rules incrementally and only for artifacts proven to be local/generated noise
Tested: Confirmed the targeted .omx wait files and real-smoke CSV no longer appear in git status after the ignore update
Not-tested: Did not broaden ignore coverage to larger data/cache trees in this checkpoint

authored 2026-06-02 19:11:52 +0800

Unify the first runnable command across all primary restart entrypoints · db60ba0f ...

db60ba0f Browse Directory

Constraint: Limit this checkpoint to handoff and changelog documentation consistency
Rejected: Leave small formatting mismatches across entrypoints | Restart guidance should be byte-level easy to copy and compare
Confidence: high
Scope-risk: narrow
Directive: Keep the first verification command identical across AGENT, README, session handoff, and delivery handoff
Tested: Verified all four primary entrypoints contain the same runnable command block
Not-tested: No code or training path executed in this consistency-only checkpoint

authored 2026-06-02 19:10:21 +0800

Keep every primary handoff entry aligned on the first runnable verification command · 8659ce9e ...

8659ce9e Browse Directory

Constraint: Restrict this checkpoint to handoff documentation consistency only
Rejected: Leave delivery handoff behind the newer restart guidance | All primary restart entrypoints should expose the same first verification command
Confidence: high
Scope-risk: narrow
Directive: Keep the first runnable command identical across AGENT, README, session handoff, and delivery handoff
Tested: Rechecked relative links in the updated delivery handoff and changelog docs
Not-tested: No code or training path executed in this handoff-consistency checkpoint

authored 2026-06-02 19:08:46 +0800

Synchronize the shortest runnable restart command into agent memory · cfdd1765 ...

cfdd1765 Browse Directory

Constraint: Limit this checkpoint to memory and handoff documentation only
Rejected: Keep the restart command only in README | New sessions should see the first verification command directly in AGENT memory too
Confidence: high
Scope-risk: narrow
Directive: Keep AGENT memory focused on restart-critical commands and avoid duplicating full workflow specs there
Tested: Rechecked 174 relative links across AGENT, changelog, and handoff docs
Not-tested: No code or training path executed in this memory-only checkpoint

authored 2026-06-02 19:08:08 +0800

Add the shortest runnable restart command to the docs overview and reconfirm the offline smoke · 74313c01 ...

74313c01

Constraint: Restrict this checkpoint to navigation and handoff documentation backed by fresh local verification
Rejected: Keep restart guidance read-only without a first command to run | New sessions benefit from an immediate executable sanity check
Confidence: high
Scope-risk: narrow
Directive: Keep README focused on compressed restart guidance and use the offline smoke only as an environment and chain sanity check
Tested: Re-ran business_export_offline_smoke.py successfully and rechecked 215 relative links across the updated docs
Not-tested: Did not connect to a live business export or run full training/evaluation beyond dry-run

authored 2026-06-02 19:07:18 +0800

Make the docs overview self-consistent and add the shortest restart reading path · d3082ce2 ...

d3082ce2 Browse Directory

Constraint: Restrict this checkpoint to navigation documentation only
Rejected: Leave the overview mismatch and rely on users to infer reading order | Restart sessions should get a direct, explicit path
Confidence: high
Scope-risk: narrow
Directive: Keep README focused on compressed navigation and restart order, not on duplicating full specs
Tested: Rechecked 215 relative links across the updated overview, changelog, and handoff docs
Not-tested: No code or training path executed in this navigation-only checkpoint

authored 2026-06-02 19:06:14 +0800

Expose the business-data intake chain directly from the docs overview · ec59c9b1 ...

ec59c9b1 Browse Directory

Constraint: Keep this checkpoint limited to navigation docs and preserve the condensed doc structure
Rejected: Keep the new business-export material discoverable only through deep links | New sessions should find the intake chain from the overview immediately
Confidence: high
Scope-risk: narrow
Directive: Maintain README as the compressed navigation surface and avoid expanding it into another full spec
Tested: Rechecked 211 relative links across the updated overview, changelog, and handoff docs
Not-tested: No code or training path executed in this navigation-only checkpoint

authored 2026-06-02 19:05:18 +0800

Record the proven offline smoke so the handoff reflects executable evidence · 55974514 ...

55974514 Browse Directory

Constraint: Limit this checkpoint to documentation updates backed by already-collected local evidence
Rejected: Leave the smoke result only in transient chat output | The next session needs the proof captured in repo-native handoff files
Confidence: high
Scope-risk: narrow
Directive: Keep treating the offline smoke as an integration proof, not as a substitute for real business-data validation
Tested: Rechecked 183 relative links and documented the successful offline smoke summary already verified locally
Not-tested: No new code path executed in this documentation-only checkpoint

authored 2026-06-02 19:04:01 +0800

Prove the offline business-export chain with a runnable smoke over local audio · 7eff944b ...

7eff944b

Constraint: Keep verification offline-only and avoid touching real databases or production assets
Rejected: Stop at manifest generation without execution evidence | A dry-run smoke gives the next session stronger handoff confidence
Confidence: high
Scope-risk: narrow
Directive: Stage local sample audio inside the smoke workspace so manifest paths remain self-contained and reproducible
Tested: Ran business_export_offline_smoke.py end-to-end; verified normalize/build summaries and train.py --dry-run success; rechecked adapter doc links
Not-tested: Did not run full training/evaluation on live business exports or connect to any database

authored 2026-06-02 19:02:36 +0800

Finish the offline business-export chain by generating project manifests directl… · 3bdc0139 ...

3bdc0139

…y from normalized rows

Constraint: Keep this checkpoint offline-only and avoid touching real business data, datasets, or model artifacts
Rejected: Leave final manifest shaping as a manual next-session task | The handoff is stronger when catalog/train/test/val can already be produced automatically
Confidence: high
Scope-risk: narrow
Directive: Treat these generated manifests as integration-stage scaffolds and validate final field policy again before production data ingestion
Tested: Ran build_business_project_manifests.py on normalized sample data and verified catalog/train/test/val structure; rechecked 70 relative links
Not-tested: Did not run the generated manifests through full training/evaluation against live business audio

authored 2026-06-02 18:59:32 +0800

Complete the business-export chain by splitting manifest-ready rows into role-specific lists · b9feaccc ...

b9feaccc

Constraint: Keep this checkpoint offline-only and avoid touching real business data, datasets, or model artifacts
Rejected: Leave role splitting as a manual next-session step | The export chain is more usable when reference/query/excluded lists are produced automatically
Confidence: high
Scope-risk: narrow
Directive: Treat the split outputs as staging lists and keep final project-manifest adaptation explicit in the downstream integration step
Tested: Normalized the sample CSV, ran split_business_manifest_ready.py, verified 1 reference + 1 query + 1 excluded row, and rechecked 73 relative links
Not-tested: Did not run against a live business export or feed the split outputs into the full training pipeline

authored 2026-06-02 18:58:03 +0800

Turn business export guidance into a runnable normalization step for the next session · b5981c79 ...

b5981c79

Constraint: Keep this checkpoint offline-only and avoid touching real databases, datasets, or model artifacts
Rejected: Stop at static CSV/JSONL examples only | The next session needs an executable normalization path, not just samples
Confidence: high
Scope-risk: narrow
Directive: Treat normalized JSONL as manifest-ready staging output and keep final manifest shaping explicit in the integration step
Tested: Ran normalize_business_export.py on the sample CSV and JSONL inputs; verified 3 output rows each; rechecked 71 relative links
Not-tested: Did not run against a live business export or connect to any database

authored 2026-06-02 18:57:07 +0800

Provide export cookbook samples so business tables can flow into manifests without guesswork · b7d4b1b6 ...

b7d4b1b6 Browse Files

Constraint: Keep this checkpoint static and avoid any real database connectivity or dataset mutation
Rejected: Leave export details implicit until a live exporter exists | The next session needs concrete SQL, CSV, and JSONL examples now
Confidence: high
Scope-risk: narrow
Directive: Treat the SQL as a field-mapping example only and adapt table names to the real schema during integration
Tested: Parsed the CSV and JSONL examples and rechecked 69 relative links across the export docs
Not-tested: Did not connect to a production database or execute a live export

authored 2026-06-02 18:56:00 +0800

Make business asset tables exportable into manifest and role mapping templates · 51d789e1 ...

51d789e1

Constraint: Keep the checkpoint lightweight and avoid touching real datasets or generated artifacts
Rejected: Defer manifest guidance until a DB export tool exists | The next session needs repo-native field and role contracts now
Confidence: high
Scope-risk: narrow
Directive: Default ambiguous assets to excluded until manual review confirms song identity and usable role
Tested: Parsed manifest templates; verified print_business_type_mapping.py emits valid JSON; rechecked 94 relative links
Not-tested: Did not connect to a real database or run a live export in this checkpoint

authored 2026-06-02 18:54:54 +0800

Map business asset types into runnable training and bucket guidance for the next session · 8739bf35 ...

8739bf35 Browse Files

Constraint: Keep this checkpoint documentation-first and avoid staging dataset, cache, or model artifacts
Rejected: Leave the asset-type strategy implicit in chat only | The next session needs repo-native guidance and templates
Confidence: high
Scope-risk: narrow
Directive: Treat type-based buckets as a starting scaffold and keep hard-negative curation manual until evidence supports automation
Tested: Parsed both bucket JSON templates and rechecked 104 relative links across the new docs
Not-tested: Did not run a fresh business-type benchmark in this checkpoint

authored 2026-06-02 18:53:40 +0800

Provide a runnable semantic-bucket template so the next benchmark step can start immediately · 75fa5e93 ...

75fa5e93 Browse Files

Constraint: Keep the checkpoint lightweight and avoid touching dataset or model artifacts
Rejected: Wait to add buckets until automatic semantic labeling exists | Manual curated buckets are enough to unblock the next session now
Confidence: high
Scope-risk: narrow
Directive: Use the template as a curated benchmark scaffold, not as evidence that filenames imply semantics
Tested: Parsed the new JSON template; ran ab_smoke_bucketed.py --help; rechecked targeted relative links
Not-tested: Did not launch a new semantic bucket benchmark run in this checkpoint

authored 2026-06-02 18:51:59 +0800

Capture the finished bucket benchmark and handoff state for the next session · 1bdca61b ...

1bdca61b Browse Files

Constraint: Avoid staging datasets, smoke artifacts, /tmp outputs, and caches
Rejected: Delay handoff until larger semantic buckets exist | User asked for immediate delivery and resumability now
Confidence: high
Scope-risk: narrow
Directive: Treat toy prefix buckets as a methodology baseline, not a product conclusion
Tested: Verified /tmp/ab_smoke_bucketed_smoke/report.json and bucket_report.json outputs; reviewed targeted git diff
Not-tested: No new training or benchmark execution in this documentation-only checkpoint

authored 2026-06-02 18:50:23 +0800

Promote bucket benchmarking from a plan to a runnable baseline · c1a22cbb ...

c1a22cbb Browse Files

Constraint: The cap48/cap64 reversal means strategy guidance can no longer rely on a single overall subset result
Rejected: Keep bucket benchmarking as a doc-only next step | The repo now needs an executable baseline so later sessions can measure scale/style divergence directly
Confidence: high
Scope-risk: moderate
Directive: Treat ab_smoke_bucketed.py as the canonical seed for style-aware evaluation, and expand bucket definitions before revisiting global default-strategy claims
Tested: Verified acr-engine/scripts/ab_smoke_bucketed.py passes py_compile; verified first bucket prefix_000_a produced bucket_report.json with hybrid 4/1.0/1.0 and high_energy 3/1.0/1.0; verified second bucket execution is in progress
Not-tested: Full multi-bucket report.json completion, richer bucket definitions, and bucket-level aggregate conclusions

authored 2026-06-02 18:48:23 +0800

Record the cap64 reversal once the larger benchmark finished · e49dc0b9 ...

e49dc0b9

Constraint: Strategy guidance must now reflect that cap48 and cap64 produce different winners under verified runs
Rejected: Keep high_energy as the generic default | The completed cap64 run shows hybrid winning clearly at a larger subset size, so the docs must acknowledge scale sensitivity
Confidence: high
Scope-risk: moderate
Directive: Do not present a single global default strategy again until bucketed and style-aware benchmarks explain the cap48/cap64 divergence
Tested: Verified cap64 report.json, progress.json, high_energy eval.json, and hybrid eval.json; confirmed cap64 winner=hybrid with top1 0.875 vs high_energy 0.625
Not-tested: Multi-seed cap64 aggregates, bucket/style-aware benchmarks, and any revised hybrid training design

authored 2026-06-02 18:44:58 +0800

Preserve proof that cap64 hybrid advanced into evaluation before results landed · 8f2e6016 ...

8f2e6016 Browse Files

Constraint: The cap64 run is still incomplete, so only verified hybrid index-complete and evaluation-running evidence can be recorded safely now
Rejected: Wait for hybrid eval.json before checkpointing | Would lose the verified handoff that hybrid indexing finished and evaluate.py is already running
Confidence: high
Scope-risk: narrow
Directive: Keep cap64 high_energy and hybrid checkpoints symmetric so the final comparison can be written from docs alone if needed
Tested: Verified hybrid reference_progress.json shows 64 refs, 657 windows, 192-d embeddings, and complete status; verified active process is evaluate.py on /tmp/ab_smoke_seg_cap64_top2/hybrid/fma/manifests; verified hybrid eval.json and report.json are still absent
Not-tested: Final hybrid cap64 metrics, final report.json, and any cap64 winner conclusion

authored 2026-06-02 18:43:15 +0800

Preserve proof that cap64 hybrid training fully finished before scoring lands · fee2a39c ...

fee2a39c Browse Directory

Constraint: The cap64 run is still active, so only verified training-complete evidence can be recorded now without overstating results
Rejected: Wait for hybrid eval before checkpointing | Would lose the stronger handoff evidence that the full hybrid epoch already completed
Confidence: high
Scope-risk: narrow
Directive: Keep distinguishing hybrid training-complete from hybrid index/eval completion until report.json lands
Tested: Verified live session output shows hybrid Epoch 1 progressed from 0/32 to 32/32, and verified the active process remains run_demo.py build-index on /tmp/ab_smoke_seg_cap64_top2/hybrid/fma/manifests while hybrid eval.json and report.json remain absent
Not-tested: Final hybrid cap64 metrics, final report.json, and any cap64 winner conclusion

authored 2026-06-02 18:41:16 +0800

Preserve proof that cap64 hybrid advanced into indexing · 65cc45c2 ...

65cc45c2 Browse Directory

Constraint: The cap64 run is still in progress, so this checkpoint can only record verified hybrid stage transitions, not final comparisons
Rejected: Wait for hybrid eval before checkpointing | Would lose the verified evidence that hybrid training finished and indexing has already started
Confidence: high
Scope-risk: narrow
Directive: Keep cap64 branch checkpoints symmetric so high_energy and hybrid can be compared later without re-reading process history
Tested: Verified active process is run_demo.py build-index on /tmp/ab_smoke_seg_cap64_top2/hybrid/fma/manifests; verified /tmp/ab_smoke_seg_cap64_top2/hybrid/fma_models_smoke/best_model.pt exists; verified hybrid eval.json and report.json are still absent
Not-tested: Final hybrid cap64 metrics, final report.json, and any cap64 winner conclusion

authored 2026-06-02 18:39:26 +0800

Preserve proof that cap64 has entered the hybrid training branch · df7bd04b ...

df7bd04b Browse Directory

Constraint: The cap64 run is still incomplete, so only branch-transition evidence can be recorded safely at this point
Rejected: Wait for the hybrid eval before checkpointing | Would lose the verified handoff that execution has moved beyond high_energy into hybrid training
Confidence: high
Scope-risk: narrow
Directive: Keep cap64 branch progression explicit so the next session can resume from the current strategy leg without re-inspection
Tested: Verified high_energy eval.json reports num_queries=32, top1=0.625, topk=1.0; verified active processes show external_adapters.py on /tmp/ab_smoke_seg_cap64_top2/hybrid and train.py on /tmp/ab_smoke_seg_cap64_top2/hybrid/fma/manifests; verified hybrid eval.json and report.json are still absent
Not-tested: Final hybrid cap64 metrics, final report.json, and any cap64 winner conclusion

authored 2026-06-02 18:37:43 +0800