Commits · 5be68c1d9db63369538b4ae82dc0aecb72ff6a6c · wanghai-tech / hikoon-ACR

04 Jun, 2026 11 commits

Create live Phase-1 extraction jobs in PostgreSQL · 5be68c1d ...

Constraint: Continue Phase-1 industrialization without waiting on missing audio mounts, and keep every Ralph step documented and pushed
Rejected: Leave extraction scheduling as an implicit next step after registry bootstrap | It forces future sessions to reconstruct pending jobs by hand
Confidence: high
Scope-risk: narrow
Directive: Use feature_extraction_job as the canonical handoff between registry bootstrap and actual encoder extraction runs
Tested: /usr/local/miniconda3/bin/python scripts/bootstrap_phase1_extraction_jobs_live.py --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' --schema acr_test --output data/pgvector_eval/music20/phase1_extraction_jobs_report.json; /usr/local/miniconda3/bin/python -m py_compile scripts/bootstrap_phase1_extraction_jobs_live.py; git diff --check -- acr-engine/scripts/bootstrap_phase1_extraction_jobs_live.py acr-engine/data/pgvector_eval/music20/phase1_extraction_jobs_report.json docs/model-feature-registry-bootstrap.md docs/postgres_db_schema_samples.md docs/session-handoff.md docs/CHANGELOG.md
Not-tested: Downstream worker that consumes these pending jobs to run real MERT/MuQ extraction

authored 2026-06-04 12:49:45 +0800

Prove the Phase-1 registry bootstrap is idempotent · f0c82687 ...

f0c82687 Browse Directory

Constraint: Ralph follow-up work must keep producing audit-ready evidence and a pushed trail for the next session
Rejected: Assume the new bootstrap script is safe to rerun without proof | Duplicate feature-set inserts would erode trust in the PostgreSQL bootstrap path
Confidence: high
Scope-risk: narrow
Directive: Re-run registry bootstrap in-place before future extraction jobs and treat count drift as a regression signal
Tested: /usr/local/miniconda3/bin/python scripts/bootstrap_phase1_model_registry_live.py --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' --schema acr_test --output data/pgvector_eval/music20/phase1_registry_bootstrap_report.json (run twice); /usr/local/miniconda3/bin/python -m py_compile scripts/bootstrap_phase1_model_registry_live.py; git diff --check -- acr-engine/scripts/bootstrap_phase1_model_registry_live.py acr-engine/data/pgvector_eval/music20/phase1_registry_bootstrap_report.json acr-engine/data/pgvector_eval/music20/phase1_registry_bootstrap_idempotency_report.json docs/model-feature-registry-bootstrap.md docs/postgres_db_schema_samples.md docs/session-handoff.md docs/CHANGELOG.md
Not-tested: Actual downstream MERT/MuQ extraction after bootstrap, missing business sample mount recovery

authored 2026-06-04 12:47:24 +0800

Bootstrap the Phase-1 model registry on live PostgreSQL · fef8f438 ...

fef8f438 Browse Directory

Constraint: Continue the Ralph loop without waiting on missing business sample mounts, while still leaving a push-ready implementation and documentation trail
Rejected: Keep Phase-1 registry setup as static SQL snippets only | It slows live validation and leaves no machine-checkable bootstrap path
Confidence: high
Scope-risk: narrow
Directive: Treat model_registry/feature_set_registry/reference_set_registry as the mandatory entrypoint before any future MERT/MuQ extraction jobs
Tested: /usr/local/miniconda3/bin/python scripts/bootstrap_phase1_model_registry_live.py --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' --schema acr_test --output data/pgvector_eval/music20/phase1_registry_bootstrap_report.json; /usr/local/miniconda3/bin/python -m py_compile scripts/bootstrap_phase1_model_registry_live.py; git diff --check -- acr-engine/scripts/bootstrap_phase1_model_registry_live.py acr-engine/data/pgvector_eval/music20/phase1_registry_bootstrap_report.json docs/model-feature-registry-bootstrap.md docs/postgres_db_schema_samples.md docs/session-handoff.md docs/CHANGELOG.md
Not-tested: Actual MERT/MuQ embedding extraction, hard-case type_8/type_16 live queries, multi-recording/cover-lane retrieval

authored 2026-06-04 12:44:49 +0800

Record the current blocker for hard-case live samples · ea51b9c1 ...

ea51b9c1 Browse Directory

Constraint: Each Ralph follow-up change must leave a documented, pushed trail for the next session
Rejected: Keep the missing /workspace/downloads discovery only in transient shell output | It would be rediscovered and waste the next session
Confidence: high
Scope-risk: narrow
Directive: Treat hard-case live evaluation as environment-dependent until business sample mounts are restored
Tested: git diff --check -- docs/postgres_db_schema_samples.md docs/session-handoff.md docs/CHANGELOG.md; ls -ld /workspace/downloads => no such file or directory
Not-tested: Restoring or remounting the missing business sample directory

authored 2026-06-04 12:40:52 +0800

Harden lineage validation evidence for the PostgreSQL ACR path · e54e2ff2 ...

e54e2ff2 Browse Directory

Constraint: Each follow-up Ralph edit must update docs and preserve a push-ready, auditable validation trail
Rejected: Stop at a single audio_window negative test | It left recording/audio_embedding trigger coverage and report readability weaker than needed
Confidence: high
Scope-risk: narrow
Directive: Keep live retrieval reports self-explanatory enough for reviewers who only inspect JSON artifacts
Tested: /usr/local/miniconda3/bin/python scripts/live_pgvector_music20_eval.py --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' --schema acr_test --reset-schema --output data/pgvector_eval/music20/live_pgvector_report.json; /usr/local/miniconda3/bin/python -m py_compile scripts/live_pgvector_music20_eval.py; git diff --check -- acr-engine/scripts/live_pgvector_music20_eval.py acr-engine/data/pgvector_eval/music20/live_pgvector_report.json docs/postgres_db_schema_samples.md docs/CHANGELOG.md docs/session-handoff.md
Not-tested: type_8/type_16 live JSONL coverage, MERT/MuQ live embeddings, multi-recording/cover-lane decision flow

authored 2026-06-04 12:39:05 +0800

Validate the PostgreSQL ACR storage path with live evidence · 96c9ce7d ...

96c9ce7d Browse Directory

Constraint: The new data model had to be proven against the user-provided PostgreSQL instance and stay aligned with Phase-1 encoder-only decisions
Rejected: Document-only schema guidance without a live database run | It would leave retrieval correctness and table intent unproven
Confidence: high
Scope-risk: narrow
Directive: Keep future retrieval experiments writing through model/feature/reference registries instead of adding fixed per-model columns
Tested: /usr/local/miniconda3/bin/python scripts/live_pgvector_music20_eval.py --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' --schema acr_test --reset-schema --output data/pgvector_eval/music20/live_pgvector_report.json; /usr/local/miniconda3/bin/python scripts/evaluate_songid_pgvector_path.py --reference-embeddings-jsonl data/pgvector_eval/music20/reference_embeddings.jsonl --query-embeddings-jsonl data/pgvector_eval/music20/query_embeddings.jsonl --output data/pgvector_eval/music20/songid_eval_report_fresh.json; /usr/local/miniconda3/bin/python -m py_compile scripts/live_pgvector_music20_eval.py scripts/evaluate_songid_pgvector_path.py; git diff --check -- docs/README.md docs/CHANGELOG.md docs/postgres_db_schema_samples.md acr-engine/scripts/live_pgvector_music20_eval.py acr-engine/data/pgvector_eval/music20/live_pgvector_report.json acr-engine/data/pgvector_eval/music20/songid_eval_report_fresh.json
Not-tested: MERT/MuQ live embeddings, type_8/type_16 live JSONL coverage, multi-recording/cover-lane decision flow

authored 2026-06-04 12:20:15 +0800

Update 1 · b220751b
b220751b Browse Files

cnb.bofCdSsphPA authored 2026-06-04 11:43:33 +0800

Preserve a fast handoff entrypoint for the ACR roadmap · d8fd2d15 ...

d8fd2d15 Browse Directory

Constraint: The startup handoff must reflect the new Phase-1 encoder-only and PostgreSQL v2 decisions without carrying stale timeline noise
Rejected: Keep appending runtime logs to session-handoff.md | It obscures the current start point for the next session
Confidence: high
Scope-risk: narrow
Directive: Keep session-handoff.md focused on where to resume next, and move detailed chronology into changelog/history docs
Tested: git diff --check -- docs/session-handoff.md docs/CHANGELOG.md
Not-tested: No link checker or markdown linter was run

authored 2026-06-04 11:17:46 +0800

Make the Phase-1 ACR plan executable for each delivery role · 4b23f546 ...

4b23f546 Browse Directory

Constraint: The architecture and schema docs were already in place, but teams still lacked a concrete implementation checklist and registry bootstrap contract for encoder-only rollout
Rejected: leaving execution guidance implicit in architecture prose | would slow Phase-1 delivery and cause inconsistent model/feature initialization
Confidence: high
Scope-risk: narrow
Directive: treat Phase-1 implementation sequencing and model/feature/reference-set bootstrap as first-class docs that evolve with the schema
Tested: git diff --check on changed docs; Python document sanity check; README/CHANGELOG link coverage verified with rg
Not-tested: no runtime behavior changed; no database apply executed

authored 2026-06-04 11:13:31 +0800

Keep the new ACR architecture guide clean for follow-up edits · e514a6c7 ...

e514a6c7 Browse Directory

Constraint: The documentation refactor was already pushed and only needed a formatting-only hygiene follow-up
Rejected: leaving known markdown whitespace debt in the freshly introduced guide | would add avoidable noise to future reviews
Confidence: high
Scope-risk: narrow
Directive: keep the new role-oriented architecture docs diff-clean so future schema/model edits stay reviewable
Tested: git diff --check on docs/acr-architecture.md
Not-tested: content semantics unchanged; no runtime effects

authored 2026-06-04 11:07:22 +0800

Clarify the ACR evolution path and freeze a production-grade data model · a549d1de ...

a549d1de

Constraint: Phase-1 must support encoder-only open-source backbones without destabilizing future schema evolution
Rejected: extending the old flat song_id + fixed-vector schema | would couple model swaps to schema rewrites and weaken copyright lineage
Confidence: high
Scope-risk: moderate
Directive: treat canonical_song/work/recording/recording_asset/audio_window plus model/feature registries as the stable contract; evolve models and indexes around them
Tested: git diff --check on changed files; Python content/structure sanity check; architect review APPROVED; README link coverage and DDL object presence verified
Not-tested: live PostgreSQL apply not run because psql is unavailable in this environment

authored 2026-06-04 11:06:42 +0800

03 Jun, 2026 15 commits

Add the song_id pgvector evaluation scaffolding · 2898ef26 ...

2898ef26 Browse Directory

Constraint: we need a song-level evaluation path that matches the future pgvector production shape before moving off the local FAISS proving lane
Rejected: jumping straight to a live pgvector-only implementation | we still need a reproducible repo-local evaluation harness and artifact trail first
Confidence: high
Scope-risk: moderate
Directive: keep future pgvector work song_id-first and measure each query type separately before aggregating product claims
Tested: /usr/local/miniconda3/bin/python -m unittest discover -s acr-engine/tests -v; /usr/local/miniconda3/bin/python acr-engine/scripts/export_workspace_music20_embeddings_jsonl.py --downloads-dir /workspace/downloads --song-limit 20 --out-dir acr-engine/data/pgvector_eval/music20; /usr/local/miniconda3/bin/python acr-engine/scripts/evaluate_songid_pgvector_path.py --reference-embeddings-jsonl acr-engine/data/pgvector_eval/music20/reference_embeddings.jsonl --query-embeddings-jsonl acr-engine/data/pgvector_eval/music20/query_embeddings.jsonl --output acr-engine/data/pgvector_eval/music20/songid_eval_report.json
Not-tested: live PostgreSQL/pgvector online retrieval path

authored 2026-06-03 18:13:59 +0800

Extend the business-corpus voice correctness baseline to type8 and type16 · a0ceb991 ...

a0ceb991 Browse Directory

Constraint: we need a complete hard-query picture before claiming the workspace_music20 voice lane is usable or deciding where pgvector work should start
Rejected: extrapolating from type_7 alone | the type_8 and type_16 lanes can fail differently and need their own measured baselines
Confidence: high
Scope-risk: narrow
Directive: keep all future business-corpus voice evaluations split by query type so we can see exactly which hard lanes fail and why
Tested: /usr/local/miniconda3/bin/python -m unittest discover -s acr-engine/tests -v; generated voice_workspace20_type8_eval.json (top1=0.0, top3=0.0) and voice_workspace20_type16_eval.json (top1=0.0, top3=0.0)
Not-tested: improved business-corpus voice correctness after moving to embedding/pgvector retrieval

authored 2026-06-03 18:11:11 +0800

Record the first business-corpus voice correctness check · 5a01ab7f ...

5a01ab7f Browse Directory

Constraint: the repo needs to distinguish runtime success from business-level song_id correctness before any production claim
Rejected: treating the workspace_music20 smoke as good enough | the current type_7 batch result is top1=0.0 and top3=0.05, which is far below a usable threshold
Confidence: high
Scope-risk: narrow
Directive: keep all future business-corpus voice evaluations written to local_eval artifacts and mirrored into changelog/checklist/handoff before push
Tested: /usr/local/miniconda3/bin/python -m unittest discover -s acr-engine/tests -v; generated acr-engine/data/local_eval/voice_workspace20_type7_eval.json with num_queries=20, top1=0.0, top3=0.05
Not-tested: improved business-corpus correctness after further retrieval tuning

authored 2026-06-03 18:09:35 +0800

Route voice recognition through the workspace music20 corpus · 356053b7 ...

356053b7 Browse Files

Constraint: external voice uploads now need a business-sample-backed path before any pgvector production cutover, while still staying lightweight enough for CPU smoke tests
Rejected: waiting for full pgvector service integration before proving a business-corpus path | would leave the external voice interface unvalidated against real sample references
Confidence: medium
Scope-risk: moderate
Directive: treat workspace_music20 as a proving lane only; validate business top1 correctness before promoting its defaults or claiming production readiness
Tested: /usr/local/miniconda3/bin/python -m unittest discover -s acr-engine/tests -v; /usr/local/miniconda3/bin/python acr-engine/scripts/service_voice_smoke.py -> status ok, corpus=workspace_music20, chunk_count=1, top_song_id=109, has_context=true
Not-tested: pgvector-backed /recognize/voice production retrieval path

authored 2026-06-03 18:07:28 +0800

Reduce voice service latency and record the first successful payload smoke · 86c3f935 ...

86c3f935 Browse Directory

Constraint: the voice service must return a payload under the current CPU environment before we can iterate on business-corpus correctness
Rejected: keeping the previous multi-chunk defaults | they caused smoke-timeout regressions and blocked basic endpoint validation
Confidence: high
Scope-risk: moderate
Directive: treat the current result as transport/runtime proof only until the service is switched from synthetic defaults to the /workspace business reference corpus
Tested: /usr/local/miniconda3/bin/python -m unittest discover -s acr-engine/tests -v; /usr/local/miniconda3/bin/python acr-engine/scripts/service_voice_smoke.py -> status ok, chunk_count=1, top_song_id=song_0022, has_context=false
Not-tested: business-corpus song_id correctness for /recognize/voice under /workspace reference data

authored 2026-06-03 18:04:18 +0800

998e4712 Browse Directory

Constraint: every documented progress step in this lane must update changelog, checklist, and handoff together before pushing
Rejected: leaving the handoff refresh isolated | it would break the repo's own continuity ritual and make the next session diff harder to trust
Confidence: high
Scope-risk: narrow
Directive: when the voice service state changes, mirror it across changelog, checklist, and handoff in the same push
Tested: /usr/local/miniconda3/bin/python -m unittest discover -s acr-engine/tests -v
Not-tested: successful end-to-end /recognize/voice payload within timeout

authored 2026-06-03 18:00:02 +0800

Refresh session handoff for the current voice service runtime state · 2cc5685b ...

2cc5685b Browse Directory

Constraint: the handoff must reflect the real runtime state: health endpoints work, CPU torch is installed, but end-to-end voice smoke still times out
Rejected: keeping the older dependency-missing note | it no longer matches the current environment and would mislead the next session
Confidence: high
Scope-risk: narrow
Directive: keep handoff notes focused on the shortest next debugging path for /recognize/voice timeout reduction
Tested: /usr/local/miniconda3/bin/python -m unittest discover -s acr-engine/tests -v; /health endpoint returns ok under uvicorn with CPU torch installed
Not-tested: successful end-to-end /recognize/voice payload within timeout

authored 2026-06-03 17:59:22 +0800

Record the current voice service readiness and remaining blocker · f44a34a3 ...

f44a34a3 Browse Directory

Constraint: the docs must reflect the real runtime state after installing CPU torch: health is up, but end-to-end voice recognition still times out
Rejected: declaring the voice API complete | the current smoke still does not return a final recognition payload within the timeout window
Confidence: medium
Scope-risk: narrow
Directive: keep status docs synchronized with actual smoke results, especially partial readiness states
Tested: /usr/local/miniconda3/bin/python -m unittest discover -s acr-engine/tests -v; /health endpoint returns ok under uvicorn; direct /recognize/voice smoke currently times out after CPU torch install
Not-tested: successful end-to-end /recognize/voice result payload within timeout

authored 2026-06-03 17:57:35 +0800

Update the release checklist for the voice-query service path · b787858c ...

b787858c Browse Directory

Constraint: the checklist should reflect the real current state: health endpoint is up, but full voice inference remains blocked by missing torch
Rejected: marking service smoke fully passed | /recognize/voice still cannot execute end-to-end inference in this environment
Confidence: high
Scope-risk: narrow
Directive: keep the release checklist brutally explicit about partial vs full service readiness
Tested: /usr/local/miniconda3/bin/python -m unittest discover -s acr-engine/tests -v; /health endpoint reachable under uvicorn
Not-tested: successful /recognize/voice inference until torch is installed

authored 2026-06-03 17:49:47 +0800

Refresh the handoff docs for the voice-query ACR path · 97a9ffc8 ...

97a9ffc8 Browse Directory

Constraint: the handoff should reflect the current FAISS-first local workflow and the partially wired voice service without claiming end-to-end inference is ready
Rejected: waiting for full torch-backed service completion before documenting progress | would hide the current repo state and block clean session handoff
Confidence: high
Scope-risk: narrow
Directive: keep future handoff updates focused on what is runnable now, what is blocked, and the next shortest path to unblock it
Tested: /usr/local/miniconda3/bin/python -m unittest discover -s acr-engine/tests -v; /usr/local/miniconda3/bin/python -m uvicorn src.service.app:app --host 127.0.0.1 --port 8000 with successful /health response
Not-tested: successful /recognize/voice inference until torch is installed

authored 2026-06-03 17:44:59 +0800

Let the ACR API start before heavy model dependencies load · fa5b6147 ...

fa5b6147 Browse Files

Constraint: service health and config endpoints should stay reachable even when training-time dependencies like torch are not installed
Rejected: importing retrieval engines at module load | it makes the whole API crash before reporting dependency gaps clearly
Confidence: high
Scope-risk: narrow
Directive: keep runtime dependency checks inside request-time engine loading so infra can health-check the service independently of model installation state
Tested: /usr/local/miniconda3/bin/python -m unittest discover -s acr-engine/tests -v; /usr/local/miniconda3/bin/python -m uvicorn src.service.app:app --host 127.0.0.1 --port 8000 with successful /health response; POST /recognize/voice currently returns a clear 500 dependency error when torch is missing
Not-tested: successful end-to-end /recognize/voice inference without torch installed

authored 2026-06-03 17:39:37 +0800

Add voice chunking and match-context foundations for ACR service · bd66c06b ...

bd66c06b Browse Directory

Constraint: keep humming/recording query support lightweight and compatible with the existing FAISS-first local workflow while production retrieval remains pgvector-oriented
Rejected: delaying service-path scaffolding until full production retrieval is ready | would block validation of voice-to-chunk and context export behavior
Confidence: high
Scope-risk: moderate
Directive: keep semantics song_id-first and treat resource paths only as supporting evidence/context artifacts
Tested: /usr/local/miniconda3/bin/python -m unittest discover -s acr-engine/tests -v
Not-tested: live FastAPI smoke until uvicorn is available in the current interpreter environment

authored 2026-06-03 17:36:22 +0800

Add a FAISS-first local ACR workflow for music20 samples · 69843933 ...

69843933 Browse Directory

Constraint: local validation should stay lightweight and use /workspace sample files while production retrieval remains pgvector-backed
Rejected: making ChromaDB the default local backend | chromadb is not installed in the current environment and FAISS is already available
Confidence: high
Scope-risk: narrow
Directive: keep local dev workflows explicitly separated from production pgvector flows in docs and scripts
Tested: /usr/local/miniconda3/bin/python -m unittest discover -s acr-engine/tests -v; /usr/local/miniconda3/bin/python acr-engine/scripts/local_music20_acr.py --downloads-dir /workspace/downloads --song-limit 20 --backend faiss --output acr-engine/data/local_eval/music20_summary.json
Not-tested: chromadb backend execution without installation; live pgvector database execution path

authored 2026-06-03 16:49:02 +0800

update 1 · 4806664b
4806664b

cnb.bofCdSsphPA authored 2026-06-03 15:20:43 +0800

Freeze the production encoder before scaling the music index · 9de8092d ...

9de8092d Browse Directory

Document the production decision to stabilize the embedding space before onboarding a 300k-song catalog, and record the migration rules for future encoder upgrades.

Constraint: 300k-song production rollout makes embedding churn expensive and risky
Rejected: keep iterating encoder before defining a production embedding version | would force repeated full-vector rebuilds and unstable rollout criteria
Confidence: high
Scope-risk: narrow
Directive: Treat encoder changes as versioned index migrations, not in-place model swaps
Tested: reviewed rendered markdown content, docs index link, changelog entry, and git diff for the three touched docs
Not-tested: git push / remote sync outcome depends on repository remote state

authored 2026-06-03 13:40:30 +0800

02 Jun, 2026 14 commits

Record the hum_guard verification result · 73d28fae ...

73d28fae Browse Directory

Capture the latest sweep evidence so the next session can resume cleanly.

Constraint: docs only; keep large data and checkpoints out of git
Rejected: leaving hum_guard unrecorded | would lose the newest verification evidence
Confidence: high
Scope-risk: narrow
Directive: continue the next search from hum_focus
Tested: reviewed the eval.json evidence and diff
Not-tested: no code or model changes in this commit

authored 2026-06-03 00:15:07 +0800

Freeze the handoff package before the next run · 07a5c3e7 ...

07a5c3e7 Browse Directory

Keep the current optimization state resumable and concise.

Constraint: docs only; avoid raw data, checkpoints, and __pycache__
Rejected: continuing implementation now | user requested a fast delivery package first
Confidence: high
Scope-risk: narrow
Directive: resume from the handoff docs on the next session
Tested: reviewed diff and confirmed only four docs changed
Not-tested: no code or training pipeline changes in this commit

authored 2026-06-03 00:13:10 +0800

pin hum_focus as the current dual-axis search anchor\n\nConstraint: Keep the han… · 57f05e80 ...

57f05e80 Browse Directory

…doff restart-safe and avoid staging temporary sweep artifacts\nRejected: Switch back to v6 or continue blind search | Fresh evidence shows hum_focus is the current best candidate and the right anchor for finer tuning\nConfidence: high\nScope-risk: narrow\nDirective: Use hum_focus as the baseline for the next micro-search, preserving humming_like gains while keeping confused at 0.25\nTested: Verified hum_focus versus hum_balanced with fresh eval results and updated docs accordingly\nNot-tested: Whether a further micro-tuned variant beats hum_focus

authored 2026-06-03 00:04:49 +0800

checkpoint the first end-to-end dual-axis smoke result\n\nConstraint: The handof… · 9c3f182a ...

9c3f182a Browse Directory

…f must record the fresh dual-axis metric outcome without staging temporary smoke artifacts\nRejected: Keep tuning weights before checkpointing | The first end-to-end dual-axis result is already a meaningful evidence point and restart-safe boundary\nConfidence: high\nScope-risk: narrow\nDirective: Continue with finer-grained dual-axis weight search, targeting humming_like recovery while preserving confused gains\nTested: Verified dual-axis smoke completed train, build-index, and evaluate with top1 0.5 / topk 0.9 and updated handoff/changelog docs\nNot-tested: Improved dual-axis weight combinations beyond this first balanced trial

authored 2026-06-02 23:57:16 +0800

parameterize dual-axis hard-case weighting for low-risk experiments\n\nConstrain… · 62798505 ...

62798505 Browse Directory

…t: Keep the training pipeline behavior stable while exposing humming_like and confused controls through config only\nRejected: Add a brand-new sampler framework first | The smallest useful step is config-driven control on the existing dataset weighting path\nConfidence: high\nScope-risk: narrow\nDirective: Run weight-search experiments through training.sample_type_weights and training.pair_type_weights before attempting broader training-stack refactors\nTested: py_compile passed, train.py dry-run on synthetic_v2 passed, and custom SongPairDataset weighting instantiation produced expected hard_weight output\nNot-tested: End-to-end retraining and metric improvements from new dual-axis weight combinations

authored 2026-06-02 23:52:13 +0800

explain the v5-v6 hard-case split with source-backed evidence\n\nConstraint: The… · 7812b589 ...

7812b589 Browse Directory

… handoff must convert baseline metrics into an actionable causal explanation without staging report artifacts\nRejected: Start a new weighting experiment immediately | Source-backed explanation of the existing split is cheaper and reduces blind iteration risk\nConfidence: high\nScope-risk: narrow\nDirective: Treat dual-axis hard-case weighting as the next design lane, using v6 as the base and v5 as the humming_like reference\nTested: Verified source-backed v5/v6 definitions from changelog and smoke-v6 config artifacts, then updated handoff/changelog docs\nNot-tested: A new merged weighting strategy or its downstream metric impact

authored 2026-06-02 23:49:10 +0800

select the next hard-case optimization baseline from fresh sweeps\n\nConstraint:… · 93dfa158 ...

93dfa158 Browse Directory

… Handoff must encode the new baseline decision without staging temporary sweep artifacts\nRejected: Jump straight into retraining without baseline comparison | Fresh sweep evidence now makes a targeted v6-vs-v5 optimization path cheaper and safer\nConfidence: high\nScope-risk: narrow\nDirective: Use v6 as the overall baseline and treat v5 as the humming_like comparison target before changing training or segmentation logic\nTested: Ran a synthetic_v2 hard-case sweep across v3-v6, verified summary metrics, and updated handoff/changelog docs with the baseline decision\nNot-tested: Whether a merged v6-plus-v5 strategy improves real open-data derived hard cases

authored 2026-06-02 23:46:12 +0800

pin down the hard-case gap after the first real-path closure\n\nConstraint: Hand… · d4961b14 ...

d4961b14 Browse Directory

…off must distinguish clean real-path evidence from hard-case evidence without staging temporary evaluation artifacts\nRejected: Keep scaling clean-only FMA smoke first | Fresh evidence shows the next highest-yield work is hard-case top1 improvement\nConfidence: high\nScope-risk: narrow\nDirective: Treat humming_like and confused as the primary optimization targets before investing more cycles in larger clean-only smoke runs\nTested: Audited manifest type coverage, verified synthetic_v2 hard-case evaluate results, and updated handoff/changelog docs with the gap analysis\nNot-tested: Post-optimization hard-case improvements on real open-data derived hard cases

authored 2026-06-02 23:44:12 +0800

capture the first real-path index-to-evaluate closure\n\nConstraint: Delivery st… · 81704ace ...

81704ace Browse Directory

…ate must reflect fresh evaluate evidence without staging temporary eval assets\nRejected: Wait for larger-scale or hard-case metrics | The first explicit evaluate closure is already a meaningful milestone and restart-safe handoff point\nConfidence: high\nScope-risk: narrow\nDirective: Reuse /tmp/fma_realpath_small_rerun_index2 and /tmp/fma_realpath_small_rerun_eval as the next validation baseline before scaling up\nTested: Verified eval_top50.json at num_queries 35 with top1 0.8571 and topk 1.0, confirmed query-count explanation, and updated handoff/changelog docs\nNot-tested: Larger query caps, hard-case buckets, and full-scale FMA evaluate runs

authored 2026-06-02 23:41:33 +0800

record the first completed real-path reference index milestone\n\nConstraint: De… · 9371e944 ...

9371e944 Browse Directory

…livery docs must reflect fresh post-fix completion evidence and exclude data/index artifacts\nRejected: Delay until evaluate evidence exists | Completed reference index is already a distinct stage milestone the user asked us to checkpoint\nConfidence: high\nScope-risk: narrow\nDirective: Use /tmp/fma_realpath_small_rerun_index2 as the primary handoff artifact and validate evaluate or identify next before expanding sample size\nTested: Verified reference_progress.json complete at 200/200, reference_embs.npy and reference_ids.npy present, embedding_shape [2068, 192], and handoff/changelog docs updated\nNot-tested: Automatic evaluate chaining and retrieval quality on the completed 200-ref index

authored 2026-06-02 23:36:44 +0800

capture the first real-path post-fix reference checkpoint\n\nConstraint: Handoff… · 41c4d7cc ...

41c4d7cc Browse Directory

… must reflect fresh observable evidence before restart and avoid staging data artifacts\nRejected: Wait for full reference completion | User asked for immediate delivery package and current checkpoint is already a meaningful stage transition\nConfidence: high\nScope-risk: narrow\nDirective: Treat session 19709 and /tmp/fma_realpath_small_rerun_index2 as the primary continuation path until final reference artifacts or a new traceback appear\nTested: Verified chromaprint 200/200 complete, reference_progress.json 25/200 checkpoint, partial reference numpy artifacts, and updated handoff/changelog files\nNot-tested: Full reference completion and downstream evaluate stage on the active rerun

authored 2026-06-02 23:32:09 +0800

Prevent a single bad MP3 from collapsing the whole build-index pipeline · 707449b8 ...

707449b8 Browse Directory

Constraint: Real-path investigation exposed decode failures from mpg123/librosa on some MP3s during long index runs
Rejected: Abort the entire job on first decode error | it turns one bad asset into total index failure
Confidence: high
Scope-risk: narrow
Directive: Keep per-file skip logging and skipped_refs accounting while continuing the real-path root-cause run
Tested: Verified /tmp/chroma_skip_repro with 1 good MP3 + 1 bad MP3 completes RC=0, logs skip decode failure, writes reference outputs, and records skipped_refs=1
Not-tested: Full real-path FMA rerun after tolerance change is still pending

authored 2026-06-02 23:23:10 +0800

Make build-index failures observable so root-cause analysis can proceed from real logs · 6ece1fa7 ...

6ece1fa7 Browse Directory

Constraint: The live build-index investigation was blocked by stdout/stderr buffering that left log files at 0 bytes during long runs
Rejected: Keep diagnosing from progress files alone | they do not preserve traceback or stage-transition context
Confidence: high
Scope-risk: narrow
Directive: Preserve flush-on-progress behavior while chasing the remaining real-path build-index root cause
Tested: Verified tiny repro /tmp/chroma_repro_tiny12 writes live logs and traceback with RC=1 after flush=True change
Not-tested: No final fix for the real-path build-index exit yet

authored 2026-06-02 23:20:04 +0800

Capture the unexpected build-index exit so the next session starts from failure analysis · 7bb69662 ...

7bb69662 Browse Directory

Constraint: Both observable and legacy build-index jobs exited without producing reference_* or evaluate artifacts
Rejected: Keep treating the run as slow linear progress | no longer matches the fresh ps/pgrep evidence
Confidence: high
Scope-risk: narrow
Directive: Start the next cycle with build-index exit-path diagnosis before launching more long runs
Tested: Verified ps/pgrep show no active build/evaluate process; verified observable directory still only has chromaprint progress/cache files; reviewed updated handoff docs
Not-tested: No root-cause reproduction or fix yet

authored 2026-06-02 23:12:15 +0800