Commits · 87e8ac062b7bcff09c7b6ea0581c390209e0cbf2 · wanghai-tech / hikoon-ACR

02 Jun, 2026 40 commits

Document that the real FMA transfer has passed the two-thirds mark under stable guard · 87e8ac06 ...

Constraint: Long-running progress evidence should keep proving both transfer health and guard durability until the gate opens.
Rejected: Waiting silently for completion | The user asked for continuous optimization and verifiable staged updates.
Confidence: high
Scope-risk: narrow
Directive: Keep recording material progress jumps and guard liveness rather than emitting redundant no-change updates.
Tested: Verified the detached guard was still alive after more than three minutes, confirmed log growth through seven polling cycles, re-ran archive inspect, and confirmed readiness remains blocked only by incomplete bytes.
Not-tested: Extraction and real-data smoke still await full archive completion.

authored 2026-06-02 14:10:33 +0800

Keep recording that the detached FMA guard remains stable over longer intervals · a41e509e ...

a41e509e Browse Files

Constraint: The real-data lane depends on confidence that the unattended guard will survive for the rest of the download, not just a short sample window.
Rejected: Declaring the guard fully solved after the prior check | More elapsed time and more cycles give stronger operational proof.
Confidence: high
Scope-risk: narrow
Directive: Continue pairing guard-runtime evidence with readiness checks until the archive completes and phase transition occurs.
Tested: Verified the detached guard was still alive after more than two minutes, observed log growth through five polling cycles, re-ran archive inspect, and confirmed readiness is still blocked only by incomplete bytes.
Not-tested: The completed handoff through extraction and smoke is still pending archive completion.

authored 2026-06-02 14:09:32 +0800

Preserve proof that the detached FMA guard is now staying alive longer · 933a9fb9 ...

933a9fb9 Browse Directory

Constraint: We need evidence that the new guard launcher solved the earlier drop behavior before trusting it for the rest of the transfer.
Rejected: Assuming success from one or two polls alone | A longer runtime and multiple cycles provide stronger evidence.
Confidence: high
Scope-risk: narrow
Directive: Keep verifying guard liveness alongside archive progress until readiness opens and the pipeline can switch phases.
Tested: Checked the detached guard's pid/runtime, confirmed three logged polling cycles, re-ran archive inspect, and confirmed the readiness gate is still blocked only by incomplete bytes.
Not-tested: Extraction and real-data smoke remain pending until the archive reaches full size.

authored 2026-06-02 14:08:37 +0800

Harden the FMA waiter by launching it as a real detached guard · d206f2c9 ...

d206f2c9

Constraint: A multi-hour download needs a background guard that survives shell teardown, not just a logically correct polling loop.
Rejected: More ad-hoc nohup restarts | They obscured whether the issue was loop logic or process detachment.
Confidence: high
Scope-risk: narrow
Directive: Use the guard launcher for future unattended waits and keep pid/log artifacts so later sessions can verify liveness quickly.
Tested: Ran a foreground three-cycle control experiment, launched the new setsid-based guard, then verified the detached process survived with PPID 1 and emitted at least two polling cycles in the log.
Not-tested: Full handoff through completed download, extraction, and smoke still awaits archive completion.

authored 2026-06-02 14:07:53 +0800

Record that the durable waiter still needs another stability pass · 847ac44d ...

847ac44d Browse Directory

Constraint: The real-data lane still needs a reliable unattended handoff process, and fresh evidence now shows the first durability fix was incomplete.
Rejected: Treating the restarted waiter as fully solved | The second drop proves more diagnosis is required.
Confidence: medium
Scope-risk: narrow
Directive: Investigate why the waiter exits after the first logged poll instead of assuming the infinite-loop change alone solved stability.
Tested: Re-checked archive progress, confirmed the waiter process was absent, inspected the single-entry log file, and restarted the waiter successfully.
Not-tested: Root-cause isolation for the second waiter drop remains pending.

authored 2026-06-02 14:06:15 +0800

Make the FMA waiter durable enough for a real multi-hour transfer · 31194789 ...

31194789 Browse Directory

Constraint: The real dataset download lasts far longer than the waiter's original three-cycle lifetime, so the handoff process must survive unattended.
Rejected: Repeatedly restarting a short-lived waiter by hand | That is fragile and defeats the point of automation.
Confidence: high
Scope-risk: narrow
Directive: Keep the waiter long-lived by default and preserve progress logs so future sessions can see active polling immediately.
Tested: Diagnosed the original max-cycles behavior, ran a short two-cycle verification showing archive growth, then relaunched the long-lived waiter and confirmed live process plus log output.
Not-tested: The completed handoff path from full archive to extraction has not fired yet because the download is still in progress.

authored 2026-06-02 14:05:18 +0800

Recover the FMA post-download waiter after detecting it had dropped · be2b3326 ...

be2b3326 Browse Directory

Constraint: The real-data lane should not rely on a dead background handoff process while a long download is still in flight.
Rejected: Assuming the prior waiter was still alive | A direct process check showed it was gone.
Confidence: high
Scope-risk: narrow
Directive: Re-check waiter liveness during subsequent progress audits and restart it whenever it drops before archive completion.
Tested: Re-ran archive inspect, verified the waiter was absent, confirmed the empty log file, restarted the waiter, and validated the new live process.
Not-tested: The restarted waiter has not yet handed off to extraction because the archive remains incomplete.

authored 2026-06-02 14:04:08 +0800

Keep the real FMA lane moving by arming an automatic post-download waiter · ec7a8bd7 ...

ec7a8bd7 Browse Directory

Constraint: The dataset gate is long-running, so progress should continue without manual babysitting once the archive finishes.
Rejected: Pure polling without a handoff process | That would still require manual intervention at completion time.
Confidence: high
Scope-risk: narrow
Directive: Leave the waiter in place until it hands off to post-download preparation, then capture the resulting extraction evidence.
Tested: Re-ran archive inspect, confirmed no prior waiter, started wait_for_fma_and_prepare in the background, and verified the live process plus log file.
Not-tested: The waiter has not yet reached extraction because the archive is still incomplete.

authored 2026-06-02 14:03:10 +0800

Capture fresh proof that the FMA transfer and watchdog remain healthy · 24512752 ...

24512752

Constraint: The active Ralph loop needs current operational evidence while the dataset gate is still waiting on download completion.
Rejected: Relying on byte growth alone | We also need process-level proof that the transfer path is still alive.
Confidence: high
Scope-risk: narrow
Directive: Keep validating both archive growth and transfer liveness until readiness opens, then switch to extraction immediately.
Tested: Re-ran inspect, watchdog, and process checks; all confirmed higher byte counts, a live curl process, and no restart needed.
Not-tested: Real-data extraction and smoke remain blocked by the incomplete archive.

authored 2026-06-02 14:02:23 +0800

Prove the real FMA post-download gate is still not open · 2fe32034 ...

2fe32034 Browse Directory

Constraint: We need script-backed evidence for whether the pipeline can advance beyond download waiting.
Rejected: Assuming the next phase is ready from percentage alone | Readiness must be validated by the post-download gate script.
Confidence: high
Scope-risk: narrow
Directive: Use the readiness script, not only byte counts, before switching to extraction and smoke.
Tested: Re-ran archive inspect and the post-download readiness check, which confirmed progress growth but a still-blocked archive_not_complete gate.
Not-tested: Extraction and smoke remain deferred until the readiness script reports completion.

authored 2026-06-02 14:01:48 +0800

Preserve fresh evidence that the real FMA transfer is still advancing · 9d4b0cd7 ...

9d4b0cd7 Browse Files

Constraint: Ralph requires new verification evidence while the real-data gate remains unresolved.
Rejected: Repeating the prior status without a fresh measurement | It would not prove continued forward progress.
Confidence: high
Scope-risk: narrow
Directive: Keep recording byte-level progress until the archive completes, then switch immediately to extraction and smoke validation.
Tested: Re-ran inspect and watchdog checks, confirming higher byte counts and a live curl process without restart.
Not-tested: Extraction and real-data smoke remain blocked on archive completion.

authored 2026-06-02 14:01:17 +0800

Record the live FMA download gate before real-data validation · 55dea0c9 ...

55dea0c9 Browse Directory

Constraint: Real-data smoke cannot be claimed before the user-provided archive is fully downloaded and locally inspectable.
Rejected: Pretending readiness from partial bytes | That would create false verification evidence for the dataset lane.
Confidence: high
Scope-risk: narrow
Directive: Do not run real FMA extraction or smoke until inspect reports the full expected archive size.
Tested: Re-ran the archive inspect command and confirmed the active background curl process plus current local file size.
Not-tested: Extraction, local preparation, and real FMA smoke remain pending until the archive completes.

authored 2026-06-02 14:00:42 +0800

Clarify how audio becomes trainable and queryable data · a4c891da ...

a4c891da Browse Directory

Constraint: The guidance had to align with the repo's existing manifest and pgvector templates while staying usable for later industrial ingestion.
Rejected: A purely conceptual note | It would not be actionable for future sessions or data engineering work.
Confidence: high
Scope-risk: narrow
Directive: Keep future dataset onboarding and pgvector ingestion changes anchored on manifest-first contracts and stable song identifiers.
Tested: Relative markdown links for the updated docs were validated locally and repository anchor files were confirmed present.
Not-tested: No model retraining or database ingestion was run in this documentation-only stage.

authored 2026-06-02 14:00:01 +0800

Let future sessions wait on archive completion without manual polling · d1e1a2b7 ...

d1e1a2b7

Constraint: The real FMA archive still needs time, but once it finishes the workflow should transition into extraction and readiness with minimal operator attention
Rejected: Keep completion detection as an entirely manual loop | Wastes attention and slows handoff at the exact moment the archive becomes useful
Confidence: high
Scope-risk: narrow
Directive: Use wait_for_fma_and_prepare.py as the passive bridge from long-running download to active dataset onboarding whenever unattended waiting is acceptable
Tested: /usr/local/miniconda3/bin/python -m py_compile acr-engine/scripts/wait_for_fma_and_prepare.py; /usr/local/miniconda3/bin/python acr-engine/scripts/wait_for_fma_and_prepare.py --interval 2 --max-cycles 2
Not-tested: The completed-path handoff into extraction remains pending full archive completion

authored 2026-06-02 13:55:47 +0800

Reduce the handoff gap between archive completion and real-data readiness · 46b9d8d4 ...

46b9d8d4

Constraint: Once the large FMA archive finishes, future sessions should not need to manually stitch extraction and readiness checks together
Rejected: Leave post-download steps as manual shell sequences | Increases delay and error risk at the most valuable transition point
Confidence: high
Scope-risk: narrow
Directive: Keep fma_postdownload_ready.py as the canonical first command after archive completion before attempting real-data smoke runs
Tested: /usr/local/miniconda3/bin/python -m py_compile acr-engine/scripts/fma_postdownload_ready.py; /usr/local/miniconda3/bin/python acr-engine/scripts/fma_postdownload_ready.py
Not-tested: Successful extract and readiness on the full archive remain pending completion of the download

authored 2026-06-02 13:54:06 +0800

Bridge pgvector exports toward actual PostgreSQL bulk ingestion · 44bbfcb5 ...

44bbfcb5 Browse Files

Constraint: Schema and manifest-export templates are useful, but practical adoption still needs an explicit handoff into database load order and SQL shapes
Rejected: Stop at export JSON only | Leaves later sessions to redesign the bulk-ingest bridge from scratch
Confidence: high
Scope-risk: narrow
Directive: Keep bulk-load templates declarative until a real database target is available, then add a live loader without changing manifest semantics
Tested: /usr/local/miniconda3/bin/python -m py_compile acr-engine/scripts/pgvector_bulk_load_template.py; /usr/local/miniconda3/bin/python acr-engine/scripts/pgvector_bulk_load_template.py --input acr-engine/reports/pgvector_manifest_export_test.json --output acr-engine/reports/pgvector_bulk_load_plan_test.json
Not-tested: Live PostgreSQL execution remains pending a database environment

authored 2026-06-02 13:51:37 +0800

Turn pgvector planning into repo-native ingestion templates · 528cc473 ...

528cc473

Constraint: The user needs concrete downstream data handling guidance now, and future vector retrieval work should not start from abstract docs alone
Rejected: Leave pgvector support at prose-only guidance | Delays integration by forcing later sessions to reinvent schema and export bridges
Confidence: high
Scope-risk: narrow
Directive: Keep schema/export templates aligned with actual manifest semantics before adding live database loaders
Tested: /usr/local/miniconda3/bin/python -m py_compile acr-engine/scripts/export_manifest_to_pgvector_json.py; /usr/local/miniconda3/bin/python acr-engine/scripts/export_manifest_to_pgvector_json.py --data acr-engine/data/synthetic_v2 --split test --source-dataset synthetic_v2 --output acr-engine/reports/pgvector_manifest_export_test.json
Not-tested: Live PostgreSQL/pgvector ingestion remains pending a real database target

authored 2026-06-02 13:50:05 +0800

Make data onboarding and long FMA transfer supervision easier to sustain · d6d67893 ...

d6d67893

Constraint: The user needs detailed data-format guidance now, while the real FMA archive transfer still requires durable hands-off supervision across long sessions
Rejected: Treat documentation and download-watch work as separate later tasks | Would leave either user guidance or transfer resilience lagging behind active development
Confidence: high
Scope-risk: narrow
Directive: Keep the new training-data/pgvector guide aligned with actual manifest fields and use watch_fma_download.py as the first-line long-transfer watchdog
Tested: /usr/local/miniconda3/bin/python -m py_compile acr-engine/scripts/watch_fma_download.py; /usr/local/miniconda3/bin/python acr-engine/scripts/watch_fma_download.py --cycles 2 --interval 2; /usr/local/miniconda3/bin/python acr-engine/scripts/prepare_fma_archive.py inspect
Not-tested: Full archive completion, extraction, and real-data smoke remain pending

authored 2026-06-02 13:47:54 +0800

Recover stalled real-dataset transfer with a durable background resume path · 83a3f89f ...

83a3f89f

Constraint: Long FMA archive downloads cannot rely on fragile foreground execution if Ralph-style work must continue across sessions
Rejected: Keep manually reissuing foreground download commands after stalls | Increases interruption risk and weakens resumability evidence
Confidence: high
Scope-risk: narrow
Directive: Prefer prepare_fma_archive.py bg-download for future large archive recovery so PID and log evidence remain standardized
Tested: /usr/local/miniconda3/bin/python acr-engine/scripts/prepare_fma_archive.py bg-download; /usr/local/miniconda3/bin/python acr-engine/scripts/prepare_fma_archive.py inspect; tail -n 40 /tmp/fma_modelscope_download.log
Not-tested: Full archive completion, extraction, and real-data smoke remain pending

authored 2026-06-02 13:43:23 +0800

Make long-running FMA archive progress legible at a glance · 730d9b90 ...

730d9b90 Browse Directory

Constraint: Multi-session continuation gets brittle when large real-data downloads require manual byte math to estimate progress
Rejected: Leave inspect output as raw archive size only | Forces every future session to recalculate completion state by hand
Confidence: high
Scope-risk: narrow
Directive: Keep progress fields stable so handoff tooling and humans can rely on them during long archive transfers
Tested: /usr/local/miniconda3/bin/python -m py_compile acr-engine/scripts/prepare_fma_archive.py; /usr/local/miniconda3/bin/python acr-engine/scripts/prepare_fma_archive.py inspect
Not-tested: Completion of the full archive and downstream extraction remain pending

authored 2026-06-02 13:41:25 +0800

Align real FMA ingestion with the user-provided ModelScope source · d1d7a512 ...

d1d7a512 Browse Directory

Constraint: The user supplied a verified archive URL that is a better current source of truth than the previously tested mirror path
Rejected: Keep the older archive URL as the default control surface | Would ignore fresher user evidence and split operational guidance across sources
Confidence: high
Scope-risk: narrow
Directive: Treat the ModelScope FMA archive URL as the primary default until a newer verified source supersedes it
Tested: curl -I -L --max-time 60 https://modelscope.cn/datasets/pengzhendong/fma/resolve/master/fma_small.zip; curl -L --range 0-1023 --max-time 60 -o /tmp/fma_modelscope_probe.bin https://modelscope.cn/datasets/pengzhendong/fma/resolve/master/fma_small.zip; /usr/local/miniconda3/bin/python acr-engine/scripts/prepare_fma_archive.py inspect
Not-tested: Full archive completion, extraction, and downstream real-data smoke remain pending

authored 2026-06-02 13:40:10 +0800

Add an HTTP-level regression path for the local ACR service · 2ee3e829 ...

2ee3e829 Browse Files

Constraint: A service intended for industrialization needs a real process-level smoke test, not only direct function imports
Rejected: Rely on unit-style handler calls alone | Misses uvicorn startup and actual HTTP surface regressions
Confidence: high
Scope-risk: narrow
Directive: Keep service_smoke.py lightweight and dependency-free so it remains the fastest operational gate before broader API expansion
Tested: /usr/local/miniconda3/bin/python -m py_compile acr-engine/scripts/service_smoke.py; /usr/local/miniconda3/bin/python acr-engine/scripts/service_smoke.py
Not-tested: /recognize and /index/build over HTTP remain pending dedicated API smoke inputs

authored 2026-06-02 13:38:33 +0800

Expose service readiness and cache state before scaling the API surface · aa6e1583 ...

aa6e1583 Browse Files

Constraint: Industrializing the service path requires visibility into model/index availability and repeated-load behavior before adding heavier production features
Rejected: Keep stateless per-request loading until later | Hides readiness problems and wastes time on repeated engine initialization
Confidence: high
Scope-risk: narrow
Directive: Preserve /ready and /cache as low-cost operational probes even if the serving stack evolves behind them
Tested: /usr/local/miniconda3/bin/python -m py_compile acr-engine/src/service/app.py; /usr/local/miniconda3/bin/python /tmp/test_service_readiness.py; /usr/local/miniconda3/bin/python /tmp/test_service_cache.py
Not-tested: Live FastAPI HTTP serving and concurrent request behavior remain pending

authored 2026-06-02 13:36:50 +0800

Make long-running real FMA ingestion resumable across sessions · 2b389caa ...

2b389caa Browse Directory

Constraint: The verified FMA archive is multi-gigabyte and downloads slowly, so the workflow must remain inspectable and resumable before extraction can happen
Rejected: Depend on ad hoc curl and unzip commands only | Makes long-running handoff and recovery brittle during Ralph-style continuous execution
Confidence: high
Scope-risk: narrow
Directive: Keep official FMA archive acquisition centered on prepare_fma_archive.py so future sessions share one resumable control surface
Tested: /usr/local/miniconda3/bin/python -m py_compile acr-engine/scripts/prepare_fma_archive.py; /usr/local/miniconda3/bin/python acr-engine/scripts/prepare_fma_archive.py inspect; unzip -v | head -n 2
Not-tested: Archive extraction and real-data smoke remain pending completion of the full fma_small.zip download

authored 2026-06-02 13:34:32 +0800

Lock in a stable official route for real FMA archive acquisition · 7c54eb28 ...

7c54eb28 Browse Directory

Constraint: Real-data progress was blocked until we could prove an upstream archive path that still works today
Rejected: Continue iterating on historical per-track URLs | Those paths already proved unstable via 403 and 404 evidence
Confidence: high
Scope-risk: narrow
Directive: Prefer the verified fma_small.zip archive route over legacy page or single-track scraping paths unless upstream changes again
Tested: curl -I -L --max-time 60 https://os.unil.cloud.switch.ch/fma/fma_small.zip; curl -L --range 0-1023 --max-time 60 -o /tmp/fma_small_probe.bin https://os.unil.cloud.switch.ch/fma/fma_small.zip
Not-tested: Full 7.68 GB archive download, extraction, and smoke execution remain pending

authored 2026-06-02 13:32:33 +0800

Separate local tooling issues from upstream FMA URL breakage · 7ea9b1d0 ...

7ea9b1d0

Constraint: Real-data progress requires proving whether failures come from our environment or from changed upstream access paths
Rejected: Keep treating the fetch blocker as a missing-tool problem | Would misdirect future debugging after yt-dlp module support was verified
Confidence: high
Scope-risk: narrow
Directive: Do not retry historical FMA page URLs again unless a fresh source confirms their return; pivot to official archives or stable mirrors instead
Tested: which yt-dlp || true; /usr/local/miniconda3/bin/python -m yt_dlp --version; /usr/local/miniconda3/bin/python -m py_compile acr-engine/scripts/fetch_fma_subset.py; /usr/local/miniconda3/bin/python acr-engine/scripts/fetch_fma_subset.py --report acr-engine/reports/fma_fetch_subset_report.json
Not-tested: Successful real FMA download still pending a valid upstream archive or mirror URL

authored 2026-06-02 13:31:36 +0800

Preserve the first verified real-FMA download path and blocker evidence · b32e002b ...

b32e002b Browse Files

Constraint: Continuous dataset landing work needs concrete failed-path evidence so future sessions do not restart from outdated assumptions
Rejected: Omit the failed download automation because it did not complete | Loses reproducible evidence about the current 403 and missing-tool barriers
Confidence: high
Scope-risk: narrow
Directive: Replace this bounded fetch path only after verifying a stable official archive or mirror-based download route
Tested: /usr/local/miniconda3/bin/python -m py_compile acr-engine/scripts/fetch_fma_subset.py; /usr/local/miniconda3/bin/python acr-engine/scripts/fetch_fma_subset.py --report acr-engine/reports/fma_fetch_subset_report.json
Not-tested: Successful real FMA audio download remains blocked by current upstream/tooling availability

authored 2026-06-02 13:25:17 +0800

Keep raw open-music assets out of normal git history · 2d501547 ...

2d501547 Browse Files

Constraint: The user wants real datasets added locally and potentially pushed, which would make ordinary git history fragile without LFS guardrails
Rejected: Download first and retrofit tracking later | Risks oversized commits and inconsistent reproducibility rules
Confidence: high
Scope-risk: narrow
Directive: Route all future raw corpus archives and audio under acr-engine/data/raw through LFS unless a smaller manifest-only alternative is explicitly chosen
Tested: git lfs version; git check-attr filter -- acr-engine/data/raw/fma_small_audio/example.wav; git check-attr filter -- acr-engine/data/raw/archive.zip
Not-tested: Actual large-file add/push against remote LFS storage remains pending until real dataset files are downloaded

authored 2026-06-02 13:23:04 +0800

Prevent empty local dataset folders from masquerading as smoke-ready · f2360135 ...

f2360135

Constraint: Real-data validation now depends on user-requested local corpus drop zones that may exist before they contain any audio
Rejected: Let smoke-local fail deep inside training | Produces slower and less actionable feedback for continuous sessions
Confidence: high
Scope-risk: narrow
Directive: Keep readiness thresholds aligned with the minimum viable query split assumptions before expanding real-data automation
Tested: /usr/local/miniconda3/bin/python -m py_compile src/data/external_adapters.py scripts/status_snapshot.py; /usr/local/miniconda3/bin/python src/data/external_adapters.py check-local-ready fma data/raw/fma_small_audio --eval-ratio 0.2 --query-duration 8.0; /usr/local/miniconda3/bin/python src/data/external_adapters.py check-local-ready mtg_jamendo data/raw/mtg_jamendo_audio --eval-ratio 0.2 --query-duration 8.0; /usr/local/miniconda3/bin/python scripts/status_snapshot.py --output .omx/latest_status_snapshot.json
Not-tested: Full smoke-local on real FMA or MTG-Jamendo remains blocked until audio is actually downloaded

authored 2026-06-02 13:21:58 +0800

Clarify the project's true readiness before new development · 18ba8663 ...

18ba8663

Constraint: Ongoing Ralph-style handoff requires new sessions to distinguish finished capability from smoke-only scaffolding quickly
Rejected: Leave capability status implicit in scattered docs | Increases onboarding ambiguity and status misreads
Confidence: high
Scope-risk: narrow
Directive: Update this map whenever a smoke path becomes real-data validated or a regression invalidates a claimed capability
Tested: Verified docs/current-capability-map.md exists and is linked from docs/README.md and docs/session-handoff.md
Not-tested: Semantic accuracy against future real-dataset runs remains pending

authored 2026-06-02 13:17:24 +0800

Persist the repo status snapshot for faster resumability · ce726bf1 ...

ce726bf1 Browse Files

Constraint: Future sessions benefit from a saved machine-readable snapshot, not just on-demand script output
Rejected: Keep snapshot stdout-only | Makes handoff less durable and harder to automate across sessions
Confidence: high
Scope-risk: narrow
Directive: Refresh .omx/latest_status_snapshot.json whenever the default docs, smoke paths, or next-step commands materially change
Tested: /usr/local/miniconda3/bin/python scripts/status_snapshot.py --output .omx/latest_status_snapshot.json; JSON parse check for latest_commit and next_commands
Not-tested: External automation consuming the saved snapshot over multiple sessions

authored 2026-06-02 13:15:10 +0800

Add a repo status snapshot for faster session handoff · 15c1aa6b ...

15c1aa6b Browse Files

Constraint: Future sessions need a quick machine-readable summary of the verified repo state and next commands
Rejected: Depend on manual reconstruction from docs and git history alone | Slower and more error-prone during handoff
Confidence: high
Scope-risk: narrow
Directive: Keep the snapshot script aligned with the real default docs, drop zones, smoke outputs, and next-step commands
Tested: /usr/local/miniconda3/bin/python scripts/status_snapshot.py
Not-tested: Consumption of the snapshot by external automation beyond manual review

authored 2026-06-02 13:12:57 +0800

Add a first-run checklist for future sessions · 93b2a506 ...

93b2a506 Browse Files

Constraint: New sessions need a minimal startup checklist so they can verify repo health and resume development quickly
Rejected: Keep startup knowledge implicit in long docs only | Increases ramp-up time and the chance of missing key checks
Confidence: high
Scope-risk: narrow
Directive: Update this checklist whenever the default startup workflow or open-dataset commands materially change
Tested: existence checks for acr-engine/FIRST_RUN_CHECKLIST.md, docs/README.md, docs/session-handoff.md, plus docs link-presence checks
Not-tested: Human walkthrough of the full checklist from a fresh shell

authored 2026-06-02 13:10:51 +0800

Add a detailed handoff doc for future development sessions · 5679b5d6 ...

5679b5d6 Browse Files

Constraint: New sessions need a fast, durable understanding of the project state, open-dataset workflow, verified evidence, and next steps
Rejected: Rely on scattered docs and git history alone | Too slow for session handoff and easy to miss critical workflow context
Confidence: high
Scope-risk: narrow
Directive: Keep this handoff doc updated whenever a major workflow milestone or verified capability changes
Tested: existence checks for docs/session-handoff.md and docs/README.md, plus docs index link presence
Not-tested: Manual human review across multiple markdown renderers

authored 2026-06-02 13:09:32 +0800

Add explicit drop zones for real open-music corpora · d2218523 ...

d2218523 Browse Files

Constraint: Replacing the synthetic stand-in with real FMA or MTG-Jamendo data should not require users to infer directory structure
Rejected: Leave only generic workflow text | Still forces users to guess where local audio should live before smoke runs
Confidence: high
Scope-risk: narrow
Directive: Keep future real-corpus onboarding anchored to data/raw drop zones and smoke-local commands
Tested: filesystem existence checks for acr-engine/data/raw/fma_small_audio, acr-engine/data/raw/mtg_jamendo_audio, acr-engine/data/raw/README.md, docs/README.md, docs/open-dataset-workflow.md, acr-engine/data/external_ingested/README.md
Not-tested: Real downloaded audio placed into the new drop zones

authored 2026-06-02 13:07:36 +0800

Automate the full open-dataset smoke workflow behind one command · eee15aca ...

eee15aca

Constraint: Real FMA or MTG-Jamendo onboarding should require only an input directory change, not a long manual command chain
Rejected: Keep the smoke steps separate only | Slows repeated validation and increases operator error risk
Confidence: high
Scope-risk: moderate
Directive: Use smoke-local as the default first-pass validation path for every new local open-music corpus
Tested: /usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local fma data/synthetic_v2/songs --output-root data/external_smoke --eval-ratio 0.2 --query-duration 5.0 --train-epochs 1 --batch-size 2; /usr/local/miniconda3/bin/python -m py_compile src/data/external_adapters.py src/data/manifest_tools.py train.py run_demo.py evaluate.py scripts/generate_artifacts.py
Not-tested: Real downloaded FMA or MTG-Jamendo directories on larger-scale smoke runs

authored 2026-06-02 13:05:01 +0800

Generate release artifacts for the open-dataset smoke path · 87959076 ...

87959076

Constraint: Open-dataset workflow needed the same reporting/release outputs as the synthetic baseline to be operationally useful
Rejected: Treat open-data smoke as a one-off test only | Leaves no reusable benchmark or release documentation trail
Confidence: high
Scope-risk: narrow
Directive: Every future real-dataset smoke run should emit eval JSON plus artifact bundle in the same directory
Tested: /usr/local/miniconda3/bin/python scripts/generate_artifacts.py --eval-json reports/open-smoke-fixed/fma/eval.json --config-json reports/open-smoke-fixed/fma/config.json --output-dir reports/open-smoke-fixed/fma --model-version open-smoke-fixed --data-version synthetic_as_open_fixed_fma
Not-tested: Artifact generation on a larger real downloaded corpus with multiple hard-case buckets

authored 2026-06-02 13:01:47 +0800

Close the open-dataset smoke loop through evaluation · dc9ef1b8 ...

dc9ef1b8

Constraint: Open-dataset support was not complete until imported corpora could train, build indexes, and produce eval outputs without manual path surgery
Rejected: Stop at train.py dry-run | Does not prove the retrieval/evaluation half of the workflow actually works
Confidence: high
Scope-risk: moderate
Directive: Keep future external dataset layouts self-contained and manifests-root aware across training, indexing, and evaluation paths
Tested: /usr/local/miniconda3/bin/python train.py --data data/external_ingested/synthetic_as_open_fixed/fma/manifests --output data/models_open_smoke_fixed --device cpu --epochs 1 --batch-size 2; /usr/local/miniconda3/bin/python run_demo.py build-index --data data/external_ingested/synthetic_as_open_fixed/fma/manifests --model data/models_open_smoke_fixed/best_model.pt --output data/index_open_smoke_fixed --device cpu; /usr/local/miniconda3/bin/python evaluate.py --data data/external_ingested/synthetic_as_open_fixed/fma/manifests --model data/models_open_smoke_fixed/best_model.pt --index-prefix data/index_open_smoke_fixed/reference --split test --device cpu --fast-eval --output-json reports/open-smoke-fixed/fma/eval.json; /usr/local/miniconda3/bin/python -m py_compile evaluate.py run_demo.py src/engines/ecapa_embedder.py src/engines/chromaprint_matcher.py src/data/dataset.py src/data/manifest_tools.py src/data/external_adapters.py train.py
Not-tested: Real downloaded FMA or MTG-Jamendo corpora at larger scale

authored 2026-06-02 12:59:41 +0800

Make open-dataset manifests trainable end to end · b766c74e ...

b766c74e Browse Files

Constraint: Open dataset onboarding was incomplete until generated manifests could enter train.py without manual path fixes
Rejected: Keep manifests as ingestion-only artifacts | Fails the actual training handoff and leaves the workflow broken
Confidence: high
Scope-risk: moderate
Directive: Preserve the self-contained output layout (audio plus manifests) for all future external dataset imports
Tested: /usr/local/miniconda3/bin/python src/data/external_adapters.py prepare-local fma data/synthetic_v2/songs --output-root data/external_ingested/synthetic_as_open_fixed --eval-ratio 0.2 --query-duration 5.0; /usr/local/miniconda3/bin/python src/data/external_adapters.py validate-local fma data/external_ingested/synthetic_as_open_fixed/fma/manifests; /usr/local/miniconda3/bin/python train.py --data data/external_ingested/synthetic_as_open_fixed/fma/manifests --output data/models_open_smoke_fixed --device cpu --epochs 1 --batch-size 2 --dry-run; /usr/local/miniconda3/bin/python -m py_compile src/data/dataset.py train.py src/data/manifest_tools.py src/data/external_adapters.py
Not-tested: Full multi-epoch training and index/eval loop on a real downloaded FMA or MTG-Jamendo corpus

authored 2026-06-02 12:53:53 +0800

Add a single-page open dataset workflow for training prep · fa231444 ...

fa231444 Browse Files

Constraint: Open-dataset onboarding needed one short executable path instead of scattered instructions across many docs
Rejected: Leave ingestion knowledge split across multiple pages only | Raises setup friction before real FMA or MTG-Jamendo training
Confidence: high
Scope-risk: narrow
Directive: Use the single-page workflow as the default operator path before adding more open-dataset sources
Tested: /usr/local/miniconda3/bin/python src/data/external_adapters.py inspect-local fma data/synthetic_v2/songs --eval-ratio 0.2 --query-duration 5.0; /usr/local/miniconda3/bin/python src/data/external_adapters.py prepare-local fma data/synthetic_v2/songs --output-root data/external_ingested/synthetic_as_open --eval-ratio 0.2 --query-duration 5.0; /usr/local/miniconda3/bin/python src/data/external_adapters.py validate-local fma data/external_ingested/synthetic_as_open/fma/manifests
Not-tested: Real FMA or MTG-Jamendo local download directories

authored 2026-06-02 12:50:46 +0800