Add testing workflow guide

沈秋雨
Commit a1c6a382 ... a1c6a38267d1db3e61cc3e9e5c107a927d3c9ca0 authored 2026-04-21 15:50:44 +0800 by 沈秋雨
Showing 2 changed files with 316 additions and 0 deletions
README.md
TESTING_GUIDE.md
--- a/README.md
View file @a1c6a38
+++ b/README.md
View file @a1c6a38
@@ -2,6 +2,8 @@
 独立的 WeKnora Ragas 评估项目。它只调用 WeKnora 公开 API，不依赖 WeKnora 内置的 `/evaluation` 接口。
+完整服务器测试流程见 [TESTING_GUIDE.md](TESTING_GUIDE.md)。
 ## 安装
 ```bash
--- a/TESTING_GUIDE.md 0 → 100644
View file @a1c6a38
+++ b/TESTING_GUIDE.md 0 → 100644
View file @a1c6a38
+# WeKnora Ragas 评估测试流程指南
+本文档用于在服务器上从零验证 WeKnora Ragas 独立评估项目是否可运行。
+## 1. 前置条件
+确认服务器满足：
+- Python 3.10 或更高版本。不要使用 Python 3.6。
+- WeKnora API 可访问，例如 `http://localhost:9090/api/v1`。
+- vLLM 已提供 OpenAI-compatible Chat Completions endpoint，例如 `http://localhost:8000/v1`。
+- Infinity 已提供 OpenAI-compatible Embeddings endpoint，例如 `http://localhost:7997/v1`。
+- 可选：Infinity reranker endpoint 可访问，例如 `http://localhost:7998/v1`。
+检查 Python：
+```bash
+python3 --version
+python3.10 --version
+python3.11 --version
+```
+推荐使用 Python 3.11：
+```bash
+cd /data/weknora_ragas
+python3.11 -m venv .venv
+source .venv/bin/activate
+python --version
+pip install -U pip setuptools wheel
+pip install -e ".[pdf]"
+```
+如果只跑 XLSX 或文本型 PDF，可以先安装基础依赖：
+```bash
+pip install -e .
+```
+## 2. 配置 `.env`
+复制示例文件：
+```bash
+cp .env.example .env
+```
+编辑 `.env`：
+```bash
+WEKNORA_BASE_URL=http://localhost:9090/api/v1
+WEKNORA_API_KEY=your-weknora-api-key
+WEKNORA_KB_ID=
+WEKNORA_KB_NAME=ragas-eval-pilot
+RAGAS_LLM_API_KEY=EMPTY
+RAGAS_LLM_BASE_URL=http://localhost:8000/v1
+RAGAS_GENERATOR_MODEL=your-vllm-model-id
+RAGAS_JUDGE_MODEL=your-vllm-model-id
+RAGAS_EMBEDDING_API_KEY=EMPTY
+RAGAS_EMBEDDING_BASE_URL=http://localhost:7997/v1
+RAGAS_EMBEDDING_MODEL=your-embedding-model-id
+RAGAS_RERANKER_API_KEY=EMPTY
+RAGAS_RERANKER_BASE_URL=http://localhost:7998/v1
+RAGAS_RERANKER_MODEL=your-reranker-model-id
+TESTSET_SIZE=10
+REQUEST_INTERVAL_SECONDS=0.2
+```
+如果服务没有鉴权，`RAGAS_*_API_KEY` 仍建议填 `EMPTY`，避免 OpenAI client 因空 key 报错。
+确认模型 ID：
+```bash
+curl http://localhost:8000/v1/models
+curl http://localhost:7997/v1/models
+```
+把返回的 `id` 精确填入 `RAGAS_JUDGE_MODEL` 和 `RAGAS_EMBEDDING_MODEL`。
+## 3. 服务连通性检查
+先检查 WeKnora 知识库：
+```bash
+python scripts/00_create_kb.py
+```
+如果 `.env` 中 `WEKNORA_KB_ID` 为空，该脚本会调用：
+```text
+POST /api/v1/knowledge-bases
+{"name": "..."}
+```
+创建成功后会把 ID 写回 `.env`。
+再检查模型服务：
+```bash
+python scripts/00_check_models.py
+```
+期望输出包括：
+```text
+[OK] Generator LLM
+[OK] Judge LLM
+[OK] Embedding
+All configured model services are reachable.
+```
+如果配置了 reranker，也会检查：
+```text
+[OK] Reranker
+```
+如果 reranker 当前不用，可以让 `RAGAS_RERANKER_BASE_URL` 或 `RAGAS_RERANKER_MODEL` 为空，脚本会跳过。
+## 4. 准备 Pilot 数据
+首轮不要直接跑大规模数据。建议：
+- 2 个 PDF。
+- 1 个 XLSX。
+- `TESTSET_SIZE=10`。
+放置文件：
+```bash
+mkdir -p data/raw_docs/pdf data/raw_docs/xlsx
+cp /path/to/*.pdf data/raw_docs/pdf/
+cp /path/to/*.xlsx data/raw_docs/xlsx/
+```
+## 5. 执行完整 Pilot
+按顺序执行：
+```bash
+python scripts/01_upload_docs.py
+python scripts/02_wait_ingestion.py
+python scripts/03_export_chunks.py
+python scripts/04_parse_docs.py
+python scripts/05_generate_testset.py
+python scripts/06_review_testset.py
+python scripts/07_run_weknora_qa.py
+python scripts/08_build_ragas_input.py
+python scripts/09_run_ragas_eval.py
+python scripts/10_report.py
+```
+说明：
+- `01_upload_docs.py` 上传 `data/raw_docs/pdf` 和 `data/raw_docs/xlsx`。
+- `02_wait_ingestion.py` 等待 WeKnora 解析完成。
+- `03_export_chunks.py` 导出 WeKnora chunks。
+- `04_parse_docs.py` 在评估侧解析原始文档，生成 Ragas 测试集来源。
+- `05_generate_testset.py` 生成候选 QA。
+- `06_review_testset.py` 当前会把候选 QA 标为 approved，后续可替换为人工审核。
+- `07_run_weknora_qa.py` 逐条调用 WeKnora 问答并解析 SSE。
+- `08_build_ragas_input.py` 合并 QA 和 WeKnora 输出。
+- `09_run_ragas_eval.py` 调用 Ragas 打分。
+- `10_report.py` 生成 Markdown 报告。
+## 6. 产物验收
+检查这些文件是否生成：
+```bash
+ls -lh data/exported/knowledge.jsonl
+ls -lh data/exported/chunks.jsonl
+ls -lh data/parsed_docs/documents.jsonl
+ls -lh data/parsed_docs/parse_summary.json
+ls -lh data/testsets/testset.reviewed.jsonl
+ls -lh data/runs/weknora_answers.jsonl
+ls -lh data/runs/ragas_input.jsonl
+ls -lh data/reports/ragas_scores.csv
+ls -lh data/reports/summary.md
+```
+快速检查关键字段：
+```bash
+python - <<'PY'
+import json
+from pathlib import Path
+for path in [
+    "data/exported/chunks.jsonl",
+    "data/parsed_docs/documents.jsonl",
+    "data/runs/weknora_answers.jsonl",
+    "data/runs/ragas_input.jsonl",
+]:
+    rows = [json.loads(line) for line in Path(path).read_text(encoding="utf-8").splitlines() if line.strip()]
+    print(path, len(rows))
+    if rows:
+        print(rows[0].keys())
+PY
+```
+最低验收标准：
+- `data/exported/chunks.jsonl` 非空。
+- `data/parsed_docs/documents.jsonl` 非空。
+- `data/runs/weknora_answers.jsonl` 中大部分 `response` 非空。
+- `data/runs/ragas_input.jsonl` 中 `retrieved_contexts` 非空比例合理。
+- `data/reports/ragas_scores.csv` 至少有一项指标列。
+- `data/reports/summary.md` 可读。
+## 7. 常见故障
+### Python 版本过低
+现象：
+```text
+Could not find a version that satisfies the requirement setuptools>=68
+```
+原因：当前虚拟环境是 Python 3.6。项目要求 Python 3.10+。
+处理：
+```bash
+rm -rf .venv
+python3.11 -m venv .venv
+source .venv/bin/activate
+pip install -U pip setuptools wheel
+pip install -e ".[pdf]"
+```
+### 模型 endpoint 填错
+vLLM 和 Infinity 都要填 OpenAI-compatible `/v1` 地址，例如：
+```bash
+RAGAS_LLM_BASE_URL=http://localhost:8000/v1
+RAGAS_EMBEDDING_BASE_URL=http://localhost:7997/v1
+```
+不要填 Ollama 原生 `/api` 或服务根路径。
+### Embedding 报 invalid input type
+项目已经在 `ragas_runner.py` 中设置：
+```text
+tiktoken_enabled=False
+check_embedding_ctx_length=False
+```
+如果仍报错，优先用 `scripts/00_check_models.py` 确认 Infinity endpoint 是否兼容 OpenAI embeddings API。
+### Ragas 指标超时或 NaN
+本地或小模型 judge 可能无法稳定输出 Ragas 需要的结构化结果。先缩小指标集，例如只保留：
+```yaml
+metrics:
+  - response_relevancy
+```
+确认链路通后，再逐个打开：
+```yaml
+  - faithfulness
+  - context_precision
+  - context_recall
+  - factual_correctness
+```
+也可以调大：
+```yaml
+timeout_seconds: 600
+max_workers: 1
+max_tokens: 4096
+```
+### WeKnora 问答没有 retrieved_contexts
+检查：
+```bash
+python scripts/03_export_chunks.py
+python scripts/07_run_weknora_qa.py
+```
+重点看：
+- 知识库是否解析完成。
+- chunks 是否导出非空。
+- WeKnora 问答 SSE 是否返回 `references` 事件。
+- `data/runs/failed_requests.jsonl` 中是否记录 `empty_retrieval`。
+## 8. 扩大样本规模
+首轮 10 条样本通过后，再扩大：
+```bash
+TESTSET_SIZE=50
+```
+再逐步扩大到 100-300 条。每次扩大前先确认：
+- Ragas judge 延迟可接受。
+- `failed_requests.jsonl` 中失败比例低。
+- `summary.md` 中检索失败样本可解释。
+- QA 集经过人工审核或抽样审核。