新版本_完成第一步团队数据的清洗测试

3b2e71ec · lidongxu · c34ee4a3 · 3b2e71ec · c34ee4a3 · c34ee4a3
--- a/.gitignore
+++ b/.gitignore
+__pycache__/
+*.py[cod]
+*$py.class
+.Python
+*.so
+.venv/
+venv/
+.env
+# 团队转换默认输出目录
+code/cache/
--- a/code/.env
+++ b/code/.env
-# 数据清洗系统 - 环境变量配置
-# 通过 ENV 区分环境：不设或 ENV=development 为开发，ENV=production 为生产
-# 当前环境：development | production（不写则默认为开发）
-ENV=development
-# 服务器配置
-HOST=0.0.0.0
-PORT=8000
-DEBUG=False
-# ---------- 开发环境数据库（当前 ENV=development 时使用） ----------
-DB_HOST=192.168.100.39
-DB_PORT=25301
-DB_USER=root
-DB_PASSWORD="Zt%68Dsuv&M"
-DB_NAME=market_bi
-# ---------- 生产环境数据库（ENV=production 时使用） ----------
-PROD_DB_HOST=rm-2ze28qp55mrm34g8bbo.mysql.rds.aliyuncs.com
-PROD_DB_PORT=3306
-PROD_DB_USER=sfabus
-PROD_DB_PASSWORD=Wxl@325Pa91
-PROD_DB_NAME=market_bi
-# 日志配置
-LOG_LEVEL=INFO
-LOG_FILE=./logs/app.log
-# Excel 下载配置
-EXCEL_DOWNLOAD_TIMEOUT=30
-MAX_EXCEL_SIZE=52428800  # 50MB
-# 任务超时配置
-TASK_TIMEOUT_SECONDS=3600  # 1小时
--- a/code/.gitignore
+++ b/code/.gitignore
-# ========== Python ==========
-__pycache__/
-*.py[cod]
-*$py.class
-*.so
-.Python
-build/
-develop-eggs/
-dist/
-downloads/
-eggs/
-.eggs/
-lib/
-lib64/
-parts/
-sdist/
-var/
-wheels/
-*.egg-info/
-.installed.cfg
-*.egg
-# 虚拟环境
-venv/
-.venv/
-env/
-ENV/
-# ========== 测试与覆盖率 ==========
-.pytest_cache/
-.coverage
-htmlcov/
-.tox/
-.nox/
-coverage.xml
-*.cover
-.hypothesis/
-# ========== IDE / 编辑器 ==========
-.idea/
-.vscode/
-*.swp
-*.swo
-*~
-.project
-.pydevproject
-.settings/
-# ========== 系统文件 ==========
-.DS_Store
-.DS_Store?
-Thumbs.db
-ehthumbs.db
-Desktop.ini
-# ========== 日志与临时 ==========
-*.log
-*.tmp
-*.temp
-.cache/
-# ========== 其他 ==========
-*.sql.backup
-*.bak
--- a/code/README.md
+++ b/code/README.md
-# 数据清洗系统 - 项目说明文档
+uvicorn main:app --host 0.0.0.0 --port 8000 --reload
\ No newline at end of file
-## 项目概述
-本项目是一个使用 FastAPI 框架开发的数据清洗系统，支持从 Excel 文件中提取数据、进行数据清洗处理，并将最终结果保存到 MySQL 数据库。
-### 核心功能
-1. **Excel 数据解析**：从网络链接下载并解析 Excel 文件
-2. **数据清洗处理**：对解析后的数据进行验证、清洗和去重
-3. **进度反馈**：通过 HTTP 轮询方式向前端实时反馈数据清洗进度
-4. **数据持久化**：将清洗后的数据保存到 MySQL 数据库
---
-## 项目结构
-```
-clean_data/
-├── index.py                      # 主程序入口
-├── requirements.txt              # 项目依赖列表
-├── .env.example                  # 环境变量配置示例
-├── README.md                     # 项目说明文档
-│
-├── core/                         # 核心业务模块
-│   ├── __init__.py
-│   ├── excel_handler.py          # Excel 文件处理
-│   ├── data_cleaner.py           # 数据清洗逻辑
-│   ├── db_handler.py             # 数据库交互
-│   └── progress_manager.py       # 进度管理
-│
-└── utils/                        # 工具模块
-    ├── __init__.py
-    ├── exceptions.py             # 自定义异常
-    └── validators.py             # 数据验证
-```
---
-## 快速开始
-### 1. 环境准备
-```bash
-# 克隆项目（如果需要）
-cd clean_data
-# 创建虚拟环境（推荐）
-python -m venv venv
-# 激活虚拟环境
-# Windows:
-venv\Scripts\activate
-# Linux/Mac:
-source venv/bin/activate
-# 安装依赖
-pip install -r requirements.txt
-```
-### 2. 配置环境变量
-```bash
-# 复制环境变量配置文件
-cp .env.example .env
-# 编辑 .env 文件，填写实际的配置信息
-# 特别注意：
-# - DB_HOST, DB_PORT, DB_USER, DB_PASSWORD 需要填写实际的数据库配置
-# - DB_NAME 为要使用的数据库名称
-```
-### 3. 启动服务
-```bash
-# 方式一：使用 Python 直接运行
-python index.py
-# 方式二：使用 Uvicorn 运行（推荐）
-uvicorn index:app --host 0.0.0.0 --port 8000 --reload
-# 服务将在 http://0.0.0.0:8000 启动
-# API 文档：http://localhost:8000/docs（Swagger UI）
-```
---
-## API 接口文档
-### 1. 启动数据清洗任务
-**请求**
-```
-POST /api/v1/clean
-```
-**请求体**
-```json
-{
-  "excel_url": "https://example.com/data.xlsx",
-  "department": "sales",
-  "description": "Q1销售数据清洗"
-}
-```
-**响应**
-```json
-{
-  "task_id": "550e8400-e29b-41d4-a716-446655440000",
-  "status": "queued",
-  "message": "任务已创建，正在处理中...",
-  "data_preview": null
-}
-```
-### 2. 获取数据清洗进度
-**请求**
-```
-GET /api/v1/progress/{task_id}
-```
-**响应**
-```json
-{
-  "task_id": "550e8400-e29b-41d4-a716-446655440000",
-  "status": "processing",
-  "progress": 65,
-  "message": "已清洗 650/1000 行数据",
-  "timestamp": "2026-03-06T10:30:45.123456"
-}
-```
-**状态说明**
- `queued`: 任务已创建，排队中
- `processing`: 数据正在处理中
- `completed`: 数据清洗完成
- `failed`: 清洗过程中出错
-### 3. 获取清洗结果
-**请求**
-```
-GET /api/v1/result/{task_id}
-```
-**响应**
-```json
-{
-  "task_id": "550e8400-e29b-41d4-a716-446655440000",
-  "status": "ready_to_save",
-  "message": "数据清洗完成，可进行保存",
-  "data_preview": [
-    {"产品": "产品A", "金额": 1000},
-    {"产品": "产品B", "金额": 2000}
-  ],
-  "total_rows": 1000,
-  "department": "sales"
-}
-```
-### 4. 保存清洗后的数据
-**请求**
-```
-POST /api/v1/save
-```
-**请求体**
-```json
-{
-  "task_id": "550e8400-e29b-41d4-a716-446655440000",
-  "table_name": "sales_data"
-}
-```
-**响应**
-```json
-{
-  "task_id": "550e8400-e29b-41d4-a716-446655440000",
-  "status": "saved",
-  "message": "数据已成功保存到数据库",
-  "affected_rows": 1000
-}
-```
-### 5. 健康检查
-**请求**
-```
-GET /api/v1/health
-```
-**响应**
-```json
-{
-  "status": "healthy",
-  "timestamp": "2026-03-06T10:30:45.123456",
-  "service": "数据清洗系统"
-}
-```
---
-## 进度反馈机制
-### HTTP 轮询方案（无需 WebSocket）
-系统采用 **HTTP 轮询** 方式实现进度反馈，具有以下优势：
-1. **无连接保持**：客户端主动请求，降低服务器负载
-2. **兼容性强**：支持所有 HTTP 客户端
-3. **易于部署**：无需 WebSocket 基础设施
-4. **便于扩展**：易于部署到各种云环境
-### 前端实现建议
-```javascript
-// 示例：React/Vue 前端逻辑
-const pollProgress = async (taskId) => {
-  const interval = setInterval(async () => {
-    try {
-      const response = await fetch(`/api/v1/progress/${taskId}`);
-      const data = await response.json();
-      // 更新进度条
-      updateProgressBar(data.progress);
-      updateMessage(data.message);
-      // 任务完成时停止轮询
-      if (data.status === 'completed' || data.status === 'failed') {
-        clearInterval(interval);
-      }
-    } catch (error) {
-      console.error('获取进度失败:', error);
-    }
-  }, 1000);  // 每秒轮询一次
-};
-```
---
-## 数据清洗逻辑
-### 清洗步骤
-1. **下载**：从网络链接下载 Excel 文件
-2. **解析**：使用 openpyxl 解析 Excel 内容
-3. **验证**：验证数据类型和必填字段
-4. **清洗**：
-   - 移除首尾空格
-   - 处理空值
-   - 去重处理
-5. **缓存**：将清洗后的数据存储在内存中
-6. **保存**：前端确认后保存到数据库
-### 自定义清洗规则
-编辑 `core/data_cleaner.py` 中的 `_validate_required_fields` 方法来自定义不同部门的清洗规则：
-```python
-required_fields_map = {
-    'sales': ['产品', '金额', '销售日期'],
-    'inventory': ['SKU', '数量', '仓库'],
-    'finance': ['交易日期', '金额', '类别']
-}
-```
---
-## 数据库配置
-### MySQL 5.6+ 连接配置
-编辑 `.env` 文件：
-```ini
-DB_HOST=localhost
-DB_PORT=3306
-DB_USER=root
-DB_PASSWORD=your_password
-DB_NAME=clean_data
-```
-### 创建目标表（示例）
-```sql
-CREATE TABLE sales_data (
-  id INT AUTO_INCREMENT PRIMARY KEY,
-  产品 VARCHAR(100),
-  金额 DECIMAL(10, 2),
-  销售日期 DATE,
-  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
-);
-```
---
-## 异常处理
-系统定义了多种自定义异常，便于错误追踪：
- **DataCleaningException**：数据清洗过程中的异常
- **DatabaseException**：数据库操作异常
- **ExcelParsingException**：Excel 解析异常
- **ValidationException**：数据验证异常
-所有异常都会被记录到日志中，便于问题排查。
---
-## 日志记录
-系统使用 Python 标准 logging 模块记录所有操作，日志级别可在 `.env` 中配置：
-```
-LOG_LEVEL=INFO
-LOG_FILE=./logs/app.log
-```
-日志记录内容包括：
- 任务创建和完成
- 数据处理进度
- 异常错误信息
- 数据库操作记录
---
-## 性能优化建议
-1. **批量插入**：数据库操作使用批量插入（默认每 1000 行为一批）
-2. **异步处理**：使用 FastAPI 的后台任务避免阻塞响应
-3. **进度缓存**：使用内存字典缓存进度数据和清洗结果
-4. **连接池**：建议为数据库使用连接池（可扩展功能）
---
-## 常见问题
-### Q: 为什么不使用 WebSocket？
-A: HTTP 轮询方案具有以下优势：
- 服务器不需要维持连接状态
- 更容易水平扩展
- 无需 WebSocket 库和基础设施
- 使用标准 HTTP 协议，兼容性更强
-### Q: 清洗后的数据存储在哪里？
-A: 清洗后的数据默认存储在：
- **短期**：服务器内存中（task_id 映射）
- **长期**：用户确认后保存到 MySQL 数据库
-### Q: 如何处理大文件？
-A: 可在 `.env` 中配置最大文件大小限制：
-```
-MAX_EXCEL_SIZE=52428800  # 50MB
-```
---
-## 扩展功能（可选）
-1. **数据备份**：定期备份已保存的数据
-2. **审计日志**：记录所有数据修改操作
-3. **权限控制**：添加用户认证和授权机制
-4. **缓存优化**：使用 Redis 替代内存缓存
-5. **任务队列**：使用 Celery 处理大批量任务
---
-## 部署建议
-### 生产环境
-1. 使用 Gunicorn + Uvicorn 运行应用
-2. 配置反向代理（nginx）
-3. 启用 HTTPS
-4. 配置日志持久化
-5. 设置监控告警
-### Docker 部署
-```dockerfile
-FROM python:3.9-slim
-WORKDIR /app
-COPY requirements.txt .
-RUN pip install -r requirements.txt
-COPY . .
-CMD ["uvicorn", "index:app", "--host", "0.0.0.0", "--port", "8000"]
-```
---
-## 技术栈
- **Web 框架**：FastAPI 0.104.1
- **ASGI 服务器**：Uvicorn 0.24.0
- **Excel 处理**：openpyxl 3.10.10
- **数据库驱动**：mysql-connector-python 8.2.0
- **数据验证**：Pydantic 2.5.0
- **HTTP 客户端**：requests 2.31.0
---
-## License
-MIT
---
-## 支持
-如有任何问题或建议，请联系开发团队。
--- a/code/api/__init__.py
+++ b/code/api/__init__.py
+# HTTP 路由与子模块
--- a/code/api/exception_handlers.py
+++ b/code/api/exception_handlers.py
+"""将异常与校验失败统一为 { code, data, msg } 响应体。"""
+from typing import Any
+from fastapi import Request
+from fastapi.encoders import jsonable_encoder
+from fastapi.exceptions import HTTPException, RequestValidationError
+from fastapi.responses import JSONResponse
+def _msg_from_detail(detail: str | dict | list | None) -> str:
+    if detail is None:
+        return "请求失败"
+    if isinstance(detail, str):
+        return detail
+    if isinstance(detail, dict):
+        return str(detail.get("error") or detail.get("msg") or detail.get("message") or "请求失败")
+    return "参数校验失败"
+def _data_from_detail(detail: str | dict | list | None) -> Any:
+    if isinstance(detail, (dict, list)):
+        return jsonable_encoder(detail)
+    return None
+async def http_exception_handler(request: Request, exc: HTTPException) -> JSONResponse:
+    body = {
+        "code": exc.status_code,
+        "data": _data_from_detail(exc.detail),
+        "msg": _msg_from_detail(exc.detail),
+    }
+    return JSONResponse(status_code=exc.status_code, content=body)
+async def validation_exception_handler(request: Request, exc: RequestValidationError) -> JSONResponse:
+    errors = jsonable_encoder(exc.errors())
+    return JSONResponse(
+        status_code=422,
+        content={"code": 422, "data": errors, "msg": "参数校验失败"},
+    )
--- a/code/api/response.py
+++ b/code/api/response.py
+"""统一 API 响应：code=0 成功，非 0 为逻辑/业务错误码；data 为载荷；msg 为说明文案。"""
+from typing import Any
+from pydantic import BaseModel, Field
+class ApiEnvelope(BaseModel):
+    code: int = Field(..., description="0 成功，非 0 失败")
+    data: Any = None
+    msg: str = ""
+    model_config = {"json_schema_extra": {"example": {"code": 0, "data": {}, "msg": "成功"}}}
+def ok(data: Any = None, msg: str = "") -> ApiEnvelope:
+    return ApiEnvelope(code=0, data=data, msg=msg)
--- a/code/api/routes_clean.py
+++ b/code/api/routes_clean.py
+"""清洗相关 HTTP 路由：校验入参、调用团队转换、映射业务错误到 HTTP 状态码。"""
+from fastapi import APIRouter, HTTPException
+from api.response import ApiEnvelope, ok
+from api.schemas import CleanRequestBody
+from api.team_conversion_loader import default_team_target_path, run_team_conversion
+DEPARTMENT_RISK_AUDIT_CLEAN = "风控稽查数据清洗"
+api_router = APIRouter(prefix="/api")
+def _audit_date_str_from_body(body: CleanRequestBody) -> str | None:
+    if body.year is None or body.month is None or body.day is None:
+        return None
+    return f"{body.year:04d}{body.month:02d}{body.day:02d}"
+def _raise_http_for_failed_result(result: dict) -> None:
+    """团队转换返回 ok=False 时，按 error 文案选择状态码。"""
+    err = result.get("error") or ""
+    if "source_url 须为" in err:
+        raise HTTPException(status_code=400, detail=result)
+    if "从 URL 读取源表失败" in err or err.startswith("读取源表失败"):
+        raise HTTPException(status_code=502, detail=result)
+    if result.get("message") and "error" not in result:
+        return
+    raise HTTPException(status_code=500, detail=result)
+@api_router.post("/v1/clean", response_model=ApiEnvelope)
+def post_clean(body: CleanRequestBody) -> ApiEnvelope:
+    dept = (body.department or "").strip()
+    if dept != DEPARTMENT_RISK_AUDIT_CLEAN:
+        raise HTTPException(
+            status_code=400,
+            detail={
+                "ok": False,
+                "error": f"不支持的 department: {dept!r}，当前仅支持「{DEPARTMENT_RISK_AUDIT_CLEAN}」",
+            },
+        )
+    team_url = (body.team_url or "").strip()
+    team_target = (body.team_target_path or "").strip() or default_team_target_path()
+    if not team_url:
+        raise HTTPException(
+            status_code=400,
+            detail={"ok": False, "error": "team_url 不能为空"},
+        )
+    audit_date_str = _audit_date_str_from_body(body)
+    result = run_team_conversion(team_url, team_target, audit_date_str)
+    if result.get("ok"):
+        return ok(data=result, msg="成功")
+    _raise_http_for_failed_result(result)
+    return ok(data=result, msg=str(result.get("message") or ""))
--- a/code/api/schemas.py
+++ b/code/api/schemas.py
+"""清洗接口请求体。"""
+from pydantic import BaseModel, Field
+class CleanRequestBody(BaseModel):
+    department: str = Field(..., description="业务类型，风控稽查数据清洗 走团队转换")
+    year: int | None = None
+    month: int | None = None
+    day: int | None = None
+    team_url: str | None = None
+    team_target_path: str | None = None  # 默认：项目下 cache/team_时间戳.xlsx
+    puling_url: str | None = None
+    chengyu_url: str | None = None
--- a/code/api/team_conversion_loader.py
+++ b/code/api/team_conversion_loader.py
+"""动态加载团队转换脚本（历史路径/中文文件名），对外只暴露可调用入口与路径工具。"""
+import importlib.util
+from datetime import datetime
+from pathlib import Path
+from typing import Any, Callable
+_CODE_BASE = Path(__file__).resolve().parent.parent
+_TEAM_SCRIPT = _CODE_BASE / "py_" / "audit" / "point_sale" / "data_conversion.py"
+def _load_run_team_conversion() -> Callable[..., dict[str, Any]]:
+    spec = importlib.util.spec_from_file_location("team_data_convert", _TEAM_SCRIPT)
+    if spec is None or spec.loader is None:
+        raise RuntimeError(f"无法加载团队转换模块: {_TEAM_SCRIPT}")
+    mod = importlib.util.module_from_spec(spec)
+    spec.loader.exec_module(mod)
+    fn = getattr(mod, "run_team_conversion", None)
+    if fn is None:
+        raise RuntimeError("data_conversion 中缺少 run_team_conversion")
+    return fn
+run_team_conversion: Callable[..., dict[str, Any]] = _load_run_team_conversion()
+def default_team_target_path() -> str:
+    """未传路径时：cache/team_{时间戳}.xlsx"""
+    d = _CODE_BASE / "cache"
+    d.mkdir(parents=True, exist_ok=True)
+    ts = datetime.now().strftime("%Y%m%d_%H%M%S")
+    return str(d / f"team_{ts}.xlsx")
--- a/code/bi_price_xx.sql
+++ b/code/bi_price_xx.sql
-/*
-Navicat MySQL Data Transfer
-Source Server         : t100_production
-Source Server Version : 50744
-Source Host           : rm-2ze28qp55mrm34g8bbo.mysql.rds.aliyuncs.com:3306
-Source Database       : market_bi
-Target Server Type    : MYSQL
-Target Server Version : 50744
-File Encoding         : 65001
-Date: 2026-03-12 11:37:31
-*/
-SET FOREIGN_KEY_CHECKS=0;
-- ----------------------------
-- Table structure for bi_price_xx
-- ----------------------------
-DROP TABLE IF EXISTS `bi_price_xx`;
-CREATE TABLE `bi_price_xx` (
-  `id` varchar(255) COLLATE utf8mb4_bin DEFAULT NULL COMMENT '主键',
-  `bi_product` varchar(255) COLLATE utf8mb4_bin DEFAULT NULL COMMENT '产品系统',
-  `prd_name` varchar(255) COLLATE utf8mb4_bin DEFAULT NULL COMMENT '口味',
-  `pro_weight` varchar(255) COLLATE utf8mb4_bin DEFAULT NULL COMMENT '产品克重',
-  `channel_type` varchar(255) COLLATE utf8mb4_bin DEFAULT NULL COMMENT '渠道',
-  `creator` varchar(255) COLLATE utf8mb4_bin DEFAULT NULL COMMENT '提交人',
-  `modifier` varchar(255) COLLATE utf8mb4_bin DEFAULT NULL COMMENT '修改人',
-  `creator_nickname` varchar(255) COLLATE utf8mb4_bin DEFAULT NULL COMMENT '提交人昵称',
-  `modifier_nickname` varchar(255) COLLATE utf8mb4_bin DEFAULT NULL COMMENT '修改人昵称',
-  `create_time` datetime DEFAULT NULL COMMENT '提交时间',
-  `modify_time` datetime DEFAULT NULL COMMENT '修改时间',
-  `qbi_system_upload_id` bigint(30) DEFAULT NULL COMMENT '上传批次主键',
-  `low_price` decimal(30,2) DEFAULT NULL COMMENT '低价',
-  `normal_price` decimal(30,2) DEFAULT NULL COMMENT '零售价'
-) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin COMMENT='线下价盘表';
-- ----------------------------
-- Records of bi_price_xx
-- ----------------------------
-INSERT INTO `bi_price_xx` VALUES ('7ac70b27-59d8-413c-81e4-d11d3e753ca2', '虎皮凤爪', '全品味', '105g', 'KA', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '14.90', '21.89');
-INSERT INTO `bi_price_xx` VALUES ('3ebea3f4-4e40-4088-aacb-d5d7b3194d82', '虎皮凤爪', '全品味', '210g', 'KA', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '26.50', '38.39');
-INSERT INTO `bi_price_xx` VALUES ('344df49c-597d-4826-b659-2fd1638edc45', '虎皮凤爪', '全品味', '散称', 'KA', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '45.80', '54.78');
-INSERT INTO `bi_price_xx` VALUES ('21f14713-eb58-4090-8df0-aa8a5b98f3e8', '去骨凤爪', '全品味', '72g', 'KA', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '14.90', '21.89');
-INSERT INTO `bi_price_xx` VALUES ('7d222a14-8191-4f88-9fec-2fa0e21c27cc', '去骨凤爪', '全品味', '138g', 'KA', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '26.50', '38.39');
-INSERT INTO `bi_price_xx` VALUES ('3d2ced09-1e88-46ef-87a3-e2368e5aaff1', '脆笋去骨', '全品味', '散称', 'KA', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '45.80', '50.38');
-INSERT INTO `bi_price_xx` VALUES ('bcc8e320-0db0-4535-9fd0-3e79585881c8', '老卤凤爪', '全品味', '95g', 'KA', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '14.90', '21.89');
-INSERT INTO `bi_price_xx` VALUES ('f795e572-7b8a-41f9-b2a4-678803019224', '老卤鸭掌', '全品味', '95g', 'KA', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '14.90', '21.89');
-INSERT INTO `bi_price_xx` VALUES ('de4bcae9-6ace-43b1-b88f-7c70ca674b64', '鸡肉豆堡', '全品味', '120g', 'KA', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '9.90', '14.19');
-INSERT INTO `bi_price_xx` VALUES ('4949ad7f-8f90-4bf9-8848-e862b210e3e0', '虎皮小鸡腿', '全品味', '80g', 'KA', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '12.80', '18.48');
-INSERT INTO `bi_price_xx` VALUES ('4286df2e-dcf4-4dd5-a25e-c26d8a3c8829', '虎皮凤爪', '全品味', '105g', 'BC', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '14.90', '21.89');
-INSERT INTO `bi_price_xx` VALUES ('e8b863b9-ce10-4a33-b142-6019c62aee93', '虎皮凤爪', '全品味', '210g', 'BC', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '26.50', '38.39');
-INSERT INTO `bi_price_xx` VALUES ('07ddcf94-619e-468a-9ad7-67b1835c5898', '虎皮凤爪', '全品味', '68g', 'BC', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '9.80', '14.19');
-INSERT INTO `bi_price_xx` VALUES ('0a193d95-e8ec-4e12-abe0-5dc465b1e1bd', '虎皮凤爪', '全品味', '25g', 'BC', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '3.90', '5.39');
-INSERT INTO `bi_price_xx` VALUES ('347aa860-eadf-4ae0-aa31-7a64f15322b3', '虎皮凤爪', '全品味', '散称', 'BC', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '45.80', '54.78');
-INSERT INTO `bi_price_xx` VALUES ('24e265c7-d526-4b30-a981-2eae56ac2227', '去骨凤爪', '全品味', '72g', 'BC', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '14.90', '21.89');
-INSERT INTO `bi_price_xx` VALUES ('71a70334-e06a-43a0-913e-e81d1eafcf94', '老卤凤爪', '全品味', '95g', 'BC', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '14.90', '21.89');
-INSERT INTO `bi_price_xx` VALUES ('b9686a81-9771-4c4b-8c4f-faf0b5fdf71f', '老卤鸭掌', '全品味', '95g', 'BC', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '14.90', '21.89');
-INSERT INTO `bi_price_xx` VALUES ('350cc768-1407-40ec-b9c4-4a336cd89118', '鸡肉豆堡', '全品味', '120g', 'BC', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '9.90', '14.19');
-INSERT INTO `bi_price_xx` VALUES ('5c67fa66-799b-43bc-8263-b84291ec7c44', '鸡肉豆堡', '全品味', '散称', 'BC', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '34.90', '38.39');
-INSERT INTO `bi_price_xx` VALUES ('09084915-3a6a-4339-a8b1-f26ca803aec6', '虎皮小鸡腿', '全品味', '80g', 'BC', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '12.80', '18.48');
-INSERT INTO `bi_price_xx` VALUES ('54b65a1e-41b2-4d86-9895-64eb03f33ed0', '虎皮凤爪', '全品味', '105g', 'CVS', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '14.90', '21.89');
-INSERT INTO `bi_price_xx` VALUES ('44b67fcf-42ac-4b93-83e0-da427b013737', '虎皮凤爪', '全品味', '210g', 'CVS', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '29.50', '38.39');
-INSERT INTO `bi_price_xx` VALUES ('d5dfc92a-a075-4277-938b-269f103aa76f', '虎皮凤爪', '全品味', '68g', 'CVS', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '10.90', '14.19');
-INSERT INTO `bi_price_xx` VALUES ('ce9500ff-6671-4514-8a40-6a84c79ab1a3', '虎皮凤爪', '全品味', '25g', 'CVS', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '3.90', '5.39');
-INSERT INTO `bi_price_xx` VALUES ('0a34f1bc-7e5e-4cf3-a059-da9a29d0bad8', '去骨凤爪', '全品味', '72g', 'CVS', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '14.90', '21.89');
-INSERT INTO `bi_price_xx` VALUES ('0bf423ef-5275-43ce-80b4-dac3662b21c2', '去骨凤爪', '全品味', '138g', 'CVS', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '29.50', '38.39');
-INSERT INTO `bi_price_xx` VALUES ('0ebf0fe1-25cd-4b1c-bb16-698d714a90f6', '老卤凤爪', '全品味', '95g', 'CVS', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '14.90', '21.89');
-INSERT INTO `bi_price_xx` VALUES ('3456b8c9-772f-4c8b-a2a1-a2562273839f', '老卤鸭掌', '全品味', '95g', 'CVS', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '14.90', '21.89');
-INSERT INTO `bi_price_xx` VALUES ('55957f41-d285-4dac-b661-b1c6f7faf4c1', '鸡肉豆堡', '全品味', '120g', 'CVS', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '9.90', '14.19');
-INSERT INTO `bi_price_xx` VALUES ('44274c3b-697b-401b-9670-b663a235cbf9', '虎皮小鸡腿', '全品味', '80g', 'CVS', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '12.80', '18.48');
-INSERT INTO `bi_price_xx` VALUES ('fb961ee5-3d1c-406b-81f0-68f73a24e82a', '虎皮凤爪', '全品味', '68g', '零食', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '8.80', '14.19');
-INSERT INTO `bi_price_xx` VALUES ('e400d8e8-19ce-41dd-b02e-391933d76d43', '虎皮凤爪', '全品味', '散称', '零食', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '45.80', '54.78');
-INSERT INTO `bi_price_xx` VALUES ('edacdcb9-6450-4965-870c-979f5aa0fdf3', '脆笋去骨', '全品味', '散称', '零食', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '45.80', '50.38');
-INSERT INTO `bi_price_xx` VALUES ('d75c4276-56e6-4296-b389-8ea2ff386d48', '鸡肉豆堡', '全品味', '散称', '零食', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '34.90', '38.39');
-INSERT INTO `bi_price_xx` VALUES ('b748e150-82ad-447d-89f7-7a5b4efc5b25', '虎皮小鸡腿', '全品味', '散称', '零食', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '35.80', '39.38');
-INSERT INTO `bi_price_xx` VALUES ('a8854d47-e1a3-4673-afb5-9b7d0fa8d183', '虎皮凤爪', '全品味', '105g', '批发', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '10.40', '9.90');
-INSERT INTO `bi_price_xx` VALUES ('b93e7fef-8ecf-4434-9441-da61b839069b', '虎皮凤爪', '全品味', '210g', '批发', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '19.11', '18.20');
-INSERT INTO `bi_price_xx` VALUES ('cc8f1359-2afc-464c-8f0f-62e985e99fb7', '虎皮凤爪', '全品味', '68g', '批发', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '6.83', '6.50');
-INSERT INTO `bi_price_xx` VALUES ('5711f296-0451-48c6-bcf7-efdb68d5867b', '虎皮凤爪', '全品味', '25g', '批发', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '2.63', '2.50');
-INSERT INTO `bi_price_xx` VALUES ('aab9599c-2a33-437d-8ef8-890ed1bec685', '虎皮凤爪', '全品味', '散称', '批发', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '38.85', '37.00');
-INSERT INTO `bi_price_xx` VALUES ('008ff458-bc4d-4fa8-9612-1b50fcd86992', '去骨凤爪', '全品味', '72g', '批发', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '10.40', '9.90');
-INSERT INTO `bi_price_xx` VALUES ('d49be613-998a-4fd9-b629-ba01fc51876b', '去骨凤爪', '全品味', '138g', '批发', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '17.33', '16.50');
-INSERT INTO `bi_price_xx` VALUES ('8a2eab0f-f4f7-4192-9608-f2b6a7e17482', '老卤凤爪', '全品味', '95g', '批发', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '10.40', '9.90');
-INSERT INTO `bi_price_xx` VALUES ('069d4f5e-deb0-43fb-af81-d812f1f23465', '老卤鸭掌', '全品味', '95g', '批发', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '10.40', '9.90');
-INSERT INTO `bi_price_xx` VALUES ('e2a057f4-1890-4096-bb7b-80214b86d94d', '鸡肉豆堡', '全品味', '120g', '批发', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '6.83', '6.50');
-INSERT INTO `bi_price_xx` VALUES ('b07e83c2-7be9-49e2-b42e-12267010ed0e', '虎皮小鸡腿', '全品味', '80g', '批发', '86f47c35e2d4477d838e1280a949028b', '86f47c35e2d4477d838e1280a949028b', '王璐璐', '王璐璐', '2026-03-12 11:31:37', '2026-03-12 11:31:37', '377410', '8.40', '8.00');
--- a/code/config.py
+++ b/code/config.py
-"""
-配置管理模块
-负责读取和管理应用配置
-通过环境变量 ENV=development|production 自动区分开发/生产环境
-"""
-import os
-from typing import Optional
-from dotenv import load_dotenv
-# 加载 .env 文件（使用绝对路径，避免因工作目录不同导致加载失败）
-_env_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), '.env')
-load_dotenv(dotenv_path=_env_path)
-# 环境标识：development | production，未设置时默认为开发环境
-_ENV = os.getenv("ENV", "development").strip().lower()
-IS_PRODUCTION = _ENV == "production"
-IS_DEV = not IS_PRODUCTION
-def _db_var(key: str, dev_default: str, prod_default: str = "") -> str:
-    """按环境读取数据库相关变量：生产环境优先读 PROD_DB_*，否则读 DB_*"""
-    if IS_PRODUCTION:
-        return os.getenv(f"PROD_DB_{key}", os.getenv(f"DB_{key}", prod_default)) or prod_default
-    return os.getenv(f"DB_{key}", dev_default)
-class Config:
-    """应用配置类"""
-    # 环境
-    ENV: str = _ENV
-    IS_PRODUCTION: bool = IS_PRODUCTION
-    IS_DEV: bool = IS_DEV
-    # 服务器配置（生产环境默认关闭 DEBUG）
-    HOST: str = os.getenv("HOST", "0.0.0.0")
-    PORT: int = int(os.getenv("PORT", "8000"))
-    DEBUG: bool = os.getenv("DEBUG", "false" if IS_PRODUCTION else "true").lower() == "true"
-    # 数据库配置：开发用 DB_*，生产用 PROD_DB_*（或系统环境变量覆盖）
-    DB_HOST: str = _db_var("HOST", "localhost")
-    DB_PORT: int = int(_db_var("PORT", "3306"))
-    DB_USER: str = _db_var("USER", "root")
-    DB_PASSWORD: str = _db_var("PASSWORD", "")
-    DB_NAME: str = _db_var("NAME", "clean_data")
-    # 日志配置
-    LOG_LEVEL: str = os.getenv("LOG_LEVEL", "INFO")
-    LOG_FILE: Optional[str] = os.getenv("LOG_FILE")
-    # Excel 下载配置
-    EXCEL_DOWNLOAD_TIMEOUT: int = int(os.getenv("EXCEL_DOWNLOAD_TIMEOUT", "30"))
-    MAX_EXCEL_SIZE: int = int(os.getenv("MAX_EXCEL_SIZE", "52428800"))  # 50MB
-    # 任务超时配置
-    TASK_TIMEOUT_SECONDS: int = int(os.getenv("TASK_TIMEOUT_SECONDS", "3600"))  # 1小时
-    @classmethod
-    def get_db_config(cls) -> dict:
-        """获取数据库配置字典"""
-        return {
-            'host': cls.DB_HOST,
-            'port': cls.DB_PORT,
-            'user': cls.DB_USER,
-            'password': cls.DB_PASSWORD,
-            'database': cls.DB_NAME,
-        }
-# 创建全局配置实例
-config = Config()
--- a/code/core/__init__.py
+++ b/code/core/__init__.py
-"""Core 业务模块"""
--- a/code/core/data_cleaner.py
+++ b/code/core/data_cleaner.py
-"""
-数据清洗模块
-负责数据的清洗和验证逻辑
-"""
-import logging
-import asyncio
-import pandas as pd
-from typing import List, Dict, Any, Callable, Optional
-logger = logging.getLogger(__name__)
-# 各 department 对应的清洗策略注册表
-# key: department 名称, value: (transform函数, 产品组配置, 稽查来源名称)
-_DEPARTMENT_CLEANERS = {}
-def _load_department_cleaners():
-    """非专用清洗逻辑"""
-    global _DEPARTMENT_CLEANERS
-    if _DEPARTMENT_CLEANERS:  # 如果部门清洗模块已加载，则直接返回
-        return
-    try:
-        # 加载部门清洗使用的工具
-        from core_py.数据转换_团队 import (
-            transform as _team_transform,
-            PRODUCT_GROUPS_JC,
-        )  # PRODUCT_GROUPS_JC 风控稽查数据清洗配置数据
-        _DEPARTMENT_CLEANERS["风控稽查数据清洗"] = (_team_transform, PRODUCT_GROUPS_JC, "稽查团队")
-        logger.info("已加载部门清洗模块: 风控稽查数据清洗")
-    except ImportError as e:
-        logger.warning(f"加载团队清洗模块失败: {e}")
-class DataCleaner:
-    """数据清洗类"""
-    def __init__(self):
-        self.rules = {}
-    async def clean(
-        self,
-        raw_data: List[Dict[str, Any]],
-        department: str,
-        progress_callback: Optional[Callable[[float, str, Optional[int]], None]] = None,
-        audit_date: Optional[str] = None,
-    ) -> List[Dict[str, Any]]:
-        """
-        清洗数据
-        Args:
-            raw_data: 原始数据列表（每行为 dict，key 为列名）
-            department: 业务部门名称，如 "团队"
-            progress_callback: 进度回调函数，接收 (progress: 0-1, message: str)
-            audit_date: 稽查日期字符串，格式 'yyyy-mm-dd'；为 None 时由各清洗模块自动取上月1号
-        Returns:
-            List[Dict]: 清洗后的数据
-        """
-        try:
-            logger.info(f"开始清洗数据，部门: {department}，数据行数: {len(raw_data)}")
-            # ── 专项清洗路由 ──────────────────────────────────────────────
-            _load_department_cleaners()
-            if department in _DEPARTMENT_CLEANERS:
-                return await self._clean_by_department(
-                    raw_data, department, progress_callback, audit_date=audit_date
-                )
-            # ─────────────────────────────────────────────────────────────
-            total_rows = len(raw_data)
-            cleaned_data = []
-            for idx, row in enumerate(raw_data):
-                try:
-                    cleaned_row = await self._validate_and_convert(row, department)
-                    if cleaned_row and not self._is_duplicate(
-                        cleaned_row, cleaned_data
-                    ):
-                        cleaned_data.append(cleaned_row)
-                    if progress_callback and idx % max(1, total_rows // 10) == 0:
-                        progress = idx / total_rows if total_rows > 0 else 0
-                        progress_callback(progress, f"已清洗 {idx}/{total_rows} 行数据", len(cleaned_data))
-                except Exception as e:
-                    logger.warning(f"第 {idx + 1} 行数据清洗失败: {str(e)}")
-                    continue
-            if progress_callback:
-                progress_callback(1.0, f"清洗完成，共 {len(cleaned_data)} 行有效数据", len(cleaned_data))
-            logger.info(
-                f"数据清洗完成，原始行数: {total_rows}，清洗后行数: {len(cleaned_data)}"
-            )
-            return cleaned_data
-        except Exception as e:
-            logger.error(f"clean 方法执行失败: {str(e)}")
-            raise
-    async def _clean_by_department(
-        self,
-        raw_data: List[Dict[str, Any]],
-        department: str,
-        progress_callback: Optional[Callable[[float, str, Optional[int]], None]] = None,
-        audit_date: Optional[str] = None,
-    ) -> List[Dict[str, Any]]:
-        """
-        调用对应部门的专项 transform 函数进行清洗。
-        raw_data 来自 excel_handler（List[Dict]，key 为列名），
-        transform 函数通过 iloc 按位置访问列，因此转换为 DataFrame 时
-        只要列顺序与原始 Excel 一致，iloc 索引就能正确对应。
-        """
-        transform_fn, pg, yname = _DEPARTMENT_CLEANERS[department]
-        if progress_callback:
-            progress_callback(0.1, "正在转换数据格式", None)
-        # List[Dict] → DataFrame（保留原始列顺序，iloc 索引与 Excel 列位置对应）
-        df = pd.DataFrame(raw_data)
-        if progress_callback:
-            progress_callback(0.3, f"正在执行 {department} 数据清洗", None)
-        # transform 是同步函数，用 asyncio.to_thread 避免阻塞事件循环
-        records = await asyncio.to_thread(transform_fn, df, yname, pg, audit_date)
-        if progress_callback:
-            progress_callback(1.0, f"清洗完成，共 {len(records)} 行有效数据", len(records))
-        logger.info(f"[{department}] 专项清洗完成，共 {len(records)} 条记录")
-        return records
-    async def _validate_and_convert(
-        self, row: Dict[str, Any], department: str
-    ) -> Optional[Dict[str, Any]]:
-        """
-        验证和转换单行数据
-        Args:
-            row: 数据行
-            department: 业务部门名称
-        Returns:
-            转换后的数据行，若无效则返回 None
-        """
-        try:
-            cleaned_row = {}
-            for key, value in row.items():
-                if value is None or (isinstance(value, str) and not value.strip()):
-                    # 空值处理
-                    cleaned_row[key] = None
-                    continue
-                # 字符串数据清洗
-                if isinstance(value, str):
-                    cleaned_row[key] = value.strip()
-                else:
-                    cleaned_row[key] = value
-            # 验证必填字段（根据部门调整规则）
-            if not self._validate_required_fields(cleaned_row, department):
-                return None
-            return cleaned_row
-        except Exception as e:
-            logger.warning(f"_validate_and_convert 失败: {str(e)}")
-            return None
-    def _validate_required_fields(self, row: Dict[str, Any], department: str) -> bool:
-        """
-        验证必填字段
-        Args:
-            row: 数据行
-            department: 业务部门
-        Returns:
-            bool: 是否通过验证
-        """
-        # 示例：可根据部门定义不同的必填字段规则
-        required_fields_map = {
-            "sales": ["产品", "金额"],
-            "inventory": ["SKU", "数量"],
-            "finance": ["交易日期", "金额"],
-        }
-        required_fields = required_fields_map.get(department, [])
-        # 检查必填字段是否存在且非空
-        for field in required_fields:
-            if field not in row or row[field] is None:
-                return False
-        return True
-    def _is_duplicate(
-        self, row: Dict[str, Any], existing_data: List[Dict[str, Any]]
-    ) -> bool:
-        """
-        检查行是否为重复数据
-        Args:
-            row: 当前行
-            existing_data: 已有数据列表
-        Returns:
-            bool: 是否为重复
-        """
-        # 简单的重复检查（可扩展为更复杂的逻辑）
-        for existing_row in existing_data:
-            if row == existing_row:
-                return True
-        return False
--- a/code/core/db_handler.py
+++ b/code/core/db_handler.py
-"""
-数据库处理模块
-负责与 MySQL 数据库的交互
-"""
-import logging
-import mysql.connector
-from typing import List, Dict, Any
-from contextlib import contextmanager
-from config import config
-logger = logging.getLogger(__name__)
-class DatabaseHandler:
-    """数据库处理类"""
-    def __init__(self):
-        """初始化数据库配置"""
-        self.db_config = {
-            'host': config.DB_HOST,
-            'user': config.DB_USER,
-            'password': config.DB_PASSWORD,
-            'database': config.DB_NAME,
-            'port': config.DB_PORT,
-            'autocommit': False,
-            'connection_timeout': 10
-        }
-    @contextmanager
-    def _get_connection(self):
-        """
-        获取数据库连接的上下文管理器
-        Yields:
-            mysql.connector.MySQLConnection: 数据库连接
-        Raises:
-            Exception: 连接失败时抛出异常
-        """
-        connection = None
-        try:
-            connection = mysql.connector.connect(**self.db_config)
-            logger.info("数据库连接成功")
-            yield connection
-        except mysql.connector.Error as e:
-            logger.error(f"数据库连接失败: {str(e)}")
-            raise
-        finally:
-            if connection and connection.is_connected():
-                connection.close()
-                logger.info("数据库连接已关闭")
-    async def insert_data(
-        self,
-        table_name: str,
-        data: List[Dict[str, Any]]
-    ) -> int:
-        """
-        将数据 upsert 到指定的表（首次写入为 INSERT，命中唯一键时覆盖更新）。
-        MySQL ON DUPLICATE KEY UPDATE 行为说明：
-          - 新行插入：rowcount += 1
-          - 已有行被更新：rowcount += 2
-          - 数据与现有行完全一致（无变化）：rowcount += 0
-        Args:
-            table_name: 目标表名
-            data: 数据列表
-        Returns:
-            tuple[int, int]: (submitted_rows, raw_affected)
-              - submitted_rows: 提交处理的总行数（去重后传入的行数，即预估真实入库行数）
-              - raw_affected:   MySQL 累计 rowcount 原始值（insert=+1, update=+2, 无变化=+0）
-        Raises:
-            Exception: 插入失败时抛出异常
-        """
-        if not data:
-            logger.warning("插入的数据为空")
-            return 0
-        try:
-            with self._get_connection() as connection:
-                cursor = connection.cursor()
-                # 获取字段名
-                columns = list(data[0].keys())
-                column_names = ', '.join([f'`{col}`' for col in columns])
-                placeholders = ', '.join(['%s'] * len(columns))
-                # ON DUPLICATE KEY UPDATE：命中唯一键时覆盖所有字段值
-                update_clause = ', '.join([f'`{col}` = VALUES(`{col}`)' for col in columns])
-                upsert_sql = f"""
-                    INSERT INTO `{table_name}` ({column_names})
-                    VALUES ({placeholders})
-                    ON DUPLICATE KEY UPDATE {update_clause}
-                """
-                logger.info(f"准备 upsert {len(data)} 行数据到表 {table_name}")
-                # 批量 upsert
-                # ON DUPLICATE KEY UPDATE 的 rowcount 含义：insert=1，update=2，无变化=0
-                # 真实入库（新增）行数 = rowcount // 1 的部分；用 lastrowid 变化量计算最准，
-                # 但批量时不可用。此处用最简单可靠的方案：
-                #   raw_affected 累加 rowcount 原始值，
-                #   insert_rows  = raw_affected 中 rowcount==1 的部分（需逐条统计）
-                # 由于 executemany 只返回总 rowcount，改为逐条 execute 才能精确区分。
-                # 权衡性能与精度，保留 executemany 批量写入，同时返回原始 raw_affected，
-                # 并在 log 中说明换算公式，调用方按需解读。
-                raw_affected = 0
-                for batch_start in range(0, len(data), 1000):
-                    batch_end = min(batch_start + 1000, len(data))
-                    batch_data = data[batch_start:batch_end]
-                    values_list = [
-                        tuple(row.get(col) for col in columns)
-                        for row in batch_data
-                    ]
-                    cursor.executemany(upsert_sql, values_list)
-                    raw_affected += cursor.rowcount
-                    logger.info(f"已处理 {batch_end} / {len(data)} 行数据")
-                connection.commit()
-                # 查询本次 upsert 后表中实际存在的行数（含历史数据），
-                # 以及本批次真实写入行数：
-                #   insert_rows  ≈ raw_affected 中 rowcount=1 的行（executemany 无法细分）
-                #   upsert_rows  = raw_affected（去掉无变化的0，insert贡献1，update贡献2）
-                # 用 (raw_affected + 批次总行数) / 3 可估算 update 行数，但不精确。
-                # 最可靠的语义：把传入行数作为"提交处理行数"，raw_affected 作为辅助信息。
-                submitted_rows = len(data)
-                cursor.close()
-                logger.info(
-                    f"upsert 完成：提交 {submitted_rows} 行，"
-                    f"raw_affected={raw_affected}（insert+1 / update+2 / 无变化+0）"
-                )
-                # 返回 (submitted_rows, raw_affected) 元组，由调用方决定展示哪个
-                return submitted_rows, raw_affected
-        except mysql.connector.Error as e:
-            logger.error(f"MySQL 错误: {str(e)}")
-            raise
-        except Exception as e:
-            logger.error(f"insert_data 失败: {str(e)}")
-            raise
-    async def test_connection(self) -> bool:
-        """
-        测试数据库连接
-        Returns:
-            bool: 连接是否成功
-        """
-        try:
-            with self._get_connection() as connection:
-                cursor = connection.cursor()
-                cursor.execute("SELECT 1")
-                cursor.fetchone()
-                cursor.close()
-                return True
-        except Exception as e:
-            logger.error(f"数据库连接测试失败: {str(e)}")
-            return False
-    async def create_table_if_not_exists(
-        self,
-        table_name: str,
-        schema: Dict[str, str]
-    ) -> bool:
-        """
-        如果表不存在则创建表
-        Args:
-            table_name: 表名
-            schema: 表架构定义 {列名: 列定义}
-        Returns:
-            bool: 是否创建成功或表已存在
-        """
-        try:
-            with self._get_connection() as connection:
-                cursor = connection.cursor()
-                # 检查表是否存在
-                cursor.execute(f"""
-                    SELECT TABLE_NAME FROM information_schema.TABLES
-                    WHERE TABLE_SCHEMA = '{self.db_config['database']}'
-                    AND TABLE_NAME = '{table_name}'
-                """)
-                if cursor.fetchone():
-                    logger.info(f"表 {table_name} 已存在")
-                    cursor.close()
-                    return True
-                # 创建表
-                columns_sql = ', '.join([f'`{col}` {definition}' for col, definition in schema.items()])
-                create_sql = f"""
-                    CREATE TABLE `{table_name}` (
-                        id INT AUTO_INCREMENT PRIMARY KEY,
-                        {columns_sql},
-                        created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
-                    )
-                """
-                cursor.execute(create_sql)
-                connection.commit()
-                cursor.close()
-                logger.info(f"成功创建表 {table_name}")
-                return True
-        except Exception as e:
-            logger.error(f"create_table_if_not_exists 失败: {str(e)}")
-            raise
--- a/code/core/excel_handler.py
+++ b/code/core/excel_handler.py
-"""
-Excel 文件处理模块
-负责从 URL 下载和解析 Excel 文件
-"""
-import aiohttp
-import logging
-from openpyxl import load_workbook
-from io import BytesIO
-from typing import List, Dict, Any
-import os
-import tempfile
-logger = logging.getLogger(__name__)
-class ExcelHandler:
-    """Excel 文件处理类"""
-    def __init__(self):
-        self.timeout = aiohttp.ClientTimeout(total=30)
-    async def fetch_bytes(self, url: str) -> bytes:
-        """
-        从 URL 下载文件，返回原始字节内容（供调用方自行用 pandas 解析）
-        Args:
-            url: 文件的网络链接
-        Returns:
-            bytes: 文件的原始二进制内容
-        """
-        try:
-            logger.info(f"开始从 {url} 下载文件")
-            async with aiohttp.ClientSession(timeout=self.timeout) as session:
-                async with session.get(url) as response:
-                    if response.status != 200:
-                        raise Exception(f"下载失败，HTTP 状态码: {response.status}")
-                    content = await response.read()
-            logger.info(f"下载完成，文件大小: {len(content)} 字节")
-            return content
-        except Exception as e:
-            logger.error(f"fetch_bytes 失败: {str(e)}")
-            raise
-    async def fetch_and_parse(self, excel_url: str) -> List[Dict[str, Any]]:
-        """
-        从 URL 下载并解析 Excel 文件
-        Args:
-            excel_url: Excel 文件的网络链接
-        Returns:
-            List[Dict]: 解析后的数据，每行为一个字典
-        Raises:
-            Exception: 下载或解析失败时抛出异常
-        """
-        try:
-            # 1. 下载文件
-            logger.info(f"开始从 {excel_url} 下载 Excel 文件")
-            async with aiohttp.ClientSession(timeout=self.timeout) as session:
-                async with session.get(excel_url) as response:
-                    if response.status != 200:
-                        raise Exception(f"下载失败，HTTP 状态码: {response.status}")
-                    excel_content = await response.read()
-            logger.info(f"下载完成，文件大小: {len(excel_content)} 字节")
-            # 2. 解析 Excel
-            return self._parse_excel_content(excel_content)
-        except Exception as e:
-            logger.error(f"fetch_and_parse 失败: {str(e)}")
-            raise
-    def _parse_excel_content(self, excel_content: bytes) -> List[Dict[str, Any]]:
-        """
-        解析 Excel 内容
-        Args:
-            excel_content: Excel 文件的二进制内容
-        Returns:
-            List[Dict]: 解析后的数据
-        """
-        try:
-            # 使用 BytesIO 从内存中读取
-            excel_file = BytesIO(excel_content)
-            workbook = load_workbook(excel_file)
-            # 获取第一个工作表
-            worksheet = workbook.active
-            if not worksheet:
-                raise Exception("Excel 文件不包含有效的工作表")
-            # 获取标题行
-            headers = []
-            for cell in worksheet[1]:
-                headers.append(cell.value)
-            if not headers or all(h is None for h in headers):
-                raise Exception("Excel 文件不包含有效的标题行")
-            # 解析数据行
-            data = []
-            for row in worksheet.iter_rows(min_row=2, values_only=False):
-                row_data = {}
-                for idx, cell in enumerate(row):
-                    if idx < len(headers):
-                        row_data[headers[idx]] = cell.value
-                # 跳过空行
-                if any(v is not None for v in row_data.values()):
-                    data.append(row_data)
-            logger.info(f"成功解析 Excel，共 {len(data)} 行数据")
-            return data
-        except Exception as e:
-            logger.error(f"_parse_excel_content 失败: {str(e)}")
-            raise
--- a/code/core/progress_manager.py
+++ b/code/core/progress_manager.py
-"""
-进度管理模块
-负责任务进度的记录和查询
-"""
-import logging
-from typing import Dict, Any, Optional
-from datetime import datetime, timedelta
-import threading
-logger = logging.getLogger(__name__)
-class ProgressManager:
-    """进度管理类"""
-    def __init__(self, timeout_seconds: int = 3600):
-        """
-        初始化进度管理器
-        Args:
-            timeout_seconds: 任务进度的过期时间（秒），默认 1 小时
-        """
-        self.progress_data: Dict[str, Dict[str, Any]] = {}
-        self.timeout_seconds = timeout_seconds
-        self.lock = threading.Lock()
-    def update_progress(
-        self,
-        task_id: str,
-        status: str,
-        progress: int,
-        message: str,
-        processed_count: Optional[int] = None
-    ) -> None:
-        """
-        更新任务进度
-        Args:
-            task_id: 任务唯一标识
-            status: 状态 (queued, processing, completed, failed)
-            progress: 进度百分比 (0-100)
-            message: 进度信息
-            processed_count: 已处理的数据条数，None 表示暂未统计
-        """
-        with self.lock:
-            self.progress_data[task_id] = {
-                'task_id': task_id,
-                'status': status,
-                'progress': max(0, min(100, progress)),
-                'message': message,
-                'processed_count': processed_count,
-                'timestamp': datetime.now().isoformat(),
-                'created_at': datetime.now()
-            }
-            logger.debug(f"[{task_id}] 进度更新: {status} {progress}% - {message}")
-    def get_progress(self, task_id: str) -> Optional[Dict[str, Any]]:
-        """
-        获取任务进度
-        Args:
-            task_id: 任务唯一标识
-        Returns:
-            Optional[Dict]: 进度信息，若任务不存在或已过期返回 None
-        """
-        with self.lock:
-            if task_id not in self.progress_data:
-                return None
-            data = self.progress_data[task_id]
-            # 检查是否过期
-            if datetime.now() - data['created_at'] > timedelta(seconds=self.timeout_seconds):
-                logger.warning(f"任务 {task_id} 已过期，删除记录")
-                del self.progress_data[task_id]
-                return None
-            # 返回字典副本，移除 created_at（内部字段）
-            result = {k: v for k, v in data.items() if k != 'created_at'}
-            return result
-    def get_all_progress(self) -> Dict[str, Dict[str, Any]]:
-        """
-        获取所有任务的进度信息
-        Returns:
-            Dict: 所有任务的进度信息
-        """
-        with self.lock:
-            # 清理过期任务
-            expired_tasks = []
-            for task_id, data in self.progress_data.items():
-                if datetime.now() - data['created_at'] > timedelta(seconds=self.timeout_seconds):
-                    expired_tasks.append(task_id)
-            for task_id in expired_tasks:
-                del self.progress_data[task_id]
-                logger.info(f"清理过期任务: {task_id}")
-            # 返回所有有效任务的进度
-            return {
-                task_id: {k: v for k, v in data.items() if k != 'created_at'}
-                for task_id, data in self.progress_data.items()
-            }
-    def clear_progress(self, task_id: str) -> None:
-        """
-        清除任务进度记录
-        Args:
-            task_id: 任务唯一标识
-        """
-        with self.lock:
-            if task_id in self.progress_data:
-                del self.progress_data[task_id]
-                logger.info(f"清除任务 {task_id} 的进度记录")
--- a/code/core_py/1低价计算.py
+++ b/code/core_py/1低价计算.py
-import sys
-import os
-import pandas as pd
-import mysql.connector
-# 兼容直接运行（python core_py/1低价计算.py）和作为模块被 index.py 导入两种场景
-if __name__ == "__main__":
-    sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
-from config import config
-def load_price_map_from_db() -> dict:
-    """
-    从 market_bi.bi_price_xx 读取线下价盘数据，
-    返回匹配字典: { "产品系列|产品克重|渠道(大写)" -> low_price(float) }
-    """
-    conn = mysql.connector.connect(
-        host=config.DB_HOST,
-        port=config.DB_PORT,
-        user=config.DB_USER,
-        password=config.DB_PASSWORD,
-        database="market_bi",
-        charset="utf8mb4",
-    )
-    try:
-        sql = "SELECT bi_product, pro_weight, channel_type, low_price FROM bi_price_xx"
-        df_p = pd.read_sql(sql, conn)
-    finally:
-        conn.close()
-    def _clean(s):
-        return "" if pd.isna(s) else str(s).strip().upper()
-    df_p["match_key"] = (
-        df_p["bi_product"].apply(_clean) + "|"
-        + df_p["pro_weight"].apply(_clean) + "|"
-        + df_p["channel_type"].apply(_clean)
-    )
-    df_p["low_price"] = pd.to_numeric(df_p["low_price"], errors="coerce")
-    return df_p.set_index("match_key")["low_price"].to_dict()
-def transform(df_y: pd.DataFrame) -> pd.DataFrame:
-    """
-    供 API 调用的低价计算入口。
-    接收大宽表 DataFrame（STANDARD_COLUMNS 列名），从数据库 market_bi.bi_price_xx
-    读取价盘基准，计算并回填以下三列后返回：
-        - 是否低价：低价 / 正常 / None（无法匹配或缺价格）
-        - 破价价差：低价时的价差（decimal），正常/无法匹配时为 None
-        - 低价整改状态：低价时置为 '未整改'，其余不改动
-    Args:
-        df_y: 大宽表 DataFrame，必须包含列：
-              产品系列、产品克重、渠道类型（稽查源提供）、产品价格
-    Returns:
-        pd.DataFrame: 更新了低价相关字段的 DataFrame（不修改原对象）
-    """
-    df = df_y.copy()
-    price_map = load_price_map_from_db()
-    def _clean(s):
-        return "" if pd.isna(s) else str(s).strip().upper()
-    # 构建匹配键和数值价格（辅助列，最终会删除）
-    df["_series_c"] = df["产品系列"].apply(_clean)
-    df["_weight_c"] = df["产品克重"].apply(_clean)
-    df["_channel_c"] = df["渠道类型（稽查源提供）"].apply(_clean)
-    df["_match_key"] = df["_series_c"] + "|" + df["_weight_c"] + "|" + df["_channel_c"]
-    df["_price_num"] = pd.to_numeric(df["产品价格"], errors="coerce")
-    df["_p_low_price"] = df["_match_key"].map(price_map)
-    # 重置低价相关列
-    df["是否低价"] = None
-    df["破价价差"] = None
-    # 条件向量化计算，避免逐行循环
-    has_both = df["_price_num"].notna() & df["_p_low_price"].notna()
-    cond_low = has_both & (df["_price_num"] < df["_p_low_price"])
-    cond_normal = has_both & ~cond_low
-    df.loc[cond_low, "是否低价"] = "低价"
-    df.loc[cond_low, "破价价差"] = (
-        df.loc[cond_low, "_p_low_price"] - df.loc[cond_low, "_price_num"]
-    ).round(2)
-    df["低价整改状态"] = df["低价整改状态"].astype(object)
-    df.loc[cond_low, "低价整改状态"] = "未整改"
-    df.loc[cond_normal, "是否低价"] = "正常"
-    df.loc[cond_normal, "破价价差"] = None
-    # 清除辅助列
-    df.drop(
-        columns=["_series_c", "_weight_c", "_channel_c", "_match_key", "_price_num", "_p_low_price"],
-        inplace=True,
-    )
-    return df
-if __name__ == "__main__":
-    # ── 独立测试模式：读本地 Excel 大宽表 → 计算低价 → 输出结果文件 ──
-    from datetime import datetime
-    from dateutil.relativedelta import relativedelta
-    current_date = (datetime.now().replace(day=1) - relativedelta(months=1)).strftime("%Y-%m-01")
-    y_file = f"/王小卤/风控/代码-新/大日期{current_date}_2.xlsx"
-    output_file = f"/王小卤/风控/代码-新/低价大日期_2.xlsx"
-    print("正在读取稽查结果大宽表...")
-    df_y = pd.read_excel(y_file, sheet_name="合并后", dtype=str)
-    df_y.columns = df_y.columns.str.strip()
-    print("正在从数据库读取价盘并计算低价...")
-    df_result = transform(df_y)
-    df_result.to_excel(output_file, index=False)
-    print(f"✅ 处理完成！结果已保存至：{output_file}")
--- a/code/core_py/__init__.py
+++ b/code/core_py/__init__.py
--- a/code/core_py/数据转换_团队.py
+++ b/code/core_py/数据转换_团队.py
-import pandas as pd
-import copy
-import os
-from datetime import datetime
-from dateutil.relativedelta import relativedelta
-# === 本地独立运行配置（仅 __main__ 模式使用）===
-source_file = "/王小卤/风控/代码-新//2026.2-团队数据源.xlsx"
-def _get_default_audit_date() -> str:
-    """返回上月1号作为默认稽查日期，格式 yyyy-mm-01"""
-    return (datetime.now().replace(day=1) - relativedelta(months=1)).strftime("%Y-%m-01")
-# 列映射（目标表列名）
-COLUMN_MAPPING = {
-    "稽查日期": "稽查日期",
-    "稽查来源": "稽查来源",
-    "勤策门店编码": "勤策门店编码",
-    "勤策门店名称": "勤策门店名称",
-    "经销商名称": "经销商名称",
-    "城市": "城市",
-    "渠道类型": "渠道类型（稽查源提供）",
-    "产品系列": "产品系列",
-    "产品口味": "产品口味",
-    "产品克重": "产品克重",
-    "产品价格": "产品价格",
-    "产品生产月份": "产品生产月份",
-}
-# ===== 新增：多产品组配置 =====
-# 每组：价格列 + 7个口味列 + 产品信息
-# 团队表
-PRODUCT_GROUPS_JC = [
-    # 第1组：虎皮凤爪 210g
-    {
-        "price_col": 50,
-        "flavor_cols": [51, 52, 53, 54, 55, 56, 57],
-        "series": "虎皮凤爪",
-        "weight": "210g",
-        "flavors": ["卤香", "香辣", "椒麻", "火锅", "微辣", "麻辣", "黑鸭"]
-    },
-    # 第2组：虎皮凤爪 105g
-    {
-        "price_col": 58,
-        "flavor_cols": [59, 60, 61, 62, 63, 64, 65],
-        "series": "虎皮凤爪",
-        "weight": "105g",
-        "flavors": ["卤香", "香辣", "椒麻", "火锅", "微辣", "麻辣", "黑鸭"]
-    },
-    # 第3组：虎皮凤爪 68g
-    {
-        "price_col": 66,
-        "flavor_cols": [67, 68, 69, 70, 71],
-        "series": "虎皮凤爪",
-        "weight": "68g",
-        "flavors": ["卤香", "香辣", "椒麻", "麻辣", "黑鸭"]
-    },
-    # 第4组：鸡肉豆堡 120g
-    {
-        "price_col": 72,
-        "flavor_cols": [73, 74],
-        "series": "鸡肉豆堡",
-        "weight": "120g",
-        "flavors": ["卤香", "香辣"]
-    },
-    # 第5组：牛肉豆堡 120g
-    {
-        "price_col": 75,
-        "flavor_cols": [76, 77],
-        "series": "牛肉豆堡",
-        "weight": "120g",
-        "flavors": ["卤香", "香辣"]
-    },
-    # 第6组：去骨凤爪 72g
-    {
-        "price_col": 78,
-        "flavor_cols": [79, 80],
-        "series": "去骨凤爪",
-        "weight": "72g",
-        "flavors": ["柠檬", "香辣"]
-    },
-    # 第7组：去骨凤爪 138g
-    {
-        "price_col": 81,
-        "flavor_cols": [82, 83],
-        "series": "去骨凤爪",
-        "weight": "138g",
-        "flavors": ["柠檬", "香辣"]
-    },
-    # 第8组：虎皮小鸡腿 80g
-    {
-        "price_col": 84,
-        "flavor_cols": [85, 86],
-        "series": "虎皮小鸡腿",
-        "weight": "80g",
-        "flavors": ["卤香", "香辣"]
-    },
-    # 第9组：老卤凤爪 95g（与老卤鸭掌共用 price_col=87）
-    {
-        "price_col": 87,
-        "flavor_cols": [88],
-        "series": "老卤凤爪",
-        "weight": "95g",
-        "flavors": ["卤香"]
-    },
-    # 第10组：老卤鸭掌 95g（与老卤凤爪共用 price_col=87）
-    {
-        "price_col": 87,
-        "flavor_cols": [89],
-        "series": "老卤鸭掌",
-        "weight": "95g",
-        "flavors": ["卤香"]
-    },
-    # 第11组：虎皮凤爪 25g
-    {
-        "price_col": 90,
-        "flavor_cols": [91, 92],
-        "series": "虎皮凤爪",
-        "weight": "25g",
-        "flavors": ["卤香", "香辣"]
-    },
-    # 第12组：虎皮凤爪 散称
-    {
-        "price_col": 93,
-        "flavor_cols": [94, 95, 96],
-        "series": "虎皮凤爪",
-        "weight": "散称",
-        "flavors": ["卤香", "香辣", "黑鸭"]
-    }
-]
-# 标准输出列定义（与目标表结构保持一致）
-STANDARD_COLUMNS = [
-    "稽查日期", "稽查来源", "大区", "战区", "经销商编码", "经销商名称",
-    "勤策门店编码", "勤策门店名称", "客户经理工号", "客户经理",
-    "勤策渠道大类", "稽核渠道（对N列清洗）", "城市", "渠道类型（稽查源提供）",
-    "产品系列", "产品口味", "产品克重", "产品价格", "是否低价", "破价价差", "低价整改状态",
-    "低价整改说明", "产品生产月份", "临期月份数", "临期状态", "新鲜度",
-    "大日期整改状态", "大日期整改说明"
-]
-def _build_records(df_source, yname, pg, existing_columns, audit_date: str = None):
-    """
-    核心记录构建逻辑，供 transform() 和 main() 复用。
-    Args:
-        df_source: pandas DataFrame，列通过 iloc 按位置访问
-        yname: 稽查来源名称，如 '稽查团队'
-        pg: 产品组配置列表
-        existing_columns: 目标表的列名列表
-        audit_date: 稽查日期字符串，格式 'yyyy-mm-dd'；为 None 时取上月1号
-    Returns:
-        list: 构建好的记录列表（每条为 dict）
-    """
-    if audit_date is None:
-        audit_date = _get_default_audit_date()
-    records = []
-    for idx, row in df_source.iterrows():
-        base_data = {
-            "勤策门店编码": str(row.iloc[8]).strip() if pd.notna(row.iloc[8]) else "",
-            "城市": str(row.iloc[4]).strip() if pd.notna(row.iloc[4]) else "",
-            "勤策门店名称": str(row.iloc[9]).strip() if pd.notna(row.iloc[9]) else "",
-            "经销商名称": str(row.iloc[7]).strip() if pd.notna(row.iloc[7]) else "",
-            "渠道类型": str(row.iloc[10]).strip() if pd.notna(row.iloc[10]) else "",
-        }
-        base_row = {}
-        if COLUMN_MAPPING["稽查日期"] in existing_columns:
-            base_row[COLUMN_MAPPING["稽查日期"]] = audit_date
-        if COLUMN_MAPPING["稽查来源"] in existing_columns:
-            base_row[COLUMN_MAPPING["稽查来源"]] = yname
-        if COLUMN_MAPPING["勤策门店编码"] in existing_columns:
-            base_row[COLUMN_MAPPING["勤策门店编码"]] = base_data["勤策门店编码"]
-        if COLUMN_MAPPING["勤策门店名称"] in existing_columns:
-            base_row[COLUMN_MAPPING["勤策门店名称"]] = base_data["勤策门店名称"]
-        if COLUMN_MAPPING["经销商名称"] in existing_columns:
-            base_row[COLUMN_MAPPING["经销商名称"]] = base_data["经销商名称"]
-        if COLUMN_MAPPING["城市"] in existing_columns:
-            base_row[COLUMN_MAPPING["城市"]] = base_data["城市"]
-        if COLUMN_MAPPING["渠道类型"] in existing_columns:
-            base_row[COLUMN_MAPPING["渠道类型"]] = base_data["渠道类型"]
-        for group in pg:
-            price_col = group["price_col"]
-            flavor_cols = group["flavor_cols"]
-            flavors = group["flavors"]
-            series = group["series"]
-            weight = group["weight"]
-            src_price = str(row.iloc[price_col]).strip() if pd.notna(row.iloc[price_col]) else ""
-            if not src_price or src_price == '无价签':
-                src_price = ''
-            row_with_price = copy.deepcopy(base_row)
-            if COLUMN_MAPPING["产品价格"] in existing_columns:
-                row_with_price[COLUMN_MAPPING["产品价格"]] = src_price
-            for i, col_idx in enumerate(flavor_cols):
-                flavor_name = flavors[i]
-                src_month = str(row.iloc[col_idx]).strip() if pd.notna(row.iloc[col_idx]) else ""
-                if src_month:
-                    new_rec = copy.deepcopy(row_with_price)
-                    src_month = normalize_month(src_month)
-                    _set_product_fields(new_rec, series, flavor_name, weight, src_month, existing_columns)
-                    rDate(new_rec)
-                    records.append(new_rec)
-                elif src_price:
-                    new_rec = copy.deepcopy(row_with_price)
-                    _set_product_fields(new_rec, series, flavor_name, weight, None, existing_columns)
-                    rDate(new_rec)
-                    records.append(new_rec)
-    return records
-def transform(df_source, yname, pg, audit_date: str = None):
-    """
-    供 API 调用的数据转换入口：接收 DataFrame，返回清洗后的记录列表，不读写任何文件。
-    Args:
-        df_source: pandas DataFrame，列通过 iloc 按位置访问（与原始 Excel 列顺序对应）
-        yname: 稽查来源名称，如 '稽查团队'
-        pg: 产品组配置列表
-        audit_date: 稽查日期字符串，格式 'yyyy-mm-dd'；为 None 时自动取上月1号
-    Returns:
-        list[dict]: 按 STANDARD_COLUMNS 结构整理好的记录列表
-    """
-    return _build_records(df_source, yname, pg, STANDARD_COLUMNS, audit_date=audit_date)
-# === 主逻辑（独立运行/本地文件模式） ===
-def main(df_source, yname, pg, audit_date: str = None):
-    if audit_date is None:
-        audit_date = _get_default_audit_date()
-    target_file = f"/王小卤/风控/代码-新/大日期{audit_date}_2.xlsx"
-    try:
-        # 获取目标表结构
-        try:
-            df_target = pd.read_excel(target_file, sheet_name="合并后", dtype=str)
-            existing_columns = df_target.columns.tolist()
-        except (FileNotFoundError, ValueError):
-            df_target = pd.DataFrame(columns=STANDARD_COLUMNS)
-            existing_columns = STANDARD_COLUMNS
-        records = _build_records(df_source, yname, pg, existing_columns, audit_date=audit_date)
-        if not records:
-            print("⚠️ 无有效数据需要追加。")
-            return
-        df_new = pd.DataFrame(records, columns=existing_columns)
-        df_combined = pd.concat([df_target, df_new], ignore_index=True)
-        if os.path.exists(target_file):
-            with pd.ExcelWriter(target_file, engine='openpyxl', mode='a', if_sheet_exists='replace') as writer:
-                df_combined.to_excel(writer, sheet_name="合并后", index=False)
-        else:
-            with pd.ExcelWriter(target_file, engine='openpyxl', mode='w') as writer:
-                df_combined.to_excel(writer, sheet_name="合并后", index=False)
-        print(f"✅ 成功追加 {len(records)} 条记录到目标表！")
-    except Exception as e:
-        print(f"❌ 错误: {e}")
-        import traceback
-        traceback.print_exc()
-def _set_product_fields(record, series, flavor, weight, prod_month_str, existing_columns):
-    """设置产品字段"""
-    if COLUMN_MAPPING["产品系列"] in existing_columns:
-        record[COLUMN_MAPPING["产品系列"]] = series
-    if COLUMN_MAPPING["产品口味"] in existing_columns:
-        record[COLUMN_MAPPING["产品口味"]] = flavor
-    if COLUMN_MAPPING["产品克重"] in existing_columns:
-        record[COLUMN_MAPPING["产品克重"]] = weight
-    if prod_month_str and COLUMN_MAPPING["产品生产月份"] in existing_columns:
-        try:
-            dt = datetime.strptime(prod_month_str, "%Y-%m-%d")
-            record[COLUMN_MAPPING["产品生产月份"]] = dt.strftime("%Y-%m-%d")
-        except (ValueError, TypeError):
-            record[COLUMN_MAPPING["产品生产月份"]] = None
-def rDate(row_dict):
-    """计算临期状态（保持你原有的业务逻辑）"""
-    prod_date_str = row_dict.get("产品生产月份", None)
-    inspect_date_str = row_dict.get("稽查日期", "").strip()
-    if not prod_date_str or not inspect_date_str:
-        row_dict["临期状态"] = ""
-        row_dict["新鲜度"] = ""
-        row_dict["临期月份数"] = ""
-        return
-    try:
-        prod_date = datetime.strptime(prod_date_str, "%Y-%m-%d")
-        inspect_date = datetime.strptime(inspect_date_str, "%Y-%m-%d")
-    except ValueError:
-        row_dict["临期状态"] = ""
-        row_dict["新鲜度"] = ""
-        row_dict["临期月份数"] = ""
-        return
-    product_series = row_dict.get("产品系列", "")
-    zg_status = "未整改"
-    if product_series == "去骨凤爪":
-        expiry_date = prod_date + relativedelta(months=6)
-        gap_months = _calculate_gap_months(expiry_date, inspect_date)
-        if gap_months >= 2:
-            status, freshness,zg_status = "非大日期", "高",""
-        elif 1 <= gap_months < 2:
-            status, freshness = "大日期", "低"
-        elif 0 <= gap_months < 1:
-            status, freshness = "临期", "低"
-        else:
-            status, freshness = "过期", "低"
-    else:
-        expiry_date = prod_date + relativedelta(months=9)
-        gap_months = _calculate_gap_months(expiry_date, inspect_date)
-        if gap_months >= 3:
-            status, freshness,zg_status = "非大日期", "高",""
-        elif 1 <= gap_months < 3:
-            status, freshness = "大日期", "低"
-        elif 0 <= gap_months < 1:
-            status, freshness = "临期", "低"
-        else:
-            status, freshness = "过期", "低"
-    row_dict["临期状态"] = status
-    row_dict["新鲜度"] = freshness
-    row_dict["临期月份数"] = round(gap_months, 2)
-    row_dict["大日期整改状态"] = zg_status
-def _calculate_gap_months(expiry_date, inspect_date):
-    diff_years = expiry_date.year - inspect_date.year
-    diff_months = expiry_date.month - inspect_date.month
-    diff_days = expiry_date.day - inspect_date.day
-    return diff_years * 12 + diff_months + diff_days / 30.0
-import re
-#  这里还需要修改
-def normalize_month(src_month):
-    """
-    将生产月份字符串标准化为 'yyyy-mm' 格式。
-    支持的输入格式：
-      - 'yyyy-mm'（如 '2025-12'）→ 保持不变
-      - 'yyyymm'（如 '202512'）→ 转为 '2025-12'
-    其他格式或无效值返回原值（或可选返回空字符串）
-    """
-    if not isinstance(src_month, str):
-        return src_month  # 非字符串直接返回
-    src_month = src_month.strip()
-    if not src_month:
-        return src_month
-    # 情况1: 已是 yyyy-mm 格式（例如 2025-12）
-    if re.fullmatch(r'\d{4}-\d{1,2}', src_month):
-        # 可选：统一补零为两位月（如 2025-1 → 2025-01）
-        year, month = src_month.split('-')
-        month = month.zfill(2)  # 确保月份两位
-        return f"{year}-{month}-01"
-    # 情况2: 是 yyyymm 格式（6位数字，如 202512）
-    if re.fullmatch(r'\d{6}', src_month):
-        year = src_month[:4]
-        month = src_month[4:].lstrip('0') or '0'  # 防止全零
-        month = src_month[4:].zfill(2)  # 直接取后两位并确保两位（更安全）
-        return f"{year}-{month}-01"
-    # 其他格式：不处理（或可根据需求返回空）
-    return src_month
-if __name__ == "__main__":
-    # TODO: 配置 sheet 页名称
-    print("正在读取【团队】源文件（跳过第 1 行标题，第 2 行作为数据第 1 行）...")
-    # 修改点：
-    # 1. skiprows=1 : 跳过物理第 1 行（标题）
-    # 2. header=None : 关键！告诉 pandas 不要把物理第 2 行当表头，而是当数据。
-    #                这样物理第 2 行会变成 df 的第 0 行，列名会自动变成 0, 1, 2...
-    #                这完美匹配你代码中的 row.iloc[4], row.iloc[8] 等逻辑。
-    df_source_p = pd.read_excel(source_file, skiprows=1, header=None, dtype=str)
-    # 验证读取结果（可选，用于调试）
-    print(f"✅ 成功读取 {len(df_source_p)} 行数据。")
-    if len(df_source_p) > 0:
-        print("前 2 行数据预览（确认第 2 行是否在列）：")
-        print(df_source_p.head(2))
-        print(f"列索引范围：0 到 {len(df_source_p.columns) - 1}")
-    main(df_source_p, '稽查团队', PRODUCT_GROUPS_JC)
\ No newline at end of file
--- a/code/index.py
+++ b/code/index.py
-"""
+from fastapi import FastAPI
-数据清洗系统 - FastAPI 应用主程序
+from fastapi.exceptions import HTTPException, RequestValidationError
-Description: 提供 Excel 数据解析、清洗和存储的 API 服务
-"""
-from fastapi import FastAPI, BackgroundTasks
+from api.exception_handlers import http_exception_handler, validation_exception_handler
-from pydantic import BaseModel
+from api.routes_clean import api_router
-import logging
-import uuid
-import asyncio
-import math
-import random
-import pandas as pd
-from io import BytesIO
-from datetime import datetime
-from typing import Optional, Dict, Any
-# 导入业务模块
+app = FastAPI(title="Clean Data API")
-from core.excel_handler import ExcelHandler
+app.add_exception_handler(HTTPException, http_exception_handler)
-from core.data_cleaner import DataCleaner
+app.add_exception_handler(RequestValidationError, validation_exception_handler)
-from core.db_handler import DatabaseHandler
+app.include_router(api_router)
-from core.progress_manager import ProgressManager
-from utils.exceptions import DataCleaningException, DatabaseException
-from utils.validators import validate_excel_url
-from utils.response import BizCode, ok_resp, fail_resp
-# 风控稽查大宽表：中文列名 → 数据库英文字段名
-FENGKONG_COLUMN_MAP = {
-    "稽查日期":              "audit_date",
-    "稽查来源":              "source",
-    "大区":                  "region_name",
-    "战区":                  "district_name",
-    "经销商编码":            "dealer_code",
-    "经销商名称":            "dealer_name",
-    "勤策门店编码":          "store_code",
-    "勤策门店名称":          "store_name",
-    "客户经理工号":          "f_emp_no",
-    "客户经理":              "f_emp_name",
-    "勤策渠道大类":          "qin_ce_type_large",
-    "稽核渠道（对N列清洗）": "jh_channel_type",
-    "城市":                  "city",
-    "渠道类型（稽查源提供）":"channel_type",
-    "产品系列":              "series",
-    "产品口味":              "taste",
-    "产品克重":              "weight",
-    "产品价格":              "price",
-    "是否低价":              "low_price",
-    "破价价差":              "low_price_diff",
-    "低价整改状态":          "low_price_status",
-    "低价整改说明":          "low_price_rectify",
-    "产品生产月份":          "production_month",
-    "临期月份数":            "near_month_num",
-    "临期状态":              "near_month_status",
-    "新鲜度":                "fresh_status",
-    "大日期整改状态":        "large_date_status",
-    "大日期整改说明":        "large_date_rectify",
-}
-# risk_audit_visit 各字段类型分组（用于入库前类型强制转换）
-_FK_DECIMAL_COLS = {"price", "low_price_diff"}
-_FK_INT_COLS     = {"near_month_num"}
-_FK_DATE_COLS    = {"audit_date", "production_month"}
-# varchar 字段最大长度限制（超长截断，防止 Data too long 报错）
-_FK_VARCHAR_MAX  = {
-    "source": 20, "region_name": 20, "district_name": 20,
-    "dealer_code": 10, "dealer_name": 100,
-    "store_code": 20, "store_name": 100,
-    "f_emp_no": 20, "f_emp_name": 100,
-    "qin_ce_type_large": 20, "jh_channel_type": 20,
-    "city": 30, "channel_type": 30,
-    "series": 20, "taste": 20, "weight": 20,
-    "low_price": 20, "low_price_status": 20, "low_price_rectify": 100,
-    "near_month_status": 20, "fresh_status": 20,
-    "large_date_status": 20, "large_date_rectify": 100,
-}
-def _coerce_fengkong_row(row: dict) -> dict:
-    """
-    对已完成列名映射（英文 key）的行做类型强制转换，使其与 risk_audit_visit 字段类型完全匹配：
-      - decimal: 转 float，失败 → None
-      - int:     转 int，失败 → None
-      - date:    保留 'YYYY-MM-DD' 前10位，格式非法 → None
-      - varchar: 转字符串并按最大长度截断，空值 → None
-    """
-    result = {}
-    for col, val in row.items():
-        # 统一空值处理
-        if val is None or (isinstance(val, str) and val.strip() == ''):
-            result[col] = None
-            continue
-        if col in _FK_DECIMAL_COLS:
-            try:
-                result[col] = float(val)
-            except (ValueError, TypeError):
-                result[col] = None
-        elif col in _FK_INT_COLS:
-            try:
-                result[col] = int(float(val))
-            except (ValueError, TypeError):
-                result[col] = None
-        elif col in _FK_DATE_COLS:
-            s = str(val)[:10]
-            try:
-                datetime.strptime(s, "%Y-%m-%d")
-                result[col] = s
-            except ValueError:
-                result[col] = None
-        else:
-            # varchar：转字符串，按最大长度截断
-            s = str(val).strip()
-            max_len = _FK_VARCHAR_MAX.get(col)
-            result[col] = s[:max_len] if max_len else s
-    return result
-def _sanitize_nan(records: list) -> list:
-    """将列表中每行 dict 里的 float NaN / Inf 以及空字符串替换为 None，确保数据库写入兼容。"""
-    sanitized = []
-    for row in records:
-        sanitized.append({
-            k: (None if (isinstance(v, float) and (math.isnan(v) or math.isinf(v)))
-                     or (isinstance(v, str) and v.strip() == '')
-                else v)
-            for k, v in row.items()
-        })
-    return sanitized
-# 配置日志
-logging.basicConfig(
-    level=logging.INFO, # 只记录 INFO 以上的日志
-    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' # 时间 - 模块名 - 级别 - 内容
-)
-logger = logging.getLogger(__name__) # __name__ 运行时获取模块名
-# 创建 FastAPI 应用
-app = FastAPI(
-    title="数据清洗系统",
-    description="用于数据解析、清洗和持久化的 API 服务",
-    version="1.0.0"  # 添加这一行
-)
-# ==================== 请求数据模型 ====================
-class CleaningRequest(BaseModel):
-    """数据清洗请求模型"""
-    excel_url: Optional[str] = None   # 普通清洗模式必填；风控稽查模式可不传
-    department: str
-    description: Optional[str] = None
-    audit_date: Optional[str] = None  # 稽查日期，格式 'yyyy-mm-dd'，不传则取上月1号
-    # ── 风控稽查数据清洗 专用字段 ──────────────────────────────────
-    year: Optional[int] = None        # 数据所属年
-    month: Optional[int] = None       # 数据所属月
-    day: Optional[int] = None         # 数据所属日
-    team_url: Optional[str] = None    # 团队数据表链接
-    puling_url: Optional[str] = None  # 浦零数据表链接
-    chengyu_url: Optional[str] = None # 诚予数据表链接
-class SavingRequest(BaseModel):
-    """数据保存请求模型"""
-    task_id: str
-    table_name: Optional[str] = None  # 风控稽查任务已预设表名，可不传
-# ==================== 业务逻辑 ====================
-class DataCleaningService:
-    """数据清洗服务主类"""
-    # 性能基准参数（可根据实际情况调整）
-    DOWNLOAD_TIME_BASE = 2  # 下载和解析基础时间（秒）
-    DOWNLOAD_TIME_PER_ROW = 0.0001  # 每行数据的下载时间（秒）
-    CLEANING_TIME_PER_ROW = 0.001  # 每行数据的清洗时间（秒）
-    VALIDATION_TIME_BASE = 1  # 验证基础时间（秒）
-    CACHING_TIME_PER_ROW = 0.0001  # 每行数据的缓存时间（秒）
-    CACHE_TTL_SECONDS = 1800  # cache 保留时长：30 分钟
-    def __init__(self):
-        self.progress_manager = ProgressManager()
-        self.excel_handler = ExcelHandler()
-        self.data_cleaner = DataCleaner()
-        self.db_handler = DatabaseHandler()
-        # 存储已清洗的数据（内存中，可扩展为 Redis）
-        self.cleaned_data_cache: Dict[str, Any] = {}
-        # 正在执行保存操作的 task_id 集合，用于防止并发重复写入
-        self._saving_tasks: set = set()
-    def _evict_expired_cache(self):
-        """清除超过 TTL 的 cache 条目，在写入和读取时调用"""
-        now = datetime.now()
-        expired = [
-            tid for tid, v in self.cleaned_data_cache.items()
-            if (now - v['created_at']).total_seconds() > self.CACHE_TTL_SECONDS
-        ]
-        for tid in expired:
-            del self.cleaned_data_cache[tid]
-            logger.info(f"[cache] 已清除过期任务 {tid}")
-    def estimate_completion_time(self, row_count: int) -> int:
-        """
-        根据数据行数预估完成时间
-        Args:
-            row_count: Excel 文件的数据行数
-        Returns:
-            int: 预估完成时间（秒）
-        """
-        # 计算各阶段时间
-        download_time = self.DOWNLOAD_TIME_BASE + (row_count * self.DOWNLOAD_TIME_PER_ROW)
-        validation_time = self.VALIDATION_TIME_BASE
-        cleaning_time = row_count * self.CLEANING_TIME_PER_ROW
-        caching_time = row_count * self.CACHING_TIME_PER_ROW
-        # 总时间（向上取整）
-        total_time = int(download_time + validation_time + cleaning_time + caching_time)
-        # 最少 5 秒，最多 3600 秒（1小时）
-        return max(5, min(total_time, 3600))
-    async def clean_data_from_url(
-        self,
-        task_id: str,
-        excel_url: str,
-        department: str,
-        raw_data: list = None,
-        audit_date: str = None
-    ) -> Dict[str, Any]:
-        """
-        从 URL 下载并清洗 Excel 数据
-        Args:
-            task_id: 任务唯一标识
-            excel_url: Excel 文件的网络链接
-            department: 业务部门名称
-            raw_data: 可选，已下载的原始数据（由路由层传入以避免重复下载）
-            audit_date: 稽查日期字符串，格式 'yyyy-mm-dd'
-        Returns:
-            包含清洗结果的字典
-        """
-        try:
-            # 1. 记录任务开始
-            self.progress_manager.update_progress(
-                task_id,
-                status="processing",
-                progress=10,
-                message="开始下载 Excel 文件"
-            )
-            logger.info(f"[{task_id}] 开始处理数据清洗任务")
-            # 2. 下载并解析 Excel（若路由层已下载则直接复用，避免重复请求）
-            if raw_data is None:
-                self.progress_manager.update_progress(
-                    task_id,
-                    status="processing",
-                    progress=20,
-                    message="正在解析 Excel 文件"
-                )
-                raw_data = await self.excel_handler.fetch_and_parse(excel_url)
-            logger.info(f"[{task_id}] 成功解析 Excel，数据行数: {len(raw_data)}")
-            # 3. 数据验证
-            self.progress_manager.update_progress(
-                task_id,
-                status="processing",
-                progress=30,
-                message="正在验证数据"
-            )
-            if not raw_data:
-                raise DataCleaningException("解析的 Excel 数据为空")
-            # 4. 执行数据清洗
-            self.progress_manager.update_progress(
-                task_id,
-                status="processing",
-                progress=50,
-                message="正在清洗数据"
-            )
-            cleaned_data = await self.data_cleaner.clean(
-                raw_data,
-                department,
-                progress_callback=lambda p, m, count=None: self.progress_manager.update_progress(
-                    task_id,
-                    status="processing",
-                    progress=int(50 + p * 0.4),  # 进度从50%到90%
-                    message=m,
-                    processed_count=count
-                ),
-                audit_date=audit_date
-            )
-            logger.info(f"[{task_id}] 数据清洗完成，清洗后数据行数: {len(cleaned_data)}")
-            # 5. 缓存清洗后的数据（写入前先清除过期条目）
-            self.progress_manager.update_progress(
-                task_id,
-                status="processing",
-                progress=90,
-                message="正在缓存清洗后的数据"
-            )
-            self._evict_expired_cache()
-            safe_data = _sanitize_nan(cleaned_data)
-            self.cleaned_data_cache[task_id] = {
-                'data': safe_data,
-                'department': department,
-                'created_at': datetime.now(),
-                'row_count': len(safe_data)
-            }
-            # 6. 任务完成
-            self.progress_manager.update_progress(
-                task_id,
-                status="completed",
-                progress=100,
-                message="数据清洗完成，等待前端确认",
-                processed_count=len(cleaned_data)
-            )
-            return {
-                'task_id': task_id,
-                'status': 'completed',
-                'message': '数据清洗成功',
-                'data_preview': cleaned_data[:5],  # 返回前5行用于预览
-                'total_rows': len(cleaned_data)
-            }
-        except DataCleaningException as e:
-            logger.error(f"[{task_id}] 数据清洗业务异常: {str(e)}")
-            self.progress_manager.update_progress(
-                task_id,
-                status="failed",
-                progress=0,
-                message=f"清洗失败: {str(e)}"
-            )
-            raise
-        except Exception as e:
-            logger.error(f"[{task_id}] 数据清洗系统异常: {str(e)}", exc_info=True)
-            self.progress_manager.update_progress(
-                task_id,
-                status="failed",
-                progress=0,
-                message=f"系统异常: {str(e)}"
-            )
-            raise DataCleaningException(f"未知错误: {str(e)}")
-    async def save_cleaned_data(
-        self,
-        task_id: str,
-        table_name: str
-    ) -> Dict[str, Any]:
-        """
-        将清洗后的数据保存到数据库
-        Args:
-            task_id: 任务唯一标识
-            table_name: 目标表名
-        Returns:
-            包含保存结果的字典
-        """
-        # ── 并发防重：同一 task_id 只允许一个 save 请求在执行 ──────────
-        # asyncio 是单线程协程模型，此处 check-and-add 之间不会发生协程切换，
-        # 因此无需额外加锁，天然原子。
-        if task_id in self._saving_tasks:
-            raise DatabaseException(f"任务 {task_id} 正在保存中，请勿重复提交")
-        self._saving_tasks.add(task_id)
-        try:
-            logger.info(f"[{task_id}] 开始保存数据到数据库")
-            # 验证数据是否存在（先清除过期条目）
-            self._evict_expired_cache()
-            if task_id not in self.cleaned_data_cache:
-                raise DatabaseException(f"任务 {task_id} 的清洗数据不存在或已过期（超过30分钟）")
-            cached = self.cleaned_data_cache[task_id]
-            cleaned_data = cached['data']
-            # 优先使用缓存中预设的表名（风控稽查任务已写死 risk_audit_visit）
-            target_table = cached.get('table_name') or table_name
-            if not target_table:
-                raise DatabaseException("未指定目标表名，请在请求中传入 table_name")
-            # 将中文列名映射为数据库英文字段名，并强制转换各字段类型（仅对 risk_audit_visit 生效）
-            if target_table == "risk_audit_visit":
-                cleaned_data = [
-                    _coerce_fengkong_row(
-                        {FENGKONG_COLUMN_MAP[k]: v for k, v in row.items() if k in FENGKONG_COLUMN_MAP}
-                    )
-                    for row in cleaned_data
-                ]
-            # 保存到数据库
-            submitted_rows, raw_affected = await self.db_handler.insert_data(
-                target_table,
-                cleaned_data
-            )
-            logger.info(
-                f"[{task_id}] 成功保存到 {target_table}，"
-                f"提交行数={submitted_rows}，raw_affected={raw_affected}"
-            )
-            # 清理缓存
-            del self.cleaned_data_cache[task_id]
-            return {
-                'task_id': task_id,
-                'status': 'saved',
-                'message': '数据已成功保存到数据库',
-                'affected_rows': submitted_rows,   # 真实提交（去重后）行数，与预览页 total_rows 一致
-            }
-        except DatabaseException as e:
-            logger.error(f"[{task_id}] 数据库异常: {str(e)}")
-            raise
-        except Exception as e:
-            logger.error(f"[{task_id}] 保存数据时出错: {str(e)}", exc_info=True)
-            raise DatabaseException(f"保存失败: {str(e)}")
-        finally:
-            # 无论成功或失败，都释放保存锁，避免任务永远卡在「保存中」状态
-            self._saving_tasks.discard(task_id)
-    async def clean_fengkong_data(
-        self,
-        task_id: str,
-        team_url: Optional[str],
-        puling_url: Optional[str],
-        chengyu_url: Optional[str],
-        audit_date: Optional[str],
-    ) -> Dict[str, Any]:
-        """
-        风控稽查数据清洗：分别下载团队、浦零、诚予数据源，各自清洗后合并为一张大宽表，
-        结果存入内存缓存，不写本地文件。
-        Args:
-            task_id:     任务唯一标识
-            team_url:    团队数据表下载链接（可为 None）
-            puling_url:  浦零数据表下载链接（可为 None）
-            chengyu_url: 诚予数据表下载链接（可为 None）
-            audit_date:  稽查日期，格式 'yyyy-mm-dd'；为 None 时各模块自动取上月1号
-        """
-        from core_py.数据转换_团队 import (
-            transform as team_transform,
-            PRODUCT_GROUPS_JC,
-            STANDARD_COLUMNS,
-        )
-        from core_py.数据转换_诚予_浦零 import (
-            transform as pl_cy_transform,
-            PRODUCT_GROUPS,
-            PRODUCT_GROUPS_CY,
-        )
-        try:
-            self.progress_manager.update_progress(
-                task_id, status="processing", progress=5, message="开始风控稽查数据清洗"
-            )
-            logger.info(f"[{task_id}] 开始风控稽查数据清洗，audit_date={audit_date}")
-            all_records = []
-            progress_step = 0
-            source_count = sum(1 for u in [team_url, puling_url, chengyu_url] if u)
-            progress_per_source = int(80 / source_count) if source_count else 80
-            # ── 1. 团队数据 ──────────────────────────────────────────
-            if team_url:
-                progress_step += progress_per_source
-                self.progress_manager.update_progress(
-                    task_id, status="processing",
-                    progress=max(10, progress_step - progress_per_source + 10),
-                    message="正在下载团队数据表..."
-                )
-                raw_bytes = await self.excel_handler.fetch_bytes(team_url)
-                df_team = await asyncio.to_thread(
-                    pd.read_excel, BytesIO(raw_bytes), skiprows=1, header=None, dtype=str
-                )
-                self.progress_manager.update_progress(
-                    task_id, status="processing",
-                    progress=max(10, progress_step - progress_per_source // 2),
-                    message="正在清洗团队数据..."
-                )
-                records_team = await asyncio.to_thread(
-                    team_transform, df_team, "稽查团队", PRODUCT_GROUPS_JC, audit_date
-                )
-                all_records.extend(records_team)
-                logger.info(f"[{task_id}] 团队数据清洗完成，{len(records_team)} 条记录")
-            # ── 2. 浦零数据 ──────────────────────────────────────────
-            if puling_url:
-                progress_step += progress_per_source
-                self.progress_manager.update_progress(
-                    task_id, status="processing",
-                    progress=max(15, progress_step - progress_per_source + 10),
-                    message="正在下载浦零数据表..."
-                )
-                raw_bytes = await self.excel_handler.fetch_bytes(puling_url)
-                df_pl = await asyncio.to_thread(
-                    pd.read_excel, BytesIO(raw_bytes), header=2, dtype=str
-                )
-                self.progress_manager.update_progress(
-                    task_id, status="processing",
-                    progress=max(15, progress_step - progress_per_source // 2),
-                    message="正在清洗浦零数据..."
-                )
-                records_pl = await asyncio.to_thread(
-                    pl_cy_transform, df_pl, "浦零", PRODUCT_GROUPS, audit_date
-                )
-                all_records.extend(records_pl)
-                logger.info(f"[{task_id}] 浦零数据清洗完成，{len(records_pl)} 条记录")
-            # ── 3. 诚予数据 ──────────────────────────────────────────
-            if chengyu_url:
-                progress_step += progress_per_source
-                self.progress_manager.update_progress(
-                    task_id, status="processing",
-                    progress=max(20, progress_step - progress_per_source + 10),
-                    message="正在下载诚予数据表..."
-                )
-                raw_bytes = await self.excel_handler.fetch_bytes(chengyu_url)
-                df_cy = await asyncio.to_thread(
-                    pd.read_excel, BytesIO(raw_bytes), header=2, dtype=str
-                )
-                self.progress_manager.update_progress(
-                    task_id, status="processing",
-                    progress=max(20, progress_step - progress_per_source // 2),
-                    message="正在清洗诚予数据..."
-                )
-                records_cy = await asyncio.to_thread(
-                    pl_cy_transform, df_cy, "诚予", PRODUCT_GROUPS_CY, audit_date
-                )
-                all_records.extend(records_cy)
-                logger.info(f"[{task_id}] 诚予数据清洗完成，{len(records_cy)} 条记录")
-            # ── 4. 合并为大宽表（内存，不写文件） ──────────────────
-            self.progress_manager.update_progress(
-                task_id, status="processing", progress=85, message="正在合并数据宽表..."
-            )
-            df_merged = pd.DataFrame(all_records, columns=STANDARD_COLUMNS)
-            logger.info(f"[{task_id}] 大宽表合并完成，共 {len(df_merged)} 条记录")
-            # ── 5. 低价计算（从数据库读取价盘，回填低价字段） ────────
-            self.progress_manager.update_progress(
-                task_id, status="processing", progress=93, message="正在执行低价计算..."
-            )
-            import importlib.util, pathlib
-            _lp_spec = importlib.util.spec_from_file_location(
-                "low_price_calc",
-                pathlib.Path(__file__).parent / "core_py" / "1低价计算.py",
-            )
-            _lp_mod = importlib.util.module_from_spec(_lp_spec)
-            _lp_spec.loader.exec_module(_lp_mod)
-            df_final = await asyncio.to_thread(_lp_mod.transform, df_merged)
-            final_records = _sanitize_nan(
-                df_final.where(pd.notna(df_final), None).to_dict(orient="records")
-            )
-            logger.info(f"[{task_id}] 低价计算完成，共 {len(final_records)} 条记录")
-            # ── 6. 写入内存缓存 ──────────────────────────────────────
-            self._evict_expired_cache()
-            self.cleaned_data_cache[task_id] = {
-                "data": final_records,
-                "department": "风控稽查数据清洗",
-                "created_at": datetime.now(),
-                "row_count": len(final_records),
-                "table_name": "risk_audit_visit",
-            }
-            self.progress_manager.update_progress(
-                task_id, status="completed", progress=100,
-                message=f"风控稽查数据清洗完成，共 {len(final_records)} 条记录，等待前端确认",
-                processed_count=len(final_records)
-            )
-            return {
-                "task_id": task_id,
-                "status": "completed",
-                "message": "风控稽查数据清洗成功",
-                "data_preview": final_records[:5],
-                "total_rows": len(final_records),
-            }
-        except Exception as e:
-            logger.error(f"[{task_id}] 风控稽查数据清洗失败: {str(e)}", exc_info=True)
-            self.progress_manager.update_progress(
-                task_id, status="failed", progress=0,
-                message=f"清洗失败: {str(e)}"
-            )
-            raise
-# ==================== 初始化服务 ====================
-service = DataCleaningService()
-# ==================== API 路由 ====================
-@app.post("/api/v1/clean")
-async def start_cleaning(request: CleaningRequest, background_tasks: BackgroundTasks):
-    """
-    启动数据清洗任务
-    Returns: { code, msg, data: { task_id, status, estimated_completion_time, total_rows } }
-    """
-    try:
-        task_id = str(uuid.uuid4())
-        logger.info(f"创建新任务: {task_id}, 部门: {request.department}")
-        # ── 风控稽查数据清洗 专用分支 ──────────────────────────────
-        if request.department == "风控稽查数据清洗":
-            if not any([request.team_url, request.puling_url, request.chengyu_url]):
-                return fail_resp(BizCode.BAD_REQUEST, "风控稽查数据清洗至少需要提供一个数据源地址（team_url / puling_url / chengyu_url）")
-            # 从 year/month/day 构造稽查日期，未传则由清洗模块自动取上月1号
-            audit_date = None
-            if request.year and request.month and request.day:
-                audit_date = f"{request.year}-{request.month:02d}-{request.day:02d}"
-            estimated_rows = 1000
-            estimated_time = service.estimate_completion_time(estimated_rows)
-            # 提前写入 queued 进度，避免前端轮询时返回 404
-            service.progress_manager.update_progress(
-                task_id, status="queued", progress=0, message="任务已创建，等待处理"
-            )
-            background_tasks.add_task(
-                service.clean_fengkong_data,
-                task_id,
-                request.team_url,
-                request.puling_url,
-                request.chengyu_url,
-                audit_date,
-            )
-        # ── 普通清洗分支 ───────────────────────────────────────────
-        else:
-            if not validate_excel_url(request.excel_url):
-                return fail_resp(BizCode.BAD_REQUEST, "Excel URL 格式无效")
-            estimated_rows = 0
-            estimated_time = 5
-            prefetched_raw_data = None
-            try:
-                prefetched_raw_data = await service.excel_handler.fetch_and_parse(request.excel_url)
-                estimated_rows = len(prefetched_raw_data)
-                estimated_time = service.estimate_completion_time(estimated_rows)
-                logger.info(f"[{task_id}] 预估数据行数: {estimated_rows}, 预估完成时间: {estimated_time}秒")
-            except Exception as e:
-                logger.warning(f"[{task_id}] 预读 Excel 失败，后台任务将重新下载: {str(e)}")
-                estimated_rows = 1000
-                estimated_time = service.estimate_completion_time(estimated_rows)
-            background_tasks.add_task(
-                service.clean_data_from_url,
-                task_id,
-                request.excel_url,
-                request.department,
-                prefetched_raw_data,
-                request.audit_date,
-            )
-        return ok_resp(
-            data={
-                "task_id": task_id,
-                "status": "queued",
-                "estimated_completion_time": estimated_time,
-                "total_rows": estimated_rows,
-            },
-            msg="任务已创建，正在处理中..."
-        )
-    except Exception as e:
-        logger.error(f"启动清洗任务失败: {str(e)}")
-        return fail_resp(BizCode.SERVER_ERROR, f"启动任务失败: {str(e)}", http_status=500)
-@app.get("/api/v1/progress/{task_id}")
-async def get_progress(task_id: str):
-    """
-    获取数据清洗进度（HTTP 轮询，建议前端每 500ms-1s 调用一次）
-    Returns: { code, msg, data: { task_id, status, progress, message, timestamp } }
-    """
-    try:
-        progress_data = service.progress_manager.get_progress(task_id)
-        if not progress_data:
-            return fail_resp(BizCode.NOT_FOUND, "任务不存在", http_status=404)
-        return ok_resp(data=progress_data)
-    except Exception as e:
-        logger.error(f"获取进度失败: {str(e)}")
-        return fail_resp(BizCode.SERVER_ERROR, "获取进度失败", http_status=500)
-@app.get("/api/v1/result/{task_id}")
-async def get_cleaning_result(task_id: str):
-    """
-    获取清洗结果及数据预览（任务完成后调用）
-    Returns: { code, msg, data: { task_id, status, data_preview, total_rows, department } }
-    """
-    try:
-        progress_data = service.progress_manager.get_progress(task_id)
-        if not progress_data:
-            return fail_resp(BizCode.NOT_FOUND, "任务不存在", http_status=404)
-        if progress_data['status'] == 'processing':
-            return fail_resp(BizCode.TASK_PROCESSING, "任务仍在处理中", http_status=202)
-        if progress_data['status'] == 'failed':
-            return fail_resp(BizCode.TASK_FAILED, progress_data['message'])
-        service._evict_expired_cache()
-        if task_id not in service.cleaned_data_cache:
-            return fail_resp(BizCode.NOT_FOUND, "清洗数据不存在或已过期（超过30分钟）", http_status=404)
-        cached = service.cleaned_data_cache[task_id]
-        raw_data = cached['data']
-        # 对 risk_audit_visit 先做列名映射 + 类型转换，再基于唯一键去重，
-        # 得到真正会写入数据库的行数（用于 total_rows）；预览数据保留中文列名
-        target_table = cached.get('table_name', '')
-        if target_table == "risk_audit_visit":
-            mapped = [
-                _coerce_fengkong_row(
-                    {FENGKONG_COLUMN_MAP[k]: v for k, v in row.items() if k in FENGKONG_COLUMN_MAP}
-                )
-                for row in raw_data
-            ]
-            # 按唯一键去重（保留最后一条，与 ON DUPLICATE KEY UPDATE 行为一致）
-            _BIZ_KEYS = ("audit_date", "source", "store_name", "channel_type", "series", "taste", "weight")
-            dedup: dict = {}
-            for i, row in enumerate(mapped):
-                key = tuple(row.get(k) for k in _BIZ_KEYS)
-                dedup[key] = i   # 只记录原始行索引，用于去重后从 raw_data 取中文行
-            total_rows = len(dedup)
-            # 用去重后的索引对应回 raw_data（中文列名），保证预览列始终为中文
-            dedup_raw = [raw_data[i] for i in dedup.values()]
-        else:
-            dedup_raw = raw_data
-            total_rows = len(raw_data)
-        # 随机抽取最多 20 行用于前端预览（中文列名）
-        sample_rows = random.sample(dedup_raw, min(20, len(dedup_raw)))
-        return ok_resp(
-            data={
-                "task_id": task_id,
-                "status": "ready_to_save",
-                "data_preview": sample_rows,
-                "total_rows": total_rows,           # 去重后的预估入库行数
-                "raw_rows": cached['row_count'],    # 清洗前宽表原始行数，供参考
-                "department": cached['department']
-            },
-            msg="数据清洗完成，可进行保存"
-        )
-    except Exception as e:
-        logger.error(f"获取清洗结果失败: {str(e)}")
-        return fail_resp(BizCode.SERVER_ERROR, "获取结果失败", http_status=500)
-@app.post("/api/v1/save")
-async def save_cleaned_data(request: SavingRequest):
-    """
-    保存清洗后的数据到 MySQL 数据库（前端确认数据无误后调用）
-    Returns: { code, msg, data: { task_id, status, affected_rows } }
-    """
-    try:
-        if not request.task_id:
-            return fail_resp(BizCode.BAD_REQUEST, "参数不完整：task_id 为必填")
-        result = await service.save_cleaned_data(request.task_id, request.table_name)
-        return ok_resp(data=result, msg="数据已成功保存到数据库")
-    except DatabaseException as e:
-        logger.error(f"保存数据失败: {str(e)}")
-        return fail_resp(BizCode.DB_ERROR, str(e), http_status=500)
-    except Exception as e:
-        logger.error(f"保存数据时发生错误: {str(e)}")
-        return fail_resp(BizCode.SERVER_ERROR, f"保存失败: {str(e)}", http_status=500)
-@app.get("/api/v1/url-link")
-async def get_url_link():
-    """
-    从数据库 fortune-hub.transfer_url 表读取跳转链接
-    Returns: { code, msg, data: { url_link: str } }
-    """
-    try:
-        with service.db_handler._get_connection() as conn:
-            cursor = conn.cursor(dictionary=True)
-            cursor.execute("SELECT `url_link` FROM `fortune-hub`.`transfer_url` LIMIT 1")
-            row = cursor.fetchone()
-            cursor.close()
-        if not row or not row.get("url_link"):
-            return fail_resp(BizCode.NOT_FOUND, "未查询到跳转链接数据", http_status=404)
-        return ok_resp(data={"url_link": row["url_link"]})
-    except Exception as e:
-        logger.error(f"获取跳转链接失败: {str(e)}")
-        return fail_resp(BizCode.DB_ERROR, f"获取跳转链接失败: {str(e)}", http_status=500)
-@app.get("/api/v1/health")
-async def health_check():
-    """健康检查接口"""
-    return ok_resp(
-        data={"service": "数据清洗系统", "timestamp": str(datetime.now())},
-        msg="healthy"
-    )
-@app.get("/")
-async def root():
-    """根路由 - API 欢迎信息"""
-    return ok_resp(
-        data={"version": "1.0.0", "docs": "/docs", "redoc": "/redoc"},
-        msg="欢迎使用数据清洗系统"
-    )
-# ==================== 异常处理 ====================
-@app.exception_handler(DataCleaningException)
-async def data_cleaning_exception_handler(request, exc):
-    """处理数据清洗异常"""
-    logger.error(f"DataCleaningException: {str(exc)}")
-    return fail_resp(BizCode.TASK_FAILED, str(exc), http_status=400)
-@app.exception_handler(DatabaseException)
-async def database_exception_handler(request, exc):
-    """处理数据库异常"""
-    logger.error(f"DatabaseException: {str(exc)}")
-    return fail_resp(BizCode.DB_ERROR, str(exc), http_status=500)
-# ==================== 应用启动和关闭事件 ====================
-@app.on_event("startup")
-async def startup_event():
-    """应用启动时的初始化"""
-    logger.info("数据清洗系统启动")
-    try:
-        # 初始化数据库连接等
-        pass
-    except Exception as e:
-        logger.error(f"启动时出错: {str(e)}")
-@app.on_event("shutdown")
-async def shutdown_event():
-    """应用关闭时的清理"""
-    logger.info("数据清洗系统关闭")
-    try:
-        # 关闭数据库连接等
-        pass
-    except Exception as e:
-        logger.error(f"关闭时出错: {str(e)}")
-# ==================== 主程序入口 ====================
-if __name__ == "__main__":
-    import uvicorn
-    # 运行 Uvicorn 服务器
-    uvicorn.run(
-        app,
-        host="0.0.0.0",
-        port=8000,
-        log_level="info",
-        reload=True  # 开发环境下启用热重载
-    )
--- a/code/core_py/数据转换_诚予_浦零.py
+++ b/code/core_py/数据转换_诚予_浦零.py
-import pandas as pd
+# 数据转换_团队.py
+# 团队稽查宽表仅从 URL 拉取；转窄表后写入本地目标 xlsx「合并后」sheet，并计算临期/大日期/新鲜度等。
 import copy
 import os
+import sys
+import urllib.error
 from datetime import datetime
+from pathlib import Path
+import pandas as pd
 from dateutil.relativedelta import relativedelta
-# TODO: === 配置区 ===
+# 动态加载本文件时保证能 import code 下的 utils
-# TODO: 配置稽查月份（默认1号）0:1，1:12，2:11，3:10
+_CODE_ROOT = Path(__file__).resolve().parent.parent
-current_date = (datetime.now().replace(day=1) - relativedelta(months=1)).strftime("%Y-%m-01")
+if str(_CODE_ROOT) not in sys.path:
-source_file = "/王小卤/风控/代码-新/2026.2-浦零数据源.xlsx"
+    sys.path.insert(0, str(_CODE_ROOT))
-#source_file_cy = "/Users/a02200059/Desktop/王小卤/风控中心/低价+大日期/2512门店稽查结果/诚予国际.xlsx"
-target_file = f"/王小卤/风控/代码-新/大日期{current_date}_2.xlsx"
+from utils.dates import (  # noqa: E402
+    approx_gap_months_calendar,
+    first_yyyy_mm_dd_in_dataframe,
+    first_yyyy_mm_dd_in_iloc,
+    normalize_year_month_to_day01,
+    to_yyyy_mm_dd,
+)
+from utils.excel_http import read_excel_from_url_skip1_with_header_row  # noqa: E402
+def _resolve_audit_date(
+    audit_date_str: str | None,
+    df_target: pd.DataFrame,
+    df_source: pd.DataFrame | None = None,
+    *,
+    source_audit_col: int = 0,
+) -> tuple[str | None, str | None]:
+    """稽查日期：团队宽表指定列 → 显式参数 → 目标表列/第三列。返回 (YYYY-MM-DD, 错误信息)。"""
+    n = None
+    if df_source is not None and df_source.shape[1] > source_audit_col:
+        n = first_yyyy_mm_dd_in_iloc(df_source, source_audit_col)
+    if n is None and audit_date_str is not None and str(audit_date_str).strip():
+        n = to_yyyy_mm_dd(audit_date_str)
+        if n is None:
+            return None, f"稽查日期参数无法解析: {audit_date_str!r}"
+    if n is None:
+        n = first_yyyy_mm_dd_in_dataframe(
+            df_target,
+            ("稽查日期列", "稽查日期"),
+            third_column_fallback=True,
+        )
+    if n:
+        return n, None
+    return None, (
+        "未获取到稽查日期：请在团队宽表「稽核/稽查日期」列（首行表头须含该字样）逐行填写；"
+        "或在目标表「合并后」填写；或传入 audit_date_str。"
+    )
 # 列映射（目标表列名）
 COLUMN_MAPPING = {
-    "稽查日期": "稽查日期",
+    "稽查日期": "稽查日期",  # 逻辑名 → 目标表列名
    "稽查来源": "稽查来源",
    "勤策门店编码": "勤策门店编码",
    "勤策门店名称": "勤策门店名称",
@@ -28,225 +74,140 @@ COLUMN_MAPPING = {
    "产品生产月份": "产品生产月份",
 }
-# ===== 新增：多产品组配置 =====
+# ===== 多产品组配置：每组对应源表一截「价格 + 多口味月份列」=====
-# 每组：价格列 + 7个口味列 + 产品信息
+PRODUCT_GROUPS_JC = [
-# 诚予国际
-PRODUCT_GROUPS_CY = [
    # 第1组：虎皮凤爪 210g
    {
-        "price_col": 7,
+        "price_col": 50,  # 源表列索引：该组价格
-        "flavor_cols": [8, 9, 10, 11, 12, 13, 14],
+        "flavor_cols": [51, 52, 53, 54, 55, 56, 57],  # 各口味生产月份列索引
-        "series": "虎皮凤爪",
+        "series": "虎皮凤爪",  # 写入目标表的产品系列
-        "weight": "210g",
+        "weight": "210g",  # 克重文案
-        "flavors": ["卤香", "香辣", "椒麻", "火锅", "微辣", "麻辣", "黑鸭"]
+        "flavors": ["卤香", "香辣", "椒麻", "火锅", "微辣", "麻辣", "黑鸭"]  # 与 flavor_cols 一一对应
    },
    # 第2组：虎皮凤爪 105g
    {
-        "price_col": 15,
+        "price_col": 58,
-        "flavor_cols": [16, 17, 18, 19, 20, 21, 22],
+        "flavor_cols": [59, 60, 61, 62, 63, 64, 65],
        "series": "虎皮凤爪",
        "weight": "105g",
        "flavors": ["卤香", "香辣", "椒麻", "火锅", "微辣", "麻辣", "黑鸭"]
    },
    # 第3组：虎皮凤爪 68g
    {
-        "price_col": 23,
+        "price_col": 66,
-        "flavor_cols": [24, 25, 26, 27, 28],
+        "flavor_cols": [67, 68, 69, 70, 71],
        "series": "虎皮凤爪",
        "weight": "68g",
        "flavors": ["卤香", "香辣", "椒麻", "麻辣", "黑鸭"]
    },
    # 第4组：鸡肉豆堡 120g
    {
-        "price_col": 29,
+        "price_col": 72,
-        "flavor_cols": [30, 31],
+        "flavor_cols": [73, 74],
        "series": "鸡肉豆堡",
        "weight": "120g",
        "flavors": ["卤香", "香辣"]
    },
    # 第5组：牛肉豆堡 120g
    {
-        "price_col": 32,
+        "price_col": 75,
-        "flavor_cols": [33, 34],
+        "flavor_cols": [76, 77],
        "series": "牛肉豆堡",
        "weight": "120g",
        "flavors": ["卤香", "香辣"]
    },
    # 第6组：去骨凤爪 72g
    {
-        "price_col": 35,
+        "price_col": 78,
-        "flavor_cols": [36, 37],
+        "flavor_cols": [79, 80],
        "series": "去骨凤爪",
        "weight": "72g",
        "flavors": ["柠檬", "香辣"]
    },
    # 第7组：去骨凤爪 138g
    {
-        "price_col": 38,
+        "price_col": 81,
-        "flavor_cols": [39, 40],
+        "flavor_cols": [82, 83],
        "series": "去骨凤爪",
        "weight": "138g",
        "flavors": ["柠檬", "香辣"]
    },
    # 第8组：虎皮小鸡腿 80g
    {
-        "price_col": 41,
+        "price_col": 84,
-        "flavor_cols": [42, 43],
+        "flavor_cols": [85, 86],
        "series": "虎皮小鸡腿",
        "weight": "80g",
        "flavors": ["卤香", "香辣"]
    },
-    # 第9组：老卤凤爪 95g（与老卤鸭掌共用 price_col=44）
+    # 第9组：老卤凤爪 95g（与老卤鸭掌共用 price_col=87）
    {
-        "price_col": 44,
+        "price_col": 87,
-        "flavor_cols": [45],
+        "flavor_cols": [88],
        "series": "老卤凤爪",
        "weight": "95g",
        "flavors": ["卤香"]
    },
-    # 第10组：老卤鸭掌 95g（与老卤凤爪共用 price_col=44）
+    # 第10组：老卤鸭掌 95g（与老卤凤爪共用 price_col=87）
    {
-        "price_col": 44,
+        "price_col": 87,
-        "flavor_cols": [46],
+        "flavor_cols": [89],
        "series": "老卤鸭掌",
        "weight": "95g",
        "flavors": ["卤香"]
    },
    # 第11组：虎皮凤爪 25g
    {
-        "price_col": 47,
+        "price_col": 90,
-        "flavor_cols": [48, 49],
+        "flavor_cols": [91, 92],
        "series": "虎皮凤爪",
        "weight": "25g",
        "flavors": ["卤香", "香辣"]
    },
    # 第12组：虎皮凤爪 散称
    {
-        "price_col": 50,
+        "price_col": 93,
-        "flavor_cols": [51, 52, 53],
+        "flavor_cols": [94, 95, 96],
        "series": "虎皮凤爪",
        "weight": "散称",
        "flavors": ["卤香", "香辣", "黑鸭"]
    }
 ]
-# 标准输出列定义（与目标表结构保持一致）
-STANDARD_COLUMNS = [
-    "稽查日期", "稽查来源", "大区", "战区", "经销商编码", "经销商名称",
-    "勤策门店编码", "勤策门店名称", "客户经理工号", "客户经理",
-    "勤策渠道大类", "稽核渠道（对N列清洗）", "城市", "渠道类型（稽查源提供）",
-    "产品系列", "产品口味", "产品克重", "产品价格", "是否低价", "破价价差", "低价整改状态",
-    "低价整改说明", "产品生产月份", "临期月份数", "临期状态", "新鲜度",
-    "大日期整改状态", "大日期整改说明"
-]
-PRODUCT_GROUPS = [
+# 首行表头未识别「稽核/稽查日期」时回退的列索引（0 基）
-    # 第1组：虎皮凤爪 210g
+TEAM_WIDE_AUDIT_DATE_COL_FALLBACK = 0
-    {
-        "price_col": 6,
-        "flavor_cols": [7, 8, 9, 10, 11, 12, 13],
+def _find_wide_table_audit_col(header_row: pd.Series) -> int | None:
-        "series": "虎皮凤爪",
+    """首行表头：列名含「稽核日期」或「稽查日期」的列下标（与 skiprows=1 后数据 iloc 对齐）。"""
-        "weight": "210g",
+    keys = ("稽核日期", "稽查日期")
-        "flavors": ["卤香", "香辣", "椒麻", "火锅", "微辣", "麻辣", "黑鸭"]
+    for i in range(len(header_row)):
-    },
+        v = header_row.iloc[i]
-    # 第2组：虎皮凤爪 105g
+        if v is None or (isinstance(v, float) and pd.isna(v)):
-    {
+            continue
-        "price_col": 14,
+        s = str(v).strip().replace(" ", "")
-        "flavor_cols": [15, 16, 17, 18, 19, 20, 21],
+        if s and any(k in s for k in keys):
-        "series": "虎皮凤爪",
+            return i
-        "weight": "105g",
+    return None
-        "flavors": ["卤香", "香辣", "椒麻", "火锅", "微辣", "麻辣", "黑鸭"]
-    },
-    # 第3组：虎皮凤爪 68g
+# === 主逻辑：宽表 → 窄表并写回目标 ===
-    {
+def main(
-        "price_col": 22,
+    df_source,
-        "flavor_cols": [23, 24, 25, 26, 27],
+    yname,
-        "series": "虎皮凤爪",
+    pg,
-        "weight": "68g",
+    target_file_path,
-        "flavors": ["卤香", "香辣", "椒麻", "麻辣", "黑鸭"]
+    audit_date_str=None,
-    },
+    *,
-    # 第4组：鸡肉豆堡 120g
+    source_audit_col: int = TEAM_WIDE_AUDIT_DATE_COL_FALLBACK,
-    {
+):
-        "price_col": 28,
+    tf = Path(target_file_path)  # 目标 xlsx 路径对象
-        "flavor_cols": [29, 30],
-        "series": "鸡肉豆堡",
-        "weight": "120g",
-        "flavors": ["卤香", "香辣"]
-    },
-    # 第5组：牛肉豆堡 120g
-    {
-        "price_col": 31,
-        "flavor_cols": [32, 33],
-        "series": "牛肉豆堡",
-        "weight": "120g",
-        "flavors": ["卤香", "香辣"]
-    },
-    # 第6组：去骨凤爪 72g
-    {
-        "price_col": 34,
-        "flavor_cols": [35, 36],
-        "series": "去骨凤爪",
-        "weight": "72g",
-        "flavors": ["柠檬", "香辣"]
-    },
-    # 第7组：去骨凤爪 138g
-    {
-        "price_col": 37,
-        "flavor_cols": [38, 39],
-        "series": "去骨凤爪",
-        "weight": "138g",
-        "flavors": ["柠檬", "香辣"]
-    },
-    # 第8组：虎皮小鸡腿 80g
-    {
-        "price_col": 40,
-        "flavor_cols": [41, 42],
-        "series": "虎皮小鸡腿",
-        "weight": "80g",
-        "flavors": ["卤香", "香辣"]
-    },
-    # 第9组：老卤凤爪 95g
-    {
-        "price_col": 43,
-        "flavor_cols": [44],
-        "series": "老卤凤爪",
-        "weight": "95g",
-        "flavors": ["卤香"]
-    },
-    # 第10组：老卤鸭掌 95g
-    {
-        "price_col": 45,
-        "flavor_cols": [46],
-        "series": "老卤鸭掌",
-        "weight": "95g",
-        "flavors": ["卤香"]
-    },
-    # 第11组：虎皮凤爪 25g
-    {
-        "price_col": 47,
-        "flavor_cols": [48, 49],
-        "series": "虎皮凤爪",
-        "weight": "25g",
-        "flavors": ["卤香", "香辣"]
-    },
-    # 第12组：虎皮凤爪 散称
-    {
-        "price_col": 50,
-        "flavor_cols": [51, 52, 53],
-        "series": "虎皮凤爪",
-        "weight": "散称",
-        "flavors": ["卤香", "香辣", "黑鸭"]
-    }
-]
-# === 主逻辑 ===
-def main(df_source,yname,pg):
    try:
-        # 获取目标表结构
+        try:  # 尝试读已有目标工作簿
-        try:
+            df_target = pd.read_excel(tf, sheet_name="合并后", dtype=str)  # 合并后 sheet，全文当字符串
-            df_target = pd.read_excel(target_file, sheet_name="合并后", dtype=str)
+            existing_columns = df_target.columns.tolist()  # 目标列顺序，新行须对齐
-            existing_columns = df_target.columns.tolist()
+        except (FileNotFoundError, ValueError):  # 无文件或无该 sheet
-        except (FileNotFoundError, ValueError):
+            standard_columns = [  # 新建目标时的标准表头
-            standard_columns = [
                "稽查日期", "稽查来源", "大区", "战区", "经销商编码", "经销商名称",
                "勤策门店编码", "勤策门店名称", "客户经理工号", "客户经理",
                "勤策渠道大类", "稽核渠道（对N列清洗）", "城市", "渠道类型（稽查源提供）",
@@ -254,28 +215,38 @@ def main(df_source,yname,pg):
                "低价整改说明", "产品生产月份", "临期月份数", "临期状态", "新鲜度",
                "大日期整改状态", "大日期整改说明"
            ]
-            df_target = pd.DataFrame(columns=standard_columns)
+            df_target = pd.DataFrame(columns=standard_columns)  # 空表占位
-            existing_columns = standard_columns
+            existing_columns = standard_columns  # 列名列表同上
-        records = []
+        ad, ad_err = _resolve_audit_date(
+            audit_date_str,
-        # 处理每一行
+            df_target,
-        for idx, row in df_source.iterrows():
+            df_source,
-            # 提取基础字段（B～F）
+            source_audit_col=source_audit_col,
-            base_data = {
+        )
-                "勤策门店编码": str(row.iloc[1]).strip() if pd.notna(row.iloc[1]) else "",
+        if ad_err:  # 无法得到稽查日期
-                "城市": str(row.iloc[2]).strip() if pd.notna(row.iloc[2]) else "",
+            print(f"❌ {ad_err}")  # 控制台提示
-                "勤策门店名称": str(row.iloc[3]).strip() if pd.notna(row.iloc[3]) else "",
+            return {"ok": False, "error": ad_err}  # 提前返回
-                "经销商名称": str(row.iloc[4]).strip() if pd.notna(row.iloc[4]) else "",
-                "渠道类型": str(row.iloc[5]).strip() if pd.notna(row.iloc[5]) else "",
+        records = []  # 收集本批生成的窄表行（字典）
+        src_has_audit_col = df_source.shape[1] > source_audit_col
+        for idx, row in df_source.iterrows():  # 遍历源表每一门店行
+            base_data = {  # 从源固定列位抽取门店维度（列号与团队源表约定一致）
+                "勤策门店编码": str(row.iloc[8]).strip() if pd.notna(row.iloc[8]) else "",  # 第 9 列
+                "城市": str(row.iloc[4]).strip() if pd.notna(row.iloc[4]) else "",
+                "勤策门店名称": str(row.iloc[9]).strip() if pd.notna(row.iloc[9]) else "",
+                "经销商名称": str(row.iloc[7]).strip() if pd.notna(row.iloc[7]) else "",
+                "渠道类型": str(row.iloc[10]).strip() if pd.notna(row.iloc[10]) else "",
            }
-            # 构建基础行（不含产品信息）
+            base_row = {}  # 目标表一行骨架（仅含目标里有的列）
-            base_row = {}
+            if COLUMN_MAPPING["稽查日期"] in existing_columns:  # 目标有稽查日期列
-            if COLUMN_MAPPING["稽查日期"] in existing_columns:
+                row_ad = to_yyyy_mm_dd(row.iloc[source_audit_col]) if src_has_audit_col else None
-                base_row[COLUMN_MAPPING["稽查日期"]] = current_date
+                # 该门店宽表行展开的多条窄表行共用本行稽核日期
+                base_row[COLUMN_MAPPING["稽查日期"]] = row_ad or ad
            if COLUMN_MAPPING["稽查来源"] in existing_columns:
-                base_row[COLUMN_MAPPING["稽查来源"]] = yname
+                base_row[COLUMN_MAPPING["稽查来源"]] = yname  # 如「稽查团队」
            if COLUMN_MAPPING["勤策门店编码"] in existing_columns:
                base_row[COLUMN_MAPPING["勤策门店编码"]] = base_data["勤策门店编码"]
            if COLUMN_MAPPING["勤策门店名称"] in existing_columns:
@@ -287,70 +258,65 @@ def main(df_source,yname,pg):
            if COLUMN_MAPPING["渠道类型"] in existing_columns:
                base_row[COLUMN_MAPPING["渠道类型"]] = base_data["渠道类型"]
-            # 处理每一组产品
+            for group in pg:  # 当前门店下每个产品组展开多行
-            for group in pg:
+                price_col = group["price_col"]  # 组内价格列索引
-                price_col = group["price_col"]
+                flavor_cols = group["flavor_cols"]  # 组内各口味月份列
-                flavor_cols = group["flavor_cols"]
+                flavors = group["flavors"]  # 口味名称列表
-                flavors = group["flavors"]
+                series = group["series"]  # 系列名
-                series = group["series"]
+                weight = group["weight"]  # 克重
-                weight = group["weight"]
+                src_price = str(row.iloc[price_col]).strip() if pd.notna(row.iloc[price_col]) else ""  # 读价格单元格
-                if not flavor_cols:
+                if not src_price or src_price == '无价签':  # 空或占位文案视为无价格
-                    print("⚠️ 未找到任何口味列！")
+                    src_price = ''  # 统一为空串
-                # 获取该组价格
-                src_price = str(row.iloc[price_col]).strip() if pd.notna(row.iloc[price_col]) else ""
+                row_with_price = copy.deepcopy(base_row)  # 每组单独拷贝，避免串味
-                if not src_price or src_price == '无价签':
-                    src_price = ''
-                # 设置价格到基础行副本（仅用于本组）
-                row_with_price = copy.deepcopy(base_row)
                if COLUMN_MAPPING["产品价格"] in existing_columns:
-                    row_with_price[COLUMN_MAPPING["产品价格"]] = src_price
+                    row_with_price[COLUMN_MAPPING["产品价格"]] = src_price  # 写入该组价格
-                # 处理该组的7个口味
+                for i, col_idx in enumerate(flavor_cols):  # 该组每个口味一列
-                for i, col_idx in enumerate(flavor_cols):
+                    flavor_name = flavors[i]  # 与列索引对齐的口味名
-                    flavor_name = flavors[i]
+                    src_month = str(row.iloc[col_idx]).strip() if pd.notna(row.iloc[col_idx]) else ""  # 生产月份原文
-                    src_month = str(row.iloc[col_idx]).strip() if pd.notna(row.iloc[col_idx]) else ""
-                    # 情况1: 有生产月份 → 必须生成记录
+                    if src_month:  # 有月份则必有窄表行（业务规则）
-                    if src_month:
+                        new_rec = copy.deepcopy(row_with_price)  # 再拷贝一行给本口味
-                        new_rec = copy.deepcopy(row_with_price)
+                        src_month = normalize_year_month_to_day01(src_month)  # 规范为 yyyy-mm-dd 串
-                        # 修改src_month格式
+                        _set_product_fields(new_rec, series, flavor_name, weight, src_month, existing_columns)  # 填产品列
-                        src_month = normalize_month(src_month)
+                        rDate(new_rec)  # 算临期/新鲜度等
-                        _set_product_fields(new_rec, series, flavor_name, weight, src_month, existing_columns)
+                        records.append(new_rec)  # 入结果列表
-                        rDate(new_rec)
-                        records.append(new_rec)
-                    # 情况2: 无生产月份但有价格 → 生成记录（生产月份留空）
+                    elif src_price:  # 无月份但有价格也要一行，月份空
-                    elif src_price:
                        new_rec = copy.deepcopy(row_with_price)
-                        _set_product_fields(new_rec, series, flavor_name, weight, None, existing_columns)
+                        _set_product_fields(new_rec, series, flavor_name, weight, None, existing_columns)  # 不设生产月份
-                        rDate(new_rec)
+                        rDate(new_rec)  # 仍跑一遍（内部会因缺日期清空临期字段）
                        records.append(new_rec)
-        if not records:
+        if not records:  # 没有任何可写行
-            print("⚠️ 无有效数据需要追加。")
+            msg = "无有效数据需要追加。"
-            return
+            print(f"⚠️ {msg}")
+            return {"ok": False, "message": msg}
-        df_new = pd.DataFrame(records, columns=existing_columns)
+        df_new = pd.DataFrame(records, columns=existing_columns)  # 新行 DataFrame，列顺序与目标一致
-        df_combined = pd.concat([df_target, df_new], ignore_index=True)
+        df_combined = pd.concat([df_target, df_new], ignore_index=True)  # 旧数据 + 新数据
-        # 判断目标文件是否存在
+        if os.path.exists(tf):  # 目标文件已在磁盘上
-        if os.path.exists(target_file):
+            with pd.ExcelWriter(tf, engine='openpyxl', mode='a', if_sheet_exists='replace') as writer:  # 追加模式替换 sheet
-            # 文件存在：以追加模式打开，替换 "合并后" sheet
+                df_combined.to_excel(writer, sheet_name="合并后", index=False)  # 不写行索引
-            with pd.ExcelWriter(target_file, engine='openpyxl', mode='a', if_sheet_exists='replace') as writer:
+        else:  # 首次创建文件
-                df_combined.to_excel(writer, sheet_name="合并后", index=False)
+            with pd.ExcelWriter(tf, engine='openpyxl', mode='w') as writer:  # 新建工作簿
-        else:
-            # 文件不存在：创建新文件，只写入 "合并后" sheet
-            with pd.ExcelWriter(target_file, engine='openpyxl', mode='w') as writer:
                df_combined.to_excel(writer, sheet_name="合并后", index=False)
        print(f"✅ 成功追加 {len(records)} 条记录到目标表！")
+        return {
+            "ok": True,
+            "records_added": len(records),  # 新增行数
+            "target_file": str(tf),  # 绝对/相对路径字符串
+        }
-    except Exception as e:
+    except Exception as e:  # 未预料异常
        print(f"❌ 错误: {e}")
-        import traceback
+        import traceback  # 延迟导入，仅出错时打印栈
        traceback.print_exc()
+        return {"ok": False, "error": str(e)}
 def _set_product_fields(record, series, flavor, weight, prod_month_str, existing_columns):
@@ -361,54 +327,49 @@ def _set_product_fields(record, series, flavor, weight, prod_month_str, existing
        record[COLUMN_MAPPING["产品口味"]] = flavor
    if COLUMN_MAPPING["产品克重"] in existing_columns:
        record[COLUMN_MAPPING["产品克重"]] = weight
-    if prod_month_str and COLUMN_MAPPING["产品生产月份"] in existing_columns:
+    if prod_month_str and COLUMN_MAPPING["产品生产月份"] in existing_columns:  # 有月份字符串才解析
-        # record[COLUMN_MAPPING["产品生产月份"]] = prod_month_str
-        # record[COLUMN_MAPPING["产品生产月份"]] = pd.to_datetime(prod_month_str)
        try:
-            #TODO: 假设 prod_month_str 是 "yyyy-mm-dd" 字符串
+            dt = datetime.strptime(prod_month_str, "%Y-%m-%d")  # 期望 normalize_year_month_to_day01 已输出此格式
-            dt = datetime.strptime(prod_month_str, "%Y-%m-%d")
+            record[COLUMN_MAPPING["产品生产月份"]] = dt.date()  # rDate 用 date 与保质期相加
-            record[COLUMN_MAPPING["产品生产月份"]] = dt.date()  # 👈 关键：转为 date
+        except (ValueError, TypeError):  # 解析失败
-        except (ValueError, TypeError):
+            record[COLUMN_MAPPING["产品生产月份"]] = None  # 置空避免脏数据
-            # 如果解析失败，保留原值或设为空
-            record[COLUMN_MAPPING["产品生产月份"]] = None
 def rDate(row_dict):
    """计算临期状态（保持你原有的业务逻辑）"""
-    # TODO: prod_month_str = row_dict.get("产品生产月份", "").strip()
+    prod_date = row_dict.get("产品生产月份", None)  # date 或 None
-    prod_date = row_dict.get("产品生产月份", None)
+    inspect_date_str = row_dict.get("稽查日期", "").strip()  # 字符串 YYYY-MM-DD
-    inspect_date_str = row_dict.get("稽查日期", "").strip()
-    if not prod_date or not inspect_date_str:
+    if not prod_date or not inspect_date_str:  # 缺任一无法算临期
        row_dict["临期状态"] = ""
        row_dict["新鲜度"] = ""
        row_dict["临期月份数"] = ""
        return
    try:
-        # TODO: prod_date = datetime.strptime(prod_month_str, "%Y-%m-%d")
+        inspect_date = datetime.strptime(inspect_date_str, "%Y-%m-%d")  # 稽查日转 datetime
-        inspect_date = datetime.strptime(inspect_date_str, "%Y-%m-%d")
+    except ValueError:  # 稽查日期格式不对
-    except ValueError:
        row_dict["临期状态"] = ""
        row_dict["新鲜度"] = ""
        row_dict["临期月份数"] = ""
        return
-    product_series = row_dict.get("产品系列", "")
+    product_series = row_dict.get("产品系列", "")  # 系列决定保质期月数
-    zg_status = "未整改"
+    zg_status = "未整改"  # 大日期整改状态默认值
-    if product_series == "去骨凤爪":
+    if product_series == "去骨凤爪":  # 6 个月保质期规则
-        expiry_date = prod_date + relativedelta(months=6)
+        expiry_date = prod_date + relativedelta(months=6)  # 到期日
-        gap_months = _calculate_gap_months(expiry_date, inspect_date)
+        gap_months = approx_gap_months_calendar(expiry_date, inspect_date)
        if gap_months >= 2:
-            status, freshness,zg_status = "非大日期", "高",""
+            status, freshness,zg_status = "非大日期", "高",""  # 整改状态清空表示无需整改
        elif 1 <= gap_months < 2:
            status, freshness = "大日期", "低"
        elif 0 <= gap_months < 1:
            status, freshness = "临期", "低"
        else:
            status, freshness = "过期", "低"
-    else:
+    else:  # 默认 9 个月保质期
        expiry_date = prod_date + relativedelta(months=9)
-        gap_months = _calculate_gap_months(expiry_date, inspect_date)
+        gap_months = approx_gap_months_calendar(expiry_date, inspect_date)
        if gap_months >= 3:
            status, freshness,zg_status = "非大日期", "高",""
        elif 1 <= gap_months < 3:
@@ -418,145 +379,104 @@ def rDate(row_dict):
        else:
            status, freshness = "过期", "低"
-    row_dict["临期状态"] = status
+    row_dict["临期状态"] = status  # 写回行字典
    row_dict["新鲜度"] = freshness
-    row_dict["临期月份数"] = round(gap_months, 2)
+    row_dict["临期月份数"] = round(gap_months, 2)  # 保留两位小数
    row_dict["大日期整改状态"] = zg_status
-def _calculate_gap_months(expiry_date, inspect_date):
+def read_team_source_from_url(
-    diff_years = expiry_date.year - inspect_date.year
+    url: str,
-    diff_months = expiry_date.month - inspect_date.month
+    *,
-    diff_days = expiry_date.day - inspect_date.day
+    timeout: float = 300,
-    return diff_years * 12 + diff_months + diff_days / 30.0
+    user_agent: str = "clean-data-api/1.0",
+    dtype=str,
-import re
+) -> tuple[pd.DataFrame, int]:
+    """团队宽表：跳过首行标题；返回 (数据, 稽核/稽查日期列下标)。"""
-#  todo 这里还需要修改
+    data_df, header_row = read_excel_from_url_skip1_with_header_row(
-def normalize_month(src_month):
+        url,
-    """
+        timeout=timeout,
-    将生产月份字符串标准化为 'yyyy-mm' 格式。
+        user_agent=user_agent,
+        dtype=dtype,
-    支持的输入格式：
+    )
-      - 'yyyy-mm'（如 '2025-12'）→ 保持不变
+    col = _find_wide_table_audit_col(header_row)
-      - 'yyyymm'（如 '202512'）→ 转为 '2025-12'
+    if col is None:
+        col = TEAM_WIDE_AUDIT_DATE_COL_FALLBACK
-    其他格式或无效值返回原值（或可选返回空字符串）
+        print(
-    """
+            f"⚠️ 宽表首行未识别「稽核日期/稽查日期」列，稽查日期回退用列索引 {col}"
-    if not isinstance(src_month, str):
+        )
-        return src_month  # 非字符串直接返回
+    else:
+        print(f"✅ 宽表稽核/稽查日期列索引 {col}，表头: {header_row.iloc[col]!r}")
-    src_month = src_month.strip()
+    return data_df, col
-    if not src_month:
-        return src_month
+def _print_source_preview(df_source_p: pd.DataFrame) -> None:
-    # 情况1: 已是 yyyy-mm 格式（例如 2025-12）
+    print(f"✅ 成功读取 {len(df_source_p)} 行数据。")  # 行数
-    if re.fullmatch(r'\d{4}-\d{1,2}', src_month):
+    if len(df_source_p) > 0:  # 有数据才预览
-        # 可选：统一补零为两位月（如 2025-1 → 2025-01）
+        print("前 2 行数据预览（确认第 2 行是否在列）：")
-        year, month = src_month.split('-')
+        print(df_source_p.head(2))  # 前两行
-        month = month.zfill(2)  # 确保月份两位
+        print(f"列索引范围：0 到 {len(df_source_p.columns) - 1}")  # 列数提示
-        return f"{year}-{month}-01"
-    # 情况2: 是 yyyymm 格式（6位数字，如 202512）
+def _run_team_after_load(
-    if re.fullmatch(r'\d{6}', src_month):
+    df_source_p: pd.DataFrame,
-        year = src_month[:4]
+    target_path: str | Path,
-        month = src_month[4:].lstrip('0') or '0'  # 防止全零
+    audit_date_str: str | None,
-        month = src_month[4:].zfill(2)  # 直接取后两位并确保两位（更安全）
+    yname: str = "稽查团队",
-        return f"{year}-{month}-01"
+    product_groups: list | None = None,
+    *,
-    # 其他格式：不处理（或可根据需求返回空）
+    source_audit_col: int = TEAM_WIDE_AUDIT_DATE_COL_FALLBACK,
-    return src_month
+) -> dict:
+    pg = product_groups if product_groups is not None else PRODUCT_GROUPS_JC  # 产品组配置
-def transform(df_source, yname, pg, audit_date: str = None):
+    result = main(  # 执行核心转换
-    """
+        df_source_p,
-    供 API 调用的数据转换入口：接收 DataFrame，返回清洗后的记录列表，不读写任何文件。
+        yname,
+        pg,
-    Args:
+        target_file_path=target_path,
-        df_source: pandas DataFrame，列通过 iloc 按位置访问（header=2 读入后索引从 0 开始）
+        audit_date_str=audit_date_str,
-        yname: 稽查来源名称，如 '浦零' 或 '诚予'
+        source_audit_col=source_audit_col,
-        pg: 产品组配置列表（PRODUCT_GROUPS 或 PRODUCT_GROUPS_CY）
+    )
-        audit_date: 稽查日期字符串，格式 'yyyy-mm-dd'；为 None 时自动取上月1号
+    if result is None:  # 防御性判断
+        return {"ok": False, "error": "main 未返回结果"}
-    Returns:
+    return {"source_rows": len(df_source_p), **result}  # 附带源行数
-        list[dict]: 按 STANDARD_COLUMNS 结构整理好的记录列表（产品生产月份为字符串）
-    """
-    from datetime import date as date_type
+def run_team_conversion(
+    source_url: str,
-    if audit_date is None:
+    target_path: str | Path,
-        audit_date = (datetime.now().replace(day=1) - relativedelta(months=1)).strftime("%Y-%m-01")
+    audit_date_str: str | None = None,
+    *,
-    records = []
+    yname: str = "稽查团队",
+    product_groups: list | None = None,
-    for idx, row in df_source.iterrows():
+    timeout: float = 300,
-        base_data = {
+    user_agent: str = "clean-data-api/1.0",
-            "勤策门店编码": str(row.iloc[1]).strip() if pd.notna(row.iloc[1]) else "",
+    dtype=str,
-            "城市": str(row.iloc[2]).strip() if pd.notna(row.iloc[2]) else "",
+) -> dict:
-            "勤策门店名称": str(row.iloc[3]).strip() if pd.notna(row.iloc[3]) else "",
+    """从 source_url 下载团队宽表 xlsx，转换后写入 target_path。"""
-            "经销商名称": str(row.iloc[4]).strip() if pd.notna(row.iloc[4]) else "",
+    s = (source_url or "").strip()
-            "渠道类型": str(row.iloc[5]).strip() if pd.notna(row.iloc[5]) else "",
+    low = s.lower()
-        }
+    if not s or not (low.startswith("http://") or low.startswith("https://")):
+        return {"ok": False, "error": "source_url 须为非空的 http(s) 地址"}
-        base_row = {}
+    print("正在从 URL 读取【团队】源文件（跳过第 1 行标题，第 2 行作为数据第 1 行）...")
-        if COLUMN_MAPPING["稽查日期"] in STANDARD_COLUMNS:
+    try:
-            base_row[COLUMN_MAPPING["稽查日期"]] = audit_date
+        df_source_p, source_audit_col = read_team_source_from_url(
-        if COLUMN_MAPPING["稽查来源"] in STANDARD_COLUMNS:
+            s,
-            base_row[COLUMN_MAPPING["稽查来源"]] = yname
+            timeout=timeout,
-        if COLUMN_MAPPING["勤策门店编码"] in STANDARD_COLUMNS:
+            user_agent=user_agent,
-            base_row[COLUMN_MAPPING["勤策门店编码"]] = base_data["勤策门店编码"]
+            dtype=dtype,
-        if COLUMN_MAPPING["勤策门店名称"] in STANDARD_COLUMNS:
+        )
-            base_row[COLUMN_MAPPING["勤策门店名称"]] = base_data["勤策门店名称"]
+    except urllib.error.HTTPError as e:
-        if COLUMN_MAPPING["经销商名称"] in STANDARD_COLUMNS:
+        return {"ok": False, "error": f"从 URL 读取源表失败: HTTP {e.code}"}
-            base_row[COLUMN_MAPPING["经销商名称"]] = base_data["经销商名称"]
+    except urllib.error.URLError as e:
-        if COLUMN_MAPPING["城市"] in STANDARD_COLUMNS:
+        return {"ok": False, "error": f"从 URL 读取源表失败: {e.reason!s}"}
-            base_row[COLUMN_MAPPING["城市"]] = base_data["城市"]
+    except Exception as e:
-        if COLUMN_MAPPING["渠道类型"] in STANDARD_COLUMNS:
+        return {"ok": False, "error": f"读取源表失败: {e}"}
-            base_row[COLUMN_MAPPING["渠道类型"]] = base_data["渠道类型"]
+    _print_source_preview(df_source_p)
+    return _run_team_after_load(
-        for group in pg:
+        df_source_p,
-            price_col = group["price_col"]
+        target_path,
-            flavor_cols = group["flavor_cols"]
+        audit_date_str,
-            flavors = group["flavors"]
+        yname=yname,
-            series = group["series"]
+        product_groups=product_groups,
-            weight = group["weight"]
+        source_audit_col=source_audit_col,
+    )
-            src_price = str(row.iloc[price_col]).strip() if pd.notna(row.iloc[price_col]) else ""
-            if not src_price or src_price == '无价签':
-                src_price = ''
-            row_with_price = copy.deepcopy(base_row)
-            if COLUMN_MAPPING["产品价格"] in STANDARD_COLUMNS:
-                row_with_price[COLUMN_MAPPING["产品价格"]] = src_price
-            for i, col_idx in enumerate(flavor_cols):
-                flavor_name = flavors[i]
-                src_month = str(row.iloc[col_idx]).strip() if pd.notna(row.iloc[col_idx]) else ""
-                if src_month:
-                    new_rec = copy.deepcopy(row_with_price)
-                    src_month = normalize_month(src_month)
-                    _set_product_fields(new_rec, series, flavor_name, weight, src_month, STANDARD_COLUMNS)
-                    rDate(new_rec)
-                    records.append(new_rec)
-                elif src_price:
-                    new_rec = copy.deepcopy(row_with_price)
-                    _set_product_fields(new_rec, series, flavor_name, weight, None, STANDARD_COLUMNS)
-                    rDate(new_rec)
-                    records.append(new_rec)
-    # 将 date 对象统一转为 ISO 字符串，保证 JSON 可序列化
-    for rec in records:
-        for k, v in rec.items():
-            if isinstance(v, date_type):
-                rec[k] = v.isoformat()
-    return records
-if __name__ == "__main__":
-    # TODO: 配置sheet页名称
-    print("正在读取【浦零】源文件（跳过前三行）...")
-    df_source_p = pd.read_excel(source_file, header=2, dtype=str)
-    main(df_source_p,'浦零',PRODUCT_GROUPS)
-    #print("正在读取【诚予】源文件（跳过前三行）...")
-    #df_source_c = pd.read_excel(source_file_cy, sheet_name="Sheet1", header=2, dtype=str)
-    #main(df_source_c,'诚予',PRODUCT_GROUPS_CY)
--- a/code/requirements.txt
+++ b/code/requirements.txt
-fastapi==0.104.1
+fastapi>=0.115.0
-uvicorn==0.24.0
+uvicorn[standard]>=0.32.0
-python-multipart==0.0.6
-openpyxl==3.1.5
+# 数据转换_团队.py
-requests==2.31.0
-aiohttp==3.9.1
-mysql-connector-python==8.2.0
-pydantic==2.4.2
-python-dotenv==1.0.0
 pandas>=2.0.0
-python-dateutil>=2.8.2
+openpyxl>=3.1.0
\ No newline at end of file
+python-dateutil>=2.8.0
--- a/code/risk_audit_visit.sql
+++ b/code/risk_audit_visit.sql
-/*
-Navicat MySQL Data Transfer
-Source Server         : t100_dev
-Source Server Version : 50744
-Source Host           : 192.168.100.39:25301
-Source Database       : market_bi
-Target Server Type    : MYSQL
-Target Server Version : 50744
-File Encoding         : 65001
-Date: 2026-03-09 18:13:42
-*/
-SET FOREIGN_KEY_CHECKS=0;
-- ----------------------------
-- Table structure for risk_audit_visit
-- ----------------------------
-DROP TABLE IF EXISTS `risk_audit_visit`;
-CREATE TABLE `risk_audit_visit` (
-  `rav_id` int(11) NOT NULL AUTO_INCREMENT COMMENT '主键',
-  `audit_date` date DEFAULT NULL COMMENT '稽查日期',
-  `source` varchar(20) DEFAULT NULL COMMENT '稽查来源',
-  `region_name` varchar(20) DEFAULT NULL COMMENT '大区',
-  `district_name` varchar(20) DEFAULT NULL COMMENT '战区',
-  `dealer_code` varchar(10) DEFAULT NULL COMMENT '经销商编码',
-  `dealer_name` varchar(100) DEFAULT NULL COMMENT '经销商名称',
-  `store_code` varchar(20) DEFAULT NULL COMMENT '门店编码',
-  `store_name` varchar(100) DEFAULT NULL COMMENT '勤策门店',
-  `f_emp_no` varchar(20) DEFAULT NULL COMMENT '客户经理工号',
-  `f_emp_name` varchar(100) DEFAULT NULL COMMENT '客户经理名称',
-  `qin_ce_type_large` varchar(20) DEFAULT NULL COMMENT '勤策渠道大类',
-  `jh_channel_type` varchar(20) DEFAULT NULL COMMENT '稽查渠道类型',
-  `city` varchar(30) DEFAULT NULL COMMENT '城市',
-  `channel_type` varchar(30) DEFAULT NULL COMMENT '渠道类型（稽查源提供）',
-  `series` varchar(20) DEFAULT NULL COMMENT '产品系列',
-  `taste` varchar(20) DEFAULT NULL COMMENT '产品口味',
-  `weight` varchar(20) DEFAULT NULL COMMENT '产品克重',
-  `price` decimal(10,2) DEFAULT NULL COMMENT '产品价格',
-  `low_price` varchar(20) DEFAULT NULL COMMENT '是否低价：低价，正常',
-  `low_price_diff` decimal(10,2) DEFAULT NULL COMMENT '价差',
-  `low_price_status` varchar(20) DEFAULT NULL COMMENT '低价整改状态',
-  `low_price_rectify` varchar(100) DEFAULT NULL COMMENT '低价整改说明',
-  `production_month` date DEFAULT NULL COMMENT '产品生产月份',
-  `near_month_num` int(11) DEFAULT NULL COMMENT '临期月份数',
-  `near_month_status` varchar(20) DEFAULT NULL COMMENT '临期状态',
-  `fresh_status` varchar(20) DEFAULT NULL COMMENT '新鲜度',
-  `large_date_status` varchar(20) DEFAULT NULL COMMENT '大日期整改状态',
-  `large_date_rectify` varchar(100) DEFAULT NULL COMMENT '大日期整改说明',
-  PRIMARY KEY (`rav_id`),
-  -- 业务唯一键：同一稽查日期 + 来源 + 门店名称 + 渠道类型（稽查源提供）+ 产品系列 + 口味 + 克重 = 唯一一条记录
-  -- ON DUPLICATE KEY UPDATE 依赖此唯一键判断是执行 INSERT 还是覆盖 UPDATE
-  UNIQUE KEY `uk_biz` (`audit_date`,`source`,`store_name`(100),`channel_type`,`series`,`taste`,`weight`),
-  KEY `audit` (`audit_date`),
-  KEY `dealer` (`dealer_code`,`dealer_name`),
-  KEY `product_index` (`series`,`taste`,`weight`),
-  KEY `regiondistrict` (`region_name`,`district_name`),
-  KEY `type_small` (`jh_channel_type`),
-  KEY `weight_index` (`weight`)
-) ENGINE=InnoDB AUTO_INCREMENT=493621 DEFAULT CHARSET=utf8mb4 COMMENT='稽查走访价格大日期表';
--- a/code/test_api.py
+++ b/code/test_api.py
-"""
-API 测试脚本
-用于快速测试 API 的各个端点
-"""
-import asyncio
-import httpx
-import json
-from datetime import datetime
-BASE_URL = "http://localhost:8000"
-class APITester:
-    """API 测试类"""
-    def __init__(self, base_url: str = BASE_URL):
-        self.base_url = base_url
-        self.task_id: str = None
-    async def test_health_check(self):
-        """测试健康检查接口"""
-        print("\n" + "="*50)
-        print("测试：健康检查接口")
-        print("="*50)
-        try:
-            async with httpx.AsyncClient() as client:
-                response = await client.get(f"{self.base_url}/api/v1/health")
-                print(f"状态码: {response.status_code}")
-                print(f"响应: {json.dumps(response.json(), indent=2, ensure_ascii=False)}")
-        except Exception as e:
-            print(f"错误: {str(e)}")
-    async def test_start_cleaning(self):
-        """测试启动清洗任务接口"""
-        print("\n" + "="*50)
-        print("测试：启动数据清洗任务")
-        print("="*50)
-        payload = {
-            "excel_url": "https://example.com/test_data.xlsx",
-            "department": "sales",
-            "description": "测试数据清洗"
-        }
-        try:
-            async with httpx.AsyncClient() as client:
-                response = await client.post(
-                    f"{self.base_url}/api/v1/clean",
-                    json=payload
-                )
-                print(f"状态码: {response.status_code}")
-                data = response.json()
-                print(f"响应: {json.dumps(data, indent=2, ensure_ascii=False)}")
-                if response.status_code == 200:
-                    self.task_id = data.get('task_id')
-                    print(f"\n✓ 任务创建成功，Task ID: {self.task_id}")
-        except Exception as e:
-            print(f"错误: {str(e)}")
-    async def test_get_progress(self):
-        """测试获取进度接口"""
-        if not self.task_id:
-            print("跳过：需要先创建任务")
-            return
-        print("\n" + "="*50)
-        print("测试：获取数据清洗进度")
-        print("="*50)
-        try:
-            async with httpx.AsyncClient() as client:
-                response = await client.get(
-                    f"{self.base_url}/api/v1/progress/{self.task_id}"
-                )
-                print(f"状态码: {response.status_code}")
-                print(f"响应: {json.dumps(response.json(), indent=2, ensure_ascii=False, default=str)}")
-        except Exception as e:
-            print(f"错误: {str(e)}")
-    async def test_get_result(self):
-        """测试获取清洗结果接口"""
-        if not self.task_id:
-            print("跳过：需要先创建任务")
-            return
-        print("\n" + "="*50)
-        print("测试：获取清洗结果")
-        print("="*50)
-        try:
-            async with httpx.AsyncClient() as client:
-                response = await client.get(
-                    f"{self.base_url}/api/v1/result/{self.task_id}"
-                )
-                print(f"状态码: {response.status_code}")
-                data = response.json()
-                print(f"响应: {json.dumps(data, indent=2, ensure_ascii=False, default=str)}")
-        except Exception as e:
-            print(f"错误: {str(e)}")
-    async def test_save_data(self):
-        """测试保存数据接口"""
-        if not self.task_id:
-            print("跳过：需要先创建任务")
-            return
-        print("\n" + "="*50)
-        print("测试：保存清洗后的数据")
-        print("="*50)
-        payload = {
-            "task_id": self.task_id,
-            "table_name": "sales_data"
-        }
-        try:
-            async with httpx.AsyncClient() as client:
-                response = await client.post(
-                    f"{self.base_url}/api/v1/save",
-                    json=payload
-                )
-                print(f"状态码: {response.status_code}")
-                print(f"响应: {json.dumps(response.json(), indent=2, ensure_ascii=False)}")
-        except Exception as e:
-            print(f"错误: {str(e)}")
-    async def run_all_tests(self):
-        """运行所有测试"""
-        print("\n")
-        print("╔" + "="*48 + "╗")
-        print("║" + " "*10 + "数据清洗系统 API 测试" + " "*16 + "║")
-        print("║" + f" "*10 + f"时间: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}" + " "*15 + "║")
-        print("╚" + "="*48 + "╝")
-        await self.test_health_check()
-        await asyncio.sleep(1)
-        await self.test_start_cleaning()
-        await asyncio.sleep(2)
-        await self.test_get_progress()
-        await asyncio.sleep(1)
-        await self.test_get_result()
-        await asyncio.sleep(1)
-        print("\n" + "="*50)
-        print("所有测试完成！")
-        print("="*50 + "\n")
-async def main():
-    """主函数"""
-    tester = APITester()
-    await tester.run_all_tests()
-if __name__ == "__main__":
-    print("\n提示：确保 FastAPI 服务已在 http://localhost:8000 运行中\n")
-    asyncio.run(main())
--- a/code/utils/__init__.py
+++ b/code/utils/__init__.py
-"""Utils 工具模块"""
+# 跨业务复用的小工具（日期、网络 Excel 等）
-from utils.response import BizCode, ApiResponse, ok_resp, fail_resp
-__all__ = ["BizCode", "ApiResponse", "ok_resp", "fail_resp"]
--- a/code/utils/dates.py
+++ b/code/utils/dates.py
+"""日期解析与 DataFrame 中取首个有效日期（与具体业务表头通过参数解耦）。"""
+from __future__ import annotations
+import re
+from collections.abc import Sequence
+from datetime import datetime
+import pandas as pd
+def _parse_yyyymmdd(s: str) -> str | None:
+    """8 位 YYYYMMDD → YYYY-MM-DD；非法日历则 None。"""
+    if not re.fullmatch(r"\d{8}", s):
+        return None
+    try:
+        return datetime.strptime(s, "%Y%m%d").strftime("%Y-%m-%d")
+    except ValueError:
+        return None
+def to_yyyy_mm_dd(val) -> str | None:
+    """任意单元格值 → YYYY-MM-DD；无法解析则 None。"""
+    if val is None or (isinstance(val, float) and pd.isna(val)):
+        return None
+    if isinstance(val, str):
+        y = _parse_yyyymmdd(val.strip())
+        if y:
+            return y
+    if isinstance(val, int) and val >= 0:
+        s = str(val)
+        if len(s) == 8:
+            y = _parse_yyyymmdd(s)
+            if y:
+                return y
+    if isinstance(val, float) and val.is_integer() and val >= 0:
+        s = str(int(val))
+        if len(s) == 8:
+            y = _parse_yyyymmdd(s)
+            if y:
+                return y
+    ts = pd.to_datetime(val, errors="coerce")
+    if pd.isna(ts):
+        return None
+    return ts.strftime("%Y-%m-%d")
+def first_yyyy_mm_dd_in_iloc(df: pd.DataFrame, col_idx: int) -> str | None:
+    """自上而下取第 col_idx 列首个可解析日期（宽表无表头时常用）。"""
+    if df is None or df.shape[1] <= col_idx or col_idx < 0:
+        return None
+    for val in df.iloc[:, col_idx]:
+        n = to_yyyy_mm_dd(val)
+        if n:
+            return n
+    return None
+def first_yyyy_mm_dd_in_dataframe(
+    df: pd.DataFrame,
+    column_names: Sequence[str],
+    *,
+    third_column_fallback: bool = True,
+) -> str | None:
+    """按列名顺序找第一列，自上而下取首个可解析为日期的值；若无匹配列且允许则用第 3 列（下标 2）。"""
+    ser = None
+    if df is not None and df.shape[1] > 0:
+        for name in column_names:
+            if name in df.columns:
+                ser = df[name]
+                break
+        if ser is None and third_column_fallback and df.shape[1] > 2:
+            ser = df.iloc[:, 2]
+    if ser is None:
+        return None
+    for val in ser:
+        n = to_yyyy_mm_dd(val)
+        if n:
+            return n
+    return None
+def normalize_year_month_to_day01(src_month):
+    """
+    生产月份类字符串 → YYYY-MM-01（供后续 strptime %Y-%m-%d）。
+    支持 yyyy-mm、yyyymm；其它类型/格式原样返回。
+    """
+    if not isinstance(src_month, str):
+        return src_month
+    src_month = src_month.strip()
+    if not src_month:
+        return src_month
+    if re.fullmatch(r"\d{4}-\d{1,2}", src_month):
+        year, month = src_month.split("-")
+        return f"{year}-{month.zfill(2)}-01"
+    if re.fullmatch(r"\d{6}", src_month):
+        year = src_month[:4]
+        month = src_month[4:].zfill(2)
+        return f"{year}-{month}-01"
+    return src_month
+def approx_gap_months_calendar(expiry_date, inspect_date) -> float:
+    """到期日相对检查日的剩余月数近似值（与原业务公式一致：年*12+月+日/30）。"""
+    diff_years = expiry_date.year - inspect_date.year
+    diff_months = expiry_date.month - inspect_date.month
+    diff_days = expiry_date.day - inspect_date.day
+    return diff_years * 12 + diff_months + diff_days / 30.0
--- a/code/utils/excel_http.py
+++ b/code/utils/excel_http.py
+"""从 URL 下载 Excel 到内存并用 pandas 解析（不写本地临时文件）。"""
+from __future__ import annotations
+import io
+import urllib.request
+import pandas as pd
+def read_excel_from_url(
+    url: str,
+    *,
+    timeout: float = 300,
+    user_agent: str = "clean-data-api/1.0",
+    skiprows: int = 0,
+    header=None,
+    dtype=str,
+) -> pd.DataFrame:
+    req = urllib.request.Request(url.strip(), headers={"User-Agent": user_agent})
+    with urllib.request.urlopen(req, timeout=timeout) as resp:
+        raw = resp.read()
+    return pd.read_excel(io.BytesIO(raw), skiprows=skiprows, header=header, dtype=dtype)
+def read_excel_from_url_skip1_with_header_row(
+    url: str,
+    *,
+    timeout: float = 300,
+    user_agent: str = "clean-data-api/1.0",
+    dtype=str,
+) -> tuple[pd.DataFrame, pd.Series]:
+    """跳过第 1 行后的数据 + 被跳过的第 1 行（表头，0 基列与数据列对齐）。"""
+    req = urllib.request.Request(url.strip(), headers={"User-Agent": user_agent})
+    with urllib.request.urlopen(req, timeout=timeout) as resp:
+        raw = resp.read()
+    buf = io.BytesIO(raw)
+    header_df = pd.read_excel(buf, header=None, dtype=dtype, nrows=1)
+    buf.seek(0)
+    data_df = pd.read_excel(buf, skiprows=1, header=None, dtype=dtype)
+    return data_df, header_df.iloc[0]
--- a/code/utils/exceptions.py
+++ b/code/utils/exceptions.py
-"""
-异常定义模块
-"""
-class DataCleaningException(Exception):
-    """数据清洗异常"""
-    pass
-class DatabaseException(Exception):
-    """数据库异常"""
-    pass
-class ExcelParsingException(Exception):
-    """Excel 解析异常"""
-    pass
-class ValidationException(Exception):
-    """验证异常"""
-    pass
--- a/code/utils/response.py
+++ b/code/utils/response.py
-"""
-统一响应格式封装模块
-所有接口统一返回: { code: 业务状态码, msg: 消息, data: 数据 }
-"""
-from enum import IntEnum
-from typing import Any
-from fastapi.responses import JSONResponse
-from pydantic import BaseModel
-class BizCode(IntEnum):
-    """业务逻辑状态码"""
-    SUCCESS         = 200   # 通用成功
-    TASK_QUEUED     = 201   # 任务已入队（异步场景）
-    TASK_PROCESSING = 202   # 任务处理中
-    BAD_REQUEST     = 400   # 请求参数错误
-    NOT_FOUND       = 404   # 资源不存在
-    TASK_FAILED     = 422   # 任务执行失败（业务层）
-    SERVER_ERROR    = 500   # 服务器内部错误
-    DB_ERROR        = 501   # 数据库错误
-    EXCEL_ERROR     = 502   # Excel 解析错误
-class ApiResponse(BaseModel):
-    """统一 API 响应体"""
-    code: int
-    msg: str
-    data: Any = None
-def ok_resp(data: Any = None, msg: str = "success") -> JSONResponse:
-    """返回成功的 JSONResponse（HTTP 200）"""
-    return JSONResponse(
-        status_code=200,
-        content=ApiResponse(code=BizCode.SUCCESS, msg=msg, data=data).model_dump()
-    )
-def fail_resp(
-    biz_code: BizCode,
-    msg: str,
-    http_status: int = 400,
-    data: Any = None
-) -> JSONResponse:
-    """返回失败的 JSONResponse"""
-    return JSONResponse(
-        status_code=http_status,
-        content=ApiResponse(code=biz_code, msg=msg, data=data).model_dump()
-    )
--- a/code/utils/validators.py
+++ b/code/utils/validators.py
-"""
-数据验证模块
-"""
-import re
-import logging
-logger = logging.getLogger(__name__)
-def validate_excel_url(url: str) -> bool:
-    """
-    验证 Excel URL 的有效性
-    Args:
-        url: URL 字符串
-    Returns:
-        bool: 是否为有效的 Excel URL
-    """
-    if not url or not isinstance(url, str):
-        return False
-    # 检查 URL 格式
-    url_pattern = r'^https?://.*\.(xlsx|xls|csv)$'
-    if not re.match(url_pattern, url, re.IGNORECASE):
-        logger.warning(f"URL 格式无效: {url}")
-        return False
-    return True
-def sanitize_filename(filename: str) -> str:
-    """
-    清理文件名，移除不安全的字符
-    Args:
-        filename: 原始文件名
-    Returns:
-        str: 清理后的文件名
-    """
-    # 移除不安全字符
-    sanitized = re.sub(r'[<>:"/\\|?*]', '', filename)
-    return sanitized[:255]  # 限制长度
-def validate_table_name(table_name: str) -> bool:
-    """
-    验证数据库表名的有效性
-    Args:
-        table_name: 表名
-    Returns:
-        bool: 是否为有效的表名
-    """
-    if not table_name or not isinstance(table_name, str):
-        return False
-    # MySQL 表名规则：以字母、数字或下划线开头，不包含特殊字符
-    table_name_pattern = r'^[a-zA-Z_][a-zA-Z0-9_]{0,63}$'
-    if not re.match(table_name_pattern, table_name):
-        logger.warning(f"表名格式无效: {table_name}")
-        return False
-    return True