新建项目

fa6c4c14 · lidongxu · fa6c4c14 · fa6c4c14 · fa6c4c14 · fa6c4c14
--- a/.gitignore
+++ b/.gitignore
+# ========== 环境与敏感信息 ==========
+.env
+.env.local
+.env.*.local
+*.env
+
+# ========== Python ==========
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+
+# 虚拟环境
+venv/
+.venv/
+env/
+ENV/
+
+# ========== 测试与覆盖率 ==========
+.pytest_cache/
+.coverage
+htmlcov/
+.tox/
+.nox/
+coverage.xml
+*.cover
+.hypothesis/
+
+# ========== IDE / 编辑器 ==========
+.idea/
+.vscode/
+*.swp
+*.swo
+*~
+.project
+.pydevproject
+.settings/
+
+# ========== 系统文件 ==========
+.DS_Store
+.DS_Store?
+Thumbs.db
+ehthumbs.db
+Desktop.ini
+
+# ========== 日志与临时 ==========
+*.log
+*.tmp
+*.temp
+.cache/
+
+# ========== 其他 ==========
+*.sql.backup
+*.bak
--- a/README.md
+++ b/README.md
+# 数据清洗系统 - 项目说明文档
+
+## 项目概述
+
+本项目是一个使用 FastAPI 框架开发的数据清洗系统，支持从 Excel 文件中提取数据、进行数据清洗处理，并将最终结果保存到 MySQL 数据库。
+
+### 核心功能
+
+1. **Excel 数据解析**：从网络链接下载并解析 Excel 文件
+2. **数据清洗处理**：对解析后的数据进行验证、清洗和去重
+3. **进度反馈**：通过 HTTP 轮询方式向前端实时反馈数据清洗进度
+4. **数据持久化**：将清洗后的数据保存到 MySQL 数据库
+
+---
+
+## 项目结构
+
+```
+clean_data/
+├── index.py                      # 主程序入口
+├── requirements.txt              # 项目依赖列表
+├── .env.example                  # 环境变量配置示例
+├── README.md                     # 项目说明文档
+│
+├── core/                         # 核心业务模块
+│   ├── __init__.py
+│   ├── excel_handler.py          # Excel 文件处理
+│   ├── data_cleaner.py           # 数据清洗逻辑
+│   ├── db_handler.py             # 数据库交互
+│   └── progress_manager.py       # 进度管理
+│
+└── utils/                        # 工具模块
+    ├── __init__.py
+    ├── exceptions.py             # 自定义异常
+    └── validators.py             # 数据验证
+```
+
+---
+
+## 快速开始
+
+### 1. 环境准备
+
+```bash
+# 克隆项目（如果需要）
+cd clean_data
+
+# 创建虚拟环境（推荐）
+python -m venv venv
+
+# 激活虚拟环境
+# Windows:
+venv\Scripts\activate
+# Linux/Mac:
+source venv/bin/activate
+
+# 安装依赖
+pip install -r requirements.txt
+```
+
+### 2. 配置环境变量
+
+```bash
+# 复制环境变量配置文件
+cp .env.example .env
+
+# 编辑 .env 文件，填写实际的配置信息
+# 特别注意：
+# - DB_HOST, DB_PORT, DB_USER, DB_PASSWORD 需要填写实际的数据库配置
+# - DB_NAME 为要使用的数据库名称
+```
+
+### 3. 启动服务
+
+```bash
+# 方式一：使用 Python 直接运行
+python index.py
+
+# 方式二：使用 Uvicorn 运行（推荐）
+uvicorn index:app --host 0.0.0.0 --port 8000 --reload
+
+# 服务将在 http://0.0.0.0:8000 启动
+# API 文档：http://localhost:8000/docs（Swagger UI）
+```
+
+---
+
+## API 接口文档
+
+### 1. 启动数据清洗任务
+
+**请求**
+```
+POST /api/v1/clean
+```
+
+**请求体**
+```json
+{
+  "excel_url": "https://example.com/data.xlsx",
+  "department": "sales",
+  "description": "Q1销售数据清洗"
+}
+```
+
+**响应**
+```json
+{
+  "task_id": "550e8400-e29b-41d4-a716-446655440000",
+  "status": "queued",
+  "message": "任务已创建，正在处理中...",
+  "data_preview": null
+}
+```
+
+### 2. 获取数据清洗进度
+
+**请求**
+```
+GET /api/v1/progress/{task_id}
+```
+
+**响应**
+```json
+{
+  "task_id": "550e8400-e29b-41d4-a716-446655440000",
+  "status": "processing",
+  "progress": 65,
+  "message": "已清洗 650/1000 行数据",
+  "timestamp": "2026-03-06T10:30:45.123456"
+}
+```
+
+**状态说明**
+- `queued`: 任务已创建，排队中
+- `processing`: 数据正在处理中
+- `completed`: 数据清洗完成
+- `failed`: 清洗过程中出错
+
+### 3. 获取清洗结果
+
+**请求**
+```
+GET /api/v1/result/{task_id}
+```
+
+**响应**
+```json
+{
+  "task_id": "550e8400-e29b-41d4-a716-446655440000",
+  "status": "ready_to_save",
+  "message": "数据清洗完成，可进行保存",
+  "data_preview": [
+    {"产品": "产品A", "金额": 1000},
+    {"产品": "产品B", "金额": 2000}
+  ],
+  "total_rows": 1000,
+  "department": "sales"
+}
+```
+
+### 4. 保存清洗后的数据
+
+**请求**
+```
+POST /api/v1/save
+```
+
+**请求体**
+```json
+{
+  "task_id": "550e8400-e29b-41d4-a716-446655440000",
+  "table_name": "sales_data"
+}
+```
+
+**响应**
+```json
+{
+  "task_id": "550e8400-e29b-41d4-a716-446655440000",
+  "status": "saved",
+  "message": "数据已成功保存到数据库",
+  "affected_rows": 1000
+}
+```
+
+### 5. 健康检查
+
+**请求**
+```
+GET /api/v1/health
+```
+
+**响应**
+```json
+{
+  "status": "healthy",
+  "timestamp": "2026-03-06T10:30:45.123456",
+  "service": "数据清洗系统"
+}
+```
+
+---
+
+## 进度反馈机制
+
+### HTTP 轮询方案（无需 WebSocket）
+
+系统采用 **HTTP 轮询** 方式实现进度反馈，具有以下优势：
+
+1. **无连接保持**：客户端主动请求，降低服务器负载
+2. **兼容性强**：支持所有 HTTP 客户端
+3. **易于部署**：无需 WebSocket 基础设施
+4. **便于扩展**：易于部署到各种云环境
+
+### 前端实现建议
+
+```javascript
+// 示例：React/Vue 前端逻辑
+const pollProgress = async (taskId) => {
+  const interval = setInterval(async () => {
+    try {
+      const response = await fetch(`/api/v1/progress/${taskId}`);
+      const data = await response.json();
+      
+      // 更新进度条
+      updateProgressBar(data.progress);
+      updateMessage(data.message);
+      
+      // 任务完成时停止轮询
+      if (data.status === 'completed' || data.status === 'failed') {
+        clearInterval(interval);
+      }
+    } catch (error) {
+      console.error('获取进度失败:', error);
+    }
+  }, 1000);  // 每秒轮询一次
+};
+```
+
+---
+
+## 数据清洗逻辑
+
+### 清洗步骤
+
+1. **下载**：从网络链接下载 Excel 文件
+2. **解析**：使用 openpyxl 解析 Excel 内容
+3. **验证**：验证数据类型和必填字段
+4. **清洗**：
+   - 移除首尾空格
+   - 处理空值
+   - 去重处理
+5. **缓存**：将清洗后的数据存储在内存中
+6. **保存**：前端确认后保存到数据库
+
+### 自定义清洗规则
+
+编辑 `core/data_cleaner.py` 中的 `_validate_required_fields` 方法来自定义不同部门的清洗规则：
+
+```python
+required_fields_map = {
+    'sales': ['产品', '金额', '销售日期'],
+    'inventory': ['SKU', '数量', '仓库'],
+    'finance': ['交易日期', '金额', '类别']
+}
+```
+
+---
+
+## 数据库配置
+
+### MySQL 5.6+ 连接配置
+
+编辑 `.env` 文件：
+
+```ini
+DB_HOST=localhost
+DB_PORT=3306
+DB_USER=root
+DB_PASSWORD=your_password
+DB_NAME=clean_data
+```
+
+### 创建目标表（示例）
+
+```sql
+CREATE TABLE sales_data (
+  id INT AUTO_INCREMENT PRIMARY KEY,
+  产品 VARCHAR(100),
+  金额 DECIMAL(10, 2),
+  销售日期 DATE,
+  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
+);
+```
+
+---
+
+## 异常处理
+
+系统定义了多种自定义异常，便于错误追踪：
+
+- **DataCleaningException**：数据清洗过程中的异常
+- **DatabaseException**：数据库操作异常
+- **ExcelParsingException**：Excel 解析异常
+- **ValidationException**：数据验证异常
+
+所有异常都会被记录到日志中，便于问题排查。
+
+---
+
+## 日志记录
+
+系统使用 Python 标准 logging 模块记录所有操作，日志级别可在 `.env` 中配置：
+
+```
+LOG_LEVEL=INFO
+LOG_FILE=./logs/app.log
+```
+
+日志记录内容包括：
+- 任务创建和完成
+- 数据处理进度
+- 异常错误信息
+- 数据库操作记录
+
+---
+
+## 性能优化建议
+
+1. **批量插入**：数据库操作使用批量插入（默认每 1000 行为一批）
+2. **异步处理**：使用 FastAPI 的后台任务避免阻塞响应
+3. **进度缓存**：使用内存字典缓存进度数据和清洗结果
+4. **连接池**：建议为数据库使用连接池（可扩展功能）
+
+---
+
+## 常见问题
+
+### Q: 为什么不使用 WebSocket？
+
+A: HTTP 轮询方案具有以下优势：
+- 服务器不需要维持连接状态
+- 更容易水平扩展
+- 无需 WebSocket 库和基础设施
+- 使用标准 HTTP 协议，兼容性更强
+
+### Q: 清洗后的数据存储在哪里？
+
+A: 清洗后的数据默认存储在：
+- **短期**：服务器内存中（task_id 映射）
+- **长期**：用户确认后保存到 MySQL 数据库
+
+### Q: 如何处理大文件？
+
+A: 可在 `.env` 中配置最大文件大小限制：
+```
+MAX_EXCEL_SIZE=52428800  # 50MB
+```
+
+---
+
+## 扩展功能（可选）
+
+1. **数据备份**：定期备份已保存的数据
+2. **审计日志**：记录所有数据修改操作
+3. **权限控制**：添加用户认证和授权机制
+4. **缓存优化**：使用 Redis 替代内存缓存
+5. **任务队列**：使用 Celery 处理大批量任务
+
+---
+
+## 部署建议
+
+### 生产环境
+
+1. 使用 Gunicorn + Uvicorn 运行应用
+2. 配置反向代理（nginx）
+3. 启用 HTTPS
+4. 配置日志持久化
+5. 设置监控告警
+
+### Docker 部署
+
+```dockerfile
+FROM python:3.9-slim
+WORKDIR /app
+COPY requirements.txt .
+RUN pip install -r requirements.txt
+COPY . .
+CMD ["uvicorn", "index:app", "--host", "0.0.0.0", "--port", "8000"]
+```
+
+---
+
+## 技术栈
+
+- **Web 框架**：FastAPI 0.104.1
+- **ASGI 服务器**：Uvicorn 0.24.0
+- **Excel 处理**：openpyxl 3.10.10
+- **数据库驱动**：mysql-connector-python 8.2.0
+- **数据验证**：Pydantic 2.5.0
+- **HTTP 客户端**：requests 2.31.0
+
+---
+
+## License
+
+MIT
+
+---
+
+## 支持
+
+如有任何问题或建议，请联系开发团队。
--- a/config.py
+++ b/config.py
+"""
+配置管理模块
+负责读取和管理应用配置
+"""
+
+import os
+from typing import Optional
+from dotenv import load_dotenv
+
+# 加载 .env 文件
+load_dotenv()
+
+class Config:
+    """应用配置类"""
+    
+    # 服务器配置
+    HOST: str = os.getenv("HOST", "0.0.0.0")
+    PORT: int = int(os.getenv("PORT", "8000"))
+    DEBUG: bool = os.getenv("DEBUG", "False").lower() == "true"
+    
+    # 数据库配置
+    DB_HOST: str = os.getenv("DB_HOST", "localhost")
+    DB_PORT: int = int(os.getenv("DB_PORT", "3306"))
+    DB_USER: str = os.getenv("DB_USER", "root")
+    DB_PASSWORD: str = os.getenv("DB_PASSWORD", "")
+    DB_NAME: str = os.getenv("DB_NAME", "clean_data")
+    
+    # 日志配置
+    LOG_LEVEL: str = os.getenv("LOG_LEVEL", "INFO")
+    LOG_FILE: Optional[str] = os.getenv("LOG_FILE")
+    
+    # Excel 下载配置
+    EXCEL_DOWNLOAD_TIMEOUT: int = int(os.getenv("EXCEL_DOWNLOAD_TIMEOUT", "30"))
+    MAX_EXCEL_SIZE: int = int(os.getenv("MAX_EXCEL_SIZE", "52428800"))  # 50MB
+    
+    # 任务超时配置
+    TASK_TIMEOUT_SECONDS: int = int(os.getenv("TASK_TIMEOUT_SECONDS", "3600"))  # 1小时
+    
+    @classmethod
+    def get_db_config(cls) -> dict:
+        """获取数据库配置字典"""
+        return {
+            'host': cls.DB_HOST,
+            'port': cls.DB_PORT,
+            'user': cls.DB_USER,
+            'password': cls.DB_PASSWORD,
+            'database': cls.DB_NAME,
+        }
+
+# 创建全局配置实例
+config = Config()
--- a/core/__init__.py
+++ b/core/__init__.py
+"""Core 业务模块"""
--- a/core/data_cleaner.py
+++ b/core/data_cleaner.py
+"""
+数据清洗模块
+负责数据的清洗和验证逻辑
+"""
+
+import logging
+import asyncio
+import pandas as pd
+from typing import List, Dict, Any, Callable, Optional
+
+logger = logging.getLogger(__name__)
+
+# 各 department 对应的清洗策略注册表
+# key: department 名称, value: (transform函数, 产品组配置, 稽查来源名称)
+_DEPARTMENT_CLEANERS = {}
+
+
+def _load_department_cleaners():
+    """非专用清洗逻辑"""
+    global _DEPARTMENT_CLEANERS
+    if _DEPARTMENT_CLEANERS:  # 如果部门清洗模块已加载，则直接返回
+        return
+    try:
+        # 加载部门清洗使用的工具
+        from core_py.数据转换_团队 import (
+            transform as _team_transform,
+            PRODUCT_GROUPS_JC,
+        )  # PRODUCT_GROUPS_JC 风控稽查数据清洗配置数据
+
+        _DEPARTMENT_CLEANERS["风控稽查数据清洗"] = (_team_transform, PRODUCT_GROUPS_JC, "稽查团队")
+        logger.info("已加载部门清洗模块: 风控稽查数据清洗")
+    except ImportError as e:
+        logger.warning(f"加载团队清洗模块失败: {e}")
+
+
+class DataCleaner:
+    """数据清洗类"""
+
+    def __init__(self):
+        self.rules = {}
+
+    async def clean(
+        self,
+        raw_data: List[Dict[str, Any]],
+        department: str,
+        progress_callback: Optional[Callable[[float, str, Optional[int]], None]] = None,
+        audit_date: Optional[str] = None,
+    ) -> List[Dict[str, Any]]:
+        """
+        清洗数据
+
+        Args:
+            raw_data: 原始数据列表（每行为 dict，key 为列名）
+            department: 业务部门名称，如 "团队"
+            progress_callback: 进度回调函数，接收 (progress: 0-1, message: str)
+            audit_date: 稽查日期字符串，格式 'yyyy-mm-dd'；为 None 时由各清洗模块自动取上月1号
+
+        Returns:
+            List[Dict]: 清洗后的数据
+        """
+        try:
+            logger.info(f"开始清洗数据，部门: {department}，数据行数: {len(raw_data)}")
+
+            # ── 专项清洗路由 ──────────────────────────────────────────────
+            _load_department_cleaners()
+            if department in _DEPARTMENT_CLEANERS:
+                return await self._clean_by_department(
+                    raw_data, department, progress_callback, audit_date=audit_date
+                )
+            # ─────────────────────────────────────────────────────────────
+
+            total_rows = len(raw_data)
+            cleaned_data = []
+
+            for idx, row in enumerate(raw_data):
+                try:
+                    cleaned_row = await self._validate_and_convert(row, department)
+
+                    if cleaned_row and not self._is_duplicate(
+                        cleaned_row, cleaned_data
+                    ):
+                        cleaned_data.append(cleaned_row)
+
+                    if progress_callback and idx % max(1, total_rows // 10) == 0:
+                        progress = idx / total_rows if total_rows > 0 else 0
+                        progress_callback(progress, f"已清洗 {idx}/{total_rows} 行数据", len(cleaned_data))
+
+                except Exception as e:
+                    logger.warning(f"第 {idx + 1} 行数据清洗失败: {str(e)}")
+                    continue
+
+            if progress_callback:
+                progress_callback(1.0, f"清洗完成，共 {len(cleaned_data)} 行有效数据", len(cleaned_data))
+
+            logger.info(
+                f"数据清洗完成，原始行数: {total_rows}，清洗后行数: {len(cleaned_data)}"
+            )
+            return cleaned_data
+
+        except Exception as e:
+            logger.error(f"clean 方法执行失败: {str(e)}")
+            raise
+
+    async def _clean_by_department(
+        self,
+        raw_data: List[Dict[str, Any]],
+        department: str,
+        progress_callback: Optional[Callable[[float, str, Optional[int]], None]] = None,
+        audit_date: Optional[str] = None,
+    ) -> List[Dict[str, Any]]:
+        """
+        调用对应部门的专项 transform 函数进行清洗。
+
+        raw_data 来自 excel_handler（List[Dict]，key 为列名），
+        transform 函数通过 iloc 按位置访问列，因此转换为 DataFrame 时
+        只要列顺序与原始 Excel 一致，iloc 索引就能正确对应。
+        """
+        transform_fn, pg, yname = _DEPARTMENT_CLEANERS[department]
+
+        if progress_callback:
+            progress_callback(0.1, "正在转换数据格式", None)
+
+        # List[Dict] → DataFrame（保留原始列顺序，iloc 索引与 Excel 列位置对应）
+        df = pd.DataFrame(raw_data)
+
+        if progress_callback:
+            progress_callback(0.3, f"正在执行 {department} 数据清洗", None)
+
+        # transform 是同步函数，用 asyncio.to_thread 避免阻塞事件循环
+        records = await asyncio.to_thread(transform_fn, df, yname, pg, audit_date)
+
+        if progress_callback:
+            progress_callback(1.0, f"清洗完成，共 {len(records)} 行有效数据", len(records))
+
+        logger.info(f"[{department}] 专项清洗完成，共 {len(records)} 条记录")
+        return records
+
+    async def _validate_and_convert(
+        self, row: Dict[str, Any], department: str
+    ) -> Optional[Dict[str, Any]]:
+        """
+        验证和转换单行数据
+
+        Args:
+            row: 数据行
+            department: 业务部门名称
+
+        Returns:
+            转换后的数据行，若无效则返回 None
+        """
+        try:
+            cleaned_row = {}
+
+            for key, value in row.items():
+                if value is None or (isinstance(value, str) and not value.strip()):
+                    # 空值处理
+                    cleaned_row[key] = None
+                    continue
+
+                # 字符串数据清洗
+                if isinstance(value, str):
+                    cleaned_row[key] = value.strip()
+                else:
+                    cleaned_row[key] = value
+
+            # 验证必填字段（根据部门调整规则）
+            if not self._validate_required_fields(cleaned_row, department):
+                return None
+
+            return cleaned_row
+
+        except Exception as e:
+            logger.warning(f"_validate_and_convert 失败: {str(e)}")
+            return None
+
+    def _validate_required_fields(self, row: Dict[str, Any], department: str) -> bool:
+        """
+        验证必填字段
+
+        Args:
+            row: 数据行
+            department: 业务部门
+
+        Returns:
+            bool: 是否通过验证
+        """
+        # 示例：可根据部门定义不同的必填字段规则
+        required_fields_map = {
+            "sales": ["产品", "金额"],
+            "inventory": ["SKU", "数量"],
+            "finance": ["交易日期", "金额"],
+        }
+
+        required_fields = required_fields_map.get(department, [])
+
+        # 检查必填字段是否存在且非空
+        for field in required_fields:
+            if field not in row or row[field] is None:
+                return False
+
+        return True
+
+    def _is_duplicate(
+        self, row: Dict[str, Any], existing_data: List[Dict[str, Any]]
+    ) -> bool:
+        """
+        检查行是否为重复数据
+
+        Args:
+            row: 当前行
+            existing_data: 已有数据列表
+
+        Returns:
+            bool: 是否为重复
+        """
+        # 简单的重复检查（可扩展为更复杂的逻辑）
+        for existing_row in existing_data:
+            if row == existing_row:
+                return True
+
+        return False
--- a/core/db_handler.py
+++ b/core/db_handler.py
+"""
+数据库处理模块
+负责与 MySQL 数据库的交互
+"""
+
+import logging
+import mysql.connector
+from typing import List, Dict, Any
+import os
+from contextlib import contextmanager
+
+logger = logging.getLogger(__name__)
+
+class DatabaseHandler:
+    """数据库处理类"""
+    
+    def __init__(self):
+        """初始化数据库配置"""
+        self.db_config = {
+            'host': os.getenv('DB_HOST', 'localhost'),
+            'user': os.getenv('DB_USER', 'root'),
+            'password': os.getenv('DB_PASSWORD', ''),
+            'database': os.getenv('DB_NAME', 'clean_data'),
+            'port': int(os.getenv('DB_PORT', 3306)),
+            'autocommit': False,
+            'connection_timeout': 10
+        }
+    
+    @contextmanager
+    def _get_connection(self):
+        """
+        获取数据库连接的上下文管理器
+        
+        Yields:
+            mysql.connector.MySQLConnection: 数据库连接
+            
+        Raises:
+            Exception: 连接失败时抛出异常
+        """
+        connection = None
+        try:
+            connection = mysql.connector.connect(**self.db_config)
+            logger.info("数据库连接成功")
+            yield connection
+        except mysql.connector.Error as e:
+            logger.error(f"数据库连接失败: {str(e)}")
+            raise
+        finally:
+            if connection and connection.is_connected():
+                connection.close()
+                logger.info("数据库连接已关闭")
+    
+    async def insert_data(
+        self,
+        table_name: str,
+        data: List[Dict[str, Any]]
+    ) -> int:
+        """
+        将数据插入到指定的表
+        
+        Args:
+            table_name: 目标表名
+            data: 数据列表
+            
+        Returns:
+            int: 受影响的行数
+            
+        Raises:
+            Exception: 插入失败时抛出异常
+        """
+        if not data:
+            logger.warning("插入的数据为空")
+            return 0
+        
+        try:
+            with self._get_connection() as connection:
+                cursor = connection.cursor()
+                
+                # 获取字段名
+                columns = list(data[0].keys())
+                column_names = ', '.join([f'`{col}`' for col in columns])
+                placeholders = ', '.join(['%s'] * len(columns))
+                
+                insert_sql = f"""
+                    INSERT INTO `{table_name}` ({column_names})
+                    VALUES ({placeholders})
+                """
+                
+                logger.info(f"准备插入 {len(data)} 行数据到表 {table_name}")
+                
+                # 批量插入数据
+                for batch_start in range(0, len(data), 1000):
+                    batch_end = min(batch_start + 1000, len(data))
+                    batch_data = data[batch_start:batch_end]
+                    
+                    # 准备批次数据
+                    values_list = []
+                    for row in batch_data:
+                        values = tuple(row.get(col) for col in columns)
+                        values_list.append(values)
+                    
+                    # 执行批量插入
+                    cursor.executemany(insert_sql, values_list)
+                    logger.info(f"已插入 {batch_end} / {len(data)} 行数据")
+                
+                connection.commit()
+                affected_rows = cursor.rowcount
+                cursor.close()
+                
+                logger.info(f"成功插入 {affected_rows} 行数据到 {table_name}")
+                return affected_rows
+        
+        except mysql.connector.Error as e:
+            logger.error(f"MySQL 错误: {str(e)}")
+            raise
+        except Exception as e:
+            logger.error(f"insert_data 失败: {str(e)}")
+            raise
+    
+    async def test_connection(self) -> bool:
+        """
+        测试数据库连接
+        
+        Returns:
+            bool: 连接是否成功
+        """
+        try:
+            with self._get_connection() as connection:
+                cursor = connection.cursor()
+                cursor.execute("SELECT 1")
+                cursor.fetchone()
+                cursor.close()
+                return True
+        except Exception as e:
+            logger.error(f"数据库连接测试失败: {str(e)}")
+            return False
+    
+    async def create_table_if_not_exists(
+        self,
+        table_name: str,
+        schema: Dict[str, str]
+    ) -> bool:
+        """
+        如果表不存在则创建表
+        
+        Args:
+            table_name: 表名
+            schema: 表架构定义 {列名: 列定义}
+            
+        Returns:
+            bool: 是否创建成功或表已存在
+        """
+        try:
+            with self._get_connection() as connection:
+                cursor = connection.cursor()
+                
+                # 检查表是否存在
+                cursor.execute(f"""
+                    SELECT TABLE_NAME FROM information_schema.TABLES
+                    WHERE TABLE_SCHEMA = '{self.db_config['database']}'
+                    AND TABLE_NAME = '{table_name}'
+                """)
+                
+                if cursor.fetchone():
+                    logger.info(f"表 {table_name} 已存在")
+                    cursor.close()
+                    return True
+                
+                # 创建表
+                columns_sql = ', '.join([f'`{col}` {definition}' for col, definition in schema.items()])
+                create_sql = f"""
+                    CREATE TABLE `{table_name}` (
+                        id INT AUTO_INCREMENT PRIMARY KEY,
+                        {columns_sql},
+                        created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
+                    )
+                """
+                
+                cursor.execute(create_sql)
+                connection.commit()
+                cursor.close()
+                
+                logger.info(f"成功创建表 {table_name}")
+                return True
+        
+        except Exception as e:
+            logger.error(f"create_table_if_not_exists 失败: {str(e)}")
+            raise
--- a/core/excel_handler.py
+++ b/core/excel_handler.py
+"""
+Excel 文件处理模块
+负责从 URL 下载和解析 Excel 文件
+"""
+
+import aiohttp
+import logging
+from openpyxl import load_workbook
+from io import BytesIO
+from typing import List, Dict, Any
+import os
+import tempfile
+
+logger = logging.getLogger(__name__)
+
+class ExcelHandler:
+    """Excel 文件处理类"""
+    
+    def __init__(self):
+        self.timeout = aiohttp.ClientTimeout(total=30)
+    
+    async def fetch_bytes(self, url: str) -> bytes:
+        """
+        从 URL 下载文件，返回原始字节内容（供调用方自行用 pandas 解析）
+
+        Args:
+            url: 文件的网络链接
+
+        Returns:
+            bytes: 文件的原始二进制内容
+        """
+        try:
+            logger.info(f"开始从 {url} 下载文件")
+            async with aiohttp.ClientSession(timeout=self.timeout) as session:
+                async with session.get(url) as response:
+                    if response.status != 200:
+                        raise Exception(f"下载失败，HTTP 状态码: {response.status}")
+                    content = await response.read()
+            logger.info(f"下载完成，文件大小: {len(content)} 字节")
+            return content
+        except Exception as e:
+            logger.error(f"fetch_bytes 失败: {str(e)}")
+            raise
+
+    async def fetch_and_parse(self, excel_url: str) -> List[Dict[str, Any]]:
+        """
+        从 URL 下载并解析 Excel 文件
+        
+        Args:
+            excel_url: Excel 文件的网络链接
+            
+        Returns:
+            List[Dict]: 解析后的数据，每行为一个字典
+            
+        Raises:
+            Exception: 下载或解析失败时抛出异常
+        """
+        try:
+            # 1. 下载文件
+            logger.info(f"开始从 {excel_url} 下载 Excel 文件")
+            
+            async with aiohttp.ClientSession(timeout=self.timeout) as session:
+                async with session.get(excel_url) as response:
+                    if response.status != 200:
+                        raise Exception(f"下载失败，HTTP 状态码: {response.status}")
+                    
+                    excel_content = await response.read()
+            
+            logger.info(f"下载完成，文件大小: {len(excel_content)} 字节")
+            
+            # 2. 解析 Excel
+            return self._parse_excel_content(excel_content)
+            
+        except Exception as e:
+            logger.error(f"fetch_and_parse 失败: {str(e)}")
+            raise
+    
+    def _parse_excel_content(self, excel_content: bytes) -> List[Dict[str, Any]]:
+        """
+        解析 Excel 内容
+        
+        Args:
+            excel_content: Excel 文件的二进制内容
+            
+        Returns:
+            List[Dict]: 解析后的数据
+        """
+        try:
+            # 使用 BytesIO 从内存中读取
+            excel_file = BytesIO(excel_content)
+            workbook = load_workbook(excel_file)
+            
+            # 获取第一个工作表
+            worksheet = workbook.active
+            
+            if not worksheet:
+                raise Exception("Excel 文件不包含有效的工作表")
+            
+            # 获取标题行
+            headers = []
+            for cell in worksheet[1]:
+                headers.append(cell.value)
+            
+            if not headers or all(h is None for h in headers):
+                raise Exception("Excel 文件不包含有效的标题行")
+            
+            # 解析数据行
+            data = []
+            for row in worksheet.iter_rows(min_row=2, values_only=False):
+                row_data = {}
+                for idx, cell in enumerate(row):
+                    if idx < len(headers):
+                        row_data[headers[idx]] = cell.value
+                
+                # 跳过空行
+                if any(v is not None for v in row_data.values()):
+                    data.append(row_data)
+            
+            logger.info(f"成功解析 Excel，共 {len(data)} 行数据")
+            return data
+            
+        except Exception as e:
+            logger.error(f"_parse_excel_content 失败: {str(e)}")
+            raise
--- a/core/progress_manager.py
+++ b/core/progress_manager.py
+"""
+进度管理模块
+负责任务进度的记录和查询
+"""
+
+import logging
+from typing import Dict, Any, Optional
+from datetime import datetime, timedelta
+import threading
+
+logger = logging.getLogger(__name__)
+
+class ProgressManager:
+    """进度管理类"""
+    
+    def __init__(self, timeout_seconds: int = 3600):
+        """
+        初始化进度管理器
+        
+        Args:
+            timeout_seconds: 任务进度的过期时间（秒），默认 1 小时
+        """
+        self.progress_data: Dict[str, Dict[str, Any]] = {}
+        self.timeout_seconds = timeout_seconds
+        self.lock = threading.Lock()
+    
+    def update_progress(
+        self,
+        task_id: str,
+        status: str,
+        progress: int,
+        message: str,
+        processed_count: Optional[int] = None
+    ) -> None:
+        """
+        更新任务进度
+        
+        Args:
+            task_id: 任务唯一标识
+            status: 状态 (queued, processing, completed, failed)
+            progress: 进度百分比 (0-100)
+            message: 进度信息
+            processed_count: 已处理的数据条数，None 表示暂未统计
+        """
+        with self.lock:
+            self.progress_data[task_id] = {
+                'task_id': task_id,
+                'status': status,
+                'progress': max(0, min(100, progress)),
+                'message': message,
+                'processed_count': processed_count,
+                'timestamp': datetime.now().isoformat(),
+                'created_at': datetime.now()
+            }
+            
+            logger.debug(f"[{task_id}] 进度更新: {status} {progress}% - {message}")
+    
+    def get_progress(self, task_id: str) -> Optional[Dict[str, Any]]:
+        """
+        获取任务进度
+        
+        Args:
+            task_id: 任务唯一标识
+            
+        Returns:
+            Optional[Dict]: 进度信息，若任务不存在或已过期返回 None
+        """
+        with self.lock:
+            if task_id not in self.progress_data:
+                return None
+            
+            data = self.progress_data[task_id]
+            
+            # 检查是否过期
+            if datetime.now() - data['created_at'] > timedelta(seconds=self.timeout_seconds):
+                logger.warning(f"任务 {task_id} 已过期，删除记录")
+                del self.progress_data[task_id]
+                return None
+            
+            # 返回字典副本，移除 created_at（内部字段）
+            result = {k: v for k, v in data.items() if k != 'created_at'}
+            return result
+    
+    def get_all_progress(self) -> Dict[str, Dict[str, Any]]:
+        """
+        获取所有任务的进度信息
+        
+        Returns:
+            Dict: 所有任务的进度信息
+        """
+        with self.lock:
+            # 清理过期任务
+            expired_tasks = []
+            for task_id, data in self.progress_data.items():
+                if datetime.now() - data['created_at'] > timedelta(seconds=self.timeout_seconds):
+                    expired_tasks.append(task_id)
+            
+            for task_id in expired_tasks:
+                del self.progress_data[task_id]
+                logger.info(f"清理过期任务: {task_id}")
+            
+            # 返回所有有效任务的进度
+            return {
+                task_id: {k: v for k, v in data.items() if k != 'created_at'}
+                for task_id, data in self.progress_data.items()
+            }
+    
+    def clear_progress(self, task_id: str) -> None:
+        """
+        清除任务进度记录
+        
+        Args:
+            task_id: 任务唯一标识
+        """
+        with self.lock:
+            if task_id in self.progress_data:
+                del self.progress_data[task_id]
+                logger.info(f"清除任务 {task_id} 的进度记录")
--- a/core_py/1低价计算.py
+++ b/core_py/1低价计算.py
+import pandas as pd
+from datetime import datetime
+from dateutil.relativedelta import relativedelta
+
+# 文件路径
+# TODO: 配置稽查月份（默认1号）
+current_date = (datetime.now().replace(day=1) - relativedelta(months=1)).strftime("%Y-%m-01")
+y_file = f"/王小卤/风控/代码-新/大日期{current_date}_2.xlsx"
+p_file = f"/王小卤/风控/代码-新//线下价盘表2601版.xlsx"
+# 保存回原文件（建议先保存为新文件以防覆盖）
+output_file = f"/王小卤/风控/代码-新//低价大日期_2.xlsx"
+
+# 读取Y表（稽查结果表）
+df_y = pd.read_excel(y_file,sheet_name='合并后', dtype=str)  # 先以字符串读入避免格式问题，后续转数字
+# 读取P表（价盘表）
+df_p = pd.read_excel(p_file, dtype=str)
+
+# 清理列名（去除前后空格等）
+df_y.columns = df_y.columns.str.strip()
+df_p.columns = df_p.columns.str.strip()
+
+# 将关键字段转换为统一格式（去除空格、统一大小写等，便于匹配）
+def clean_str(s):
+    if pd.isna(s):
+        return ""
+    return str(s).strip().upper()
+
+# 对Y表的关键列清洗
+df_y['产品系列_clean'] = df_y.iloc[:, 14].apply(clean_str)  # O列：产品系列
+df_y['产品克重_clean'] = df_y.iloc[:, 16].apply(clean_str)  # Q列：产品克重
+df_y['渠道类型_clean'] = df_y.iloc[:, 13].apply(clean_str)  # N列：渠道类型（稽查源提供）
+
+# 对P表的关键列清洗
+df_p['产品系统_clean'] = df_p.iloc[:, 0].apply(clean_str)   # A列：产品系统
+df_p['产品克重_p_clean'] = df_p.iloc[:, 2].apply(clean_str) # C列：产品克重
+df_p['渠道_p_clean'] = df_p.iloc[:, 3].apply(clean_str)     # D列：渠道
+
+# 将价格列转为数值类型（注意处理非数字情况）
+df_y['产品价格_num'] = pd.to_numeric(df_y.iloc[:, 17], errors='coerce')  # R列：产品价格
+df_p['低价_num'] = pd.to_numeric(df_p.iloc[:, 4], errors='coerce')       # E列：低价
+
+# 构建P表的唯一键（产品系统 + 产品克重 + 渠道）
+df_p['match_key'] = df_p['产品系统_clean'] + '|' + df_p['产品克重_p_clean'] + '|' + df_p['渠道_p_clean']
+
+# 构建Y表的匹配键（产品系列 + 产品克重 + 渠道类型）
+df_y['match_key'] = df_y['产品系列_clean'] + '|' + df_y['产品克重_clean'] + '|' + df_y['渠道类型_clean']
+
+# 将P表转为字典：key -> 低价
+price_map = df_p.set_index('match_key')['低价_num'].to_dict()
+
+# 初始化Y表的目标列（S: 是否低价, T: 破价价差）
+df_y['是否低价'] = '正常'  # 默认值
+df_y['破价价差'] = None
+
+# 遍历Y表每一行进行匹配和判断
+for idx, row in df_y.iterrows():
+    key = row['match_key']
+    y_price = row['产品价格_num']
+    p_low_price = price_map.get(key, None)
+
+    if pd.notna(y_price) and pd.notna(p_low_price):
+        if y_price < p_low_price:
+            df_y.at[idx, '是否低价'] = '低价'
+            df_y.at[idx, '破价价差'] = round(p_low_price - y_price, 2)
+            df_y.at[idx, '低价整改状态'] = '未整改'
+        else:
+            df_y.at[idx, '是否低价'] = '正常'
+            df_y.at[idx, '破价价差'] = None
+    else:
+        # 无法匹配或价格缺失，保留默认或标记
+        df_y.at[idx, '是否低价'] = None
+        df_y.at[idx, '破价价差'] = None
+
+# 只保留原始列（不保留清洗用的辅助列）
+original_columns = df_y.columns.tolist()
+output_columns = [col for col in original_columns if not col.endswith('_clean') and col not in ['产品价格_num', 'match_key']]
+
+df_y[output_columns].to_excel(output_file, index=False)
+
+print(f"处理完成！结果已保存至：{output_file}")
\ No newline at end of file
--- a/core_py/__init__.py
+++ b/core_py/__init__.py
--- a/core_py/数据转换_团队.py
+++ b/core_py/数据转换_团队.py
+import pandas as pd
+import copy
+import os
+from datetime import datetime
+from dateutil.relativedelta import relativedelta
+
+# === 本地独立运行配置（仅 __main__ 模式使用）===
+source_file = "/王小卤/风控/代码-新//2026.2-团队数据源.xlsx"
+
+
+def _get_default_audit_date() -> str:
+    """返回上月1号作为默认稽查日期，格式 yyyy-mm-01"""
+    return (datetime.now().replace(day=1) - relativedelta(months=1)).strftime("%Y-%m-01")
+
+
+
+# 列映射（目标表列名）
+COLUMN_MAPPING = {
+    "稽查日期": "稽查日期",
+    "稽查来源": "稽查来源",
+    "勤策门店编码": "勤策门店编码",
+    "勤策门店名称": "勤策门店名称",
+    "经销商名称": "经销商名称",
+    "城市": "城市",
+    "渠道类型": "渠道类型（稽查源提供）",
+    "产品系列": "产品系列",
+    "产品口味": "产品口味",
+    "产品克重": "产品克重",
+    "产品价格": "产品价格",
+    "产品生产月份": "产品生产月份",
+}
+
+# ===== 新增：多产品组配置 =====
+# 每组：价格列 + 7个口味列 + 产品信息
+# 团队表
+PRODUCT_GROUPS_JC = [
+    # 第1组：虎皮凤爪 210g
+    {
+        "price_col": 50,
+        "flavor_cols": [51, 52, 53, 54, 55, 56, 57],
+        "series": "虎皮凤爪",
+        "weight": "210g",
+        "flavors": ["卤香", "香辣", "椒麻", "火锅", "微辣", "麻辣", "黑鸭"]
+    },
+    # 第2组：虎皮凤爪 105g
+    {
+        "price_col": 58,
+        "flavor_cols": [59, 60, 61, 62, 63, 64, 65],
+        "series": "虎皮凤爪",
+        "weight": "105g",
+        "flavors": ["卤香", "香辣", "椒麻", "火锅", "微辣", "麻辣", "黑鸭"]
+    },
+    # 第3组：虎皮凤爪 68g
+    {
+        "price_col": 66,
+        "flavor_cols": [67, 68, 69, 70, 71],
+        "series": "虎皮凤爪",
+        "weight": "68g",
+        "flavors": ["卤香", "香辣", "椒麻", "麻辣", "黑鸭"]
+    },
+    # 第4组：鸡肉豆堡 120g
+    {
+        "price_col": 72,
+        "flavor_cols": [73, 74],
+        "series": "鸡肉豆堡",
+        "weight": "120g",
+        "flavors": ["卤香", "香辣"]
+    },
+    # 第5组：牛肉豆堡 120g
+    {
+        "price_col": 75,
+        "flavor_cols": [76, 77],
+        "series": "牛肉豆堡",
+        "weight": "120g",
+        "flavors": ["卤香", "香辣"]
+    },
+    # 第6组：去骨凤爪 72g
+    {
+        "price_col": 78,
+        "flavor_cols": [79, 80],
+        "series": "去骨凤爪",
+        "weight": "72g",
+        "flavors": ["柠檬", "香辣"]
+    },
+    # 第7组：去骨凤爪 138g
+    {
+        "price_col": 81,
+        "flavor_cols": [82, 83],
+        "series": "去骨凤爪",
+        "weight": "138g",
+        "flavors": ["柠檬", "香辣"]
+    },
+    # 第8组：虎皮小鸡腿 80g
+    {
+        "price_col": 84,
+        "flavor_cols": [85, 86],
+        "series": "虎皮小鸡腿",
+        "weight": "80g",
+        "flavors": ["卤香", "香辣"]
+    },
+    # 第9组：老卤凤爪 95g（与老卤鸭掌共用 price_col=87）
+    {
+        "price_col": 87,
+        "flavor_cols": [88],
+        "series": "老卤凤爪",
+        "weight": "95g",
+        "flavors": ["卤香"]
+    },
+    # 第10组：老卤鸭掌 95g（与老卤凤爪共用 price_col=87）
+    {
+        "price_col": 87,
+        "flavor_cols": [89],
+        "series": "老卤鸭掌",
+        "weight": "95g",
+        "flavors": ["卤香"]
+    },
+    # 第11组：虎皮凤爪 25g
+    {
+        "price_col": 90,
+        "flavor_cols": [91, 92],
+        "series": "虎皮凤爪",
+        "weight": "25g",
+        "flavors": ["卤香", "香辣"]
+    },
+    # 第12组：虎皮凤爪 散称
+    {
+        "price_col": 93,
+        "flavor_cols": [94, 95, 96],
+        "series": "虎皮凤爪",
+        "weight": "散称",
+        "flavors": ["卤香", "香辣", "黑鸭"]
+    }
+]
+
+# 标准输出列定义（与目标表结构保持一致）
+STANDARD_COLUMNS = [
+    "稽查日期", "稽查来源", "大区", "战区", "经销商编码", "经销商名称",
+    "勤策门店编码", "勤策门店名称", "客户经理工号", "客户经理",
+    "勤策渠道大类", "稽核渠道（对N列清洗）", "城市", "渠道类型（稽查源提供）",
+    "产品系列", "产品口味", "产品克重", "产品价格", "是否低价", "破价价差", "低价整改状态",
+    "低价整改说明", "产品生产月份", "临期月份数", "临期状态", "新鲜度",
+    "大日期整改状态", "大日期整改说明"
+]
+
+
+def _build_records(df_source, yname, pg, existing_columns, audit_date: str = None):
+    """
+    核心记录构建逻辑，供 transform() 和 main() 复用。
+
+    Args:
+        df_source: pandas DataFrame，列通过 iloc 按位置访问
+        yname: 稽查来源名称，如 '稽查团队'
+        pg: 产品组配置列表
+        existing_columns: 目标表的列名列表
+        audit_date: 稽查日期字符串，格式 'yyyy-mm-dd'；为 None 时取上月1号
+
+    Returns:
+        list: 构建好的记录列表（每条为 dict）
+    """
+    if audit_date is None:
+        audit_date = _get_default_audit_date()
+
+    records = []
+
+    for idx, row in df_source.iterrows():
+        base_data = {
+            "勤策门店编码": str(row.iloc[8]).strip() if pd.notna(row.iloc[8]) else "",
+            "城市": str(row.iloc[4]).strip() if pd.notna(row.iloc[4]) else "",
+            "勤策门店名称": str(row.iloc[9]).strip() if pd.notna(row.iloc[9]) else "",
+            "经销商名称": str(row.iloc[7]).strip() if pd.notna(row.iloc[7]) else "",
+            "渠道类型": str(row.iloc[10]).strip() if pd.notna(row.iloc[10]) else "",
+        }
+
+        base_row = {}
+        if COLUMN_MAPPING["稽查日期"] in existing_columns:
+            base_row[COLUMN_MAPPING["稽查日期"]] = audit_date
+        if COLUMN_MAPPING["稽查来源"] in existing_columns:
+            base_row[COLUMN_MAPPING["稽查来源"]] = yname
+        if COLUMN_MAPPING["勤策门店编码"] in existing_columns:
+            base_row[COLUMN_MAPPING["勤策门店编码"]] = base_data["勤策门店编码"]
+        if COLUMN_MAPPING["勤策门店名称"] in existing_columns:
+            base_row[COLUMN_MAPPING["勤策门店名称"]] = base_data["勤策门店名称"]
+        if COLUMN_MAPPING["经销商名称"] in existing_columns:
+            base_row[COLUMN_MAPPING["经销商名称"]] = base_data["经销商名称"]
+        if COLUMN_MAPPING["城市"] in existing_columns:
+            base_row[COLUMN_MAPPING["城市"]] = base_data["城市"]
+        if COLUMN_MAPPING["渠道类型"] in existing_columns:
+            base_row[COLUMN_MAPPING["渠道类型"]] = base_data["渠道类型"]
+
+        for group in pg:
+            price_col = group["price_col"]
+            flavor_cols = group["flavor_cols"]
+            flavors = group["flavors"]
+            series = group["series"]
+            weight = group["weight"]
+
+            src_price = str(row.iloc[price_col]).strip() if pd.notna(row.iloc[price_col]) else ""
+            if not src_price or src_price == '无价签':
+                src_price = ''
+
+            row_with_price = copy.deepcopy(base_row)
+            if COLUMN_MAPPING["产品价格"] in existing_columns:
+                row_with_price[COLUMN_MAPPING["产品价格"]] = src_price
+
+            for i, col_idx in enumerate(flavor_cols):
+                flavor_name = flavors[i]
+                src_month = str(row.iloc[col_idx]).strip() if pd.notna(row.iloc[col_idx]) else ""
+
+                if src_month:
+                    new_rec = copy.deepcopy(row_with_price)
+                    src_month = normalize_month(src_month)
+                    _set_product_fields(new_rec, series, flavor_name, weight, src_month, existing_columns)
+                    rDate(new_rec)
+                    records.append(new_rec)
+                elif src_price:
+                    new_rec = copy.deepcopy(row_with_price)
+                    _set_product_fields(new_rec, series, flavor_name, weight, None, existing_columns)
+                    rDate(new_rec)
+                    records.append(new_rec)
+
+    return records
+
+
+def transform(df_source, yname, pg, audit_date: str = None):
+    """
+    供 API 调用的数据转换入口：接收 DataFrame，返回清洗后的记录列表，不读写任何文件。
+
+    Args:
+        df_source: pandas DataFrame，列通过 iloc 按位置访问（与原始 Excel 列顺序对应）
+        yname: 稽查来源名称，如 '稽查团队'
+        pg: 产品组配置列表
+        audit_date: 稽查日期字符串，格式 'yyyy-mm-dd'；为 None 时自动取上月1号
+
+    Returns:
+        list[dict]: 按 STANDARD_COLUMNS 结构整理好的记录列表
+    """
+    return _build_records(df_source, yname, pg, STANDARD_COLUMNS, audit_date=audit_date)
+
+
+# === 主逻辑（独立运行/本地文件模式） ===
+def main(df_source, yname, pg, audit_date: str = None):
+    if audit_date is None:
+        audit_date = _get_default_audit_date()
+    target_file = f"/王小卤/风控/代码-新/大日期{audit_date}_2.xlsx"
+
+    try:
+        # 获取目标表结构
+        try:
+            df_target = pd.read_excel(target_file, sheet_name="合并后", dtype=str)
+            existing_columns = df_target.columns.tolist()
+        except (FileNotFoundError, ValueError):
+            df_target = pd.DataFrame(columns=STANDARD_COLUMNS)
+            existing_columns = STANDARD_COLUMNS
+
+        records = _build_records(df_source, yname, pg, existing_columns, audit_date=audit_date)
+
+        if not records:
+            print("⚠️ 无有效数据需要追加。")
+            return
+
+        df_new = pd.DataFrame(records, columns=existing_columns)
+        df_combined = pd.concat([df_target, df_new], ignore_index=True)
+
+        if os.path.exists(target_file):
+            with pd.ExcelWriter(target_file, engine='openpyxl', mode='a', if_sheet_exists='replace') as writer:
+                df_combined.to_excel(writer, sheet_name="合并后", index=False)
+        else:
+            with pd.ExcelWriter(target_file, engine='openpyxl', mode='w') as writer:
+                df_combined.to_excel(writer, sheet_name="合并后", index=False)
+
+        print(f"✅ 成功追加 {len(records)} 条记录到目标表！")
+
+    except Exception as e:
+        print(f"❌ 错误: {e}")
+        import traceback
+        traceback.print_exc()
+
+
+def _set_product_fields(record, series, flavor, weight, prod_month_str, existing_columns):
+    """设置产品字段"""
+    if COLUMN_MAPPING["产品系列"] in existing_columns:
+        record[COLUMN_MAPPING["产品系列"]] = series
+    if COLUMN_MAPPING["产品口味"] in existing_columns:
+        record[COLUMN_MAPPING["产品口味"]] = flavor
+    if COLUMN_MAPPING["产品克重"] in existing_columns:
+        record[COLUMN_MAPPING["产品克重"]] = weight
+    if prod_month_str and COLUMN_MAPPING["产品生产月份"] in existing_columns:
+        try:
+            dt = datetime.strptime(prod_month_str, "%Y-%m-%d")
+            record[COLUMN_MAPPING["产品生产月份"]] = dt.strftime("%Y-%m-%d")
+        except (ValueError, TypeError):
+            record[COLUMN_MAPPING["产品生产月份"]] = None
+
+
+def rDate(row_dict):
+    """计算临期状态（保持你原有的业务逻辑）"""
+    prod_date_str = row_dict.get("产品生产月份", None)
+    inspect_date_str = row_dict.get("稽查日期", "").strip()
+
+    if not prod_date_str or not inspect_date_str:
+        row_dict["临期状态"] = ""
+        row_dict["新鲜度"] = ""
+        row_dict["临期月份数"] = ""
+        return
+
+    try:
+        prod_date = datetime.strptime(prod_date_str, "%Y-%m-%d")
+        inspect_date = datetime.strptime(inspect_date_str, "%Y-%m-%d")
+    except ValueError:
+        row_dict["临期状态"] = ""
+        row_dict["新鲜度"] = ""
+        row_dict["临期月份数"] = ""
+        return
+
+    product_series = row_dict.get("产品系列", "")
+    zg_status = "未整改"
+    if product_series == "去骨凤爪":
+        expiry_date = prod_date + relativedelta(months=6)
+        gap_months = _calculate_gap_months(expiry_date, inspect_date)
+        if gap_months >= 2:
+            status, freshness,zg_status = "非大日期", "高",""
+        elif 1 <= gap_months < 2:
+            status, freshness = "大日期", "低"
+        elif 0 <= gap_months < 1:
+            status, freshness = "临期", "低"
+        else:
+            status, freshness = "过期", "低"
+    else:
+        expiry_date = prod_date + relativedelta(months=9)
+        gap_months = _calculate_gap_months(expiry_date, inspect_date)
+        if gap_months >= 3:
+            status, freshness,zg_status = "非大日期", "高",""
+        elif 1 <= gap_months < 3:
+            status, freshness = "大日期", "低"
+        elif 0 <= gap_months < 1:
+            status, freshness = "临期", "低"
+        else:
+            status, freshness = "过期", "低"
+
+    row_dict["临期状态"] = status
+    row_dict["新鲜度"] = freshness
+    row_dict["临期月份数"] = round(gap_months, 2)
+    row_dict["大日期整改状态"] = zg_status
+
+
+def _calculate_gap_months(expiry_date, inspect_date):
+    diff_years = expiry_date.year - inspect_date.year
+    diff_months = expiry_date.month - inspect_date.month
+    diff_days = expiry_date.day - inspect_date.day
+    return diff_years * 12 + diff_months + diff_days / 30.0
+
+import re
+
+#  这里还需要修改
+def normalize_month(src_month):
+    """
+    将生产月份字符串标准化为 'yyyy-mm' 格式。
+    
+    支持的输入格式：
+      - 'yyyy-mm'（如 '2025-12'）→ 保持不变
+      - 'yyyymm'（如 '202512'）→ 转为 '2025-12'
+    
+    其他格式或无效值返回原值（或可选返回空字符串）
+    """
+    if not isinstance(src_month, str):
+        return src_month  # 非字符串直接返回
+    
+    src_month = src_month.strip()
+    if not src_month:
+        return src_month
+
+    # 情况1: 已是 yyyy-mm 格式（例如 2025-12）
+    if re.fullmatch(r'\d{4}-\d{1,2}', src_month):
+        # 可选：统一补零为两位月（如 2025-1 → 2025-01）
+        year, month = src_month.split('-')
+        month = month.zfill(2)  # 确保月份两位
+        return f"{year}-{month}-01"
+
+    # 情况2: 是 yyyymm 格式（6位数字，如 202512）
+    if re.fullmatch(r'\d{6}', src_month):
+        year = src_month[:4]
+        month = src_month[4:].lstrip('0') or '0'  # 防止全零
+        month = src_month[4:].zfill(2)  # 直接取后两位并确保两位（更安全）
+        return f"{year}-{month}-01"
+
+    # 其他格式：不处理（或可根据需求返回空）
+    return src_month
+
+if __name__ == "__main__":
+    # TODO: 配置 sheet 页名称
+    print("正在读取【团队】源文件（跳过第 1 行标题，第 2 行作为数据第 1 行）...")
+
+    # 修改点：
+    # 1. skiprows=1 : 跳过物理第 1 行（标题）
+    # 2. header=None : 关键！告诉 pandas 不要把物理第 2 行当表头，而是当数据。
+    #                这样物理第 2 行会变成 df 的第 0 行，列名会自动变成 0, 1, 2...
+    #                这完美匹配你代码中的 row.iloc[4], row.iloc[8] 等逻辑。
+    df_source_p = pd.read_excel(source_file, skiprows=1, header=None, dtype=str)
+
+    # 验证读取结果（可选，用于调试）
+    print(f"✅ 成功读取 {len(df_source_p)} 行数据。")
+    if len(df_source_p) > 0:
+        print("前 2 行数据预览（确认第 2 行是否在列）：")
+        print(df_source_p.head(2))
+        print(f"列索引范围：0 到 {len(df_source_p.columns) - 1}")
+
+    main(df_source_p, '稽查团队', PRODUCT_GROUPS_JC)
\ No newline at end of file
--- a/core_py/数据转换_诚予_浦零.py
+++ b/core_py/数据转换_诚予_浦零.py
+import pandas as pd
+import copy
+import os
+from datetime import datetime
+from dateutil.relativedelta import relativedelta
+
+# TODO: === 配置区 ===
+# TODO: 配置稽查月份（默认1号）0:1，1:12，2:11，3:10
+current_date = (datetime.now().replace(day=1) - relativedelta(months=1)).strftime("%Y-%m-01")
+source_file = "/王小卤/风控/代码-新/2026.2-浦零数据源.xlsx"
+#source_file_cy = "/Users/a02200059/Desktop/王小卤/风控中心/低价+大日期/2512门店稽查结果/诚予国际.xlsx"
+target_file = f"/王小卤/风控/代码-新/大日期{current_date}_2.xlsx"
+
+
+# 列映射（目标表列名）
+COLUMN_MAPPING = {
+    "稽查日期": "稽查日期",
+    "稽查来源": "稽查来源",
+    "勤策门店编码": "勤策门店编码",
+    "勤策门店名称": "勤策门店名称",
+    "经销商名称": "经销商名称",
+    "城市": "城市",
+    "渠道类型": "渠道类型（稽查源提供）",
+    "产品系列": "产品系列",
+    "产品口味": "产品口味",
+    "产品克重": "产品克重",
+    "产品价格": "产品价格",
+    "产品生产月份": "产品生产月份",
+}
+
+# ===== 新增：多产品组配置 =====
+# 每组：价格列 + 7个口味列 + 产品信息
+# 诚予国际
+PRODUCT_GROUPS_CY = [
+    # 第1组：虎皮凤爪 210g
+    {
+        "price_col": 7,
+        "flavor_cols": [8, 9, 10, 11, 12, 13, 14],
+        "series": "虎皮凤爪",
+        "weight": "210g",
+        "flavors": ["卤香", "香辣", "椒麻", "火锅", "微辣", "麻辣", "黑鸭"]
+    },
+    # 第2组：虎皮凤爪 105g
+    {
+        "price_col": 15,
+        "flavor_cols": [16, 17, 18, 19, 20, 21, 22],
+        "series": "虎皮凤爪",
+        "weight": "105g",
+        "flavors": ["卤香", "香辣", "椒麻", "火锅", "微辣", "麻辣", "黑鸭"]
+    },
+    # 第3组：虎皮凤爪 68g
+    {
+        "price_col": 23,
+        "flavor_cols": [24, 25, 26, 27, 28],
+        "series": "虎皮凤爪",
+        "weight": "68g",
+        "flavors": ["卤香", "香辣", "椒麻", "麻辣", "黑鸭"]
+    },
+    # 第4组：鸡肉豆堡 120g
+    {
+        "price_col": 29,
+        "flavor_cols": [30, 31],
+        "series": "鸡肉豆堡",
+        "weight": "120g",
+        "flavors": ["卤香", "香辣"]
+    },
+    # 第5组：牛肉豆堡 120g
+    {
+        "price_col": 32,
+        "flavor_cols": [33, 34],
+        "series": "牛肉豆堡",
+        "weight": "120g",
+        "flavors": ["卤香", "香辣"]
+    },
+    # 第6组：去骨凤爪 72g
+    {
+        "price_col": 35,
+        "flavor_cols": [36, 37],
+        "series": "去骨凤爪",
+        "weight": "72g",
+        "flavors": ["柠檬", "香辣"]
+    },
+    # 第7组：去骨凤爪 138g
+    {
+        "price_col": 38,
+        "flavor_cols": [39, 40],
+        "series": "去骨凤爪",
+        "weight": "138g",
+        "flavors": ["柠檬", "香辣"]
+    },
+    # 第8组：虎皮小鸡腿 80g
+    {
+        "price_col": 41,
+        "flavor_cols": [42, 43],
+        "series": "虎皮小鸡腿",
+        "weight": "80g",
+        "flavors": ["卤香", "香辣"]
+    },
+    # 第9组：老卤凤爪 95g（与老卤鸭掌共用 price_col=44）
+    {
+        "price_col": 44,
+        "flavor_cols": [45],
+        "series": "老卤凤爪",
+        "weight": "95g",
+        "flavors": ["卤香"]
+    },
+    # 第10组：老卤鸭掌 95g（与老卤凤爪共用 price_col=44）
+    {
+        "price_col": 44,
+        "flavor_cols": [46],
+        "series": "老卤鸭掌",
+        "weight": "95g",
+        "flavors": ["卤香"]
+    },
+    # 第11组：虎皮凤爪 25g
+    {
+        "price_col": 47,
+        "flavor_cols": [48, 49],
+        "series": "虎皮凤爪",
+        "weight": "25g",
+        "flavors": ["卤香", "香辣"]
+    },
+    # 第12组：虎皮凤爪 散称
+    {
+        "price_col": 50,
+        "flavor_cols": [51, 52, 53],
+        "series": "虎皮凤爪",
+        "weight": "散称",
+        "flavors": ["卤香", "香辣", "黑鸭"]
+    }
+]
+# 标准输出列定义（与目标表结构保持一致）
+STANDARD_COLUMNS = [
+    "稽查日期", "稽查来源", "大区", "战区", "经销商编码", "经销商名称",
+    "勤策门店编码", "勤策门店名称", "客户经理工号", "客户经理",
+    "勤策渠道大类", "稽核渠道（对N列清洗）", "城市", "渠道类型（稽查源提供）",
+    "产品系列", "产品口味", "产品克重", "产品价格", "是否低价", "破价价差", "低价整改状态",
+    "低价整改说明", "产品生产月份", "临期月份数", "临期状态", "新鲜度",
+    "大日期整改状态", "大日期整改说明"
+]
+
+PRODUCT_GROUPS = [
+    # 第1组：虎皮凤爪 210g
+    {
+        "price_col": 6,
+        "flavor_cols": [7, 8, 9, 10, 11, 12, 13],
+        "series": "虎皮凤爪",
+        "weight": "210g",
+        "flavors": ["卤香", "香辣", "椒麻", "火锅", "微辣", "麻辣", "黑鸭"]
+    },
+    # 第2组：虎皮凤爪 105g
+    {
+        "price_col": 14,
+        "flavor_cols": [15, 16, 17, 18, 19, 20, 21],
+        "series": "虎皮凤爪",
+        "weight": "105g",
+        "flavors": ["卤香", "香辣", "椒麻", "火锅", "微辣", "麻辣", "黑鸭"]
+    },
+    # 第3组：虎皮凤爪 68g
+    {
+        "price_col": 22,
+        "flavor_cols": [23, 24, 25, 26, 27],
+        "series": "虎皮凤爪",
+        "weight": "68g",
+        "flavors": ["卤香", "香辣", "椒麻", "麻辣", "黑鸭"]
+    },
+    # 第4组：鸡肉豆堡 120g
+    {
+        "price_col": 28,
+        "flavor_cols": [29, 30],
+        "series": "鸡肉豆堡",
+        "weight": "120g",
+        "flavors": ["卤香", "香辣"]
+    },
+    # 第5组：牛肉豆堡 120g
+    {
+        "price_col": 31,
+        "flavor_cols": [32, 33],
+        "series": "牛肉豆堡",
+        "weight": "120g",
+        "flavors": ["卤香", "香辣"]
+    },
+    # 第6组：去骨凤爪 72g
+    {
+        "price_col": 34,
+        "flavor_cols": [35, 36],
+        "series": "去骨凤爪",
+        "weight": "72g",
+        "flavors": ["柠檬", "香辣"]
+    },
+    # 第7组：去骨凤爪 138g
+    {
+        "price_col": 37,
+        "flavor_cols": [38, 39],
+        "series": "去骨凤爪",
+        "weight": "138g",
+        "flavors": ["柠檬", "香辣"]
+    },
+    # 第8组：虎皮小鸡腿 80g
+    {
+        "price_col": 40,
+        "flavor_cols": [41, 42],
+        "series": "虎皮小鸡腿",
+        "weight": "80g",
+        "flavors": ["卤香", "香辣"]
+    },
+    # 第9组：老卤凤爪 95g
+    {
+        "price_col": 43,
+        "flavor_cols": [44],
+        "series": "老卤凤爪",
+        "weight": "95g",
+        "flavors": ["卤香"]
+    },
+    # 第10组：老卤鸭掌 95g
+    {
+        "price_col": 45,
+        "flavor_cols": [46],
+        "series": "老卤鸭掌",
+        "weight": "95g",
+        "flavors": ["卤香"]
+    },
+    # 第11组：虎皮凤爪 25g
+    {
+        "price_col": 47,
+        "flavor_cols": [48, 49],
+        "series": "虎皮凤爪",
+        "weight": "25g",
+        "flavors": ["卤香", "香辣"]
+    },
+    # 第12组：虎皮凤爪 散称
+    {
+        "price_col": 50,
+        "flavor_cols": [51, 52, 53],
+        "series": "虎皮凤爪",
+        "weight": "散称",
+        "flavors": ["卤香", "香辣", "黑鸭"]
+    }
+]
+
+# === 主逻辑 ===
+def main(df_source,yname,pg):
+    try:
+        # 获取目标表结构
+        try:
+            df_target = pd.read_excel(target_file, sheet_name="合并后", dtype=str)
+            existing_columns = df_target.columns.tolist()
+        except (FileNotFoundError, ValueError):
+            standard_columns = [
+                "稽查日期", "稽查来源", "大区", "战区", "经销商编码", "经销商名称",
+                "勤策门店编码", "勤策门店名称", "客户经理工号", "客户经理",
+                "勤策渠道大类", "稽核渠道（对N列清洗）", "城市", "渠道类型（稽查源提供）",
+                "产品系列", "产品口味", "产品克重", "产品价格","是否低价", "破价价差", "低价整改状态",
+                "低价整改说明", "产品生产月份", "临期月份数", "临期状态", "新鲜度",
+                "大日期整改状态", "大日期整改说明"
+            ]
+            df_target = pd.DataFrame(columns=standard_columns)
+            existing_columns = standard_columns
+
+        records = []
+
+        # 处理每一行
+        for idx, row in df_source.iterrows():
+            # 提取基础字段（B～F）
+            base_data = {
+                "勤策门店编码": str(row.iloc[1]).strip() if pd.notna(row.iloc[1]) else "",
+                "城市": str(row.iloc[2]).strip() if pd.notna(row.iloc[2]) else "",
+                "勤策门店名称": str(row.iloc[3]).strip() if pd.notna(row.iloc[3]) else "",
+                "经销商名称": str(row.iloc[4]).strip() if pd.notna(row.iloc[4]) else "",
+                "渠道类型": str(row.iloc[5]).strip() if pd.notna(row.iloc[5]) else "",
+            }
+
+            # 构建基础行（不含产品信息）
+            base_row = {}
+            if COLUMN_MAPPING["稽查日期"] in existing_columns:
+                base_row[COLUMN_MAPPING["稽查日期"]] = current_date
+            if COLUMN_MAPPING["稽查来源"] in existing_columns:
+                base_row[COLUMN_MAPPING["稽查来源"]] = yname
+            if COLUMN_MAPPING["勤策门店编码"] in existing_columns:
+                base_row[COLUMN_MAPPING["勤策门店编码"]] = base_data["勤策门店编码"]
+            if COLUMN_MAPPING["勤策门店名称"] in existing_columns:
+                base_row[COLUMN_MAPPING["勤策门店名称"]] = base_data["勤策门店名称"]
+            if COLUMN_MAPPING["经销商名称"] in existing_columns:
+                base_row[COLUMN_MAPPING["经销商名称"]] = base_data["经销商名称"]
+            if COLUMN_MAPPING["城市"] in existing_columns:
+                base_row[COLUMN_MAPPING["城市"]] = base_data["城市"]
+            if COLUMN_MAPPING["渠道类型"] in existing_columns:
+                base_row[COLUMN_MAPPING["渠道类型"]] = base_data["渠道类型"]
+
+            # 处理每一组产品
+            for group in pg:
+                price_col = group["price_col"]
+                flavor_cols = group["flavor_cols"]
+                flavors = group["flavors"]
+                series = group["series"]
+                weight = group["weight"]
+
+                if not flavor_cols:
+                    print("⚠️ 未找到任何口味列！")
+                # 获取该组价格
+                src_price = str(row.iloc[price_col]).strip() if pd.notna(row.iloc[price_col]) else ""
+                if not src_price or src_price == '无价签':
+                    src_price = ''
+
+                # 设置价格到基础行副本（仅用于本组）
+                row_with_price = copy.deepcopy(base_row)
+                if COLUMN_MAPPING["产品价格"] in existing_columns:
+                    row_with_price[COLUMN_MAPPING["产品价格"]] = src_price
+
+                # 处理该组的7个口味
+                for i, col_idx in enumerate(flavor_cols):
+                    flavor_name = flavors[i]
+                    src_month = str(row.iloc[col_idx]).strip() if pd.notna(row.iloc[col_idx]) else ""
+
+                    # 情况1: 有生产月份 → 必须生成记录
+                    if src_month:
+                        new_rec = copy.deepcopy(row_with_price)
+                        # 修改src_month格式
+                        src_month = normalize_month(src_month)
+                        _set_product_fields(new_rec, series, flavor_name, weight, src_month, existing_columns)
+                        rDate(new_rec)
+                        records.append(new_rec)
+
+                    # 情况2: 无生产月份但有价格 → 生成记录（生产月份留空）
+                    elif src_price:
+                        new_rec = copy.deepcopy(row_with_price)
+                        _set_product_fields(new_rec, series, flavor_name, weight, None, existing_columns)
+                        rDate(new_rec)
+                        records.append(new_rec)
+
+        if not records:
+            print("⚠️ 无有效数据需要追加。")
+            return
+
+        df_new = pd.DataFrame(records, columns=existing_columns)
+        df_combined = pd.concat([df_target, df_new], ignore_index=True)
+
+        # 判断目标文件是否存在
+        if os.path.exists(target_file):
+            # 文件存在：以追加模式打开，替换 "合并后" sheet
+            with pd.ExcelWriter(target_file, engine='openpyxl', mode='a', if_sheet_exists='replace') as writer:
+                df_combined.to_excel(writer, sheet_name="合并后", index=False)
+        else:
+            # 文件不存在：创建新文件，只写入 "合并后" sheet
+            with pd.ExcelWriter(target_file, engine='openpyxl', mode='w') as writer:
+                df_combined.to_excel(writer, sheet_name="合并后", index=False)
+
+        print(f"✅ 成功追加 {len(records)} 条记录到目标表！")
+
+    except Exception as e:
+        print(f"❌ 错误: {e}")
+        import traceback
+        traceback.print_exc()
+
+
+def _set_product_fields(record, series, flavor, weight, prod_month_str, existing_columns):
+    """设置产品字段"""
+    if COLUMN_MAPPING["产品系列"] in existing_columns:
+        record[COLUMN_MAPPING["产品系列"]] = series
+    if COLUMN_MAPPING["产品口味"] in existing_columns:
+        record[COLUMN_MAPPING["产品口味"]] = flavor
+    if COLUMN_MAPPING["产品克重"] in existing_columns:
+        record[COLUMN_MAPPING["产品克重"]] = weight
+    if prod_month_str and COLUMN_MAPPING["产品生产月份"] in existing_columns:
+        # record[COLUMN_MAPPING["产品生产月份"]] = prod_month_str
+        # record[COLUMN_MAPPING["产品生产月份"]] = pd.to_datetime(prod_month_str)
+        try:
+            #TODO: 假设 prod_month_str 是 "yyyy-mm-dd" 字符串
+            dt = datetime.strptime(prod_month_str, "%Y-%m-%d")
+            record[COLUMN_MAPPING["产品生产月份"]] = dt.date()  # 👈 关键：转为 date
+        except (ValueError, TypeError):
+            # 如果解析失败，保留原值或设为空
+            record[COLUMN_MAPPING["产品生产月份"]] = None
+
+def rDate(row_dict):
+    """计算临期状态（保持你原有的业务逻辑）"""
+    # TODO: prod_month_str = row_dict.get("产品生产月份", "").strip()
+    prod_date = row_dict.get("产品生产月份", None)
+    inspect_date_str = row_dict.get("稽查日期", "").strip()
+
+    if not prod_date or not inspect_date_str:
+        row_dict["临期状态"] = ""
+        row_dict["新鲜度"] = ""
+        row_dict["临期月份数"] = ""
+        return
+
+    try:
+        # TODO: prod_date = datetime.strptime(prod_month_str, "%Y-%m-%d")
+        inspect_date = datetime.strptime(inspect_date_str, "%Y-%m-%d")
+    except ValueError:
+        row_dict["临期状态"] = ""
+        row_dict["新鲜度"] = ""
+        row_dict["临期月份数"] = ""
+        return
+
+    product_series = row_dict.get("产品系列", "")
+    zg_status = "未整改"
+    if product_series == "去骨凤爪":
+        expiry_date = prod_date + relativedelta(months=6)
+        gap_months = _calculate_gap_months(expiry_date, inspect_date)
+        if gap_months >= 2:
+            status, freshness,zg_status = "非大日期", "高",""
+        elif 1 <= gap_months < 2:
+            status, freshness = "大日期", "低"
+        elif 0 <= gap_months < 1:
+            status, freshness = "临期", "低"
+        else:
+            status, freshness = "过期", "低"
+    else:
+        expiry_date = prod_date + relativedelta(months=9)
+        gap_months = _calculate_gap_months(expiry_date, inspect_date)
+        if gap_months >= 3:
+            status, freshness,zg_status = "非大日期", "高",""
+        elif 1 <= gap_months < 3:
+            status, freshness = "大日期", "低"
+        elif 0 <= gap_months < 1:
+            status, freshness = "临期", "低"
+        else:
+            status, freshness = "过期", "低"
+
+    row_dict["临期状态"] = status
+    row_dict["新鲜度"] = freshness
+    row_dict["临期月份数"] = round(gap_months, 2)
+    row_dict["大日期整改状态"] = zg_status
+
+
+def _calculate_gap_months(expiry_date, inspect_date):
+    diff_years = expiry_date.year - inspect_date.year
+    diff_months = expiry_date.month - inspect_date.month
+    diff_days = expiry_date.day - inspect_date.day
+    return diff_years * 12 + diff_months + diff_days / 30.0
+
+import re
+
+#  todo 这里还需要修改
+def normalize_month(src_month):
+    """
+    将生产月份字符串标准化为 'yyyy-mm' 格式。
+    
+    支持的输入格式：
+      - 'yyyy-mm'（如 '2025-12'）→ 保持不变
+      - 'yyyymm'（如 '202512'）→ 转为 '2025-12'
+    
+    其他格式或无效值返回原值（或可选返回空字符串）
+    """
+    if not isinstance(src_month, str):
+        return src_month  # 非字符串直接返回
+    
+    src_month = src_month.strip()
+    if not src_month:
+        return src_month
+
+    # 情况1: 已是 yyyy-mm 格式（例如 2025-12）
+    if re.fullmatch(r'\d{4}-\d{1,2}', src_month):
+        # 可选：统一补零为两位月（如 2025-1 → 2025-01）
+        year, month = src_month.split('-')
+        month = month.zfill(2)  # 确保月份两位
+        return f"{year}-{month}-01"
+
+    # 情况2: 是 yyyymm 格式（6位数字，如 202512）
+    if re.fullmatch(r'\d{6}', src_month):
+        year = src_month[:4]
+        month = src_month[4:].lstrip('0') or '0'  # 防止全零
+        month = src_month[4:].zfill(2)  # 直接取后两位并确保两位（更安全）
+        return f"{year}-{month}-01"
+
+    # 其他格式：不处理（或可根据需求返回空）
+    return src_month
+
+def transform(df_source, yname, pg, audit_date: str = None):
+    """
+    供 API 调用的数据转换入口：接收 DataFrame，返回清洗后的记录列表，不读写任何文件。
+
+    Args:
+        df_source: pandas DataFrame，列通过 iloc 按位置访问（header=2 读入后索引从 0 开始）
+        yname: 稽查来源名称，如 '浦零' 或 '诚予'
+        pg: 产品组配置列表（PRODUCT_GROUPS 或 PRODUCT_GROUPS_CY）
+        audit_date: 稽查日期字符串，格式 'yyyy-mm-dd'；为 None 时自动取上月1号
+
+    Returns:
+        list[dict]: 按 STANDARD_COLUMNS 结构整理好的记录列表（产品生产月份为字符串）
+    """
+    from datetime import date as date_type
+
+    if audit_date is None:
+        audit_date = (datetime.now().replace(day=1) - relativedelta(months=1)).strftime("%Y-%m-01")
+
+    records = []
+
+    for idx, row in df_source.iterrows():
+        base_data = {
+            "勤策门店编码": str(row.iloc[1]).strip() if pd.notna(row.iloc[1]) else "",
+            "城市": str(row.iloc[2]).strip() if pd.notna(row.iloc[2]) else "",
+            "勤策门店名称": str(row.iloc[3]).strip() if pd.notna(row.iloc[3]) else "",
+            "经销商名称": str(row.iloc[4]).strip() if pd.notna(row.iloc[4]) else "",
+            "渠道类型": str(row.iloc[5]).strip() if pd.notna(row.iloc[5]) else "",
+        }
+
+        base_row = {}
+        if COLUMN_MAPPING["稽查日期"] in STANDARD_COLUMNS:
+            base_row[COLUMN_MAPPING["稽查日期"]] = audit_date
+        if COLUMN_MAPPING["稽查来源"] in STANDARD_COLUMNS:
+            base_row[COLUMN_MAPPING["稽查来源"]] = yname
+        if COLUMN_MAPPING["勤策门店编码"] in STANDARD_COLUMNS:
+            base_row[COLUMN_MAPPING["勤策门店编码"]] = base_data["勤策门店编码"]
+        if COLUMN_MAPPING["勤策门店名称"] in STANDARD_COLUMNS:
+            base_row[COLUMN_MAPPING["勤策门店名称"]] = base_data["勤策门店名称"]
+        if COLUMN_MAPPING["经销商名称"] in STANDARD_COLUMNS:
+            base_row[COLUMN_MAPPING["经销商名称"]] = base_data["经销商名称"]
+        if COLUMN_MAPPING["城市"] in STANDARD_COLUMNS:
+            base_row[COLUMN_MAPPING["城市"]] = base_data["城市"]
+        if COLUMN_MAPPING["渠道类型"] in STANDARD_COLUMNS:
+            base_row[COLUMN_MAPPING["渠道类型"]] = base_data["渠道类型"]
+
+        for group in pg:
+            price_col = group["price_col"]
+            flavor_cols = group["flavor_cols"]
+            flavors = group["flavors"]
+            series = group["series"]
+            weight = group["weight"]
+
+            src_price = str(row.iloc[price_col]).strip() if pd.notna(row.iloc[price_col]) else ""
+            if not src_price or src_price == '无价签':
+                src_price = ''
+
+            row_with_price = copy.deepcopy(base_row)
+            if COLUMN_MAPPING["产品价格"] in STANDARD_COLUMNS:
+                row_with_price[COLUMN_MAPPING["产品价格"]] = src_price
+
+            for i, col_idx in enumerate(flavor_cols):
+                flavor_name = flavors[i]
+                src_month = str(row.iloc[col_idx]).strip() if pd.notna(row.iloc[col_idx]) else ""
+
+                if src_month:
+                    new_rec = copy.deepcopy(row_with_price)
+                    src_month = normalize_month(src_month)
+                    _set_product_fields(new_rec, series, flavor_name, weight, src_month, STANDARD_COLUMNS)
+                    rDate(new_rec)
+                    records.append(new_rec)
+                elif src_price:
+                    new_rec = copy.deepcopy(row_with_price)
+                    _set_product_fields(new_rec, series, flavor_name, weight, None, STANDARD_COLUMNS)
+                    rDate(new_rec)
+                    records.append(new_rec)
+
+    # 将 date 对象统一转为 ISO 字符串，保证 JSON 可序列化
+    for rec in records:
+        for k, v in rec.items():
+            if isinstance(v, date_type):
+                rec[k] = v.isoformat()
+
+    return records
+
+
+if __name__ == "__main__":
+    # TODO: 配置sheet页名称
+    print("正在读取【浦零】源文件（跳过前三行）...")
+    df_source_p = pd.read_excel(source_file, header=2, dtype=str)
+    main(df_source_p,'浦零',PRODUCT_GROUPS)
+    #print("正在读取【诚予】源文件（跳过前三行）...")
+    #df_source_c = pd.read_excel(source_file_cy, sheet_name="Sheet1", header=2, dtype=str)
+    #main(df_source_c,'诚予',PRODUCT_GROUPS_CY)
--- a/index.py
+++ b/index.py
+"""
+数据清洗系统 - FastAPI 应用主程序
+Description: 提供 Excel 数据解析、清洗和存储的 API 服务
+"""
+
+from fastapi import FastAPI, BackgroundTasks
+from pydantic import BaseModel
+import logging
+import uuid
+import asyncio
+import math
+import pandas as pd
+from io import BytesIO
+from datetime import datetime
+from typing import Optional, Dict, Any
+
+# 导入业务模块
+from core.excel_handler import ExcelHandler
+from core.data_cleaner import DataCleaner
+from core.db_handler import DatabaseHandler
+from core.progress_manager import ProgressManager
+from utils.exceptions import DataCleaningException, DatabaseException
+from utils.validators import validate_excel_url
+from utils.response import BizCode, ok_resp, fail_resp
+
+
+def _sanitize_nan(records: list) -> list:
+    """将列表中每行 dict 里的 float NaN / Inf 替换为 None，确保 JSON 可序列化。"""
+    sanitized = []
+    for row in records:
+        sanitized.append({
+            k: (None if isinstance(v, float) and (math.isnan(v) or math.isinf(v)) else v)
+            for k, v in row.items()
+        })
+    return sanitized
+
+
+# 配置日志
+logging.basicConfig(
+    level=logging.INFO, # 只记录 INFO 以上的日志
+    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' # 时间 - 模块名 - 级别 - 内容
+)
+logger = logging.getLogger(__name__) # __name__ 运行时获取模块名
+
+# 创建 FastAPI 应用
+app = FastAPI(
+    title="数据清洗系统",
+    description="用于数据解析、清洗和持久化的 API 服务",
+    version="1.0.0"
+)
+
+# ==================== 请求数据模型 ====================
+
+class CleaningRequest(BaseModel):
+    """数据清洗请求模型"""
+    excel_url: Optional[str] = None   # 普通清洗模式必填；风控稽查模式可不传
+    department: str
+    description: Optional[str] = None
+    audit_date: Optional[str] = None  # 稽查日期，格式 'yyyy-mm-dd'，不传则取上月1号
+    # ── 风控稽查数据清洗 专用字段 ──────────────────────────────────
+    year: Optional[int] = None        # 数据所属年
+    month: Optional[int] = None       # 数据所属月
+    day: Optional[int] = None         # 数据所属日
+    team_url: Optional[str] = None    # 团队数据表链接
+    puling_url: Optional[str] = None  # 浦零数据表链接
+    chengyu_url: Optional[str] = None # 诚予数据表链接
+
+class SavingRequest(BaseModel):
+    """数据保存请求模型"""
+    task_id: str
+    table_name: str
+
+# ==================== 业务逻辑 ====================
+
+class DataCleaningService:
+    """数据清洗服务主类"""
+    
+    # 性能基准参数（可根据实际情况调整）
+    DOWNLOAD_TIME_BASE = 2  # 下载和解析基础时间（秒）
+    DOWNLOAD_TIME_PER_ROW = 0.0001  # 每行数据的下载时间（秒）
+    CLEANING_TIME_PER_ROW = 0.001  # 每行数据的清洗时间（秒）
+    VALIDATION_TIME_BASE = 1  # 验证基础时间（秒）
+    CACHING_TIME_PER_ROW = 0.0001  # 每行数据的缓存时间（秒）
+    CACHE_TTL_SECONDS = 1800  # cache 保留时长：30 分钟
+    
+    def __init__(self):
+        self.progress_manager = ProgressManager()
+        self.excel_handler = ExcelHandler()
+        self.data_cleaner = DataCleaner()
+        self.db_handler = DatabaseHandler()
+        # 存储已清洗的数据（内存中，可扩展为 Redis）
+        self.cleaned_data_cache: Dict[str, Any] = {}
+
+    def _evict_expired_cache(self):
+        """清除超过 TTL 的 cache 条目，在写入和读取时调用"""
+        now = datetime.now()
+        expired = [
+            tid for tid, v in self.cleaned_data_cache.items()
+            if (now - v['created_at']).total_seconds() > self.CACHE_TTL_SECONDS
+        ]
+        for tid in expired:
+            del self.cleaned_data_cache[tid]
+            logger.info(f"[cache] 已清除过期任务 {tid}")
+    
+    def estimate_completion_time(self, row_count: int) -> int:
+        """
+        根据数据行数预估完成时间
+        
+        Args:
+            row_count: Excel 文件的数据行数
+            
+        Returns:
+            int: 预估完成时间（秒）
+        """
+        # 计算各阶段时间
+        download_time = self.DOWNLOAD_TIME_BASE + (row_count * self.DOWNLOAD_TIME_PER_ROW)
+        validation_time = self.VALIDATION_TIME_BASE
+        cleaning_time = row_count * self.CLEANING_TIME_PER_ROW
+        caching_time = row_count * self.CACHING_TIME_PER_ROW
+        
+        # 总时间（向上取整）
+        total_time = int(download_time + validation_time + cleaning_time + caching_time)
+        
+        # 最少 5 秒，最多 3600 秒（1小时）
+        return max(5, min(total_time, 3600))
+    
+    async def clean_data_from_url(
+        self,
+        task_id: str,
+        excel_url: str,
+        department: str,
+        raw_data: list = None,
+        audit_date: str = None
+    ) -> Dict[str, Any]:
+        """
+        从 URL 下载并清洗 Excel 数据
+
+        Args:
+            task_id: 任务唯一标识
+            excel_url: Excel 文件的网络链接
+            department: 业务部门名称
+            raw_data: 可选，已下载的原始数据（由路由层传入以避免重复下载）
+            audit_date: 稽查日期字符串，格式 'yyyy-mm-dd'
+
+        Returns:
+            包含清洗结果的字典
+        """
+        try:
+            # 1. 记录任务开始
+            self.progress_manager.update_progress(
+                task_id,
+                status="processing",
+                progress=10,
+                message="开始下载 Excel 文件"
+            )
+            logger.info(f"[{task_id}] 开始处理数据清洗任务")
+
+            # 2. 下载并解析 Excel（若路由层已下载则直接复用，避免重复请求）
+            if raw_data is None:
+                self.progress_manager.update_progress(
+                    task_id,
+                    status="processing",
+                    progress=20,
+                    message="正在解析 Excel 文件"
+                )
+                raw_data = await self.excel_handler.fetch_and_parse(excel_url)
+
+            logger.info(f"[{task_id}] 成功解析 Excel，数据行数: {len(raw_data)}")
+
+            # 3. 数据验证
+            self.progress_manager.update_progress(
+                task_id,
+                status="processing",
+                progress=30,
+                message="正在验证数据"
+            )
+
+            if not raw_data:
+                raise DataCleaningException("解析的 Excel 数据为空")
+
+            # 4. 执行数据清洗
+            self.progress_manager.update_progress(
+                task_id,
+                status="processing",
+                progress=50,
+                message="正在清洗数据"
+            )
+
+            cleaned_data = await self.data_cleaner.clean(
+                raw_data,
+                department,
+                progress_callback=lambda p, m, count=None: self.progress_manager.update_progress(
+                    task_id,
+                    status="processing",
+                    progress=int(50 + p * 0.4),  # 进度从50%到90%
+                    message=m,
+                    processed_count=count
+                ),
+                audit_date=audit_date
+            )
+
+            logger.info(f"[{task_id}] 数据清洗完成，清洗后数据行数: {len(cleaned_data)}")
+
+            # 5. 缓存清洗后的数据（写入前先清除过期条目）
+            self.progress_manager.update_progress(
+                task_id,
+                status="processing",
+                progress=90,
+                message="正在缓存清洗后的数据"
+            )
+
+            self._evict_expired_cache()
+            safe_data = _sanitize_nan(cleaned_data)
+            self.cleaned_data_cache[task_id] = {
+                'data': safe_data,
+                'department': department,
+                'created_at': datetime.now(),
+                'row_count': len(safe_data)
+            }
+            
+            # 6. 任务完成
+            self.progress_manager.update_progress(
+                task_id,
+                status="completed",
+                progress=100,
+                message="数据清洗完成，等待前端确认",
+                processed_count=len(cleaned_data)
+            )
+            
+            return {
+                'task_id': task_id,
+                'status': 'completed',
+                'message': '数据清洗成功',
+                'data_preview': cleaned_data[:5],  # 返回前5行用于预览
+                'total_rows': len(cleaned_data)
+            }
+            
+        except DataCleaningException as e:
+            logger.error(f"[{task_id}] 数据清洗业务异常: {str(e)}")
+            self.progress_manager.update_progress(
+                task_id,
+                status="failed",
+                progress=0,
+                message=f"清洗失败: {str(e)}"
+            )
+            raise
+        except Exception as e:
+            logger.error(f"[{task_id}] 数据清洗系统异常: {str(e)}", exc_info=True)
+            self.progress_manager.update_progress(
+                task_id,
+                status="failed",
+                progress=0,
+                message=f"系统异常: {str(e)}"
+            )
+            raise DataCleaningException(f"未知错误: {str(e)}")
+    
+    async def save_cleaned_data(
+        self,
+        task_id: str,
+        table_name: str
+    ) -> Dict[str, Any]:
+        """
+        将清洗后的数据保存到数据库
+        
+        Args:
+            task_id: 任务唯一标识
+            table_name: 目标表名
+            
+        Returns:
+            包含保存结果的字典
+        """
+        try:
+            logger.info(f"[{task_id}] 开始保存数据到数据库")
+            
+            # 验证数据是否存在（先清除过期条目）
+            self._evict_expired_cache()
+            if task_id not in self.cleaned_data_cache:
+                raise DatabaseException(f"任务 {task_id} 的清洗数据不存在或已过期（超过30分钟）")
+            
+            cleaned_data = self.cleaned_data_cache[task_id]['data']
+            
+            # 保存到数据库
+            affected_rows = await self.db_handler.insert_data(
+                table_name,
+                cleaned_data
+            )
+            
+            logger.info(f"[{task_id}] 成功保存 {affected_rows} 行数据到 {table_name}")
+            
+            # 清理缓存
+            del self.cleaned_data_cache[task_id]
+            
+            return {
+                'task_id': task_id,
+                'status': 'saved',
+                'message': '数据已成功保存到数据库',
+                'affected_rows': affected_rows
+            }
+            
+        except DatabaseException as e:
+            logger.error(f"[{task_id}] 数据库异常: {str(e)}")
+            raise
+        except Exception as e:
+            logger.error(f"[{task_id}] 保存数据时出错: {str(e)}", exc_info=True)
+            raise DatabaseException(f"保存失败: {str(e)}")
+
+    async def clean_fengkong_data(
+        self,
+        task_id: str,
+        team_url: Optional[str],
+        puling_url: Optional[str],
+        chengyu_url: Optional[str],
+        audit_date: Optional[str],
+    ) -> Dict[str, Any]:
+        """
+        风控稽查数据清洗：分别下载团队、浦零、诚予数据源，各自清洗后合并为一张大宽表，
+        结果存入内存缓存，不写本地文件。
+
+        Args:
+            task_id:     任务唯一标识
+            team_url:    团队数据表下载链接（可为 None）
+            puling_url:  浦零数据表下载链接（可为 None）
+            chengyu_url: 诚予数据表下载链接（可为 None）
+            audit_date:  稽查日期，格式 'yyyy-mm-dd'；为 None 时各模块自动取上月1号
+        """
+        from core_py.数据转换_团队 import (
+            transform as team_transform,
+            PRODUCT_GROUPS_JC,
+            STANDARD_COLUMNS,
+        )
+        from core_py.数据转换_诚予_浦零 import (
+            transform as pl_cy_transform,
+            PRODUCT_GROUPS,
+            PRODUCT_GROUPS_CY,
+        )
+
+        try:
+            self.progress_manager.update_progress(
+                task_id, status="processing", progress=5, message="开始风控稽查数据清洗"
+            )
+            logger.info(f"[{task_id}] 开始风控稽查数据清洗，audit_date={audit_date}")
+
+            all_records = []
+            progress_step = 0
+            source_count = sum(1 for u in [team_url, puling_url, chengyu_url] if u)
+            progress_per_source = int(80 / source_count) if source_count else 80
+
+            # ── 1. 团队数据 ──────────────────────────────────────────
+            if team_url:
+                progress_step += progress_per_source
+                self.progress_manager.update_progress(
+                    task_id, status="processing",
+                    progress=max(10, progress_step - progress_per_source + 10),
+                    message="正在下载团队数据表..."
+                )
+                raw_bytes = await self.excel_handler.fetch_bytes(team_url)
+                df_team = await asyncio.to_thread(
+                    pd.read_excel, BytesIO(raw_bytes), skiprows=1, header=None, dtype=str
+                )
+                self.progress_manager.update_progress(
+                    task_id, status="processing",
+                    progress=max(10, progress_step - progress_per_source // 2),
+                    message="正在清洗团队数据..."
+                )
+                records_team = await asyncio.to_thread(
+                    team_transform, df_team, "稽查团队", PRODUCT_GROUPS_JC, audit_date
+                )
+                all_records.extend(records_team)
+                logger.info(f"[{task_id}] 团队数据清洗完成，{len(records_team)} 条记录")
+
+            # ── 2. 浦零数据 ──────────────────────────────────────────
+            if puling_url:
+                progress_step += progress_per_source
+                self.progress_manager.update_progress(
+                    task_id, status="processing",
+                    progress=max(15, progress_step - progress_per_source + 10),
+                    message="正在下载浦零数据表..."
+                )
+                raw_bytes = await self.excel_handler.fetch_bytes(puling_url)
+                df_pl = await asyncio.to_thread(
+                    pd.read_excel, BytesIO(raw_bytes), header=2, dtype=str
+                )
+                self.progress_manager.update_progress(
+                    task_id, status="processing",
+                    progress=max(15, progress_step - progress_per_source // 2),
+                    message="正在清洗浦零数据..."
+                )
+                records_pl = await asyncio.to_thread(
+                    pl_cy_transform, df_pl, "浦零", PRODUCT_GROUPS, audit_date
+                )
+                all_records.extend(records_pl)
+                logger.info(f"[{task_id}] 浦零数据清洗完成，{len(records_pl)} 条记录")
+
+            # ── 3. 诚予数据 ──────────────────────────────────────────
+            if chengyu_url:
+                progress_step += progress_per_source
+                self.progress_manager.update_progress(
+                    task_id, status="processing",
+                    progress=max(20, progress_step - progress_per_source + 10),
+                    message="正在下载诚予数据表..."
+                )
+                raw_bytes = await self.excel_handler.fetch_bytes(chengyu_url)
+                df_cy = await asyncio.to_thread(
+                    pd.read_excel, BytesIO(raw_bytes), header=2, dtype=str
+                )
+                self.progress_manager.update_progress(
+                    task_id, status="processing",
+                    progress=max(20, progress_step - progress_per_source // 2),
+                    message="正在清洗诚予数据..."
+                )
+                records_cy = await asyncio.to_thread(
+                    pl_cy_transform, df_cy, "诚予", PRODUCT_GROUPS_CY, audit_date
+                )
+                all_records.extend(records_cy)
+                logger.info(f"[{task_id}] 诚予数据清洗完成，{len(records_cy)} 条记录")
+
+            # ── 4. 合并为大宽表（内存，不写文件） ──────────────────
+            self.progress_manager.update_progress(
+                task_id, status="processing", progress=90, message="正在合并数据宽表..."
+            )
+            df_merged = pd.DataFrame(all_records, columns=STANDARD_COLUMNS)
+            merged_records = _sanitize_nan(
+                df_merged.where(pd.notna(df_merged), None).to_dict(orient="records")
+            )
+            logger.info(f"[{task_id}] 大宽表合并完成，共 {len(merged_records)} 条记录")
+
+            # ── 5. 写入内存缓存 ──────────────────────────────────────
+            self._evict_expired_cache()
+            self.cleaned_data_cache[task_id] = {
+                "data": merged_records,
+                "department": "风控稽查数据清洗",
+                "created_at": datetime.now(),
+                "row_count": len(merged_records),
+            }
+
+            self.progress_manager.update_progress(
+                task_id, status="completed", progress=100,
+                message=f"风控稽查数据清洗完成，共 {len(merged_records)} 条记录，等待前端确认",
+                processed_count=len(merged_records)
+            )
+
+            return {
+                "task_id": task_id,
+                "status": "completed",
+                "message": "风控稽查数据清洗成功",
+                "data_preview": merged_records[:5],
+                "total_rows": len(merged_records),
+            }
+
+        except Exception as e:
+            logger.error(f"[{task_id}] 风控稽查数据清洗失败: {str(e)}", exc_info=True)
+            self.progress_manager.update_progress(
+                task_id, status="failed", progress=0,
+                message=f"清洗失败: {str(e)}"
+            )
+            raise
+
+
+# ==================== 初始化服务 ====================
+
+service = DataCleaningService()
+
+# ==================== API 路由 ====================
+@app.post("/api/v1/clean")
+async def start_cleaning(request: CleaningRequest, background_tasks: BackgroundTasks):
+    """
+    启动数据清洗任务
+
+    Returns: { code, msg, data: { task_id, status, estimated_completion_time, total_rows } }
+    """
+    try:
+        task_id = str(uuid.uuid4())
+        logger.info(f"创建新任务: {task_id}, 部门: {request.department}")
+
+        # ── 风控稽查数据清洗 专用分支 ──────────────────────────────
+        if request.department == "风控稽查数据清洗":
+            if not any([request.team_url, request.puling_url, request.chengyu_url]):
+                return fail_resp(BizCode.BAD_REQUEST, "风控稽查数据清洗至少需要提供一个数据源地址（team_url / puling_url / chengyu_url）")
+
+            # 从 year/month/day 构造稽查日期，未传则由清洗模块自动取上月1号
+            audit_date = None
+            if request.year and request.month and request.day:
+                audit_date = f"{request.year}-{request.month:02d}-{request.day:02d}"
+
+            estimated_rows = 1000
+            estimated_time = service.estimate_completion_time(estimated_rows)
+
+            # 提前写入 queued 进度，避免前端轮询时返回 404
+            service.progress_manager.update_progress(
+                task_id, status="queued", progress=0, message="任务已创建，等待处理"
+            )
+
+            background_tasks.add_task(
+                service.clean_fengkong_data,
+                task_id,
+                request.team_url,
+                request.puling_url,
+                request.chengyu_url,
+                audit_date,
+            )
+
+        # ── 普通清洗分支 ───────────────────────────────────────────
+        else:
+            if not validate_excel_url(request.excel_url):
+                return fail_resp(BizCode.BAD_REQUEST, "Excel URL 格式无效")
+
+            estimated_rows = 0
+            estimated_time = 5
+            prefetched_raw_data = None
+
+            try:
+                prefetched_raw_data = await service.excel_handler.fetch_and_parse(request.excel_url)
+                estimated_rows = len(prefetched_raw_data)
+                estimated_time = service.estimate_completion_time(estimated_rows)
+                logger.info(f"[{task_id}] 预估数据行数: {estimated_rows}, 预估完成时间: {estimated_time}秒")
+            except Exception as e:
+                logger.warning(f"[{task_id}] 预读 Excel 失败，后台任务将重新下载: {str(e)}")
+                estimated_rows = 1000
+                estimated_time = service.estimate_completion_time(estimated_rows)
+
+            background_tasks.add_task(
+                service.clean_data_from_url,
+                task_id,
+                request.excel_url,
+                request.department,
+                prefetched_raw_data,
+                request.audit_date,
+            )
+
+        return ok_resp(
+            data={
+                "task_id": task_id,
+                "status": "queued",
+                "estimated_completion_time": estimated_time,
+                "total_rows": estimated_rows,
+            },
+            msg="任务已创建，正在处理中..."
+        )
+
+    except Exception as e:
+        logger.error(f"启动清洗任务失败: {str(e)}")
+        return fail_resp(BizCode.SERVER_ERROR, f"启动任务失败: {str(e)}", http_status=500)
+
+
+@app.get("/api/v1/progress/{task_id}")
+async def get_progress(task_id: str):
+    """
+    获取数据清洗进度（HTTP 轮询，建议前端每 500ms-1s 调用一次）
+
+    Returns: { code, msg, data: { task_id, status, progress, message, timestamp } }
+    """
+    try:
+        progress_data = service.progress_manager.get_progress(task_id)
+
+        if not progress_data:
+            return fail_resp(BizCode.NOT_FOUND, "任务不存在", http_status=404)
+
+        return ok_resp(data=progress_data)
+
+    except Exception as e:
+        logger.error(f"获取进度失败: {str(e)}")
+        return fail_resp(BizCode.SERVER_ERROR, "获取进度失败", http_status=500)
+
+
+@app.get("/api/v1/result/{task_id}")
+async def get_cleaning_result(task_id: str):
+    """
+    获取清洗结果及数据预览（任务完成后调用）
+
+    Returns: { code, msg, data: { task_id, status, data_preview, total_rows, department } }
+    """
+    try:
+        progress_data = service.progress_manager.get_progress(task_id)
+
+        if not progress_data:
+            return fail_resp(BizCode.NOT_FOUND, "任务不存在", http_status=404)
+
+        if progress_data['status'] == 'processing':
+            return fail_resp(BizCode.TASK_PROCESSING, "任务仍在处理中", http_status=202)
+
+        if progress_data['status'] == 'failed':
+            return fail_resp(BizCode.TASK_FAILED, progress_data['message'])
+
+        service._evict_expired_cache()
+        if task_id not in service.cleaned_data_cache:
+            return fail_resp(BizCode.NOT_FOUND, "清洗数据不存在或已过期（超过30分钟）", http_status=404)
+
+        cached = service.cleaned_data_cache[task_id]
+
+        return ok_resp(
+            data={
+                "task_id": task_id,
+                "status": "ready_to_save",
+                "data_preview": cached['data'][:10],
+                "total_rows": cached['row_count'],
+                "department": cached['department']
+            },
+            msg="数据清洗完成，可进行保存"
+        )
+
+    except Exception as e:
+        logger.error(f"获取清洗结果失败: {str(e)}")
+        return fail_resp(BizCode.SERVER_ERROR, "获取结果失败", http_status=500)
+
+
+@app.post("/api/v1/save")
+async def save_cleaned_data(request: SavingRequest):
+    """
+    保存清洗后的数据到 MySQL 数据库（前端确认数据无误后调用）
+
+    Returns: { code, msg, data: { task_id, status, affected_rows } }
+    """
+    try:
+        if not request.task_id or not request.table_name:
+            return fail_resp(BizCode.BAD_REQUEST, "参数不完整：task_id 和 table_name 均为必填")
+
+        result = await service.save_cleaned_data(request.task_id, request.table_name)
+
+        return ok_resp(data=result, msg="数据已成功保存到数据库")
+
+    except DatabaseException as e:
+        logger.error(f"保存数据失败: {str(e)}")
+        return fail_resp(BizCode.DB_ERROR, str(e), http_status=500)
+    except Exception as e:
+        logger.error(f"保存数据时发生错误: {str(e)}")
+        return fail_resp(BizCode.SERVER_ERROR, f"保存失败: {str(e)}", http_status=500)
+
+
+@app.get("/api/v1/health")
+async def health_check():
+    """健康检查接口"""
+    return ok_resp(
+        data={"service": "数据清洗系统", "timestamp": str(datetime.now())},
+        msg="healthy"
+    )
+
+
+@app.get("/")
+async def root():
+    """根路由 - API 欢迎信息"""
+    return ok_resp(
+        data={"version": "1.0.0", "docs": "/docs", "redoc": "/redoc"},
+        msg="欢迎使用数据清洗系统"
+    )
+
+# ==================== 异常处理 ====================
+
+@app.exception_handler(DataCleaningException)
+async def data_cleaning_exception_handler(request, exc):
+    """处理数据清洗异常"""
+    logger.error(f"DataCleaningException: {str(exc)}")
+    return fail_resp(BizCode.TASK_FAILED, str(exc), http_status=400)
+
+@app.exception_handler(DatabaseException)
+async def database_exception_handler(request, exc):
+    """处理数据库异常"""
+    logger.error(f"DatabaseException: {str(exc)}")
+    return fail_resp(BizCode.DB_ERROR, str(exc), http_status=500)
+
+# ==================== 应用启动和关闭事件 ====================
+
+@app.on_event("startup")
+async def startup_event():
+    """应用启动时的初始化"""
+    logger.info("数据清洗系统启动")
+    try:
+        # 初始化数据库连接等
+        pass
+    except Exception as e:
+        logger.error(f"启动时出错: {str(e)}")
+
+@app.on_event("shutdown")
+async def shutdown_event():
+    """应用关闭时的清理"""
+    logger.info("数据清洗系统关闭")
+    try:
+        # 关闭数据库连接等
+        pass
+    except Exception as e:
+        logger.error(f"关闭时出错: {str(e)}")
+
+# ==================== 主程序入口 ====================
+
+if __name__ == "__main__":
+    import uvicorn
+    
+    # 运行 Uvicorn 服务器
+    uvicorn.run(
+        app,
+        host="0.0.0.0",
+        port=8000,
+        log_level="info",
+        reload=True  # 开发环境下启用热重载
+    )
--- a/requirements.txt
+++ b/requirements.txt
+fastapi==0.104.1
+uvicorn==0.24.0
+python-multipart==0.0.6
+openpyxl==3.1.5
+requests==2.31.0
+aiohttp==3.9.1
+mysql-connector-python==8.2.0
+pydantic==2.4.2
+python-dotenv==1.0.0
+pandas>=2.0.0
+python-dateutil>=2.8.2
\ No newline at end of file
--- a/risk_audit_visit.sql
+++ b/risk_audit_visit.sql
+/*
+Navicat MySQL Data Transfer
+
+Source Server         : t100_dev
+Source Server Version : 50744
+Source Host           : 192.168.100.39:25301
+Source Database       : market_bi
+
+Target Server Type    : MYSQL
+Target Server Version : 50744
+File Encoding         : 65001
+
+Date: 2026-03-09 18:13:42
+*/
+
+SET FOREIGN_KEY_CHECKS=0;
+
+-- ----------------------------
+-- Table structure for risk_audit_visit
+-- ----------------------------
+DROP TABLE IF EXISTS `risk_audit_visit`;
+CREATE TABLE `risk_audit_visit` (
+  `rav_id` int(11) NOT NULL AUTO_INCREMENT COMMENT '主键',
+  `audit_date` date DEFAULT NULL COMMENT '稽查日期',
+  `source` varchar(20) DEFAULT NULL COMMENT '稽查来源',
+  `region_name` varchar(20) DEFAULT NULL COMMENT '大区',
+  `district_name` varchar(20) DEFAULT NULL COMMENT '战区',
+  `dealer_code` varchar(10) DEFAULT NULL COMMENT '经销商编码',
+  `dealer_name` varchar(100) DEFAULT NULL COMMENT '经销商名称',
+  `store_code` varchar(20) DEFAULT NULL COMMENT '门店编码',
+  `store_name` varchar(100) DEFAULT NULL COMMENT '勤策门店',
+  `f_emp_no` varchar(20) DEFAULT NULL COMMENT '客户经理工号',
+  `f_emp_name` varchar(100) DEFAULT NULL COMMENT '客户经理名称',
+  `qin_ce_type_large` varchar(20) DEFAULT NULL COMMENT '勤策渠道大类',
+  `jh_channel_type` varchar(20) DEFAULT NULL COMMENT '稽查渠道类型',
+  `city` varchar(30) DEFAULT NULL COMMENT '城市',
+  `channel_type` varchar(30) DEFAULT NULL COMMENT '渠道类型（稽查源提供）',
+  `series` varchar(20) DEFAULT NULL COMMENT '产品系列',
+  `taste` varchar(20) DEFAULT NULL COMMENT '产品口味',
+  `weight` varchar(20) DEFAULT NULL COMMENT '产品克重',
+  `price` decimal(10,2) DEFAULT NULL COMMENT '产品价格',
+  `low_price` varchar(20) DEFAULT NULL COMMENT '是否低价：低价，正常',
+  `low_price_diff` decimal(10,2) DEFAULT NULL COMMENT '价差',
+  `low_price_status` varchar(20) DEFAULT NULL COMMENT '低价整改状态',
+  `low_price_rectify` varchar(100) DEFAULT NULL COMMENT '低价整改说明',
+  `production_month` date DEFAULT NULL COMMENT '产品生产月份',
+  `near_month_num` int(11) DEFAULT NULL COMMENT '临期月份数',
+  `near_month_status` varchar(20) DEFAULT NULL COMMENT '临期状态',
+  `fresh_status` varchar(20) DEFAULT NULL COMMENT '新鲜度',
+  `large_date_status` varchar(20) DEFAULT NULL COMMENT '大日期整改状态',
+  `large_date_rectify` varchar(100) DEFAULT NULL COMMENT '大日期整改说明',
+  PRIMARY KEY (`rav_id`),
+  KEY `audit` (`audit_date`),
+  KEY `dealer` (`dealer_code`,`dealer_name`),
+  KEY `product_index` (`series`,`taste`,`weight`),
+  KEY `regiondistrict` (`region_name`,`district_name`),
+  KEY `type_small` (`jh_channel_type`),
+  KEY `weight_index` (`weight`)
+) ENGINE=InnoDB AUTO_INCREMENT=493621 DEFAULT CHARSET=utf8mb4 COMMENT='稽查走访价格大日期表';
--- a/test_api.py
+++ b/test_api.py
+"""
+API 测试脚本
+用于快速测试 API 的各个端点
+"""
+
+import asyncio
+import httpx
+import json
+from datetime import datetime
+
+BASE_URL = "http://localhost:8000"
+
+class APITester:
+    """API 测试类"""
+    
+    def __init__(self, base_url: str = BASE_URL):
+        self.base_url = base_url
+        self.task_id: str = None
+    
+    async def test_health_check(self):
+        """测试健康检查接口"""
+        print("\n" + "="*50)
+        print("测试：健康检查接口")
+        print("="*50)
+        
+        try:
+            async with httpx.AsyncClient() as client:
+                response = await client.get(f"{self.base_url}/api/v1/health")
+                print(f"状态码: {response.status_code}")
+                print(f"响应: {json.dumps(response.json(), indent=2, ensure_ascii=False)}")
+        except Exception as e:
+            print(f"错误: {str(e)}")
+    
+    async def test_start_cleaning(self):
+        """测试启动清洗任务接口"""
+        print("\n" + "="*50)
+        print("测试：启动数据清洗任务")
+        print("="*50)
+        
+        payload = {
+            "excel_url": "https://example.com/test_data.xlsx",
+            "department": "sales",
+            "description": "测试数据清洗"
+        }
+        
+        try:
+            async with httpx.AsyncClient() as client:
+                response = await client.post(
+                    f"{self.base_url}/api/v1/clean",
+                    json=payload
+                )
+                print(f"状态码: {response.status_code}")
+                data = response.json()
+                print(f"响应: {json.dumps(data, indent=2, ensure_ascii=False)}")
+                
+                if response.status_code == 200:
+                    self.task_id = data.get('task_id')
+                    print(f"\n✓ 任务创建成功，Task ID: {self.task_id}")
+        except Exception as e:
+            print(f"错误: {str(e)}")
+    
+    async def test_get_progress(self):
+        """测试获取进度接口"""
+        if not self.task_id:
+            print("跳过：需要先创建任务")
+            return
+        
+        print("\n" + "="*50)
+        print("测试：获取数据清洗进度")
+        print("="*50)
+        
+        try:
+            async with httpx.AsyncClient() as client:
+                response = await client.get(
+                    f"{self.base_url}/api/v1/progress/{self.task_id}"
+                )
+                print(f"状态码: {response.status_code}")
+                print(f"响应: {json.dumps(response.json(), indent=2, ensure_ascii=False, default=str)}")
+        except Exception as e:
+            print(f"错误: {str(e)}")
+    
+    async def test_get_result(self):
+        """测试获取清洗结果接口"""
+        if not self.task_id:
+            print("跳过：需要先创建任务")
+            return
+        
+        print("\n" + "="*50)
+        print("测试：获取清洗结果")
+        print("="*50)
+        
+        try:
+            async with httpx.AsyncClient() as client:
+                response = await client.get(
+                    f"{self.base_url}/api/v1/result/{self.task_id}"
+                )
+                print(f"状态码: {response.status_code}")
+                data = response.json()
+                print(f"响应: {json.dumps(data, indent=2, ensure_ascii=False, default=str)}")
+        except Exception as e:
+            print(f"错误: {str(e)}")
+    
+    async def test_save_data(self):
+        """测试保存数据接口"""
+        if not self.task_id:
+            print("跳过：需要先创建任务")
+            return
+        
+        print("\n" + "="*50)
+        print("测试：保存清洗后的数据")
+        print("="*50)
+        
+        payload = {
+            "task_id": self.task_id,
+            "table_name": "sales_data"
+        }
+        
+        try:
+            async with httpx.AsyncClient() as client:
+                response = await client.post(
+                    f"{self.base_url}/api/v1/save",
+                    json=payload
+                )
+                print(f"状态码: {response.status_code}")
+                print(f"响应: {json.dumps(response.json(), indent=2, ensure_ascii=False)}")
+        except Exception as e:
+            print(f"错误: {str(e)}")
+    
+    async def run_all_tests(self):
+        """运行所有测试"""
+        print("\n")
+        print("╔" + "="*48 + "╗")
+        print("║" + " "*10 + "数据清洗系统 API 测试" + " "*16 + "║")
+        print("║" + f" "*10 + f"时间: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}" + " "*15 + "║")
+        print("╚" + "="*48 + "╝")
+        
+        await self.test_health_check()
+        await asyncio.sleep(1)
+        
+        await self.test_start_cleaning()
+        await asyncio.sleep(2)
+        
+        await self.test_get_progress()
+        await asyncio.sleep(1)
+        
+        await self.test_get_result()
+        await asyncio.sleep(1)
+        
+        print("\n" + "="*50)
+        print("所有测试完成！")
+        print("="*50 + "\n")
+
+async def main():
+    """主函数"""
+    tester = APITester()
+    await tester.run_all_tests()
+
+if __name__ == "__main__":
+    print("\n提示：确保 FastAPI 服务已在 http://localhost:8000 运行中\n")
+    asyncio.run(main())
--- a/utils/__init__.py
+++ b/utils/__init__.py
+"""Utils 工具模块"""
+
+from utils.response import BizCode, ApiResponse, ok_resp, fail_resp
+
+__all__ = ["BizCode", "ApiResponse", "ok_resp", "fail_resp"]
--- a/utils/exceptions.py
+++ b/utils/exceptions.py
+"""
+异常定义模块
+"""
+
+class DataCleaningException(Exception):
+    """数据清洗异常"""
+    pass
+
+class DatabaseException(Exception):
+    """数据库异常"""
+    pass
+
+class ExcelParsingException(Exception):
+    """Excel 解析异常"""
+    pass
+
+class ValidationException(Exception):
+    """验证异常"""
+    pass
--- a/utils/response.py
+++ b/utils/response.py
+"""
+统一响应格式封装模块
+
+所有接口统一返回: { code: 业务状态码, msg: 消息, data: 数据 }
+"""
+
+from enum import IntEnum
+from typing import Any
+from fastapi.responses import JSONResponse
+from pydantic import BaseModel
+
+
+class BizCode(IntEnum):
+    """业务逻辑状态码"""
+    SUCCESS         = 200   # 通用成功
+    TASK_QUEUED     = 201   # 任务已入队（异步场景）
+    TASK_PROCESSING = 202   # 任务处理中
+    BAD_REQUEST     = 400   # 请求参数错误
+    NOT_FOUND       = 404   # 资源不存在
+    TASK_FAILED     = 422   # 任务执行失败（业务层）
+    SERVER_ERROR    = 500   # 服务器内部错误
+    DB_ERROR        = 501   # 数据库错误
+    EXCEL_ERROR     = 502   # Excel 解析错误
+
+
+class ApiResponse(BaseModel):
+    """统一 API 响应体"""
+    code: int
+    msg: str
+    data: Any = None
+
+
+def ok_resp(data: Any = None, msg: str = "success") -> JSONResponse:
+    """返回成功的 JSONResponse（HTTP 200）"""
+    return JSONResponse(
+        status_code=200,
+        content=ApiResponse(code=BizCode.SUCCESS, msg=msg, data=data).model_dump()
+    )
+
+
+def fail_resp(
+    biz_code: BizCode,
+    msg: str,
+    http_status: int = 400,
+    data: Any = None
+) -> JSONResponse:
+    """返回失败的 JSONResponse"""
+    return JSONResponse(
+        status_code=http_status,
+        content=ApiResponse(code=biz_code, msg=msg, data=data).model_dump()
+    )
--- a/utils/validators.py
+++ b/utils/validators.py
+"""
+数据验证模块
+"""
+
+import re
+import logging
+
+logger = logging.getLogger(__name__)
+
+def validate_excel_url(url: str) -> bool:
+    """
+    验证 Excel URL 的有效性
+    
+    Args:
+        url: URL 字符串
+        
+    Returns:
+        bool: 是否为有效的 Excel URL
+    """
+    if not url or not isinstance(url, str):
+        return False
+    
+    # 检查 URL 格式
+    url_pattern = r'^https?://.*\.(xlsx|xls|csv)$'
+    
+    if not re.match(url_pattern, url, re.IGNORECASE):
+        logger.warning(f"URL 格式无效: {url}")
+        return False
+    
+    return True
+
+def sanitize_filename(filename: str) -> str:
+    """
+    清理文件名，移除不安全的字符
+    
+    Args:
+        filename: 原始文件名
+        
+    Returns:
+        str: 清理后的文件名
+    """
+    # 移除不安全字符
+    sanitized = re.sub(r'[<>:"/\\|?*]', '', filename)
+    return sanitized[:255]  # 限制长度
+
+def validate_table_name(table_name: str) -> bool:
+    """
+    验证数据库表名的有效性
+    
+    Args:
+        table_name: 表名
+        
+    Returns:
+        bool: 是否为有效的表名
+    """
+    if not table_name or not isinstance(table_name, str):
+        return False
+    
+    # MySQL 表名规则：以字母、数字或下划线开头，不包含特殊字符
+    table_name_pattern = r'^[a-zA-Z_][a-zA-Z0-9_]{0,63}$'
+    
+    if not re.match(table_name_pattern, table_name):
+        logger.warning(f"表名格式无效: {table_name}")
+        return False
+    
+    return True