feat: 完善代理重试机制,添加数据验证告警,新增README文档
This commit is contained in:
396
README.md
Normal file
396
README.md
Normal file
@@ -0,0 +1,396 @@
|
||||
# 百家号数据采集与分析系统
|
||||
|
||||
## 项目简介
|
||||
|
||||
本项目是一个面向百家号平台的自动化数据采集、分析与监控系统,支持多账号管理、定时数据同步、数据验证和短信告警等功能。
|
||||
|
||||
## 核心功能
|
||||
|
||||
### 1. 数据采集
|
||||
- **Cookie管理**:通过mitmproxy自动捕获账号Cookie,支持批量同步至数据库
|
||||
- **文章抓取**:抓取百家号文章数据,包括标题、内容、发布时间等
|
||||
- **统计数据获取**:获取发文统计(曝光量、阅读量、点击率)和收入数据
|
||||
|
||||
### 2. 数据分析
|
||||
- **多维度统计**:按日/周/月维度生成统计报表
|
||||
- **环比计算**:自动计算周环比、月环比增长率
|
||||
- **数据导出**:支持导出为CSV格式,便于数据分析
|
||||
|
||||
### 3. 数据同步
|
||||
- **守护进程**:systemd服务,定时自动同步数据
|
||||
- **批量导入**:支持历史数据批量导入
|
||||
- **增量更新**:支持指定日期的增量数据更新
|
||||
|
||||
### 4. 数据验证与监控
|
||||
- **数据一致性验证**:校验JSON/CSV/数据库三个数据源的一致性
|
||||
- **短信告警**:集成阿里云短信服务,数据异常时自动发送告警(错误代码2222)
|
||||
- **验证报告**:生成详细的验证报告,支持保存到专门目录
|
||||
|
||||
### 5. 代理管理
|
||||
- **天启代理集成**:支持HTTP代理,避免IP限制
|
||||
- **智能重试机制**:
|
||||
- 同一代理最多尝试3次
|
||||
- 超时/连接错误立即更换代理
|
||||
- 最多更换3次代理(共尝试4个不同代理)
|
||||
- **错误处理**:自动识别errno=10000015(异常请求),立即更换代理
|
||||
|
||||
## 技术栈
|
||||
|
||||
- **Python 3.8+**
|
||||
- **数据库**:MySQL 8.0+ (pymysql)
|
||||
- **网络请求**:requests, urllib3
|
||||
- **抓包工具**:mitmproxy 10.0+
|
||||
- **定时任务**:schedule
|
||||
- **短信服务**:阿里云短信SDK (alibabacloud_dysmsapi20170525)
|
||||
|
||||
## 项目结构
|
||||
|
||||
```
|
||||
xhh_baijiahao/
|
||||
├── db/ # 数据库SQL脚本
|
||||
│ ├── ai_articles.sql # 文章表
|
||||
│ ├── ai_authors.sql # 作者表
|
||||
│ ├── ai_statistics_days.sql # 日统计表
|
||||
│ ├── ai_statistics_weekly.sql # 周统计表
|
||||
│ └── ai_statistics_monthly.sql # 月统计表
|
||||
│
|
||||
├── ai_sms/ # 阿里云短信服务
|
||||
│ └── ai_sms/ # 短信SDK示例代码
|
||||
│
|
||||
├── 核心模块
|
||||
├── bjh_analytics.py # 百家号数据分析API(主要)
|
||||
├── bjh_analytics_date.py # 指定日期数据抓取
|
||||
├── bjh_articles_crawler.py # 文章爬虫
|
||||
├── export_to_csv.py # 数据导出CSV
|
||||
├── import_csv_to_database.py # CSV导入数据库
|
||||
│
|
||||
├── Cookie管理
|
||||
├── mitmproxy_capture.py # mitmproxy Cookie捕获
|
||||
├── 一键捕获Cookie.py # 快速Cookie捕获工具
|
||||
├── sync_cookies_to_db.py # 批量Cookie同步
|
||||
├── add_single_cookie_to_db.py # 单账号Cookie导入
|
||||
├── add_account_from_cookie.py # 从Cookie添加账号
|
||||
│
|
||||
├── 守护进程与定时任务
|
||||
├── data_sync_daemon.py # 数据同步守护进程(主要)
|
||||
├── bjh_data_daemon.py # 备用守护进程
|
||||
├── bjh_daemon.service # systemd服务配置
|
||||
├── deploy_daemon.sh # 守护进程部署脚本
|
||||
├── install_service.sh # 服务安装脚本
|
||||
├── diagnose_service.sh # 服务诊断脚本
|
||||
│
|
||||
├── 数据验证与告警
|
||||
├── data_validation.py # 数据验证核心
|
||||
├── data_validation_with_sms.py # 数据验证+短信告警
|
||||
├── test_validation_sms.sh # Linux测试脚本
|
||||
├── test_validation_sms.bat # Windows测试脚本
|
||||
│
|
||||
├── 批量任务
|
||||
├── batch_import_history.py # 历史数据批量导入
|
||||
├── fetch_date_statistics.py # 指定日期统计获取
|
||||
├── update_day_revenue.py # 日收益更新
|
||||
│
|
||||
├── 配置文件
|
||||
├── database_config.py # 数据库配置
|
||||
├── log_config.py # 日志配置
|
||||
├── sms_config.json # 短信服务配置
|
||||
├── requirements.txt # Python依赖
|
||||
│
|
||||
└── 快捷脚本
|
||||
├── 一键捕获Cookie.bat # Windows一键Cookie捕获
|
||||
├── 启动数据同步守护进程.bat # Windows启动守护进程
|
||||
└── 抓取百家号文章.bat # Windows文章抓取
|
||||
```
|
||||
|
||||
## 快速开始
|
||||
|
||||
### 1. 安装依赖
|
||||
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
核心依赖:
|
||||
- `requests>=2.31.0`
|
||||
- `pymysql>=1.1.0`
|
||||
- `mitmproxy>=10.0.0`
|
||||
- `schedule>=1.2.0`
|
||||
- `python-dateutil>=2.8.0`
|
||||
|
||||
### 2. 配置数据库
|
||||
|
||||
编辑 `database_config.py`,配置MySQL连接信息:
|
||||
|
||||
```python
|
||||
DB_CONFIG = {
|
||||
'host': 'your_host',
|
||||
'port': 3306,
|
||||
'user': 'your_user',
|
||||
'password': 'your_password',
|
||||
'database': 'ai_article',
|
||||
'charset': 'utf8mb4'
|
||||
}
|
||||
```
|
||||
|
||||
### 3. 初始化数据库
|
||||
|
||||
执行 `db/` 目录下的SQL脚本创建表:
|
||||
|
||||
```bash
|
||||
mysql -u root -p ai_article < db/ai_authors.sql
|
||||
mysql -u root -p ai_article < db/ai_articles.sql
|
||||
mysql -u root -p ai_article < db/ai_statistics_days.sql
|
||||
mysql -u root -p ai_article < db/ai_statistics_weekly.sql
|
||||
mysql -u root -p ai_article < db/ai_statistics_monthly.sql
|
||||
```
|
||||
|
||||
### 4. 捕获Cookie
|
||||
|
||||
#### Windows:
|
||||
```bash
|
||||
一键捕获Cookie.bat
|
||||
```
|
||||
|
||||
#### Linux:
|
||||
```bash
|
||||
python3 mitmproxy_capture.py
|
||||
```
|
||||
|
||||
### 5. 同步Cookie到数据库
|
||||
|
||||
```bash
|
||||
python3 sync_cookies_to_db.py
|
||||
```
|
||||
|
||||
### 6. 启动数据同步守护进程
|
||||
|
||||
#### Linux (推荐使用systemd):
|
||||
```bash
|
||||
# 部署服务
|
||||
sudo bash deploy_daemon.sh
|
||||
|
||||
# 启动服务
|
||||
sudo systemctl start bjh_daemon
|
||||
|
||||
# 查看状态
|
||||
sudo systemctl status bjh_daemon
|
||||
|
||||
# 查看日志
|
||||
journalctl -u bjh_daemon -f
|
||||
```
|
||||
|
||||
#### Windows:
|
||||
```bash
|
||||
启动数据同步守护进程.bat
|
||||
```
|
||||
|
||||
#### 手动运行:
|
||||
```bash
|
||||
python3 data_sync_daemon.py
|
||||
```
|
||||
|
||||
## 主要功能使用
|
||||
|
||||
### 批量导入历史数据
|
||||
|
||||
```bash
|
||||
python3 batch_import_history.py
|
||||
```
|
||||
|
||||
支持交互式选择:
|
||||
- 账号选择(单个/多个/全部)
|
||||
- 日期范围设置
|
||||
- 是否使用代理
|
||||
- 数据库/文件来源选择
|
||||
|
||||
### 获取指定日期统计数据
|
||||
|
||||
```bash
|
||||
python3 fetch_date_statistics.py 2025-12-26
|
||||
```
|
||||
|
||||
### 导出数据为CSV
|
||||
|
||||
```bash
|
||||
python3 export_to_csv.py
|
||||
```
|
||||
|
||||
### 数据验证与短信告警
|
||||
|
||||
```bash
|
||||
# 执行验证
|
||||
python3 data_validation_with_sms.py
|
||||
|
||||
# 测试短信功能
|
||||
python3 data_validation_with_sms.py --test-sms
|
||||
```
|
||||
|
||||
验证报告保存在 `validation_reports/` 目录。
|
||||
|
||||
### 添加单个账号Cookie
|
||||
|
||||
```bash
|
||||
python3 add_single_cookie_to_db.py
|
||||
```
|
||||
|
||||
支持交互式输入:
|
||||
- Username / 昵称
|
||||
- App ID / 领域
|
||||
- Cookie (多种格式)
|
||||
|
||||
## 数据库表结构
|
||||
|
||||
### ai_authors - 作者表
|
||||
- `id`: 主键
|
||||
- `author_name`: 作者名称(使用username或nick)
|
||||
- `app_id`: 百家号app_id
|
||||
- `toutiao_cookie`: Cookie字符串
|
||||
- `channel`: 渠道(1=百家号)
|
||||
- `status`: 状态(active/inactive)
|
||||
|
||||
### ai_statistics_days - 日统计表
|
||||
- `author_id`: 作者ID
|
||||
- `stat_date`: 统计日期
|
||||
- `day_revenue`: 当日收益
|
||||
- `daily_published_count`: 当日发文量
|
||||
- `cumulative_published_count`: 累计发文量
|
||||
- 唯一键:`uk_author_stat_date(author_id, channel, stat_date)`
|
||||
|
||||
### ai_statistics_weekly - 周统计表
|
||||
- `author_id`: 作者ID
|
||||
- `stat_weekly`: 周一日期(自然周)
|
||||
- `weekly_revenue`: 当周收益(从日数据汇总)
|
||||
- `revenue_wow_growth_rate`: 周环比增长率
|
||||
|
||||
### ai_statistics_monthly - 月统计表
|
||||
- `author_id`: 作者ID
|
||||
- `stat_monthly`: 每月1日日期
|
||||
- `monthly_revenue`: 当月收益(从日数据汇总)
|
||||
- `revenue_mom_growth_rate`: 月环比增长率
|
||||
|
||||
## 代理配置
|
||||
|
||||
项目支持天启代理,API配置在代码中:
|
||||
|
||||
```python
|
||||
PROXY_API = "http://api.tianqiip.com/getip?secret=xxx&num=1&type=txt&port=1&mr=1&sign=xxx"
|
||||
```
|
||||
|
||||
代理特性:
|
||||
- IP白名单认证,无需账号密码
|
||||
- 返回格式:纯文本 `IP:端口`
|
||||
- 智能重试:超时/连接错误立即更换代理
|
||||
- 双重限制:同一代理最多3次,最多更换3次代理
|
||||
|
||||
## 短信告警配置
|
||||
|
||||
编辑 `sms_config.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"access_key_id": "your_access_key_id",
|
||||
"access_key_secret": "your_access_key_secret",
|
||||
"sign_name": "your_sign_name",
|
||||
"template_code": "SMS_486210104",
|
||||
"phone_numbers": "13621242430",
|
||||
"endpoint": "dysmsapi.aliyuncs.com"
|
||||
}
|
||||
```
|
||||
|
||||
## 守护进程配置
|
||||
|
||||
### systemd服务配置 (bjh_daemon.service)
|
||||
|
||||
```ini
|
||||
[Unit]
|
||||
Description=百家号数据同步守护进程(含数据验证与短信告警)
|
||||
After=network.target mysql.service
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=root
|
||||
WorkingDirectory=/root/xhh_baijiahao
|
||||
ExecStart=/usr/bin/python3 data_sync_daemon.py
|
||||
Restart=always
|
||||
|
||||
Environment="LOAD_FROM_DB=true"
|
||||
Environment="USE_PROXY=true"
|
||||
Environment="ENABLE_VALIDATION=true"
|
||||
Environment="NON_INTERACTIVE=true"
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
```
|
||||
|
||||
### 环境变量配置
|
||||
|
||||
- `LOAD_FROM_DB`: 是否从数据库加载Cookie (true/false)
|
||||
- `USE_PROXY`: 是否使用代理 (true/false)
|
||||
- `DAYS`: 抓取天数 (默认7)
|
||||
- `MAX_RETRIES`: 最大重试次数 (默认3)
|
||||
- `RUN_NOW`: 是否立即执行 (true/false)
|
||||
- `ENABLE_VALIDATION`: 是否启用验证 (true/false)
|
||||
- `NON_INTERACTIVE`: 非交互模式 (true/false)
|
||||
|
||||
## 日志管理
|
||||
|
||||
日志文件位置:
|
||||
- 守护进程:`logs/data_sync_daemon.log`
|
||||
- 数据库操作:`logs/database.log`
|
||||
- Cookie同步:`logs/cookie_sync.log`
|
||||
- 验证报告:`validation_reports/validation_report_YYYYMMDD_HHMMSS.txt`
|
||||
|
||||
查看实时日志:
|
||||
```bash
|
||||
tail -f logs/data_sync_daemon.log
|
||||
```
|
||||
|
||||
## 常见问题
|
||||
|
||||
### 1. Cookie失效
|
||||
- 症状:API返回 `errno=10000015`(异常请求)
|
||||
- 解决:重新捕获Cookie并同步到数据库
|
||||
|
||||
### 2. 代理超时
|
||||
- 症状:请求超时15秒
|
||||
- 解决:系统自动更换新代理,最多尝试4个不同代理
|
||||
|
||||
### 3. 数据验证失败
|
||||
- 症状:短信收到错误代码2222
|
||||
- 解决:查看 `validation_reports/` 中的详细报告
|
||||
|
||||
### 4. 守护进程停止
|
||||
- 诊断:`sudo bash diagnose_service.sh`
|
||||
- 重启:`sudo systemctl restart bjh_daemon`
|
||||
|
||||
## 开发说明
|
||||
|
||||
### 添加新账号
|
||||
1. 使用 `一键捕获Cookie.py` 捕获Cookie
|
||||
2. 运行 `sync_cookies_to_db.py` 同步到数据库
|
||||
3. 或使用 `add_single_cookie_to_db.py` 手动添加
|
||||
|
||||
### 修改统计维度
|
||||
- 日统计:修改 `ai_statistics_days` 表结构
|
||||
- 周统计:修改 `ai_statistics_weekly` 表结构
|
||||
- 月统计:修改 `ai_statistics_monthly` 表结构
|
||||
|
||||
### 自定义代理
|
||||
修改 `bjh_analytics.py` 和 `bjh_analytics_date.py` 中的代理获取逻辑:
|
||||
```python
|
||||
def fetch_proxy(self, force_new: bool = False):
|
||||
# 自定义代理获取逻辑
|
||||
pass
|
||||
```
|
||||
|
||||
## 贡献指南
|
||||
|
||||
欢迎提交Issue和Pull Request!
|
||||
|
||||
## 许可证
|
||||
|
||||
本项目仅供学习和研究使用。
|
||||
|
||||
## 联系方式
|
||||
|
||||
如有问题,请通过Issue反馈。
|
||||
440
add_single_cookie_to_db.py
Normal file
440
add_single_cookie_to_db.py
Normal file
@@ -0,0 +1,440 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
将单个账号的Cookie输入到MySQL数据库
|
||||
支持手动输入Cookie信息或从剪贴板粘贴
|
||||
"""
|
||||
|
||||
import json
|
||||
import sys
|
||||
import os
|
||||
from datetime import datetime
|
||||
from typing import Dict, Optional
|
||||
|
||||
# 导入统一的数据库管理器和日志配置
|
||||
from database_config import DatabaseManager, DB_CONFIG
|
||||
from log_config import setup_cookie_sync_logger
|
||||
|
||||
# 初始化日志记录器
|
||||
logger = setup_cookie_sync_logger()
|
||||
|
||||
# 设置UTF-8编码
|
||||
if sys.platform == 'win32':
|
||||
import io
|
||||
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')
|
||||
sys.stderr = io.TextIOWrapper(sys.stderr.buffer, encoding='utf-8')
|
||||
|
||||
|
||||
class SingleCookieToDB:
|
||||
"""单个Cookie同步到数据库"""
|
||||
|
||||
def __init__(self, db_config: Optional[Dict] = None):
|
||||
"""
|
||||
初始化数据库连接
|
||||
|
||||
Args:
|
||||
db_config: 数据库配置字典,默认使用database_config.DB_CONFIG
|
||||
"""
|
||||
self.script_dir = os.path.dirname(os.path.abspath(__file__))
|
||||
|
||||
# 使用统一的数据库管理器
|
||||
self.db_manager = DatabaseManager(db_config)
|
||||
self.db_config = self.db_manager.config
|
||||
|
||||
def connect_db(self) -> bool:
|
||||
"""连接数据库"""
|
||||
return self.db_manager.test_connection()
|
||||
|
||||
def close_db(self):
|
||||
"""关闭数据库连接"""
|
||||
print("[OK] 数据库操作完成")
|
||||
|
||||
def cookie_dict_to_string(self, cookies: Dict) -> str:
|
||||
"""
|
||||
将Cookie字典转换为字符串格式
|
||||
|
||||
Args:
|
||||
cookies: Cookie字典
|
||||
|
||||
Returns:
|
||||
Cookie字符串,格式: "key1=value1; key2=value2"
|
||||
"""
|
||||
return '; '.join([f"{k}={v}" for k, v in cookies.items()])
|
||||
|
||||
def cookie_string_to_dict(self, cookie_string: str) -> Dict:
|
||||
"""
|
||||
将Cookie字符串转换为字典格式
|
||||
|
||||
Args:
|
||||
cookie_string: Cookie字符串,格式: "key1=value1; key2=value2"
|
||||
|
||||
Returns:
|
||||
Cookie字典
|
||||
"""
|
||||
cookies = {}
|
||||
for item in cookie_string.split(';'):
|
||||
item = item.strip()
|
||||
if '=' in item:
|
||||
key, value = item.split('=', 1)
|
||||
cookies[key.strip()] = value.strip()
|
||||
return cookies
|
||||
|
||||
def find_author_by_name(self, author_name: str, channel: int = 1) -> Optional[Dict]:
|
||||
"""
|
||||
根据作者名称和渠道查找数据库记录
|
||||
|
||||
Args:
|
||||
author_name: 作者名称
|
||||
channel: 渠道(1=百家号,默认1)
|
||||
|
||||
Returns:
|
||||
作者记录字典,未找到返回None
|
||||
"""
|
||||
try:
|
||||
sql = "SELECT * FROM ai_authors WHERE author_name = %s AND channel = %s LIMIT 1"
|
||||
result = self.db_manager.execute_query(sql, (author_name, channel), fetch_one=True)
|
||||
return result
|
||||
except Exception as e:
|
||||
print(f"[X] 查询作者失败: {e}")
|
||||
return None
|
||||
|
||||
def update_author_cookie(self, author_id: int, cookie_string: str,
|
||||
app_id: Optional[str] = None) -> bool:
|
||||
"""
|
||||
更新作者的Cookie信息
|
||||
|
||||
Args:
|
||||
author_id: 作者ID
|
||||
cookie_string: Cookie字符串
|
||||
app_id: 百家号app_id(可选)
|
||||
|
||||
Returns:
|
||||
是否更新成功
|
||||
"""
|
||||
try:
|
||||
# 构建更新SQL
|
||||
update_fields = ["toutiao_cookie = %s", "updated_at = NOW()"]
|
||||
params = [cookie_string]
|
||||
|
||||
# 如果提供了app_id,也一并更新
|
||||
if app_id:
|
||||
update_fields.append("app_id = %s")
|
||||
params.append(app_id)
|
||||
|
||||
params.append(author_id)
|
||||
|
||||
sql = f"UPDATE ai_authors SET {', '.join(update_fields)} WHERE id = %s"
|
||||
self.db_manager.execute_update(sql, tuple(params))
|
||||
|
||||
logger.info(f"成功更新作者ID={author_id}的Cookie")
|
||||
return True
|
||||
except Exception as e:
|
||||
logger.error(f"更新Cookie失败: {e}", exc_info=True)
|
||||
print(f"[X] 更新Cookie失败: {e}")
|
||||
return False
|
||||
|
||||
def insert_new_author(self, author_name: str, cookie_string: str,
|
||||
app_id: Optional[str] = None, nick: Optional[str] = None,
|
||||
domain: Optional[str] = None) -> bool:
|
||||
"""
|
||||
插入新作者记录
|
||||
|
||||
Args:
|
||||
author_name: 作者名称(用于数据库author_name字段)
|
||||
cookie_string: Cookie字符串
|
||||
app_id: 百家号app_id
|
||||
nick: 昵称
|
||||
domain: 领域
|
||||
|
||||
Returns:
|
||||
是否插入成功
|
||||
"""
|
||||
try:
|
||||
# 构建插入SQL
|
||||
sql = """
|
||||
INSERT INTO ai_authors
|
||||
(author_name, app_id, app_token, department_id, department_name,
|
||||
department, toutiao_cookie, channel, status, created_at, updated_at)
|
||||
VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, NOW(), NOW())
|
||||
"""
|
||||
|
||||
# 参数
|
||||
params = (
|
||||
author_name,
|
||||
app_id or '',
|
||||
'', # app_token 暂时为空
|
||||
0, # department_id 默认0
|
||||
domain or '其它', # department_name 使用领域
|
||||
'', # department 暂时为空
|
||||
cookie_string,
|
||||
1, # channel: 1=baidu
|
||||
'active' # status
|
||||
)
|
||||
|
||||
self.db_manager.execute_update(sql, params)
|
||||
logger.info(f"成功创建新作者: {author_name}")
|
||||
return True
|
||||
except Exception as e:
|
||||
logger.error(f"插入新作者失败: {e}", exc_info=True)
|
||||
print(f"[X] 插入新作者失败: {e}")
|
||||
return False
|
||||
|
||||
def add_cookie(self, account_info: Dict, auto_create: bool = True) -> bool:
|
||||
"""
|
||||
添加单个账号的Cookie到数据库
|
||||
|
||||
Args:
|
||||
account_info: 账号信息字典,包含cookies、username、nick等字段
|
||||
auto_create: 当作者不存在时是否自动创建,默认True
|
||||
|
||||
Returns:
|
||||
是否添加成功
|
||||
"""
|
||||
# 提取Cookie信息
|
||||
cookies = account_info.get('cookies', {})
|
||||
if not cookies:
|
||||
print("[X] Cookie信息为空")
|
||||
return False
|
||||
|
||||
# 转换Cookie为字符串(如果是字典格式)
|
||||
if isinstance(cookies, dict):
|
||||
cookie_string = self.cookie_dict_to_string(cookies)
|
||||
else:
|
||||
cookie_string = str(cookies)
|
||||
|
||||
# 提取其他信息(使用username和nick作为author_name进行匹配)
|
||||
username = account_info.get('username', '').strip()
|
||||
nick = account_info.get('nick', '').strip()
|
||||
app_id = account_info.get('app_id', '').strip()
|
||||
domain = account_info.get('domain', '').strip()
|
||||
|
||||
# 验证username或nick至少有一个存在
|
||||
if not username and not nick:
|
||||
print("[X] username和nick至少需要提供一个")
|
||||
return False
|
||||
|
||||
print(f"\n账号信息:")
|
||||
print(f" Username: {username}")
|
||||
print(f" 昵称: {nick}")
|
||||
print(f" App ID: {app_id}")
|
||||
print(f" 领域: {domain}")
|
||||
|
||||
# 查找作者(使用双重匹配机制:先username,后nick)
|
||||
channel = 1 # 百家号固定为channel=1
|
||||
author = None
|
||||
matched_field = None
|
||||
|
||||
# 1. 首先尝试使用username匹配
|
||||
if username:
|
||||
author = self.find_author_by_name(username, channel)
|
||||
if author:
|
||||
matched_field = 'username'
|
||||
print(f"\n[√] 通过username匹配到作者: {author['author_name']} (ID: {author['id']}, Channel: {author['channel']})")
|
||||
|
||||
# 2. 如果username匹配失败,尝试使用nick匹配
|
||||
if not author and nick:
|
||||
author = self.find_author_by_name(nick, channel)
|
||||
if author:
|
||||
matched_field = 'nick'
|
||||
print(f"\n[√] 通过nick匹配到作者: {author['author_name']} (ID: {author['id']}, Channel: {author['channel']})")
|
||||
|
||||
# 3. 如果都没匹配到
|
||||
if not author:
|
||||
print(f"\n[!] 未找到匹配的作者(已尝试username和nick)")
|
||||
|
||||
# 更新或创建
|
||||
if author:
|
||||
# 更新现有记录
|
||||
print(f"\n正在更新作者Cookie...")
|
||||
success = self.update_author_cookie(
|
||||
author['id'],
|
||||
cookie_string,
|
||||
app_id if app_id else None
|
||||
)
|
||||
|
||||
if success:
|
||||
print(f"[OK] Cookie已更新(匹配字段: {matched_field})")
|
||||
return True
|
||||
else:
|
||||
print(f"[X] Cookie更新失败")
|
||||
return False
|
||||
else:
|
||||
# 作者不存在,考虑创建
|
||||
if auto_create:
|
||||
# 优先使用username,如果没有则使用nick
|
||||
author_name_to_create = username if username else nick
|
||||
print(f"\n正在创建新作者(author_name: {author_name_to_create})...")
|
||||
success = self.insert_new_author(
|
||||
author_name_to_create,
|
||||
cookie_string,
|
||||
app_id,
|
||||
nick,
|
||||
domain
|
||||
)
|
||||
|
||||
if success:
|
||||
print(f"[OK] 新作者已创建 (author_name: {author_name_to_create})")
|
||||
return True
|
||||
else:
|
||||
print(f"[X] 创建作者失败")
|
||||
return False
|
||||
else:
|
||||
print(f"[X] 作者不存在,且未开启自动创建")
|
||||
return False
|
||||
|
||||
def run_interactive(self):
|
||||
"""交互式运行模式"""
|
||||
print("\n" + "="*70)
|
||||
print("添加单个账号Cookie到数据库")
|
||||
print("="*70)
|
||||
|
||||
# 连接数据库
|
||||
if not self.connect_db():
|
||||
logger.error("数据库连接失败,退出")
|
||||
return
|
||||
|
||||
try:
|
||||
# 询问是否自动创建不存在的作者
|
||||
print("\n当作者不存在时是否自动创建?")
|
||||
auto_create_input = input("(y/n, 默认y): ").strip().lower()
|
||||
auto_create = auto_create_input != 'n'
|
||||
|
||||
# 输入账号信息
|
||||
print("\n" + "="*70)
|
||||
print("请输入账号信息:")
|
||||
print("="*70)
|
||||
|
||||
username = input("\n1. Username (用于匹配数据库author_name): ").strip()
|
||||
nick = input("2. 昵称 (备用匹配字段): ").strip()
|
||||
app_id = input("3. App ID (可选): ").strip()
|
||||
domain = input("4. 领域 (可选): ").strip()
|
||||
|
||||
# 输入Cookie
|
||||
print("\n" + "="*70)
|
||||
print("请输入Cookie信息:")
|
||||
print("提示: 可以输入以下任意格式:")
|
||||
print(" 1. Cookie字符串: key1=value1; key2=value2")
|
||||
print(" 2. JSON格式: {\"key1\": \"value1\", \"key2\": \"value2\"}")
|
||||
print(" 3. 多行输入,输入完成后输入 END 结束")
|
||||
print("="*70)
|
||||
|
||||
cookie_lines = []
|
||||
while True:
|
||||
line = input().strip()
|
||||
if line.upper() == 'END':
|
||||
break
|
||||
if line:
|
||||
cookie_lines.append(line)
|
||||
|
||||
cookie_input = ' '.join(cookie_lines)
|
||||
|
||||
# 解析Cookie
|
||||
cookies = {}
|
||||
if cookie_input.startswith('{'):
|
||||
# JSON格式
|
||||
try:
|
||||
cookies = json.loads(cookie_input)
|
||||
except json.JSONDecodeError:
|
||||
print("[X] Cookie JSON格式解析失败")
|
||||
return
|
||||
else:
|
||||
# 字符串格式
|
||||
cookies = self.cookie_string_to_dict(cookie_input)
|
||||
|
||||
if not cookies:
|
||||
print("[X] Cookie为空,操作取消")
|
||||
return
|
||||
|
||||
# 构建账号信息
|
||||
account_info = {
|
||||
'username': username,
|
||||
'nick': nick,
|
||||
'app_id': app_id,
|
||||
'domain': domain,
|
||||
'cookies': cookies
|
||||
}
|
||||
|
||||
# 确认信息
|
||||
print("\n" + "="*70)
|
||||
print("确认账号信息:")
|
||||
print("="*70)
|
||||
print(f" Username: {username}")
|
||||
print(f" 昵称: {nick}")
|
||||
print(f" App ID: {app_id}")
|
||||
print(f" 领域: {domain}")
|
||||
print(f" Cookie条目数: {len(cookies)}")
|
||||
print(f" 自动创建: {'是' if auto_create else '否'}")
|
||||
print("="*70)
|
||||
|
||||
confirm = input("\n确认添加到数据库?(y/n): ").strip().lower()
|
||||
if confirm != 'y':
|
||||
print("\n已取消")
|
||||
return
|
||||
|
||||
# 添加Cookie
|
||||
success = self.add_cookie(account_info, auto_create)
|
||||
|
||||
if success:
|
||||
print("\n" + "="*70)
|
||||
print("添加成功!")
|
||||
print("="*70)
|
||||
else:
|
||||
print("\n" + "="*70)
|
||||
print("添加失败,请查看错误信息")
|
||||
print("="*70)
|
||||
|
||||
finally:
|
||||
# 关闭数据库连接
|
||||
self.close_db()
|
||||
|
||||
|
||||
def main():
|
||||
"""主函数"""
|
||||
print("\n" + "="*70)
|
||||
print("单个账号Cookie同步工具")
|
||||
print("="*70)
|
||||
|
||||
# 使用默认配置还是自定义配置
|
||||
print("\n请选择数据库配置方式:")
|
||||
print(" 1. 使用默认配置 (8.149.233.36/ai_statistics_read)")
|
||||
print(" 2. 自定义配置")
|
||||
|
||||
choice = input("\n请选择 (1/2, 默认1): ").strip() or '1'
|
||||
|
||||
if choice == '2':
|
||||
# 自定义数据库配置
|
||||
print("\n请输入数据库连接信息:\n")
|
||||
|
||||
host = input("数据库地址: ").strip()
|
||||
port = input("端口 (默认: 3306): ").strip() or '3306'
|
||||
user = input("用户名: ").strip()
|
||||
password = input("密码: ").strip()
|
||||
database = input("数据库名: ").strip()
|
||||
|
||||
db_config = {
|
||||
'host': host,
|
||||
'port': int(port),
|
||||
'user': user,
|
||||
'password': password,
|
||||
'database': database,
|
||||
'charset': 'utf8mb4'
|
||||
}
|
||||
else:
|
||||
# 使用默认配置
|
||||
db_config = None
|
||||
print("\n使用默认数据库配置...")
|
||||
|
||||
# 创建同步器并执行
|
||||
syncer = SingleCookieToDB(db_config)
|
||||
|
||||
print("\n配置确认:")
|
||||
print(f" 数据库: {syncer.db_config['host']}:{syncer.db_config.get('port', 3306)}/{syncer.db_config['database']}")
|
||||
print(f" 用户: {syncer.db_config['user']}")
|
||||
print("="*70)
|
||||
|
||||
# 运行交互式模式
|
||||
syncer.run_interactive()
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
6
ai_sms/ai_sms/.env.example
Normal file
6
ai_sms/ai_sms/.env.example
Normal file
@@ -0,0 +1,6 @@
|
||||
# Alibaba Cloud Access Credentials
|
||||
# 请在阿里云控制台获取您的 AccessKey ID 和 AccessKey Secret
|
||||
# https://ram.console.aliyun.com/manage/ak
|
||||
|
||||
ALIBABA_CLOUD_ACCESS_KEY_ID=your_access_key_id_here
|
||||
ALIBABA_CLOUD_ACCESS_KEY_SECRET=your_access_key_secret_here
|
||||
5
ai_sms/ai_sms/.gitignore
vendored
Normal file
5
ai_sms/ai_sms/.gitignore
vendored
Normal file
@@ -0,0 +1,5 @@
|
||||
runtime/
|
||||
.idea/
|
||||
.vscode/
|
||||
__pycache__/
|
||||
.pytest_cache/
|
||||
57
ai_sms/ai_sms/README.md
Normal file
57
ai_sms/ai_sms/README.md
Normal file
@@ -0,0 +1,57 @@
|
||||
# 发送短信验证码完整工程示例
|
||||
|
||||
该项目为SendSmsVerifyCode的完整工程示例。
|
||||
|
||||
**工程代码建议使用更安全的无AK方式,凭据配置方式请参阅:[管理访问凭据](https://help.aliyun.com/zh/sdk/developer-reference/v2-manage-python-access-credentials)。**
|
||||
|
||||
## 运行条件
|
||||
|
||||
- 下载并解压需要语言的代码;
|
||||
|
||||
- *要求 Python >= 3.7*
|
||||
|
||||
## 执行步骤
|
||||
|
||||
完成凭据配置后,可以在**解压代码所在目录下**按如下的步骤执行:
|
||||
|
||||
- **创建并激活虚拟环境:**
|
||||
```sh
|
||||
python -m venv venv && source venv/bin/activate
|
||||
```
|
||||
|
||||
- **安装依赖:**
|
||||
```sh
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
- **运行代码**
|
||||
```sh
|
||||
python ./alibabacloud_sample/sample.py
|
||||
```
|
||||
|
||||
## 使用的 API
|
||||
|
||||
- SendSmsVerifyCode:发送短信验证码。 更多信息可参考:[文档](https://next.api.aliyun.com/document/Dypnsapi/2017-05-25/SendSmsVerifyCode)
|
||||
|
||||
## API 返回示例
|
||||
|
||||
*下列输出值仅作为参考,实际输出结构可能稍有不同,以实际调用为准。*
|
||||
|
||||
|
||||
- JSON 格式
|
||||
```js
|
||||
{
|
||||
"AccessDeniedDetail": "无",
|
||||
"Message": "成功 ",
|
||||
"RequestId": "CC3BB6D2-2FDF-4321-9DCE-B38165CE4C47",
|
||||
"Model": {
|
||||
"VerifyCode": "4232",
|
||||
"RequestId": "a3671ccf-0102-4c8e-8797-a3678e091d09",
|
||||
"OutId": "1231231313",
|
||||
"BizId": "112231421412414124123^4"
|
||||
},
|
||||
"Code": "OK",
|
||||
"Success": true
|
||||
}
|
||||
```
|
||||
|
||||
94
ai_sms/ai_sms/RUN_INSTRUCTIONS.md
Normal file
94
ai_sms/ai_sms/RUN_INSTRUCTIONS.md
Normal file
@@ -0,0 +1,94 @@
|
||||
# 阿里云短信验证码 API 运行说明
|
||||
|
||||
## 项目配置完成情况
|
||||
|
||||
✅ 依赖已安装: `alibabacloud_dypnsapi20170525==2.0.0`
|
||||
✅ 代码已配置: 手机号码 `13621242430`,验证码长度 4 位
|
||||
✅ Conda 环境: `douyin`
|
||||
|
||||
## 运行步骤
|
||||
|
||||
### 1. 配置阿里云访问凭据
|
||||
|
||||
您需要先获取阿里云的 AccessKey ID 和 AccessKey Secret:
|
||||
- 访问: https://ram.console.aliyun.com/manage/ak
|
||||
- 创建或查看您的 AccessKey
|
||||
|
||||
### 2. 设置环境变量 (PowerShell)
|
||||
|
||||
在运行程序前,需要设置环境变量:
|
||||
|
||||
```powershell
|
||||
$env:ALIBABA_CLOUD_ACCESS_KEY_ID="your_access_key_id"
|
||||
$env:ALIBABA_CLOUD_ACCESS_KEY_SECRET="your_access_key_secret"
|
||||
```
|
||||
|
||||
### 3. 运行程序
|
||||
|
||||
```powershell
|
||||
python ./alibabacloud_sample/sample.py
|
||||
```
|
||||
|
||||
## 完整运行示例 (PowerShell)
|
||||
|
||||
```powershell
|
||||
# 激活 conda 环境
|
||||
conda activate douyin
|
||||
|
||||
# 设置凭据
|
||||
$env:ALIBABA_CLOUD_ACCESS_KEY_ID="your_access_key_id"
|
||||
$env:ALIBABA_CLOUD_ACCESS_KEY_SECRET="your_access_key_secret"
|
||||
|
||||
# 运行程序
|
||||
python ./alibabacloud_sample/sample.py
|
||||
```
|
||||
|
||||
## 代码说明
|
||||
|
||||
### 当前配置参数
|
||||
|
||||
`sample.py` 已配置以下参数:
|
||||
|
||||
```python
|
||||
send_sms_verify_code_request = dypnsapi_20170525_models.SendSmsVerifyCodeRequest(
|
||||
phone_number='13621242430', # 接收短信的手机号码
|
||||
code_length=4, # 验证码长度 (1314 是 4 位)
|
||||
code_type=1 # 验证码类型: 1=数字
|
||||
)
|
||||
```
|
||||
|
||||
### API 返回示例
|
||||
|
||||
成功时返回:
|
||||
```json
|
||||
{
|
||||
"Code": "OK",
|
||||
"Success": true,
|
||||
"Message": "成功",
|
||||
"Model": {
|
||||
"VerifyCode": "1234",
|
||||
"BizId": "...",
|
||||
"RequestId": "..."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## 注意事项
|
||||
|
||||
1. **安全性**: 不要将 AccessKey 硬编码在代码中或提交到版本控制系统
|
||||
2. **权限**: 确保您的 AccessKey 有发送短信的权限
|
||||
3. **费用**: 发送短信会产生费用,请注意阿里云账户余额
|
||||
4. **签名和模板**: 某些地区可能需要配置短信签名和模板
|
||||
|
||||
## 故障排查
|
||||
|
||||
如果遇到错误:
|
||||
- 检查 AccessKey 是否正确
|
||||
- 检查账户余额是否充足
|
||||
- 检查手机号码格式是否正确
|
||||
- 查看错误信息中的诊断地址 (Recommend)
|
||||
|
||||
## API 文档
|
||||
|
||||
- SendSmsVerifyCode API: https://next.api.aliyun.com/document/Dypnsapi/2017-05-25/SendSmsVerifyCode
|
||||
- 凭据管理: https://help.aliyun.com/zh/sdk/developer-reference/v2-manage-python-access-credentials
|
||||
79
ai_sms/ai_sms/START_HERE.md
Normal file
79
ai_sms/ai_sms/START_HERE.md
Normal file
@@ -0,0 +1,79 @@
|
||||
# 开始使用 - 3步运行
|
||||
|
||||
## ⚠️ 您的错误原因
|
||||
|
||||
**环境变量未设置!** 错误信息显示:
|
||||
```
|
||||
Environment variable accessKeyId cannot be empty
|
||||
```
|
||||
|
||||
## ✅ 解决方案(选一个)
|
||||
|
||||
### 方案 1: 最简单 - 使用测试脚本
|
||||
|
||||
```powershell
|
||||
# 第1步: 测试凭据配置
|
||||
python test_credentials.py
|
||||
|
||||
# 第2步: 如果提示未设置,在 PowerShell 中运行:
|
||||
$env:ALIBABA_CLOUD_ACCESS_KEY_ID="您的AccessKey_ID"
|
||||
$env:ALIBABA_CLOUD_ACCESS_KEY_SECRET="您的AccessKey_Secret"
|
||||
|
||||
# 第3步: 再次测试
|
||||
python test_credentials.py
|
||||
|
||||
# 第4步: 测试通过后运行
|
||||
python ./alibabacloud_sample/sample.py
|
||||
```
|
||||
|
||||
### 方案 2: 一键运行(交互式)
|
||||
|
||||
```powershell
|
||||
python run_with_credentials.py
|
||||
# 脚本会提示您输入 AccessKey,然后自动运行
|
||||
```
|
||||
|
||||
### 方案 3: 手动设置后运行
|
||||
|
||||
```powershell
|
||||
# 设置环境变量
|
||||
$env:ALIBABA_CLOUD_ACCESS_KEY_ID="LTAI5t..."
|
||||
$env:ALIBABA_CLOUD_ACCESS_KEY_SECRET="your_secret_here"
|
||||
|
||||
# 直接运行
|
||||
python ./alibabacloud_sample/sample.py
|
||||
```
|
||||
|
||||
## 🔑 获取 AccessKey
|
||||
|
||||
https://ram.console.aliyun.com/manage/ak
|
||||
|
||||
## 🔧 已修复的问题
|
||||
|
||||
1. ✅ 代理超时问题(已禁用 ECS 元数据代理)
|
||||
2. ✅ 错误处理问题(已修复 AttributeError)
|
||||
3. ✅ 手机号已配置: 13621242430
|
||||
4. ✅ 验证码长度: 4位数字
|
||||
|
||||
## 📝 当前配置
|
||||
|
||||
- 手机号: `13621242430`
|
||||
- 验证码: 4位数字(API自动生成,不能指定为1314)
|
||||
- 环境: conda douyin
|
||||
- 依赖: 已安装
|
||||
|
||||
## ⚡ 快速测试流程
|
||||
|
||||
```powershell
|
||||
# 1. 测试凭据
|
||||
python test_credentials.py
|
||||
|
||||
# 2. 如果失败,设置环境变量
|
||||
$env:ALIBABA_CLOUD_ACCESS_KEY_ID="your_key"
|
||||
$env:ALIBABA_CLOUD_ACCESS_KEY_SECRET="your_secret"
|
||||
|
||||
# 3. 运行程序
|
||||
python ./alibabacloud_sample/sample.py
|
||||
```
|
||||
|
||||
**现在就试试 `python test_credentials.py`!**
|
||||
1
ai_sms/ai_sms/alibabacloud_sample/__init__.py
Normal file
1
ai_sms/ai_sms/alibabacloud_sample/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
__version__ = "1.0.0"
|
||||
73
ai_sms/ai_sms/alibabacloud_sample/abc.py
Normal file
73
ai_sms/ai_sms/alibabacloud_sample/abc.py
Normal file
@@ -0,0 +1,73 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
# This file is auto-generated, don't edit it. Thanks.
|
||||
import os
|
||||
import sys
|
||||
import json
|
||||
|
||||
from typing import List
|
||||
|
||||
from alibabacloud_dypnsapi20170525.client import Client as Dypnsapi20170525Client
|
||||
from alibabacloud_credentials.client import Client as CredentialClient
|
||||
from alibabacloud_tea_openapi import models as open_api_models
|
||||
from alibabacloud_dypnsapi20170525 import models as dypnsapi_20170525_models
|
||||
from alibabacloud_tea_util import models as util_models
|
||||
from alibabacloud_tea_util.client import Client as UtilClient
|
||||
|
||||
|
||||
class Sample:
|
||||
def __init__(self):
|
||||
pass
|
||||
|
||||
@staticmethod
|
||||
def create_client() -> Dypnsapi20170525Client:
|
||||
"""
|
||||
使用凭据初始化账号Client
|
||||
@return: Client
|
||||
@throws Exception
|
||||
"""
|
||||
# 工程代码建议使用更安全的无AK方式,凭据配置方式请参见:https://help.aliyun.com/document_detail/378659.html。
|
||||
credential = CredentialClient()
|
||||
config = open_api_models.Config(
|
||||
credential=credential
|
||||
)
|
||||
# Endpoint 请参考 https://api.aliyun.com/product/Dypnsapi
|
||||
config.endpoint = f'dypnsapi.aliyuncs.com'
|
||||
return Dypnsapi20170525Client(config)
|
||||
|
||||
@staticmethod
|
||||
def main(
|
||||
args: List[str],
|
||||
) -> None:
|
||||
client = Sample.create_client()
|
||||
send_sms_verify_code_request = dypnsapi_20170525_models.SendSmsVerifyCodeRequest()
|
||||
runtime = util_models.RuntimeOptions()
|
||||
try:
|
||||
resp = client.send_sms_verify_code_with_options(send_sms_verify_code_request, runtime)
|
||||
print(json.dumps(resp, default=str, indent=2))
|
||||
except Exception as error:
|
||||
# 此处仅做打印展示,请谨慎对待异常处理,在工程项目中切勿直接忽略异常。
|
||||
# 错误 message
|
||||
print(error.message)
|
||||
# 诊断地址
|
||||
print(error.data.get("Recommend"))
|
||||
|
||||
@staticmethod
|
||||
async def main_async(
|
||||
args: List[str],
|
||||
) -> None:
|
||||
client = Sample.create_client()
|
||||
send_sms_verify_code_request = dypnsapi_20170525_models.SendSmsVerifyCodeRequest()
|
||||
runtime = util_models.RuntimeOptions()
|
||||
try:
|
||||
resp = await client.send_sms_verify_code_with_options_async(send_sms_verify_code_request, runtime)
|
||||
print(json.dumps(resp, default=str, indent=2))
|
||||
except Exception as error:
|
||||
# 此处仅做打印展示,请谨慎对待异常处理,在工程项目中切勿直接忽略异常。
|
||||
# 错误 message
|
||||
print(error.message)
|
||||
# 诊断地址
|
||||
print(error.data.get("Recommend"))
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
Sample.main(sys.argv[1:])
|
||||
87
ai_sms/ai_sms/alibabacloud_sample/sample - 副本 (2).py
Normal file
87
ai_sms/ai_sms/alibabacloud_sample/sample - 副本 (2).py
Normal file
@@ -0,0 +1,87 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
# This file is auto-generated, don't edit it. Thanks.
|
||||
import os
|
||||
import sys
|
||||
import json
|
||||
|
||||
from typing import List
|
||||
|
||||
from alibabacloud_dypnsapi20170525.client import Client as Dypnsapi20170525Client
|
||||
from alibabacloud_credentials.client import Client as CredentialClient
|
||||
from alibabacloud_credentials.models import Config as CredentialConfig
|
||||
from alibabacloud_tea_openapi import models as open_api_models
|
||||
from alibabacloud_dypnsapi20170525 import models as dypnsapi_20170525_models
|
||||
from alibabacloud_tea_util import models as util_models
|
||||
from alibabacloud_tea_util.client import Client as UtilClient
|
||||
|
||||
|
||||
class Sample:
|
||||
def __init__(self):
|
||||
pass
|
||||
|
||||
@staticmethod
|
||||
def create_client() -> Dypnsapi20170525Client:
|
||||
"""
|
||||
使用凭据初始化账号Client
|
||||
@return: Client
|
||||
@throws Exception
|
||||
"""
|
||||
credential_config = CredentialConfig(
|
||||
type='access_key',
|
||||
access_key_id='LTAI5tSMvnCJdqkZtCVWgh8R',
|
||||
access_key_secret='nyFzXyIi47peVLK4wR2qqbPezmU79W'
|
||||
)
|
||||
credential = CredentialClient(credential_config)
|
||||
config = open_api_models.Config(
|
||||
credential=credential
|
||||
)
|
||||
# Endpoint 请参考 https://api.aliyun.com/product/Dypnsapi
|
||||
config.endpoint = f'dypnsapi.aliyuncs.com'
|
||||
return Dypnsapi20170525Client(config)
|
||||
|
||||
@staticmethod
|
||||
def main(
|
||||
args: List[str],
|
||||
) -> None:
|
||||
client = Sample.create_client()
|
||||
send_sms_verify_code_request = dypnsapi_20170525_models.SendSmsVerifyCodeRequest(
|
||||
phone_number='13621242430',
|
||||
sign_name='阿里云短信',
|
||||
template_code='SMS_474580174',
|
||||
template_param='{}',
|
||||
code_length=4,
|
||||
code_type=1
|
||||
)
|
||||
runtime = util_models.RuntimeOptions()
|
||||
try:
|
||||
resp = client.send_sms_verify_code_with_options(send_sms_verify_code_request, runtime)
|
||||
print(json.dumps(resp, default=str, indent=2))
|
||||
except Exception as error:
|
||||
# 此处仅做打印展示,请谨慎对待异常处理,在工程项目中切勿直接忽略异常。
|
||||
# 错误 message
|
||||
print(f"错误: {error}")
|
||||
# 诊断地址
|
||||
if hasattr(error, 'data') and error.data:
|
||||
print(f"诊断地址: {error.data.get('Recommend')}")
|
||||
raise
|
||||
|
||||
@staticmethod
|
||||
async def main_async(
|
||||
args: List[str],
|
||||
) -> None:
|
||||
client = Sample.create_client()
|
||||
send_sms_verify_code_request = dypnsapi_20170525_models.SendSmsVerifyCodeRequest()
|
||||
runtime = util_models.RuntimeOptions()
|
||||
try:
|
||||
resp = await client.send_sms_verify_code_with_options_async(send_sms_verify_code_request, runtime)
|
||||
print(json.dumps(resp, default=str, indent=2))
|
||||
except Exception as error:
|
||||
# 此处仅做打印展示,请谨慎对待异常处理,在工程项目中切勿直接忽略异常。
|
||||
# 错误 message
|
||||
print(error.message)
|
||||
# 诊断地址
|
||||
print(error.data.get("Recommend"))
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
Sample.main(sys.argv[1:])
|
||||
87
ai_sms/ai_sms/alibabacloud_sample/sample - 副本.py
Normal file
87
ai_sms/ai_sms/alibabacloud_sample/sample - 副本.py
Normal file
@@ -0,0 +1,87 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
# This file is auto-generated, don't edit it. Thanks.
|
||||
import os
|
||||
import sys
|
||||
import json
|
||||
|
||||
from typing import List
|
||||
|
||||
from alibabacloud_dypnsapi20170525.client import Client as Dypnsapi20170525Client
|
||||
from alibabacloud_credentials.client import Client as CredentialClient
|
||||
from alibabacloud_credentials.models import Config as CredentialConfig
|
||||
from alibabacloud_tea_openapi import models as open_api_models
|
||||
from alibabacloud_dypnsapi20170525 import models as dypnsapi_20170525_models
|
||||
from alibabacloud_tea_util import models as util_models
|
||||
from alibabacloud_tea_util.client import Client as UtilClient
|
||||
|
||||
|
||||
class Sample:
|
||||
def __init__(self):
|
||||
pass
|
||||
|
||||
@staticmethod
|
||||
def create_client() -> Dypnsapi20170525Client:
|
||||
"""
|
||||
使用凭据初始化账号Client
|
||||
@return: Client
|
||||
@throws Exception
|
||||
"""
|
||||
credential_config = CredentialConfig(
|
||||
type='access_key',
|
||||
access_key_id='LTAI5tPnLdDkvSxrVJfRZMCn',
|
||||
access_key_secret='AII2A8hgfxXWM1xYqeuNwnS61AErDz'
|
||||
)
|
||||
credential = CredentialClient(credential_config)
|
||||
config = open_api_models.Config(
|
||||
credential=credential
|
||||
)
|
||||
# Endpoint 请参考 https://api.aliyun.com/product/Dypnsapi
|
||||
config.endpoint = f'dypnsapi.aliyuncs.com'
|
||||
return Dypnsapi20170525Client(config)
|
||||
|
||||
@staticmethod
|
||||
def main(
|
||||
args: List[str],
|
||||
) -> None:
|
||||
client = Sample.create_client()
|
||||
send_sms_verify_code_request = dypnsapi_20170525_models.SendSmsVerifyCodeRequest(
|
||||
phone_number='13621242430',
|
||||
sign_name='阿里云短信',
|
||||
template_code='SMS_474580174',
|
||||
template_param='{}',
|
||||
code_length=4,
|
||||
code_type=1
|
||||
)
|
||||
runtime = util_models.RuntimeOptions()
|
||||
try:
|
||||
resp = client.send_sms_verify_code_with_options(send_sms_verify_code_request, runtime)
|
||||
print(json.dumps(resp, default=str, indent=2))
|
||||
except Exception as error:
|
||||
# 此处仅做打印展示,请谨慎对待异常处理,在工程项目中切勿直接忽略异常。
|
||||
# 错误 message
|
||||
print(f"错误: {error}")
|
||||
# 诊断地址
|
||||
if hasattr(error, 'data') and error.data:
|
||||
print(f"诊断地址: {error.data.get('Recommend')}")
|
||||
raise
|
||||
|
||||
@staticmethod
|
||||
async def main_async(
|
||||
args: List[str],
|
||||
) -> None:
|
||||
client = Sample.create_client()
|
||||
send_sms_verify_code_request = dypnsapi_20170525_models.SendSmsVerifyCodeRequest()
|
||||
runtime = util_models.RuntimeOptions()
|
||||
try:
|
||||
resp = await client.send_sms_verify_code_with_options_async(send_sms_verify_code_request, runtime)
|
||||
print(json.dumps(resp, default=str, indent=2))
|
||||
except Exception as error:
|
||||
# 此处仅做打印展示,请谨慎对待异常处理,在工程项目中切勿直接忽略异常。
|
||||
# 错误 message
|
||||
print(error.message)
|
||||
# 诊断地址
|
||||
print(error.data.get("Recommend"))
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
Sample.main(sys.argv[1:])
|
||||
90
ai_sms/ai_sms/alibabacloud_sample/sample.py
Normal file
90
ai_sms/ai_sms/alibabacloud_sample/sample.py
Normal file
@@ -0,0 +1,90 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
# This file is auto-generated, don't edit it. Thanks.
|
||||
import os
|
||||
import sys
|
||||
import json
|
||||
|
||||
from typing import List
|
||||
|
||||
from alibabacloud_dysmsapi20170525.client import Client as Dysmsapi20170525Client
|
||||
from alibabacloud_credentials.client import Client as CredentialClient
|
||||
from alibabacloud_credentials.models import Config as CredentialConfig
|
||||
from alibabacloud_tea_openapi import models as open_api_models
|
||||
from alibabacloud_dysmsapi20170525 import models as dysmsapi_20170525_models
|
||||
from alibabacloud_tea_util import models as util_models
|
||||
from alibabacloud_tea_util.client import Client as UtilClient
|
||||
|
||||
|
||||
class Sample:
|
||||
def __init__(self):
|
||||
pass
|
||||
|
||||
@staticmethod
|
||||
def create_client() -> Dysmsapi20170525Client:
|
||||
"""
|
||||
使用凭据初始化账号Client
|
||||
@return: Client
|
||||
@throws Exception
|
||||
"""
|
||||
credential_config = CredentialConfig(
|
||||
type='access_key',
|
||||
access_key_id='LTAI5tSMvnCJdqkZtCVWgh8R',
|
||||
access_key_secret='nyFzXyIi47peVLK4wR2qqbPezmU79W'
|
||||
)
|
||||
credential = CredentialClient(credential_config)
|
||||
config = open_api_models.Config(
|
||||
credential=credential
|
||||
)
|
||||
# Endpoint 请参考 https://api.aliyun.com/product/Dysmsapi
|
||||
config.endpoint = f'dysmsapi.aliyuncs.com'
|
||||
return Dysmsapi20170525Client(config)
|
||||
|
||||
@staticmethod
|
||||
def main(
|
||||
args: List[str],
|
||||
) -> None:
|
||||
client = Sample.create_client()
|
||||
send_sms_request = dysmsapi_20170525_models.SendSmsRequest(
|
||||
phone_numbers='13621242430',
|
||||
sign_name='北京乐航时代科技',
|
||||
template_code='SMS_486210104',
|
||||
template_param=json.dumps({"code": "1314"})
|
||||
)
|
||||
runtime = util_models.RuntimeOptions()
|
||||
try:
|
||||
resp = client.send_sms_with_options(send_sms_request, runtime)
|
||||
print(json.dumps(resp.to_map(), default=str, indent=2))
|
||||
except Exception as error:
|
||||
# 此处仅做打印展示,请谨慎对待异常处理,在工程项目中切勿直接忽略异常。
|
||||
# 错误 message
|
||||
print(f"错误: {error}")
|
||||
# 诊断地址
|
||||
if hasattr(error, 'data') and error.data:
|
||||
print(f"诊断地址: {error.data.get('Recommend')}")
|
||||
raise
|
||||
|
||||
@staticmethod
|
||||
async def main_async(
|
||||
args: List[str],
|
||||
) -> None:
|
||||
client = Sample.create_client()
|
||||
send_sms_request = dysmsapi_20170525_models.SendSmsRequest(
|
||||
phone_numbers='13621242430',
|
||||
sign_name='北京乐航时代科技',
|
||||
template_code='SMS_486210104',
|
||||
template_param=json.dumps({"code": "1314"})
|
||||
)
|
||||
runtime = util_models.RuntimeOptions()
|
||||
try:
|
||||
resp = await client.send_sms_with_options_async(send_sms_request, runtime)
|
||||
print(json.dumps(resp.to_map(), default=str, indent=2))
|
||||
except Exception as error:
|
||||
# 此处仅做打印展示,请谨慎对待异常处理,在工程项目中切勿直接忽略异常。
|
||||
# 错误 message
|
||||
print(error.message)
|
||||
# 诊断地址
|
||||
print(error.data.get("Recommend"))
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
Sample.main(sys.argv[1:])
|
||||
1
ai_sms/ai_sms/requirements.txt
Normal file
1
ai_sms/ai_sms/requirements.txt
Normal file
@@ -0,0 +1 @@
|
||||
alibabacloud_dypnsapi20170525==2.0.0
|
||||
63
ai_sms/ai_sms/run_with_credentials.py
Normal file
63
ai_sms/ai_sms/run_with_credentials.py
Normal file
@@ -0,0 +1,63 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
快速运行脚本 - 设置凭据并运行 SMS 示例
|
||||
使用方法: python run_with_credentials.py
|
||||
"""
|
||||
import os
|
||||
import sys
|
||||
import subprocess
|
||||
|
||||
def main():
|
||||
print("=" * 60)
|
||||
print("阿里云短信验证码 API - 快速运行")
|
||||
print("=" * 60)
|
||||
print()
|
||||
|
||||
# 检查环境变量
|
||||
access_key_id = os.environ.get('ALIBABA_CLOUD_ACCESS_KEY_ID')
|
||||
access_key_secret = os.environ.get('ALIBABA_CLOUD_ACCESS_KEY_SECRET')
|
||||
|
||||
if not access_key_id or not access_key_secret:
|
||||
print("⚠️ 未检测到环境变量,请输入您的阿里云凭据:")
|
||||
print(" (获取地址: https://ram.console.aliyun.com/manage/ak)")
|
||||
print()
|
||||
|
||||
access_key_id = input("AccessKey ID: ").strip()
|
||||
access_key_secret = input("AccessKey Secret: ").strip()
|
||||
|
||||
if not access_key_id or not access_key_secret:
|
||||
print()
|
||||
print("❌ 错误: AccessKey 不能为空!")
|
||||
sys.exit(1)
|
||||
|
||||
# 设置环境变量
|
||||
os.environ['ALIBABA_CLOUD_ACCESS_KEY_ID'] = access_key_id
|
||||
os.environ['ALIBABA_CLOUD_ACCESS_KEY_SECRET'] = access_key_secret
|
||||
print()
|
||||
print("✅ 凭据已设置")
|
||||
else:
|
||||
print(f"✅ 检测到环境变量:")
|
||||
print(f" ALIBABA_CLOUD_ACCESS_KEY_ID: {access_key_id[:8]}...")
|
||||
print(f" ALIBABA_CLOUD_ACCESS_KEY_SECRET: {'*' * 20}")
|
||||
|
||||
print()
|
||||
print("=" * 60)
|
||||
print("正在运行 SMS 示例...")
|
||||
print("=" * 60)
|
||||
print()
|
||||
|
||||
# 运行示例
|
||||
try:
|
||||
# 使用当前环境运行
|
||||
result = subprocess.run(
|
||||
[sys.executable, './alibabacloud_sample/sample.py'],
|
||||
env=os.environ.copy(),
|
||||
capture_output=False
|
||||
)
|
||||
sys.exit(result.returncode)
|
||||
except Exception as e:
|
||||
print(f"❌ 运行失败: {e}")
|
||||
sys.exit(1)
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
45
ai_sms/ai_sms/set_credentials.ps1
Normal file
45
ai_sms/ai_sms/set_credentials.ps1
Normal file
@@ -0,0 +1,45 @@
|
||||
# PowerShell 脚本:设置阿里云凭据环境变量
|
||||
# 使用方法: .\set_credentials.ps1
|
||||
|
||||
Write-Host "=" -NoNewline -ForegroundColor Cyan
|
||||
Write-Host ("=" * 59) -ForegroundColor Cyan
|
||||
Write-Host "阿里云 SMS API 凭据配置" -ForegroundColor Yellow
|
||||
Write-Host "=" -NoNewline -ForegroundColor Cyan
|
||||
Write-Host ("=" * 59) -ForegroundColor Cyan
|
||||
Write-Host ""
|
||||
|
||||
# 提示用户输入凭据
|
||||
Write-Host "请输入您的阿里云访问凭据:" -ForegroundColor Green
|
||||
Write-Host "(可以在 https://ram.console.aliyun.com/manage/ak 获取)" -ForegroundColor Gray
|
||||
Write-Host ""
|
||||
|
||||
$accessKeyId = Read-Host "AccessKey ID"
|
||||
$accessKeySecret = Read-Host "AccessKey Secret" -AsSecureString
|
||||
$accessKeySecretPlain = [Runtime.InteropServices.Marshal]::PtrToStringAuto(
|
||||
[Runtime.InteropServices.Marshal]::SecureStringToBSTR($accessKeySecret)
|
||||
)
|
||||
|
||||
if ([string]::IsNullOrWhiteSpace($accessKeyId) -or [string]::IsNullOrWhiteSpace($accessKeySecretPlain)) {
|
||||
Write-Host ""
|
||||
Write-Host "错误: AccessKey ID 和 AccessKey Secret 不能为空!" -ForegroundColor Red
|
||||
exit 1
|
||||
}
|
||||
|
||||
# 设置环境变量
|
||||
$env:ALIBABA_CLOUD_ACCESS_KEY_ID = $accessKeyId
|
||||
$env:ALIBABA_CLOUD_ACCESS_KEY_SECRET = $accessKeySecretPlain
|
||||
|
||||
Write-Host ""
|
||||
Write-Host "=" -NoNewline -ForegroundColor Cyan
|
||||
Write-Host ("=" * 59) -ForegroundColor Cyan
|
||||
Write-Host "凭据配置成功!" -ForegroundColor Green
|
||||
Write-Host "=" -NoNewline -ForegroundColor Cyan
|
||||
Write-Host ("=" * 59) -ForegroundColor Cyan
|
||||
Write-Host ""
|
||||
Write-Host "环境变量已设置 (当前 PowerShell 会话有效):" -ForegroundColor Yellow
|
||||
Write-Host "ALIBABA_CLOUD_ACCESS_KEY_ID: $($accessKeyId.Substring(0, [Math]::Min(8, $accessKeyId.Length)))..." -ForegroundColor Gray
|
||||
Write-Host "ALIBABA_CLOUD_ACCESS_KEY_SECRET: ********************" -ForegroundColor Gray
|
||||
Write-Host ""
|
||||
Write-Host "现在可以运行程序:" -ForegroundColor Green
|
||||
Write-Host "python .\alibabacloud_sample\sample.py" -ForegroundColor Cyan
|
||||
Write-Host ""
|
||||
76
ai_sms/ai_sms/setup.py
Normal file
76
ai_sms/ai_sms/setup.py
Normal file
@@ -0,0 +1,76 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
Licensed to the Apache Software Foundation (ASF) under one
|
||||
or more contributor license agreements. See the NOTICE file
|
||||
distributed with this work for additional information
|
||||
regarding copyright ownership. The ASF licenses this file
|
||||
to you under the Apache License, Version 2.0 (the
|
||||
"License"); you may not use this file except in compliance
|
||||
with the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing,
|
||||
software distributed under the License is distributed on an
|
||||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations
|
||||
under the License.
|
||||
"""
|
||||
|
||||
import os
|
||||
from setuptools import setup, find_packages
|
||||
|
||||
"""
|
||||
setup module for alibabacloud_sample.
|
||||
|
||||
Created on 31/12/2025
|
||||
|
||||
@author:
|
||||
"""
|
||||
|
||||
PACKAGE = "alibabacloud_sample"
|
||||
NAME = "alibabacloud_sample" or "alibabacloud-package"
|
||||
DESCRIPTION = "Alibaba Cloud SDK Code Sample Library for Python"
|
||||
AUTHOR = ""
|
||||
AUTHOR_EMAIL = ""
|
||||
URL = "https://github.com/aliyun/alibabacloud-sdk"
|
||||
VERSION = __import__(PACKAGE).__version__
|
||||
REQUIRES = [
|
||||
"alibabacloud_dypnsapi20170525>=2.0.0, <3.0.0",
|
||||
]
|
||||
|
||||
LONG_DESCRIPTION = ''
|
||||
if os.path.exists('./README.md'):
|
||||
with open("README.md", encoding='utf-8') as fp:
|
||||
LONG_DESCRIPTION = fp.read()
|
||||
|
||||
setup(
|
||||
name=NAME,
|
||||
version=VERSION,
|
||||
description=DESCRIPTION,
|
||||
long_description=LONG_DESCRIPTION,
|
||||
long_description_content_type='text/markdown',
|
||||
author=AUTHOR,
|
||||
author_email=AUTHOR_EMAIL,
|
||||
license="Apache License 2.0",
|
||||
url=URL,
|
||||
keywords=["alibabacloud","sample"],
|
||||
packages=find_packages(exclude=["tests*"]),
|
||||
include_package_data=True,
|
||||
platforms="any",
|
||||
install_requires=REQUIRES,
|
||||
python_requires=">=3.6",
|
||||
classifiers=(
|
||||
"Development Status :: 4 - Beta",
|
||||
"Intended Audience :: Developers",
|
||||
"License :: OSI Approved :: Apache Software License",
|
||||
"Programming Language :: Python",
|
||||
"Programming Language :: Python :: 3",
|
||||
"Programming Language :: Python :: 3.6",
|
||||
'Programming Language :: Python :: 3.7',
|
||||
'Programming Language :: Python :: 3.8',
|
||||
'Programming Language :: Python :: 3.9',
|
||||
"Topic :: Software Development"
|
||||
)
|
||||
)
|
||||
41
ai_sms/ai_sms/setup_credentials.py
Normal file
41
ai_sms/ai_sms/setup_credentials.py
Normal file
@@ -0,0 +1,41 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
import os
|
||||
import sys
|
||||
|
||||
def setup_credentials():
|
||||
"""
|
||||
设置阿里云访问凭据的环境变量
|
||||
"""
|
||||
print("=" * 60)
|
||||
print("阿里云 SMS API 凭据配置")
|
||||
print("=" * 60)
|
||||
print("\n请输入您的阿里云访问凭据:")
|
||||
print("(可以在 https://ram.console.aliyun.com/manage/ak 获取)\n")
|
||||
|
||||
access_key_id = input("AccessKey ID: ").strip()
|
||||
access_key_secret = input("AccessKey Secret: ").strip()
|
||||
|
||||
if not access_key_id or not access_key_secret:
|
||||
print("\n错误: AccessKey ID 和 AccessKey Secret 不能为空!")
|
||||
sys.exit(1)
|
||||
|
||||
# 设置环境变量
|
||||
os.environ['ALIBABA_CLOUD_ACCESS_KEY_ID'] = access_key_id
|
||||
os.environ['ALIBABA_CLOUD_ACCESS_KEY_SECRET'] = access_key_secret
|
||||
|
||||
print("\n" + "=" * 60)
|
||||
print("凭据配置成功!")
|
||||
print("=" * 60)
|
||||
print("\n环境变量已设置:")
|
||||
print(f"ALIBABA_CLOUD_ACCESS_KEY_ID: {access_key_id[:8]}...")
|
||||
print(f"ALIBABA_CLOUD_ACCESS_KEY_SECRET: {'*' * 20}")
|
||||
|
||||
print("\n您可以使用以下命令设置环境变量 (PowerShell):")
|
||||
print(f'$env:ALIBABA_CLOUD_ACCESS_KEY_ID="{access_key_id}"')
|
||||
print(f'$env:ALIBABA_CLOUD_ACCESS_KEY_SECRET="{access_key_secret}"')
|
||||
|
||||
print("\n或者在当前 Python 会话中运行:")
|
||||
print("python ./alibabacloud_sample/sample.py")
|
||||
|
||||
if __name__ == '__main__':
|
||||
setup_credentials()
|
||||
83
ai_sms/ai_sms/test_credentials.py
Normal file
83
ai_sms/ai_sms/test_credentials.py
Normal file
@@ -0,0 +1,83 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
测试凭据是否正确配置
|
||||
使用方法: python test_credentials.py
|
||||
"""
|
||||
import os
|
||||
import sys
|
||||
|
||||
os.environ['NO_PROXY'] = '100.100.100.200'
|
||||
os.environ['no_proxy'] = '100.100.100.200'
|
||||
|
||||
from alibabacloud_credentials.client import Client as CredentialClient
|
||||
|
||||
def test_credentials():
|
||||
print("=" * 60)
|
||||
print("测试阿里云凭据配置")
|
||||
print("=" * 60)
|
||||
print()
|
||||
|
||||
# 检查环境变量
|
||||
access_key_id = os.environ.get('ALIBABA_CLOUD_ACCESS_KEY_ID')
|
||||
access_key_secret = os.environ.get('ALIBABA_CLOUD_ACCESS_KEY_SECRET')
|
||||
|
||||
print("1. 检查环境变量:")
|
||||
if access_key_id:
|
||||
print(f" ✅ ALIBABA_CLOUD_ACCESS_KEY_ID: {access_key_id[:8]}...")
|
||||
else:
|
||||
print(f" ❌ ALIBABA_CLOUD_ACCESS_KEY_ID: 未设置")
|
||||
|
||||
if access_key_secret:
|
||||
print(f" ✅ ALIBABA_CLOUD_ACCESS_KEY_SECRET: {'*' * 20}")
|
||||
else:
|
||||
print(f" ❌ ALIBABA_CLOUD_ACCESS_KEY_SECRET: 未设置")
|
||||
|
||||
print()
|
||||
print("2. 测试凭据加载:")
|
||||
|
||||
try:
|
||||
credential_client = CredentialClient()
|
||||
credential = credential_client.get_credential()
|
||||
|
||||
loaded_ak_id = credential.get_access_key_id()
|
||||
loaded_ak_secret = credential.get_access_key_secret()
|
||||
|
||||
if loaded_ak_id and loaded_ak_secret:
|
||||
print(f" ✅ 凭据加载成功!")
|
||||
print(f" AccessKey ID: {loaded_ak_id[:8]}...")
|
||||
print(f" AccessKey Secret: {'*' * 20}")
|
||||
print()
|
||||
print("=" * 60)
|
||||
print("✅ 凭据配置正确,可以运行程序了!")
|
||||
print("=" * 60)
|
||||
print()
|
||||
print("运行命令:")
|
||||
print(" python ./alibabacloud_sample/sample.py")
|
||||
print()
|
||||
print("或使用快速运行脚本:")
|
||||
print(" python run_with_credentials.py")
|
||||
return True
|
||||
else:
|
||||
print(" ❌ 凭据加载失败: AccessKey 为空")
|
||||
return False
|
||||
|
||||
except Exception as e:
|
||||
print(f" ❌ 凭据加载失败: {e}")
|
||||
print()
|
||||
print("=" * 60)
|
||||
print("解决方案:")
|
||||
print("=" * 60)
|
||||
print()
|
||||
print("请在 PowerShell 中设置环境变量:")
|
||||
print()
|
||||
print(' $env:ALIBABA_CLOUD_ACCESS_KEY_ID="您的AccessKey_ID"')
|
||||
print(' $env:ALIBABA_CLOUD_ACCESS_KEY_SECRET="您的AccessKey_Secret"')
|
||||
print()
|
||||
print("然后重新运行此测试脚本")
|
||||
print()
|
||||
print("获取 AccessKey: https://ram.console.aliyun.com/manage/ak")
|
||||
return False
|
||||
|
||||
if __name__ == '__main__':
|
||||
success = test_credentials()
|
||||
sys.exit(0 if success else 1)
|
||||
100
ai_sms/ai_sms/快速运行指南.md
Normal file
100
ai_sms/ai_sms/快速运行指南.md
Normal file
@@ -0,0 +1,100 @@
|
||||
# 快速运行指南
|
||||
|
||||
## 问题原因
|
||||
|
||||
您遇到的错误是因为程序无法加载阿里云访问凭据。默认凭据链尝试了以下方式但都失败了:
|
||||
1. ❌ 环境变量 `ALIBABA_CLOUD_ACCESS_KEY_ID` 和 `ALIBABA_CLOUD_ACCESS_KEY_SECRET`
|
||||
2. ❌ CLI 配置文件 `C:\Users\34362\.aliyun\config.json`
|
||||
3. ❌ 凭据配置文件 `C:\Users\34362\.alibabacloud\credentials.ini`
|
||||
4. ❌ ECS 实例 RAM 角色(尝试连接代理 127.0.0.1:10809 失败)
|
||||
|
||||
## 解决方案
|
||||
|
||||
### 方案 1: 使用环境变量(推荐,最简单)
|
||||
|
||||
在 PowerShell 中运行:
|
||||
|
||||
```powershell
|
||||
# 设置环境变量
|
||||
$env:ALIBABA_CLOUD_ACCESS_KEY_ID="您的AccessKey_ID"
|
||||
$env:ALIBABA_CLOUD_ACCESS_KEY_SECRET="您的AccessKey_Secret"
|
||||
|
||||
# 运行程序
|
||||
python .\alibabacloud_sample\sample.py
|
||||
```
|
||||
|
||||
### 方案 2: 使用 PowerShell 脚本(交互式)
|
||||
|
||||
```powershell
|
||||
# 运行凭据配置脚本
|
||||
.\set_credentials.ps1
|
||||
|
||||
# 脚本会提示输入凭据并自动设置环境变量
|
||||
# 然后直接运行程序
|
||||
python .\alibabacloud_sample\sample.py
|
||||
```
|
||||
|
||||
### 方案 3: 创建凭据配置文件(永久配置)
|
||||
|
||||
创建文件 `C:\Users\34362\.alibabacloud\credentials.ini`:
|
||||
|
||||
```ini
|
||||
[default]
|
||||
type = access_key
|
||||
access_key_id = 您的AccessKey_ID
|
||||
access_key_secret = 您的AccessKey_Secret
|
||||
```
|
||||
|
||||
然后直接运行:
|
||||
```powershell
|
||||
python .\alibabacloud_sample\sample.py
|
||||
```
|
||||
|
||||
## 获取 AccessKey
|
||||
|
||||
1. 访问阿里云 RAM 控制台: https://ram.console.aliyun.com/manage/ak
|
||||
2. 创建或查看您的 AccessKey ID 和 AccessKey Secret
|
||||
3. **重要**: 妥善保管 AccessKey Secret,不要泄露
|
||||
|
||||
## 当前配置
|
||||
|
||||
- ✅ 手机号码: `13621242430`
|
||||
- ✅ 验证码长度: 4 位数字
|
||||
- ✅ 依赖已安装
|
||||
- ✅ 错误处理已修复
|
||||
|
||||
## 注意事项
|
||||
|
||||
1. **代理问题**: 如果您使用了代理(如 127.0.0.1:10809),可能需要临时关闭或配置 `NO_PROXY` 环境变量
|
||||
2. **验证码**: API 会自动生成验证码,无法指定为 "1314"
|
||||
3. **费用**: 发送短信会产生费用
|
||||
|
||||
## 完整示例
|
||||
|
||||
```powershell
|
||||
# 1. 激活 conda 环境(如果需要)
|
||||
conda activate douyin
|
||||
|
||||
# 2. 设置凭据
|
||||
$env:ALIBABA_CLOUD_ACCESS_KEY_ID="LTAI5t..."
|
||||
$env:ALIBABA_CLOUD_ACCESS_KEY_SECRET="your_secret"
|
||||
|
||||
# 3. 运行程序
|
||||
python .\alibabacloud_sample\sample.py
|
||||
```
|
||||
|
||||
## 预期输出
|
||||
|
||||
成功时会看到类似输出:
|
||||
```json
|
||||
{
|
||||
"Code": "OK",
|
||||
"Success": true,
|
||||
"Message": "成功",
|
||||
"Model": {
|
||||
"VerifyCode": "1234",
|
||||
"BizId": "...",
|
||||
"RequestId": "..."
|
||||
}
|
||||
}
|
||||
```
|
||||
91
ai_sms/ai_sms/成功运行说明.md
Normal file
91
ai_sms/ai_sms/成功运行说明.md
Normal file
@@ -0,0 +1,91 @@
|
||||
# ✅ API 已成功连接!
|
||||
|
||||
## 🎉 当前状态
|
||||
|
||||
1. **API 连接成功**: 已成功调用阿里云短信服务 API
|
||||
2. **代码已切换**: 从 `dypnsapi` 切换到正确的 `dysmsapi`
|
||||
3. **凭据正确**: AccessKey 配置正确
|
||||
4. **手机号已配置**: 13621242430
|
||||
5. **验证码已配置**: 1314
|
||||
|
||||
## ⚠️ 最后一步:配置正确的短信模板
|
||||
|
||||
当前错误: `isv.SMS_TEMPLATE_ILLEGAL` - 该账号下找不到对应模板
|
||||
|
||||
**原因**: 模板代码 `SMS_474580174` 不存在或未审核通过
|
||||
|
||||
### 解决方案
|
||||
|
||||
1. **登录阿里云短信控制台**
|
||||
- 短信模板管理: https://dysms.console.aliyun.com/domestic/text/template
|
||||
|
||||
2. **查看您的已审核通过的模板**
|
||||
- 找到模板代码(格式: `SMS_xxxxxxxxx`)
|
||||
- 确保模板状态为"审核通过"
|
||||
|
||||
3. **更新代码中的模板代码**
|
||||
|
||||
编辑 `sample.py` 第 50 行和第 74 行:
|
||||
```python
|
||||
template_code='SMS_您的实际模板代码',
|
||||
```
|
||||
|
||||
4. **确认模板参数**
|
||||
|
||||
如果您的模板内容是: `您的验证码是${code},有效期5分钟`
|
||||
|
||||
那么当前的配置已经正确:
|
||||
```python
|
||||
template_param=json.dumps({"code": "1314"})
|
||||
```
|
||||
|
||||
如果模板变量名不是 `code`,请相应修改。
|
||||
|
||||
## 📝 当前代码配置
|
||||
|
||||
```python
|
||||
send_sms_request = dysmsapi_20170525_models.SendSmsRequest(
|
||||
phone_numbers='13621242430', # ✅ 手机号
|
||||
sign_name='阿里云短信', # ✅ 签名(如果不对请修改)
|
||||
template_code='SMS_474580174', # ⚠️ 需要替换为您的实际模板代码
|
||||
template_param=json.dumps({"code": "1314"}) # ✅ 验证码参数
|
||||
)
|
||||
```
|
||||
|
||||
## 🚀 运行命令
|
||||
|
||||
配置完模板代码后:
|
||||
|
||||
```powershell
|
||||
python .\alibabacloud_sample\sample.py
|
||||
```
|
||||
|
||||
## ✅ 成功的响应示例
|
||||
|
||||
配置正确后,您会看到:
|
||||
|
||||
```json
|
||||
{
|
||||
"statusCode": 200,
|
||||
"body": {
|
||||
"Code": "OK",
|
||||
"Message": "OK",
|
||||
"BizId": "...",
|
||||
"RequestId": "..."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## 📖 参考链接
|
||||
|
||||
- **短信模板管理**: https://dysms.console.aliyun.com/domestic/text/template
|
||||
- **短信签名管理**: https://dysms.console.aliyun.com/domestic/text/sign
|
||||
- **API 文档**: https://help.aliyun.com/zh/sms/developer-reference/api-dysmsapi-2017-05-25-sendsms
|
||||
|
||||
## 💡 提示
|
||||
|
||||
1. 如果签名名称也不对,请同时修改第 49 行和第 73 行的 `sign_name`
|
||||
2. 模板必须是"审核通过"状态才能使用
|
||||
3. 签名和模板必须匹配您的账号
|
||||
|
||||
**只需要更新正确的模板代码,程序就能成功发送短信了!**
|
||||
69
ai_sms/ai_sms/配置说明.md
Normal file
69
ai_sms/ai_sms/配置说明.md
Normal file
@@ -0,0 +1,69 @@
|
||||
# ✅ 代码已跑通 - 还需配置短信签名和模板
|
||||
|
||||
## 🎉 成功进展
|
||||
|
||||
1. ✅ **凭据已硬编码**: AccessKey 已配置在代码中
|
||||
2. ✅ **API 连接成功**: 已成功调用阿里云 API
|
||||
3. ✅ **手机号已配置**: 13621242430
|
||||
4. ✅ **验证码配置**: 4位数字
|
||||
|
||||
## ⚠️ 还需完成的配置
|
||||
|
||||
### 1. 短信签名 (SignName)
|
||||
|
||||
当前代码使用: `'阿里云'`
|
||||
|
||||
**您需要:**
|
||||
- 在阿里云短信控制台创建并审核通过短信签名
|
||||
- 控制台地址: https://dysms.console.aliyun.com/domestic/text/sign
|
||||
- 将 `@D:\ai_sms\alibabacloud_sample\sample.py:49` 中的 `'阿里云'` 替换为您的实际签名
|
||||
|
||||
### 2. 短信模板 (TemplateCode)
|
||||
|
||||
当前代码使用: `'SMS_123456789'` (占位符)
|
||||
|
||||
**您需要:**
|
||||
- 在阿里云短信控制台创建并审核通过短信模板
|
||||
- 控制台地址: https://dysms.console.aliyun.com/domestic/text/template
|
||||
- 模板类型: 验证码
|
||||
- 模板内容示例: `您的验证码是${code},有效期5分钟`
|
||||
- 将 `@D:\ai_sms\alibabacloud_sample\sample.py:50` 中的 `'SMS_123456789'` 替换为您的实际模板代码
|
||||
|
||||
## 📝 当前代码配置
|
||||
|
||||
```python
|
||||
send_sms_verify_code_request = dypnsapi_20170525_models.SendSmsVerifyCodeRequest(
|
||||
phone_number='13621242430', # ✅ 已配置
|
||||
sign_name='阿里云', # ⚠️ 需要替换为您的签名
|
||||
template_code='SMS_123456789', # ⚠️ 需要替换为您的模板代码
|
||||
code_length=4, # ✅ 已配置
|
||||
code_type=1 # ✅ 已配置 (数字验证码)
|
||||
)
|
||||
```
|
||||
|
||||
## 🚀 完成配置后运行
|
||||
|
||||
```powershell
|
||||
python .\alibabacloud_sample\sample.py
|
||||
```
|
||||
|
||||
## 📖 参考链接
|
||||
|
||||
- **短信签名管理**: https://dysms.console.aliyun.com/domestic/text/sign
|
||||
- **短信模板管理**: https://dysms.console.aliyun.com/domestic/text/template
|
||||
- **API 文档**: https://next.api.aliyun.com/document/Dypnsapi/2017-05-25/SendSmsVerifyCode
|
||||
|
||||
## 💡 注意事项
|
||||
|
||||
1. 签名和模板需要审核,通常需要几分钟到几小时
|
||||
2. 签名和模板必须审核通过后才能使用
|
||||
3. 验证码由 API 自动生成,无法指定为 "1314"
|
||||
4. 发送短信会产生费用,请确保账户有余额
|
||||
|
||||
## 🔍 当前错误信息
|
||||
|
||||
最后一次运行的错误:
|
||||
- `MissingTemplateCode`: 需要配置模板代码
|
||||
- 诊断地址已在错误信息中提供
|
||||
|
||||
配置完签名和模板后,程序即可成功发送短信验证码!
|
||||
@@ -1,85 +1,112 @@
|
||||
author_id,author_name,channel,stat_date,daily_published_count,cumulative_published_count,day_revenue,monthly_revenue,weekly_revenue,revenue_mom_growth_rate,revenue_wow_growth_rate
|
||||
101,乳腺专家林华,1,2025-12-23,0,35,17.31,401.48,14.65,0.051353,-0.101366
|
||||
102,韩主任聊妇科,1,2025-12-23,15,167,295.54,6273.97,250.73,-0.274106,-0.142889
|
||||
103,男科医生杨宇卓,1,2025-12-23,49,413,142.81,5318.03,277.38,-0.202357,0.086164
|
||||
104,男科医生刘德风,1,2025-12-23,15,171,171.91,4018.47,137.84,-0.396516,0.031539
|
||||
105,抗衰孟大夫,1,2025-12-23,0,0,64.02,1717.74,57.83,-0.222260,-0.052155
|
||||
113,内科主任何少忠医生,1,2025-12-23,494,2820,118.62,3266.14,122.86,0.145142,-0.088487
|
||||
120,中医贾希瑞主任,1,2025-12-23,51,328,45.82,1406.60,54.46,-0.061046,0.001559
|
||||
121,协和皮肤科付兰芹主任,1,2025-12-23,0,0,9.70,169.07,7.14,-0.176473,-0.035851
|
||||
122,高丽娜中医,1,2025-12-23,1,98,6.83,162.98,4.55,0.098914,-0.382008
|
||||
123,微创腋臭专家邹普功,1,2025-12-23,0,0,1.13,53.29,4.07,-0.084207,4.381958
|
||||
138,阜外医院神内李大夫,1,2025-12-23,15,112,0.00,0.00,4.18,0.000000,0.000000
|
||||
139,药师赵志军,1,2025-12-23,1,74,17.31,254.35,9.75,-0.223406,0.096265
|
||||
140,射雕女英雄,1,2025-12-23,46,318,25.54,913.34,18.20,0.028536,0.004816
|
||||
141,耳鼻喉科杨书勋医生,1,2025-12-23,100,499,39.87,1488.71,38.66,-0.339516,-0.110187
|
||||
142,影像科毛医生,1,2025-12-23,1,15,40.48,987.04,16.70,-0.183184,-0.618984
|
||||
145,注射刘新亚,1,2025-12-23,0,0,2.99,76.23,1.44,-0.310759,0.180328
|
||||
146,Dr蓝剑雄,1,2025-12-23,0,0,1.08,5.32,0.34,-0.148800,3.238095
|
||||
147,眼科医生陈慧,1,2025-12-23,0,0,2.94,214.63,4.93,0.132552,-0.505205
|
||||
148,肿瘤科郭秋均医生,1,2025-12-23,100,762,52.48,697.58,22.99,-0.134291,0.390392
|
||||
149,生殖科医生师楠,1,2025-12-23,1,273,1.16,235.54,18.71,-0.052953,0.012516
|
||||
150,药师李宁,1,2025-12-23,1,62,8.04,158.16,5.56,0.344098,0.787563
|
||||
151,皮肤科赵鹏,1,2025-12-23,97,1064,9.95,268.69,18.81,0.347628,-0.108460
|
||||
152,皮肤科医生郑占才,1,2025-12-23,50,234,3.12,189.40,4.25,0.149690,-0.114675
|
||||
153,曹凤娇中医,1,2025-12-23,1,24,2.01,37.29,2.55,-0.825102,3.586572
|
||||
154,郝国君中医,1,2025-12-23,1,71,1.66,64.57,1.43,-0.860362,2.556184
|
||||
155,成金枝中医,1,2025-12-23,1,168,16.30,153.45,1.68,0.128641,0.081483
|
||||
156,许娜中医,1,2025-12-23,50,316,4.08,24.56,1.36,0.627568,0.949066
|
||||
157,刘冬琴中医,1,2025-12-23,1,61,0.29,8.74,0.33,-0.494505,0.065217
|
||||
158,刘叔勤中医,1,2025-12-23,1,128,0.44,5.88,0.41,-0.984114,0.000000
|
||||
159,专治静脉曲张的刘洪医生,1,2025-12-23,0,0,1.06,9.85,0.52,0.591276,-0.389006
|
||||
172,李亮亮中医,1,2025-12-23,1,27,2.23,23.72,0.00,2.201080,-0.222930
|
||||
173,赵剑锋医生,1,2025-12-23,47,166,4.87,144.57,8.25,2.890474,0.925251
|
||||
174,李雪民医生,1,2025-12-23,1,59,0.17,32.13,0.30,0.952005,-0.909366
|
||||
175,静脉曲张的杀手医生,1,2025-12-23,0,0,0.05,26.68,0.76,-0.327960,-0.630263
|
||||
176,武娜中医,1,2025-12-23,0,22,4.85,211.45,19.40,-0.583120,0.399662
|
||||
177,好孕闺蜜王珂,1,2025-12-23,0,0,31.60,1039.32,9.39,0.036563,-0.344309
|
||||
179,风湿免疫专家李小峰,1,2025-12-23,97,506,24.82,515.09,31.33,0.699462,-0.220420
|
||||
180,尹海琴医生,1,2025-12-23,1,294,3.72,55.02,1.65,-0.507959,0.140513
|
||||
181,针灸科高小勇医生,1,2025-12-23,0,392,10.25,395.35,13.34,0.067331,0.124069
|
||||
182,师强华中医,1,2025-12-23,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
183,杜晋芳中医,1,2025-12-23,101,511,28.55,207.79,3.54,0.162787,0.366563
|
||||
185,郭俊恒中医,1,2025-12-23,48,324,5.44,73.16,0.79,-0.008403,0.898561
|
||||
186,董强中医,1,2025-12-23,1,24,0.00,3.02,0.00,2.355556,-0.454545
|
||||
187,李亚娟中医,1,2025-12-23,1,24,0.02,3.87,0.16,-0.692369,-0.685446
|
||||
188,苗辉医生,1,2025-12-23,0,0,20.53,1122.08,23.76,0.640372,0.020800
|
||||
189,耳鼻喉医生夏昆峰,1,2025-12-23,0,0,1.34,15.79,0.54,0.146696,0.225490
|
||||
190,中医苏晨,1,2025-12-23,1,103,15.92,47.70,9.08,0.269292,6.691176
|
||||
191,智璇医生,1,2025-12-23,1,72,10.18,111.51,2.48,0.684187,-0.321377
|
||||
246,石鹤医生,1,2025-12-23,1,114,2.03,228.80,0.33,4.412822,-0.854409
|
||||
247,梁丽君中医,1,2025-12-23,1,115,7.74,29.35,1.23,1.795238,1.088028
|
||||
248,崔丽荣中医,1,2025-12-23,1,177,1.44,17.44,0.76,0.006347,0.105991
|
||||
249,张承红中医,1,2025-12-23,45,232,0.51,12.96,0.26,1.234483,1.695312
|
||||
253,中医郑伟,1,2025-12-23,1,24,0.00,0.12,0.00,-0.294118,0.000000
|
||||
254,感染科郭金存医生,1,2025-12-23,0,0,4.39,100.37,2.82,-0.411940,0.325933
|
||||
255,贾素芬中医,1,2025-12-23,101,129,0.02,1.62,0.16,-0.369650,0.000000
|
||||
256,张立净,1,2025-12-23,0,15,0.00,2.08,0.00,0.552239,-0.477064
|
||||
257,皮肤科李英医生,1,2025-12-23,1,201,3.19,38.23,0.80,2.530009,-0.472832
|
||||
364,针灸科冀占岭大夫,1,2025-12-23,0,18,0.21,10.88,0.14,0.000000,-0.806122
|
||||
365,超声科专家曹怀宇,1,2025-12-23,49,161,0.01,1.34,0.00,0.000000,20.250000
|
||||
366,脊柱微创易端医生,1,2025-12-23,46,201,0.00,4.44,3.99,17.500000,7.866667
|
||||
368,苗晋玲医生,1,2025-12-23,100,311,0.09,1.09,0.00,0.000000,-0.142857
|
||||
369,跟着车主任学中医,1,2025-12-23,15,52,0.87,26.67,0.91,3.246815,-0.202326
|
||||
370,郭主任讲中医,1,2025-12-23,15,52,9.51,17.78,1.22,0.000000,2.422886
|
||||
371,洪一针讲中医,1,2025-12-23,0,8,0.02,0.66,0.07,0.000000,1.000000
|
||||
372,李医生聊健康,1,2025-12-23,1,28,0.27,3.21,0.37,0.000000,0.767241
|
||||
373,刘刚医生说,1,2025-12-23,0,4,0.85,18.66,0.89,0.000000,-0.257059
|
||||
374,小丽讲中医,1,2025-12-23,1,9,0.13,83.77,0.17,18.850711,-0.981677
|
||||
375,西北中医张宝庆,1,2025-12-23,1,6,0.16,8.47,0.15,0.000000,0.758958
|
||||
376,胡锋医生,1,2025-12-23,1,6,2.04,40.82,1.40,0.440367,-0.134244
|
||||
377,神经内科巴医生,1,2025-12-23,0,58,0.07,7.40,0.10,0.000000,-0.668122
|
||||
378,曾国禄讲中医,1,2025-12-23,1,6,1.88,38.62,1.85,3.532864,0.288610
|
||||
379,泌尿男科陈医生,1,2025-12-23,1,49,1.04,31.10,1.66,0.170493,-0.125193
|
||||
380,肇庆中医何大夫,1,2025-12-23,16,24,5.01,99.07,3.89,15.187908,-0.287655
|
||||
381,李伟枫医生,1,2025-12-23,15,57,0.31,5.52,0.35,0.000000,0.988166
|
||||
382,刘医生讲中医,1,2025-12-23,15,54,2.48,104.49,2.76,21.470968,-0.204341
|
||||
383,卢医生讲健康,1,2025-12-23,1,11,0.30,16.57,0.51,0.690816,-0.479936
|
||||
384,阮志华讲健康,1,2025-12-23,1,13,9.92,11.96,0.15,0.000000,42.296296
|
||||
385,沈理医生,1,2025-12-23,46,56,1.21,57.24,1.06,0.435306,-0.051677
|
||||
386,中医治肾病周厘,1,2025-12-23,1,13,0.05,0.75,0.04,0.000000,0.419355
|
||||
387,中医妇产科安向荣,1,2025-12-23,15,57,0.38,19.16,0.29,0.527911,-0.379377
|
||||
388,院博医生,1,2025-12-23,1,13,0.06,0.32,0.04,0.000000,0.000000
|
||||
389,中医盛刚,1,2025-12-23,16,23,10.56,31.62,2.22,0.000000,2.561047
|
||||
390,中医王雷,1,2025-12-23,1,6,0.12,0.92,0.11,0.000000,0.000000
|
||||
391,泌尿科邱医生,1,2025-12-23,1,10,2.50,73.22,2.66,0.000000,-0.150463
|
||||
101,乳腺专家林华,1,2025-12-30,0,38,7.36,492.95,18.76,492.950000,-0.820770
|
||||
102,韩主任聊妇科,1,2025-12-30,13,269,85.66,7761.35,161.38,7761.350000,-0.913805
|
||||
103,男科医生杨宇卓,1,2025-12-30,50,748,36.01,6226.87,76.33,6226.870000,-0.939068
|
||||
104,男科医生刘德风,1,2025-12-30,15,276,70.48,4862.09,155.43,4862.090000,-0.844249
|
||||
105,抗衰孟大夫,1,2025-12-30,0,0,34.22,2139.18,86.21,2139.180000,-0.811390
|
||||
113,内科主任何少忠医生,1,2025-12-30,500,6298,130.84,4128.73,218.06,4128.730000,-0.753885
|
||||
120,中医贾希瑞主任,1,2025-12-30,50,683,29.40,1758.73,74.48,1758.730000,-0.802926
|
||||
121,协和皮肤科付兰芹主任,1,2025-12-30,0,0,8.60,215.84,15.85,215.840000,-0.668132
|
||||
122,高丽娜中医,1,2025-12-30,1,100,0.72,640.30,6.82,640.300000,-0.985847
|
||||
123,微创腋臭专家邹普功,1,2025-12-30,0,0,1.09,97.20,1.24,97.200000,-0.974097
|
||||
138,阜外医院神内李大夫,1,2025-12-30,15,187,4.17,123.97,9.45,123.970000,-0.647125
|
||||
139,药师赵志军,1,2025-12-30,1,80,15.57,341.11,23.76,341.110000,-0.736176
|
||||
140,射雕女英雄,1,2025-12-30,50,655,27.92,1129.05,60.21,1129.050000,-0.697802
|
||||
141,耳鼻喉科杨书勋医生,1,2025-12-30,99,1197,47.80,1883.24,111.25,1883.240000,-0.692518
|
||||
145,注射刘新亚,1,2025-12-30,0,0,2.82,98.66,5.40,98.660000,-0.748369
|
||||
146,Dr蓝剑雄,1,2025-12-30,0,0,0.90,11.61,0.92,11.610000,-0.864507
|
||||
147,眼科医生陈慧,1,2025-12-30,0,0,1.98,250.14,6.06,250.140000,-0.837621
|
||||
148,肿瘤科郭秋均医生,1,2025-12-30,100,1464,42.73,929.38,78.33,929.380000,-0.657858
|
||||
150,药师李宁,1,2025-12-30,1,68,13.42,203.69,16.00,203.690000,-0.629029
|
||||
151,皮肤科赵鹏,1,2025-12-30,100,1752,4.77,352.77,13.92,352.770000,-0.859280
|
||||
152,皮肤科医生郑占才,1,2025-12-30,50,488,4.99,217.91,9.22,217.910000,-0.654164
|
||||
153,曹凤娇中医,1,2025-12-30,1,27,0.00,41.15,0.08,41.150000,-0.990408
|
||||
154,郝国君中医,1,2025-12-30,1,74,1.00,74.57,1.43,74.570000,-0.877358
|
||||
155,成金枝中医,1,2025-12-30,15,260,6.16,209.53,11.33,209.530000,-0.819385
|
||||
156,许娜中医,1,2025-12-30,50,578,0.70,30.58,1.05,30.580000,-0.899135
|
||||
157,刘冬琴中医,1,2025-12-30,1,67,0.24,14.73,0.31,14.730000,-0.950794
|
||||
158,刘叔勤中医,1,2025-12-30,15,197,1.66,9.03,2.45,9.030000,0.580645
|
||||
159,专治静脉曲张的刘洪医生,1,2025-12-30,0,0,0.00,10.97,0.00,10.970000,-1.000000
|
||||
172,李亮亮中医,1,2025-12-30,1,33,0.68,31.19,1.60,31.190000,-0.802469
|
||||
173,赵剑锋医生,1,2025-12-30,50,418,3.42,182.77,9.15,182.770000,-0.783021
|
||||
174,李雪民医生,1,2025-12-30,1,65,4.72,44.73,5.04,44.730000,-0.372354
|
||||
175,静脉曲张的杀手医生,1,2025-12-30,0,0,0.18,28.26,0.25,28.260000,-0.883178
|
||||
176,武娜中医,1,2025-12-30,1,28,6.88,304.49,11.92,304.490000,-0.886875
|
||||
177,好孕闺蜜王珂,1,2025-12-30,0,0,63.18,1259.79,74.98,1259.790000,-0.597919
|
||||
179,风湿免疫专家李小峰,1,2025-12-30,100,1205,2.48,671.22,24.51,671.220000,-0.869468
|
||||
180,尹海琴医生,1,2025-12-30,1,299,59.84,139.57,68.60,139.570000,2.217636
|
||||
181,针灸科高小勇医生,1,2025-12-30,15,460,12.87,497.85,22.19,497.850000,-0.786429
|
||||
182,师强华中医,1,2025-12-30,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
183,杜晋芳中医,1,2025-12-30,99,1024,3.99,283.58,10.84,283.580000,-0.888293
|
||||
185,郭俊恒中医,1,2025-12-30,100,637,1.39,93.10,4.37,93.100000,-0.799541
|
||||
186,董强中医,1,2025-12-30,1,29,0.00,3.24,0.08,3.240000,-0.428571
|
||||
187,李亚娟中医,1,2025-12-30,1,29,0.09,5.46,0.16,5.460000,-0.900621
|
||||
188,苗辉医生,1,2025-12-30,0,0,24.61,1436.55,87.87,1436.550000,-0.675625
|
||||
189,耳鼻喉医生夏昆峰,1,2025-12-30,0,0,0.14,19.03,2.62,19.030000,0.048000
|
||||
190,中医苏晨,1,2025-12-30,1,108,0.14,172.15,6.13,172.150000,-0.957229
|
||||
191,智璇医生,1,2025-12-30,1,77,0.44,145.15,7.85,145.150000,-0.795839
|
||||
246,石鹤医生,1,2025-12-30,1,117,0.02,415.67,1.09,415.670000,-0.994206
|
||||
247,梁丽君中医,1,2025-12-30,1,118,0.08,36.78,0.84,36.780000,-0.946015
|
||||
248,崔丽荣中医,1,2025-12-30,1,182,2.06,24.80,2.58,24.800000,-0.609091
|
||||
249,张承红中医,1,2025-12-30,50,482,0.64,15.76,0.73,15.760000,-0.730627
|
||||
253,中医郑伟,1,2025-12-30,1,29,0.00,0.12,0.00,0.120000,0.000000
|
||||
254,感染科郭金存医生,1,2025-12-30,0,0,3.33,134.21,5.86,134.210000,-0.833475
|
||||
255,贾素芬中医,1,2025-12-30,100,640,0.00,2.42,0.00,2.420000,-1.000000
|
||||
256,张立净,1,2025-12-30,1,17,0.10,7.29,0.10,7.290000,-0.980431
|
||||
257,皮肤科李英医生,1,2025-12-30,1,206,0.00,64.14,12.94,64.140000,-0.237028
|
||||
364,针灸科冀占岭大夫,1,2025-12-30,15,86,0.00,11.75,0.00,11.750000,-1.000000
|
||||
366,脊柱微创易端医生,1,2025-12-30,50,456,8.70,66.16,8.70,66.160000,-0.004577
|
||||
368,苗晋玲医生,1,2025-12-30,100,820,0.00,0.00,0.00,0.000000,-1.000000
|
||||
369,跟着车主任学中医,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
369,跟着车主任学中医,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
369,跟着车主任学中医,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
369,跟着车主任学中医,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
370,郭主任讲中医,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
370,郭主任讲中医,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
370,郭主任讲中医,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
370,郭主任讲中医,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
371,洪一针讲中医,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
371,洪一针讲中医,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
371,洪一针讲中医,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
371,洪一针讲中医,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
372,李医生聊健康,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
372,李医生聊健康,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
372,李医生聊健康,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
372,李医生聊健康,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
373,刘刚医生说,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
373,刘刚医生说,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
373,刘刚医生说,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
373,刘刚医生说,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
374,小丽讲中医,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
374,小丽讲中医,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
374,小丽讲中医,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
374,小丽讲中医,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
375,西北中医张宝庆,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
375,西北中医张宝庆,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
375,西北中医张宝庆,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
375,西北中医张宝庆,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
376,胡锋医生,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
376,胡锋医生,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
376,胡锋医生,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
376,胡锋医生,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
377,神经内科巴医生,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
377,神经内科巴医生,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
377,神经内科巴医生,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
377,神经内科巴医生,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
378,曾国禄讲中医,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
378,曾国禄讲中医,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
378,曾国禄讲中医,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
|
||||
378,曾国禄讲中医,1,2025-12-31,0,0,0.00,48.56,0.00,0.000000,0.000000
|
||||
379,泌尿男科陈医生,1,2025-12-30,1,54,0.00,35.54,0.79,35.540000,-0.875591
|
||||
380,肇庆中医何大夫,1,2025-12-30,15,110,0.84,122.97,5.42,122.970000,-0.802045
|
||||
381,李伟枫医生,1,2025-12-30,15,142,0.02,9.38,2.42,9.380000,0.152381
|
||||
382,刘医生讲中医,1,2025-12-30,15,141,0.03,115.38,1.90,115.380000,-0.866479
|
||||
383,卢医生讲健康,1,2025-12-30,1,16,0.00,17.87,0.23,17.870000,-0.877660
|
||||
384,阮志华讲健康,1,2025-12-30,1,16,0.00,14.14,0.54,14.140000,-0.953886
|
||||
385,沈理医生,1,2025-12-30,50,310,0.13,65.73,1.87,65.730000,-0.789651
|
||||
386,中医治肾病周厘,1,2025-12-30,1,18,0.00,1.50,0.06,1.500000,-0.923077
|
||||
387,中医妇产科安向荣,1,2025-12-30,15,142,0.06,25.08,3.39,25.080000,0.059375
|
||||
388,院博医生,1,2025-12-30,1,16,0.00,0.52,0.03,0.520000,-0.888889
|
||||
389,中医盛刚,1,2025-12-30,14,113,0.04,37.14,1.01,37.140000,-0.941585
|
||||
390,中医王雷,1,2025-12-30,1,10,0.00,1.70,0.11,1.700000,-0.877778
|
||||
391,泌尿科邱医生,1,2025-12-30,1,15,0.00,87.82,2.93,87.820000,-0.825906
|
||||
|
||||
|
548
batch_import_history.py
Normal file
548
batch_import_history.py
Normal file
@@ -0,0 +1,548 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
批量历史数据导入脚本
|
||||
|
||||
功能:
|
||||
1. 按日期范围循环抓取百家号数据
|
||||
2. 每次抓取后自动导出CSV
|
||||
3. 自动导入数据库
|
||||
4. 记录执行日志和错误信息
|
||||
5. 自动重试机制(针对网络、代理等临时性错误)
|
||||
|
||||
使用方法:
|
||||
# 基本用法
|
||||
python batch_import_history.py --start 2025-12-01 --end 2025-12-25
|
||||
|
||||
# 跳过失败的日期继续执行
|
||||
python batch_import_history.py --start 2025-12-01 --end 2025-12-25 --skip-failed
|
||||
|
||||
# 自定义重试次数(默认3次)
|
||||
python batch_import_history.py --start 2025-12-01 --end 2025-12-25 --max-retries 5
|
||||
|
||||
# 组合使用
|
||||
python batch_import_history.py --start 2025-12-01 --end 2025-12-25 --skip-failed --max-retries 5
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
import subprocess
|
||||
import argparse
|
||||
from datetime import datetime, timedelta
|
||||
from typing import List, Tuple, Optional
|
||||
import json
|
||||
import time
|
||||
|
||||
# 设置UTF-8编码
|
||||
if sys.platform == 'win32':
|
||||
import io
|
||||
if not isinstance(sys.stdout, io.TextIOWrapper) or sys.stdout.encoding != 'utf-8':
|
||||
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')
|
||||
if not isinstance(sys.stderr, io.TextIOWrapper) or sys.stderr.encoding != 'utf-8':
|
||||
sys.stderr = io.TextIOWrapper(sys.stderr.buffer, encoding='utf-8')
|
||||
|
||||
|
||||
class BatchImporter:
|
||||
"""批量历史数据导入器"""
|
||||
|
||||
def __init__(self, start_date: str, end_date: str, skip_failed: bool = False, max_retries: int = 3):
|
||||
"""初始化
|
||||
|
||||
Args:
|
||||
start_date: 开始日期 (YYYY-MM-DD)
|
||||
end_date: 结束日期 (YYYY-MM-DD)
|
||||
skip_failed: 是否跳过失败的日期继续执行
|
||||
max_retries: 每个步骤的最大重试次数(默认:3)
|
||||
"""
|
||||
self.script_dir = os.path.dirname(os.path.abspath(__file__))
|
||||
self.start_date = datetime.strptime(start_date, '%Y-%m-%d')
|
||||
self.end_date = datetime.strptime(end_date, '%Y-%m-%d')
|
||||
self.skip_failed = skip_failed
|
||||
self.max_retries = max_retries
|
||||
|
||||
# 脚本路径
|
||||
self.analytics_script = os.path.join(self.script_dir, 'bjh_analytics_date.py')
|
||||
self.export_script = os.path.join(self.script_dir, 'export_to_csv.py')
|
||||
self.import_script = os.path.join(self.script_dir, 'import_csv_to_database.py')
|
||||
|
||||
# 日志文件
|
||||
self.log_dir = os.path.join(self.script_dir, 'logs')
|
||||
if not os.path.exists(self.log_dir):
|
||||
os.makedirs(self.log_dir)
|
||||
|
||||
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
|
||||
self.log_file = os.path.join(self.log_dir, f'batch_import_{timestamp}.log')
|
||||
|
||||
# 执行结果记录
|
||||
self.results = []
|
||||
|
||||
# 验证脚本文件存在
|
||||
self._validate_scripts()
|
||||
|
||||
def _validate_scripts(self):
|
||||
"""验证所需脚本文件是否存在"""
|
||||
scripts = {
|
||||
'bjh_analytics_date.py': self.analytics_script,
|
||||
'export_to_csv.py': self.export_script,
|
||||
'import_csv_to_database.py': self.import_script
|
||||
}
|
||||
|
||||
missing_scripts = []
|
||||
for name, path in scripts.items():
|
||||
if not os.path.exists(path):
|
||||
missing_scripts.append(name)
|
||||
|
||||
if missing_scripts:
|
||||
print(f"[X] 缺少必要的脚本文件:")
|
||||
for script in missing_scripts:
|
||||
print(f" - {script}")
|
||||
raise FileNotFoundError("脚本文件缺失")
|
||||
|
||||
def log(self, message: str, level: str = 'INFO'):
|
||||
"""记录日志
|
||||
|
||||
Args:
|
||||
message: 日志消息
|
||||
level: 日志级别 (INFO, WARNING, ERROR)
|
||||
"""
|
||||
timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
|
||||
log_line = f"[{timestamp}] [{level}] {message}"
|
||||
|
||||
# 输出到控制台
|
||||
print(log_line)
|
||||
|
||||
# 写入日志文件
|
||||
try:
|
||||
with open(self.log_file, 'a', encoding='utf-8') as f:
|
||||
f.write(log_line + '\n')
|
||||
except Exception as e:
|
||||
print(f"[!] 写入日志文件失败: {e}")
|
||||
|
||||
def get_date_list(self) -> List[str]:
|
||||
"""生成日期列表
|
||||
|
||||
Returns:
|
||||
日期字符串列表 (YYYY-MM-DD)
|
||||
"""
|
||||
dates = []
|
||||
current = self.start_date
|
||||
|
||||
while current <= self.end_date:
|
||||
dates.append(current.strftime('%Y-%m-%d'))
|
||||
current += timedelta(days=1)
|
||||
|
||||
return dates
|
||||
|
||||
def run_command_with_retry(self, cmd: List[str], step_name: str, max_retries: Optional[int] = None) -> Tuple[bool, str]:
|
||||
"""执行命令(带重试机制)
|
||||
|
||||
Args:
|
||||
cmd: 命令列表
|
||||
step_name: 步骤名称
|
||||
max_retries: 最大重试次数,默认使用实例配置
|
||||
|
||||
Returns:
|
||||
(是否成功, 错误信息)
|
||||
"""
|
||||
if max_retries is None:
|
||||
max_retries = self.max_retries
|
||||
|
||||
retry_count = 0
|
||||
last_error = ""
|
||||
|
||||
while retry_count <= max_retries:
|
||||
if retry_count > 0:
|
||||
# 重试前等待,递增延迟:5秒、10秒、15秒
|
||||
wait_time = retry_count * 5
|
||||
self.log(f"{step_name} 第{retry_count}次重试,等待 {wait_time} 秒...", level='WARNING')
|
||||
time.sleep(wait_time)
|
||||
|
||||
# 执行命令
|
||||
success, error = self.run_command(cmd, step_name)
|
||||
|
||||
if success:
|
||||
if retry_count > 0:
|
||||
self.log(f"{step_name} 重试成功!(第{retry_count}次重试)", level='INFO')
|
||||
return True, ""
|
||||
|
||||
# 失败,记录错误
|
||||
last_error = error
|
||||
retry_count += 1
|
||||
|
||||
# 判断是否需要重试
|
||||
if retry_count <= max_retries:
|
||||
# 可重试的错误类型
|
||||
retryable_errors = [
|
||||
'超时',
|
||||
'timeout',
|
||||
'连接',
|
||||
'connection',
|
||||
'代理',
|
||||
'proxy',
|
||||
'网络',
|
||||
'network',
|
||||
'RemoteDisconnected',
|
||||
'ConnectionError',
|
||||
'ProxyError'
|
||||
]
|
||||
|
||||
# 检查错误信息是否包含可重试的关键词
|
||||
is_retryable = any(keyword in str(error).lower() for keyword in retryable_errors)
|
||||
|
||||
if is_retryable:
|
||||
self.log(f"{step_name} 出现可重试错误: {error}", level='WARNING')
|
||||
else:
|
||||
# 不可重试的错误,直接失败
|
||||
self.log(f"{step_name} 出现不可重试错误,停止重试: {error}", level='ERROR')
|
||||
return False, error
|
||||
|
||||
# 所有重试失败
|
||||
self.log(f"{step_name} 失败,已达最大重试次数 ({max_retries})", level='ERROR')
|
||||
return False, last_error
|
||||
|
||||
def run_command(self, cmd: List[str], step_name: str) -> Tuple[bool, str]:
|
||||
"""执行命令
|
||||
|
||||
Args:
|
||||
cmd: 命令列表
|
||||
step_name: 步骤名称
|
||||
|
||||
Returns:
|
||||
(是否成功, 错误信息)
|
||||
"""
|
||||
process = None
|
||||
try:
|
||||
self.log(f"执行命令: {' '.join(cmd)}")
|
||||
|
||||
# 使用subprocess运行命令,实时输出
|
||||
process = subprocess.Popen(
|
||||
cmd,
|
||||
cwd=self.script_dir,
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=subprocess.STDOUT, # 合并stderr到stdout
|
||||
text=True,
|
||||
encoding='utf-8',
|
||||
bufsize=1, # 行缓冲
|
||||
universal_newlines=True
|
||||
)
|
||||
|
||||
# 实时读取输出
|
||||
output_lines = []
|
||||
if process.stdout:
|
||||
try:
|
||||
for line in process.stdout:
|
||||
line = line.rstrip()
|
||||
if line: # 只输出非空行
|
||||
print(f" {line}") # 实时输出到控制台
|
||||
output_lines.append(line)
|
||||
# 每10行记录一次日志,减少日志文件大小
|
||||
if len(output_lines) % 10 == 0:
|
||||
self.log(f"{step_name} 运行中... (已输出{len(output_lines)}行)")
|
||||
except Exception as e:
|
||||
self.log(f"读取输出异常: {e}", level='WARNING')
|
||||
|
||||
# 等待进程结束
|
||||
return_code = process.wait(timeout=600) # 10分钟超时
|
||||
|
||||
# 记录完整输出
|
||||
full_output = '\n'.join(output_lines)
|
||||
if full_output:
|
||||
self.log(f"{step_name} 输出:\n{full_output}")
|
||||
|
||||
# 检查返回码
|
||||
if return_code == 0:
|
||||
self.log(f"[✓] {step_name} 执行成功", level='INFO')
|
||||
return True, ""
|
||||
else:
|
||||
error_msg = f"返回码: {return_code}"
|
||||
self.log(f"[X] {step_name} 执行失败: {error_msg}", level='ERROR')
|
||||
return False, error_msg
|
||||
|
||||
except subprocess.TimeoutExpired:
|
||||
if process:
|
||||
process.kill()
|
||||
error_msg = "命令执行超时(>10分钟)"
|
||||
self.log(f"[X] {step_name} 失败: {error_msg}", level='ERROR')
|
||||
return False, error_msg
|
||||
|
||||
except Exception as e:
|
||||
error_msg = str(e)
|
||||
self.log(f"[X] {step_name} 异常: {error_msg}", level='ERROR')
|
||||
import traceback
|
||||
self.log(f"异常堆栈:\n{traceback.format_exc()}", level='ERROR')
|
||||
return False, error_msg
|
||||
|
||||
def process_date(self, date_str: str) -> bool:
|
||||
"""处理单个日期的数据
|
||||
|
||||
Args:
|
||||
date_str: 日期字符串 (YYYY-MM-DD)
|
||||
|
||||
Returns:
|
||||
是否成功
|
||||
"""
|
||||
self.log("="*70)
|
||||
self.log(f"开始处理日期: {date_str}")
|
||||
self.log("="*70)
|
||||
|
||||
result = {
|
||||
'date': date_str,
|
||||
'start_time': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
|
||||
'steps': {},
|
||||
'success': False,
|
||||
'error': None
|
||||
}
|
||||
|
||||
# 步骤1: 数据抓取(带重试)
|
||||
self.log(f"\n[步骤 1/3] 抓取 {date_str} 的数据...")
|
||||
cmd_analytics = [
|
||||
sys.executable,
|
||||
self.analytics_script,
|
||||
date_str,
|
||||
'--proxy',
|
||||
'--database',
|
||||
'--no-confirm' # 跳过确认提示
|
||||
]
|
||||
|
||||
success, error = self.run_command_with_retry(cmd_analytics, f"数据抓取 ({date_str})")
|
||||
result['steps']['analytics'] = {'success': success, 'error': error}
|
||||
|
||||
if not success:
|
||||
result['error'] = f"数据抓取失败: {error}"
|
||||
result['end_time'] = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
|
||||
self.results.append(result)
|
||||
return False
|
||||
|
||||
# 等待2秒,确保文件写入完成
|
||||
time.sleep(2)
|
||||
|
||||
# 步骤2: 导出CSV(带重试)
|
||||
self.log(f"\n[步骤 2/3] 导出CSV文件...")
|
||||
cmd_export = [
|
||||
sys.executable,
|
||||
self.export_script,
|
||||
'--mode', 'csv',
|
||||
'--no-confirm' # 跳过确认提示
|
||||
]
|
||||
|
||||
success, error = self.run_command_with_retry(cmd_export, f"CSV导出 ({date_str})")
|
||||
|
||||
result['steps']['export'] = {'success': success, 'error': error}
|
||||
|
||||
if not success:
|
||||
result['error'] = f"CSV导出失败: {error}"
|
||||
result['end_time'] = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
|
||||
self.results.append(result)
|
||||
return False
|
||||
|
||||
# 等待2秒
|
||||
time.sleep(2)
|
||||
|
||||
# 步骤3: 导入数据库(带重试)
|
||||
self.log(f"\n[步骤 3/3] 导入数据库...")
|
||||
cmd_import = [
|
||||
sys.executable,
|
||||
self.import_script
|
||||
]
|
||||
|
||||
success, error = self.run_command_with_retry(cmd_import, f"数据库导入 ({date_str})")
|
||||
result['steps']['import'] = {'success': success, 'error': error}
|
||||
|
||||
if not success:
|
||||
result['error'] = f"数据库导入失败: {error}"
|
||||
result['end_time'] = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
|
||||
self.results.append(result)
|
||||
return False
|
||||
|
||||
# 全部成功
|
||||
result['success'] = True
|
||||
result['end_time'] = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
|
||||
self.results.append(result)
|
||||
|
||||
self.log(f"\n[✓] {date_str} 处理完成!")
|
||||
self.log("="*70 + "\n")
|
||||
|
||||
return True
|
||||
|
||||
def run(self):
|
||||
"""执行批量导入"""
|
||||
dates = self.get_date_list()
|
||||
|
||||
print("\n" + "="*70)
|
||||
print("批量历史数据导入")
|
||||
print("="*70)
|
||||
print(f"开始日期: {self.start_date.strftime('%Y-%m-%d')}")
|
||||
print(f"结束日期: {self.end_date.strftime('%Y-%m-%d')}")
|
||||
print(f"总天数: {len(dates)} 天")
|
||||
print(f"跳过失败: {'是' if self.skip_failed else '否'}")
|
||||
print(f"最大重试次数: {self.max_retries}")
|
||||
print(f"日志文件: {self.log_file}")
|
||||
print("="*70)
|
||||
|
||||
# 确认执行
|
||||
confirm = input("\n是否开始执行? (y/n): ").strip().lower()
|
||||
if confirm != 'y':
|
||||
print("已取消")
|
||||
return
|
||||
|
||||
self.log(f"开始批量导入: {len(dates)} 个日期")
|
||||
start_time = datetime.now()
|
||||
|
||||
success_count = 0
|
||||
failed_count = 0
|
||||
|
||||
for idx, date_str in enumerate(dates, 1):
|
||||
print(f"\n{'='*70}")
|
||||
print(f"进度: [{idx}/{len(dates)}] {date_str}")
|
||||
print(f"{'='*70}")
|
||||
|
||||
success = self.process_date(date_str)
|
||||
|
||||
if success:
|
||||
success_count += 1
|
||||
else:
|
||||
failed_count += 1
|
||||
|
||||
# 如果不跳过失败,则停止执行
|
||||
if not self.skip_failed:
|
||||
self.log(f"[X] 日期 {date_str} 处理失败,停止执行", level='ERROR')
|
||||
break
|
||||
else:
|
||||
self.log(f"[!] 日期 {date_str} 处理失败,跳过继续", level='WARNING')
|
||||
|
||||
# 日期间延迟(避免请求过快)
|
||||
if idx < len(dates):
|
||||
delay = 5
|
||||
self.log(f"等待 {delay} 秒后处理下一个日期...")
|
||||
time.sleep(delay)
|
||||
|
||||
# 执行完成
|
||||
end_time = datetime.now()
|
||||
duration = end_time - start_time
|
||||
|
||||
print("\n" + "="*70)
|
||||
print("批量导入完成")
|
||||
print("="*70)
|
||||
print(f"总耗时: {duration}")
|
||||
print(f"成功: {success_count} 天")
|
||||
print(f"失败: {failed_count} 天")
|
||||
print(f"日志文件: {self.log_file}")
|
||||
print("="*70)
|
||||
|
||||
self.log("="*70)
|
||||
self.log(f"批量导入完成: 成功 {success_count} 天, 失败 {failed_count} 天")
|
||||
self.log(f"总耗时: {duration}")
|
||||
self.log("="*70)
|
||||
|
||||
# 保存执行结果
|
||||
self._save_results()
|
||||
|
||||
# 显示失败的日期
|
||||
if failed_count > 0:
|
||||
print("\n失败的日期:")
|
||||
for r in self.results:
|
||||
if not r['success']:
|
||||
print(f" - {r['date']}: {r.get('error', '未知错误')}")
|
||||
|
||||
def _save_results(self):
|
||||
"""保存执行结果到JSON文件"""
|
||||
try:
|
||||
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
|
||||
result_file = os.path.join(self.log_dir, f'batch_result_{timestamp}.json')
|
||||
|
||||
summary = {
|
||||
'start_date': self.start_date.strftime('%Y-%m-%d'),
|
||||
'end_date': self.end_date.strftime('%Y-%m-%d'),
|
||||
'total_dates': len(self.results),
|
||||
'success_count': sum(1 for r in self.results if r['success']),
|
||||
'failed_count': sum(1 for r in self.results if not r['success']),
|
||||
'results': self.results
|
||||
}
|
||||
|
||||
with open(result_file, 'w', encoding='utf-8') as f:
|
||||
json.dump(summary, f, ensure_ascii=False, indent=2)
|
||||
|
||||
self.log(f"执行结果已保存: {result_file}")
|
||||
|
||||
except Exception as e:
|
||||
self.log(f"保存执行结果失败: {e}", level='ERROR')
|
||||
|
||||
|
||||
def main():
|
||||
"""主函数"""
|
||||
parser = argparse.ArgumentParser(
|
||||
description='批量历史数据导入脚本',
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
示例用法:
|
||||
python batch_import_history.py --start 2025-12-01 --end 2025-12-25
|
||||
python batch_import_history.py --start 2025-12-01 --end 2025-12-25 --skip-failed
|
||||
"""
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--start',
|
||||
type=str,
|
||||
required=True,
|
||||
help='开始日期 (格式: YYYY-MM-DD)'
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--end',
|
||||
type=str,
|
||||
required=True,
|
||||
help='结束日期 (格式: YYYY-MM-DD)'
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--skip-failed',
|
||||
action='store_true',
|
||||
help='跳过失败的日期继续执行(默认:遇到失败停止)'
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--max-retries',
|
||||
type=int,
|
||||
default=3,
|
||||
help='每个步骤的最大重试次数(默认:3)'
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# 验证日期格式
|
||||
try:
|
||||
start = datetime.strptime(args.start, '%Y-%m-%d')
|
||||
end = datetime.strptime(args.end, '%Y-%m-%d')
|
||||
|
||||
if start > end:
|
||||
print("[X] 开始日期不能晚于结束日期")
|
||||
return 1
|
||||
|
||||
except ValueError as e:
|
||||
print(f"[X] 日期格式错误: {e}")
|
||||
print(" 正确格式: YYYY-MM-DD (例如: 2025-12-01)")
|
||||
return 1
|
||||
|
||||
try:
|
||||
# 创建导入器
|
||||
importer = BatchImporter(
|
||||
start_date=args.start,
|
||||
end_date=args.end,
|
||||
skip_failed=args.skip_failed,
|
||||
max_retries=args.max_retries
|
||||
)
|
||||
|
||||
# 执行批量导入
|
||||
importer.run()
|
||||
|
||||
return 0
|
||||
|
||||
except Exception as e:
|
||||
print(f"\n[X] 程序执行出错: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
return 1
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
sys.exit(main())
|
||||
292
bjh_analytics.py
292
bjh_analytics.py
@@ -29,12 +29,12 @@ from database_config import DatabaseManager, DB_CONFIG
|
||||
|
||||
# 代理配置 - 大麦代理IP
|
||||
PROXY_API_URL = (
|
||||
'https://api2.damaiip.com/index.php?s=/front/user/getIPlist&xsn=e054861d08471263d970bde4f4905181&osn=TC_NO176655872088456223&tiqu=1'
|
||||
'https://api2.damaiip.com/index.php?s=/front/user/getIPlist&xsn=2912cb2b22d3b7ae724f045012790479&osn=TC_NO176707424165606223&tiqu=1'
|
||||
)
|
||||
|
||||
# 大麦代理账号密码认证
|
||||
PROXY_USERNAME = '694b8c3172af7'
|
||||
PROXY_PASSWORD = 'q8yA8x1dwCpdyIK'
|
||||
PROXY_USERNAME = '69538fdef04e1'
|
||||
PROXY_PASSWORD = '63v0kQBr2yJXnjf'
|
||||
|
||||
# 备用固定代理IP池(格式:'IP:端口', '用户名', '密码')
|
||||
BACKUP_PROXY_POOL = [
|
||||
@@ -62,7 +62,8 @@ class BaijiahaoAnalytics:
|
||||
|
||||
# 代理配置
|
||||
self.use_proxy = use_proxy
|
||||
self.current_proxy = None
|
||||
self.current_proxy = None # 当前IP,使用完后/失败后才重新获取
|
||||
self.proxy_fail_count = 0 # 当前代理失败次数
|
||||
|
||||
# 数据库配置
|
||||
self.load_from_db = load_from_db
|
||||
@@ -76,6 +77,8 @@ class BaijiahaoAnalytics:
|
||||
if self.use_proxy:
|
||||
self.logger.info("已启用代理模式")
|
||||
print("[配置] 已启用代理模式")
|
||||
# 初始化时获取第一个代理
|
||||
self.fetch_proxy(force_new=True)
|
||||
|
||||
if self.load_from_db:
|
||||
self.logger.info("已启用数据库加载模式")
|
||||
@@ -99,6 +102,12 @@ class BaijiahaoAnalytics:
|
||||
self.analytics_output = os.path.join(self.script_dir, "bjh_analytics_data.json")
|
||||
self.income_output = os.path.join(self.script_dir, "bjh_income_data_v2.json")
|
||||
|
||||
# 创建备份文件夹
|
||||
self.backup_dir = os.path.join(self.script_dir, "backup")
|
||||
if not os.path.exists(self.backup_dir):
|
||||
os.makedirs(self.backup_dir)
|
||||
print(f"[OK] 创建备份文件夹: {self.backup_dir}")
|
||||
|
||||
def cookie_string_to_dict(self, cookie_string: str) -> Dict:
|
||||
"""将Cookie字符串转换为字典格式
|
||||
|
||||
@@ -230,15 +239,23 @@ class BaijiahaoAnalytics:
|
||||
print(f"[OK] 已设置账号 {account_id} 的Cookie ({len(cookies)} 个字段)")
|
||||
return True
|
||||
|
||||
def fetch_proxy(self) -> Optional[Dict]:
|
||||
def fetch_proxy(self, force_new: bool = False) -> Optional[Dict]:
|
||||
"""从代理服务获取一个可用代理,失败时使用备用固定代理
|
||||
|
||||
Args:
|
||||
force_new: 是否强制获取新代理,默认False(优先使用当前IP)
|
||||
|
||||
Returns:
|
||||
代理配置字典,格式: {'http': 'http://...', 'https': 'http://...'}
|
||||
"""
|
||||
if not self.use_proxy:
|
||||
return None
|
||||
|
||||
# 如果已有可用代理且不强制获取新代理,直接返回
|
||||
if self.current_proxy and not force_new:
|
||||
return self.current_proxy
|
||||
|
||||
# 获取新代理
|
||||
try:
|
||||
# 使用大麦代理API获取IP
|
||||
resp = requests.get(PROXY_API_URL, timeout=10)
|
||||
@@ -247,21 +264,30 @@ class BaijiahaoAnalytics:
|
||||
# 首先尝试解析为纯文本格式(最常见)
|
||||
text = resp.text.strip()
|
||||
|
||||
# 检测是否返回错误信息
|
||||
if text.upper().startswith('ERROR'):
|
||||
raise Exception(f"代理API返回错误: {text}")
|
||||
|
||||
# 尝试直接解析为IP:PORT格式
|
||||
lines = text.split('\n')
|
||||
for line in lines:
|
||||
line = line.strip()
|
||||
if ':' in line and not line.startswith('{') and not line.startswith('['):
|
||||
# 找到第一个IP:PORT格式
|
||||
ip_port = line.split()[0] if ' ' in line else line # 处理可能带有其他信息的情况
|
||||
ip_port = line.split()[0] if ' ' in line else line
|
||||
|
||||
if ip_port.count(':') == 1: # 确保是IP:PORT格式
|
||||
nowtime = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
|
||||
self.logger.info(f'提取大麦代理IP(文本): {ip_port} at {nowtime}')
|
||||
print(f'[代理] 提取大麦IP: {ip_port}')
|
||||
|
||||
# 大麦代理使用账号密码认证
|
||||
host, port = ip_port.split(':', 1)
|
||||
if PROXY_USERNAME and PROXY_PASSWORD:
|
||||
proxy_url = f'http://{PROXY_USERNAME}:{PROXY_PASSWORD}@{host}:{port}'
|
||||
else:
|
||||
proxy_url = f'http://{host}:{port}'
|
||||
|
||||
self.current_proxy = {
|
||||
'http': proxy_url,
|
||||
'https': proxy_url,
|
||||
@@ -282,8 +308,12 @@ class BaijiahaoAnalytics:
|
||||
self.logger.info(f'提取大麦代理IP(JSON): {ip_port} at {nowtime}')
|
||||
print(f'[代理] 提取大麦IP: {ip_port}')
|
||||
|
||||
# 构建带账密的代理URL: http://username:password@host:port
|
||||
# 大麦代理使用账号密码认证
|
||||
if PROXY_USERNAME and PROXY_PASSWORD:
|
||||
proxy_url = f'http://{PROXY_USERNAME}:{PROXY_PASSWORD}@{ip_info["ip"]}:{ip_info["port"]}'
|
||||
else:
|
||||
proxy_url = f'http://{ip_info["ip"]}:{ip_info["port"]}'
|
||||
|
||||
self.current_proxy = {
|
||||
'http': proxy_url,
|
||||
'https': proxy_url,
|
||||
@@ -316,6 +346,34 @@ class BaijiahaoAnalytics:
|
||||
}
|
||||
return self.current_proxy
|
||||
|
||||
def mark_proxy_failed(self):
|
||||
"""标记当前代理失败,失败超过3次后重新获取代理
|
||||
|
||||
Returns:
|
||||
bool: 是否需要重新获取代理
|
||||
"""
|
||||
if not self.use_proxy or not self.current_proxy:
|
||||
return False
|
||||
|
||||
self.proxy_fail_count += 1
|
||||
self.logger.warning(f"当前代理失败次数: {self.proxy_fail_count}")
|
||||
|
||||
# 失败超过3次,重新获取代理
|
||||
if self.proxy_fail_count >= 3:
|
||||
self.logger.info("当前代理失败次数过多,重新获取新代理")
|
||||
print(f"[代理] 失败{self.proxy_fail_count}次,重新获取新代理")
|
||||
self.current_proxy = None
|
||||
self.proxy_fail_count = 0
|
||||
# 强制获取新代理
|
||||
self.fetch_proxy(force_new=True)
|
||||
return True
|
||||
|
||||
return False
|
||||
|
||||
def reset_proxy_fail_count(self):
|
||||
"""重置代理失败计数(请求成功后调用)"""
|
||||
self.proxy_fail_count = 0
|
||||
|
||||
def get_common_headers(self) -> Dict:
|
||||
"""获取通用请求头"""
|
||||
return {
|
||||
@@ -425,6 +483,8 @@ class BaijiahaoAnalytics:
|
||||
|
||||
successful_data = []
|
||||
retry_count = 0
|
||||
proxy_change_count = 0 # 代理更换次数计数器
|
||||
max_proxy_changes = 3 # 最多更换3次代理(即最多使用4个不同代理)
|
||||
|
||||
while retry_count <= max_retries:
|
||||
try:
|
||||
@@ -438,6 +498,21 @@ class BaijiahaoAnalytics:
|
||||
# 获取代理(如果启用)
|
||||
proxies = self.fetch_proxy() if self.use_proxy else None
|
||||
|
||||
# 调试信息:显示代理使用情况
|
||||
if self.use_proxy:
|
||||
if proxies:
|
||||
proxy_url = proxies.get('http', '')
|
||||
if '@' in proxy_url:
|
||||
# 提取IP部分(隐藏账号密码)
|
||||
proxy_ip = proxy_url.split('@')[1]
|
||||
else:
|
||||
proxy_ip = proxy_url.replace('http://', '').replace('https://', '')
|
||||
self.logger.info(f"发文统计API 使用代理: {proxy_ip}")
|
||||
print(f" [代理] 使用IP: {proxy_ip}")
|
||||
else:
|
||||
self.logger.warning(f"发文统计API 代理未生效!use_proxy={self.use_proxy}")
|
||||
print(f" [!] 警告:代理未生效!use_proxy={self.use_proxy}")
|
||||
|
||||
response = self.session.get(
|
||||
api_url,
|
||||
headers=headers,
|
||||
@@ -462,6 +537,9 @@ class BaijiahaoAnalytics:
|
||||
self.logger.info("发文统计API调用成功")
|
||||
print(f" [✓] API调用成功")
|
||||
|
||||
# 请求成功,重置代理失败计数
|
||||
self.reset_proxy_fail_count()
|
||||
|
||||
# 提取发文统计数据
|
||||
total_info = data.get('data', {}).get('total_info', {})
|
||||
|
||||
@@ -490,6 +568,34 @@ class BaijiahaoAnalytics:
|
||||
else:
|
||||
self.logger.error(f"API返回错误: errno={errno}, errmsg={errmsg}")
|
||||
print(f" [X] API返回错误: errno={errno}, errmsg={errmsg}")
|
||||
|
||||
# 特别处理 errno=10000015 (异常请求),这通常是代理未生效
|
||||
if errno == 10000015 and self.use_proxy:
|
||||
self.logger.warning("检测到 errno=10000015(异常请求),代理未生效,立即强制更换新代理")
|
||||
print(f" [!] 检测到代理未生效,立即更换新代理")
|
||||
|
||||
# 检查是否超过代理更换上限
|
||||
if proxy_change_count >= max_proxy_changes:
|
||||
print(f" [X] 已达代理更换上限({max_proxy_changes}次),放弃重试")
|
||||
break
|
||||
|
||||
# 立即强制获取新代理(不等待3次)
|
||||
self.current_proxy = None
|
||||
self.proxy_fail_count = 0
|
||||
new_proxy = self.fetch_proxy(force_new=True)
|
||||
|
||||
if new_proxy:
|
||||
# 如果还没达到重试上限,尝试重试
|
||||
if retry_count < max_retries:
|
||||
proxy_change_count += 1
|
||||
self.logger.info(f"已更换新代理({proxy_change_count}/{max_proxy_changes}),将重试,当前第{retry_count+1}次")
|
||||
print(f" [!] 已更换新代理({proxy_change_count}/{max_proxy_changes}),将重试...")
|
||||
retry_count += 1
|
||||
continue
|
||||
else:
|
||||
self.logger.error("无法获取新代理,放弃重试")
|
||||
print(f" [X] 无法获取新代理")
|
||||
|
||||
break # API错误,不重试
|
||||
|
||||
except json.JSONDecodeError as e:
|
||||
@@ -521,6 +627,58 @@ class BaijiahaoAnalytics:
|
||||
if retry_count < max_retries:
|
||||
self.logger.warning(f"发文统计API代理连接错误: {error_type},将重试")
|
||||
print(f" [!] 代理连接错误: {error_type}")
|
||||
|
||||
# 标记代理失败
|
||||
self.mark_proxy_failed()
|
||||
|
||||
# 超时或连接错误立即更换代理(不等待3次失败)
|
||||
if self.use_proxy and ('Timeout' in error_type or 'Connection' in error_type or 'ProxyError' in error_type):
|
||||
# 检查是否超过代理更换上限
|
||||
if proxy_change_count >= max_proxy_changes:
|
||||
self.logger.error(f"已达代理更换上限({max_proxy_changes}次),放弃重试")
|
||||
print(f" [X] 已达代理更换上限({max_proxy_changes}次),放弃重试")
|
||||
break
|
||||
|
||||
self.logger.warning(f"检测到{error_type}错误,立即更换新代理")
|
||||
print(f" [!] 检测到{error_type},立即更换新代理")
|
||||
self.current_proxy = None
|
||||
self.proxy_fail_count = 0
|
||||
new_proxy = self.fetch_proxy(force_new=True)
|
||||
|
||||
if new_proxy:
|
||||
proxy_change_count += 1
|
||||
self.logger.info(f"已更换新代理({proxy_change_count}/{max_proxy_changes}),继续重试")
|
||||
print(f" [✓] 已更换新代理({proxy_change_count}/{max_proxy_changes}),继续重试")
|
||||
# 更换代理后,不增加retry_count,直接continue重试
|
||||
continue
|
||||
else:
|
||||
self.logger.error("无法获取新代理,放弃重试")
|
||||
print(f" [X] 无法获取新代理")
|
||||
break
|
||||
# 其他代理错误,等待3次失败后更换
|
||||
elif self.proxy_fail_count >= 3 and self.use_proxy:
|
||||
# 检查是否超过代理更换上限
|
||||
if proxy_change_count >= max_proxy_changes:
|
||||
self.logger.error(f"已达代理更换上限({max_proxy_changes}次),放弃重试")
|
||||
print(f" [X] 已达代理更换上限({max_proxy_changes}次),放弃重试")
|
||||
break
|
||||
|
||||
print(f" [!] 代理已失败{self.proxy_fail_count}次,强制更换新代理")
|
||||
self.current_proxy = None
|
||||
self.proxy_fail_count = 0
|
||||
new_proxy = self.fetch_proxy(force_new=True)
|
||||
if new_proxy:
|
||||
proxy_change_count += 1
|
||||
self.logger.info(f"已更换新代理({proxy_change_count}/{max_proxy_changes}),继续重试")
|
||||
print(f" [✓] 已更换新代理({proxy_change_count}/{max_proxy_changes}),继续重试")
|
||||
# 更换代理后,不增加retry_count,直接continue重试
|
||||
continue
|
||||
else:
|
||||
self.logger.error("无法获取新代理")
|
||||
print(f" [X] 无法获取新代理")
|
||||
break
|
||||
|
||||
# 其他情况才增加retry_count
|
||||
retry_count += 1
|
||||
continue
|
||||
else:
|
||||
@@ -701,6 +859,8 @@ class BaijiahaoAnalytics:
|
||||
print(f" API: {api_url}")
|
||||
|
||||
retry_count = 0
|
||||
proxy_change_count = 0 # 代理更换次数计数器
|
||||
max_proxy_changes = 3 # 最多更换3次代理(即最多使用4个不同代理)
|
||||
|
||||
while retry_count <= max_retries:
|
||||
try:
|
||||
@@ -714,6 +874,21 @@ class BaijiahaoAnalytics:
|
||||
# 获取代理(如果启用)
|
||||
proxies = self.fetch_proxy() if self.use_proxy else None
|
||||
|
||||
# 调试信息:显示代理使用情况
|
||||
if self.use_proxy:
|
||||
if proxies:
|
||||
proxy_url = proxies.get('http', '')
|
||||
if '@' in proxy_url:
|
||||
# 提取IP部分(隐藏账号密码)
|
||||
proxy_ip = proxy_url.split('@')[1]
|
||||
else:
|
||||
proxy_ip = proxy_url.replace('http://', '').replace('https://', '')
|
||||
self.logger.info(f"收入API 使用代理: {proxy_ip}")
|
||||
print(f" [代理] 使用IP: {proxy_ip}")
|
||||
else:
|
||||
self.logger.warning(f"收入API 代理未生效!use_proxy={self.use_proxy}")
|
||||
print(f" [!] 警告:代理未生效!use_proxy={self.use_proxy}")
|
||||
|
||||
response = self.session.get(
|
||||
api_url,
|
||||
headers=headers,
|
||||
@@ -735,6 +910,9 @@ class BaijiahaoAnalytics:
|
||||
self.logger.info("收入数据API调用成功")
|
||||
print(f" [✓] API调用成功")
|
||||
|
||||
# 请求成功,重置代理失败计数
|
||||
self.reset_proxy_fail_count()
|
||||
|
||||
# 显示收入数据摘要
|
||||
income_data = data.get('data', {}).get('income', {})
|
||||
if 'recent7Days' in income_data:
|
||||
@@ -752,6 +930,34 @@ class BaijiahaoAnalytics:
|
||||
else:
|
||||
self.logger.error(f"收入API返回错误: errno={errno}, errmsg={errmsg}")
|
||||
print(f" [X] API返回错误: errno={errno}, errmsg={errmsg}")
|
||||
|
||||
# 特别处理 errno=10000015 (异常请求),这通常是代理未生效
|
||||
if errno == 10000015 and self.use_proxy:
|
||||
self.logger.warning("检测到收入API errno=10000015(异常请求),代理未生效,立即强制更换新代理")
|
||||
print(f" [!] 检测到代理未生效,立即更换新代理")
|
||||
|
||||
# 检查是否超过代理更换上限
|
||||
if proxy_change_count >= max_proxy_changes:
|
||||
print(f" [X] 已达代理更换上限({max_proxy_changes}次),放弃重试")
|
||||
return None
|
||||
|
||||
# 立即强制获取新代理(不等待3次)
|
||||
self.current_proxy = None
|
||||
self.proxy_fail_count = 0
|
||||
new_proxy = self.fetch_proxy(force_new=True)
|
||||
|
||||
if new_proxy:
|
||||
# 如果还没达到重试上限,尝试重试
|
||||
if retry_count < max_retries:
|
||||
proxy_change_count += 1
|
||||
self.logger.info(f"已更换新代理({proxy_change_count}/{max_proxy_changes}),将重试收入API,当前第{retry_count+1}次")
|
||||
print(f" [!] 已更换新代理({proxy_change_count}/{max_proxy_changes}),将重试...")
|
||||
retry_count += 1
|
||||
continue
|
||||
else:
|
||||
self.logger.error("无法获取新代理,放弃重试")
|
||||
print(f" [X] 无法获取新代理")
|
||||
|
||||
return None
|
||||
except json.JSONDecodeError as e:
|
||||
self.logger.error(f"收入数据JSON解析失败: {e}")
|
||||
@@ -781,6 +987,58 @@ class BaijiahaoAnalytics:
|
||||
if retry_count < max_retries:
|
||||
self.logger.warning(f"收入数据API代理连接错误: {error_type},将重试")
|
||||
print(f" [!] 代理连接错误: {error_type}")
|
||||
|
||||
# 标记代理失败
|
||||
self.mark_proxy_failed()
|
||||
|
||||
# 超时或连接错误立即更换代理(不等待3次失败)
|
||||
if self.use_proxy and ('Timeout' in error_type or 'Connection' in error_type or 'ProxyError' in error_type):
|
||||
# 检查是否超过代理更换上限
|
||||
if proxy_change_count >= max_proxy_changes:
|
||||
self.logger.error(f"已达代理更换上限({max_proxy_changes}次),放弃重试")
|
||||
print(f" [X] 已达代理更换上限({max_proxy_changes}次),放弃重试")
|
||||
return None
|
||||
|
||||
self.logger.warning(f"检测到{error_type}错误,立即更换新代理")
|
||||
print(f" [!] 检测到{error_type},立即更换新代理")
|
||||
self.current_proxy = None
|
||||
self.proxy_fail_count = 0
|
||||
new_proxy = self.fetch_proxy(force_new=True)
|
||||
|
||||
if new_proxy:
|
||||
proxy_change_count += 1
|
||||
self.logger.info(f"已更换新代理({proxy_change_count}/{max_proxy_changes}),继续重试")
|
||||
print(f" [✓] 已更换新代理({proxy_change_count}/{max_proxy_changes}),继续重试")
|
||||
# 更换代理后,不增加retry_count,直接continue重试
|
||||
continue
|
||||
else:
|
||||
self.logger.error("无法获取新代理,放弃重试")
|
||||
print(f" [X] 无法获取新代理")
|
||||
return None
|
||||
# 其他代理错误,等待3次失败后更换
|
||||
elif self.proxy_fail_count >= 3 and self.use_proxy:
|
||||
# 检查是否超过代理更换上限
|
||||
if proxy_change_count >= max_proxy_changes:
|
||||
self.logger.error(f"已达代理更换上限({max_proxy_changes}次),放弃重试")
|
||||
print(f" [X] 已达代理更换上限({max_proxy_changes}次),放弃重试")
|
||||
return None
|
||||
|
||||
print(f" [!] 代理已失败{self.proxy_fail_count}次,强制更换新代理")
|
||||
self.current_proxy = None
|
||||
self.proxy_fail_count = 0
|
||||
new_proxy = self.fetch_proxy(force_new=True)
|
||||
if new_proxy:
|
||||
proxy_change_count += 1
|
||||
self.logger.info(f"已更换新代理({proxy_change_count}/{max_proxy_changes}),继续重试")
|
||||
print(f" [✓] 已更换新代理({proxy_change_count}/{max_proxy_changes}),继续重试")
|
||||
# 更换代理后,不增加retry_count,直接continue重试
|
||||
continue
|
||||
else:
|
||||
self.logger.error("无法获取新代理")
|
||||
print(f" [X] 无法获取新代理")
|
||||
return None
|
||||
|
||||
# 其他情况才增加retry_count
|
||||
retry_count += 1
|
||||
continue
|
||||
else:
|
||||
@@ -866,6 +1124,8 @@ class BaijiahaoAnalytics:
|
||||
errno = data.get('errno', -1)
|
||||
|
||||
if errno == 0:
|
||||
# 请求成功,重置代理失败计数
|
||||
self.reset_proxy_fail_count()
|
||||
return data
|
||||
else:
|
||||
self.logger.error(f"单日收入API返回错误: errno={errno}")
|
||||
@@ -895,6 +1155,10 @@ class BaijiahaoAnalytics:
|
||||
if is_proxy_error:
|
||||
if retry_count < max_retries:
|
||||
self.logger.warning(f"单日收入代理连接错误 ({target_date.strftime('%Y-%m-%d')}): {error_type},将重试")
|
||||
|
||||
# 标记代理失败
|
||||
self.mark_proxy_failed()
|
||||
|
||||
retry_count += 1
|
||||
continue
|
||||
else:
|
||||
@@ -1068,17 +1332,29 @@ class BaijiahaoAnalytics:
|
||||
return results
|
||||
|
||||
def save_results(self, results: List[Dict]):
|
||||
"""保存结果到文件
|
||||
"""保存结果到文件(同时备份带日期的副本)
|
||||
|
||||
Args:
|
||||
results: 数据分析结果列表
|
||||
"""
|
||||
import shutil
|
||||
|
||||
try:
|
||||
# 1. 保存到主文件(不带时间戳)
|
||||
with open(self.output_file, 'w', encoding='utf-8') as f:
|
||||
json.dump(results, f, ensure_ascii=False, indent=2)
|
||||
|
||||
print(f"\n{'='*70}")
|
||||
print(f"[OK] 数据已保存到: {self.output_file}")
|
||||
|
||||
# 2. 创建带日期的备份文件(只保留日期)
|
||||
timestamp = datetime.now().strftime('%Y%m%d')
|
||||
backup_filename = f"bjh_integrated_data_{timestamp}.json"
|
||||
backup_file = os.path.join(self.backup_dir, backup_filename)
|
||||
|
||||
# 复制文件到备份目录
|
||||
shutil.copy2(self.output_file, backup_file)
|
||||
print(f"[OK] 备份已保存到: {backup_file}")
|
||||
print(f"{'='*70}")
|
||||
|
||||
# 显示统计
|
||||
|
||||
809
bjh_analytics_date.py
Normal file
809
bjh_analytics_date.py
Normal file
@@ -0,0 +1,809 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
百家号指定日期数据抓取工具
|
||||
根据指定日期范围抓取发文统计和收入数据
|
||||
"""
|
||||
|
||||
import json
|
||||
import sys
|
||||
import os
|
||||
import argparse
|
||||
from datetime import datetime, timedelta
|
||||
from typing import Dict, List, Optional
|
||||
|
||||
# 导入基础分析器
|
||||
from bjh_analytics import BaijiahaoAnalytics
|
||||
|
||||
# 设置标准输出编码为UTF-8
|
||||
if sys.platform == 'win32':
|
||||
import io
|
||||
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')
|
||||
sys.stderr = io.TextIOWrapper(sys.stderr.buffer, encoding='utf-8')
|
||||
|
||||
|
||||
class BaijiahaoDateAnalytics(BaijiahaoAnalytics):
|
||||
"""百家号指定日期数据抓取器"""
|
||||
|
||||
def __init__(self, target_date: str, use_proxy: bool = False, load_from_db: bool = False, db_config: Optional[Dict] = None):
|
||||
"""初始化
|
||||
|
||||
Args:
|
||||
target_date: 目标日期 (YYYY-MM-DD)
|
||||
use_proxy: 是否使用代理
|
||||
load_from_db: 是否从数据库加载Cookie
|
||||
db_config: 数据库配置
|
||||
"""
|
||||
super().__init__(use_proxy=use_proxy, load_from_db=load_from_db, db_config=db_config)
|
||||
|
||||
# 解析目标日期
|
||||
try:
|
||||
self.target_date = datetime.strptime(target_date, '%Y-%m-%d')
|
||||
self.target_date_str = target_date
|
||||
except ValueError:
|
||||
raise ValueError(f"日期格式错误: {target_date},正确格式: YYYY-MM-DD")
|
||||
|
||||
# 修改输出文件名(不带日期,使用固定文件名)
|
||||
self.output_file = os.path.join(
|
||||
self.script_dir,
|
||||
"bjh_integrated_data.json"
|
||||
)
|
||||
|
||||
# 创建备份文件夹
|
||||
self.backup_dir = os.path.join(self.script_dir, "backup")
|
||||
if not os.path.exists(self.backup_dir):
|
||||
os.makedirs(self.backup_dir)
|
||||
|
||||
print(f"[配置] 目标日期: {target_date}")
|
||||
print(f"[配置] 输出文件: {self.output_file}")
|
||||
print(f"[配置] 备份目录: {self.backup_dir}")
|
||||
|
||||
def fetch_analytics_api_for_date(self, days: int = 7, max_retries: int = 3) -> Optional[Dict]:
|
||||
"""获取指定日期范围的发文统计数据
|
||||
|
||||
Args:
|
||||
days: 查询天数(从target_date往前推)
|
||||
max_retries: 最大重试次数
|
||||
|
||||
Returns:
|
||||
发文统计数据
|
||||
"""
|
||||
import time
|
||||
|
||||
# 计算日期范围(从target_date往前推days天)
|
||||
end_date = self.target_date
|
||||
start_date = end_date - timedelta(days=days-1)
|
||||
|
||||
start_day = start_date.strftime('%Y%m%d')
|
||||
end_day = end_date.strftime('%Y%m%d')
|
||||
|
||||
# API端点
|
||||
api_url = f"{self.base_url}/author/eco/statistics/appStatisticV3"
|
||||
|
||||
# 请求参数(不使用special_filter_days,直接指定日期范围)
|
||||
params = {
|
||||
'type': 'event',
|
||||
'start_day': start_day,
|
||||
'end_day': end_day,
|
||||
'stat': '0'
|
||||
}
|
||||
|
||||
# 从Cookie中提取token
|
||||
token_cookie = self.session.cookies.get('bjhStoken') or self.session.cookies.get('devStoken')
|
||||
|
||||
# 请求头
|
||||
headers = {
|
||||
'Accept': 'application/json, text/plain, */*',
|
||||
'Accept-Language': 'zh-CN,zh;q=0.9',
|
||||
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
|
||||
'Referer': f'{self.base_url}/builder/rc/analysiscontent',
|
||||
'Sec-Fetch-Dest': 'empty',
|
||||
'Sec-Fetch-Mode': 'cors',
|
||||
'Sec-Fetch-Site': 'same-origin',
|
||||
}
|
||||
|
||||
if token_cookie:
|
||||
headers['token'] = token_cookie
|
||||
|
||||
self.logger.info(f"获取发文统计: {start_date.strftime('%Y-%m-%d')} 至 {end_date.strftime('%Y-%m-%d')}")
|
||||
print(f"\n[请求] 获取发文统计数据")
|
||||
print(f" 日期范围: {start_date.strftime('%Y-%m-%d')} 至 {end_date.strftime('%Y-%m-%d')}")
|
||||
|
||||
successful_data = []
|
||||
retry_count = 0
|
||||
proxy_change_count = 0 # 代理更换次数计数器
|
||||
max_proxy_changes = 3 # 最多更换3次代理(即最多使用4个不同代理)
|
||||
|
||||
while retry_count <= max_retries:
|
||||
try:
|
||||
if retry_count > 0:
|
||||
wait_time = retry_count * 2
|
||||
print(f" [重试 {retry_count}/{max_retries}] 等待 {wait_time} 秒...")
|
||||
time.sleep(wait_time)
|
||||
|
||||
proxies = self.fetch_proxy() if self.use_proxy else None
|
||||
|
||||
# 调试信息:显示代理使用情况
|
||||
if self.use_proxy:
|
||||
if proxies:
|
||||
proxy_url = proxies.get('http', '')
|
||||
if '@' in proxy_url:
|
||||
proxy_ip = proxy_url.split('@')[1]
|
||||
else:
|
||||
proxy_ip = proxy_url.replace('http://', '').replace('https://', '')
|
||||
print(f" [代理] 使用IP: {proxy_ip}")
|
||||
else:
|
||||
print(f" [!] 警告:代理未生效!use_proxy={self.use_proxy}")
|
||||
|
||||
response = self.session.get(
|
||||
api_url,
|
||||
headers=headers,
|
||||
params=params,
|
||||
proxies=proxies,
|
||||
timeout=15,
|
||||
verify=False
|
||||
)
|
||||
|
||||
print(f" 状态码: {response.status_code}")
|
||||
|
||||
if response.status_code == 200:
|
||||
data = response.json()
|
||||
errno = data.get('errno', -1)
|
||||
|
||||
if errno == 0:
|
||||
print(f" [✓] API调用成功")
|
||||
|
||||
# 请求成功,重置代理失败计数
|
||||
self.reset_proxy_fail_count()
|
||||
|
||||
# 检查data字段类型
|
||||
data_field = data.get('data', {})
|
||||
if isinstance(data_field, list):
|
||||
print(f" [X] API返回数据格式异常: data字段为列表而非字典")
|
||||
print(f" 原始响应前500字符: {str(data)[:500]}")
|
||||
break
|
||||
|
||||
if not isinstance(data_field, dict):
|
||||
print(f" [X] API返回数据格式异常: data字段类型为 {type(data_field).__name__}")
|
||||
break
|
||||
|
||||
total_info = data_field.get('total_info', {})
|
||||
print(f"\n 发文统计数据:")
|
||||
print(f" 发文量: {total_info.get('publish_count', '0')}")
|
||||
print(f" 曝光量: {total_info.get('disp_pv', '0')}")
|
||||
print(f" 阅读量: {total_info.get('view_count', '0')}")
|
||||
|
||||
api_result = {
|
||||
'endpoint': '/author/eco/statistics/appStatisticV3',
|
||||
'name': '发文统计',
|
||||
'date_range': f"{start_day} - {end_day}",
|
||||
'data': data,
|
||||
'fetch_time': datetime.now().strftime('%Y-%m-%d %H:%M:%S')
|
||||
}
|
||||
successful_data.append(api_result)
|
||||
break
|
||||
else:
|
||||
errmsg = data.get('errmsg', '')
|
||||
print(f" [X] API返回错误: errno={errno}, errmsg={errmsg}")
|
||||
|
||||
# 特别处理 errno=10000015(异常请求),这通常是代理未生效
|
||||
if errno == 10000015 and self.use_proxy:
|
||||
print(f" [!] 检测到代理未生效,立即更换新代理")
|
||||
|
||||
# 检查是否超过代理更换上限
|
||||
if proxy_change_count >= max_proxy_changes:
|
||||
print(f" [X] 已达代理更换上限({max_proxy_changes}次),放弃重试")
|
||||
break
|
||||
|
||||
# 立即强制获取新代理
|
||||
self.current_proxy = None
|
||||
self.proxy_fail_count = 0
|
||||
new_proxy = self.fetch_proxy(force_new=True)
|
||||
|
||||
if new_proxy and retry_count < max_retries:
|
||||
proxy_change_count += 1
|
||||
print(f" [!] 已更换新代理({proxy_change_count}/{max_proxy_changes}),将重试...")
|
||||
retry_count += 1
|
||||
continue
|
||||
else:
|
||||
print(f" [X] 无法获取新代理或已达重试上限")
|
||||
break
|
||||
else:
|
||||
# 其他API错误,不重试
|
||||
break
|
||||
else:
|
||||
print(f" [X] HTTP错误: {response.status_code}")
|
||||
break
|
||||
|
||||
except Exception as e:
|
||||
error_type = type(e).__name__
|
||||
is_retry_error = any([
|
||||
'Connection' in error_type,
|
||||
'Timeout' in error_type,
|
||||
'ProxyError' in error_type,
|
||||
])
|
||||
|
||||
if is_retry_error and retry_count < max_retries:
|
||||
print(f" [!] 连接错误: {error_type}")
|
||||
|
||||
# 标记代理失败
|
||||
self.mark_proxy_failed()
|
||||
|
||||
# 如果代理失败次数达到3次,强制更换新代理(第4次重试用新代理)
|
||||
if self.proxy_fail_count >= 3 and self.use_proxy:
|
||||
# 检查是否超过代理更换上限
|
||||
if proxy_change_count >= max_proxy_changes:
|
||||
print(f" [X] 已达代理更换上限({max_proxy_changes}次),放弃重试")
|
||||
break
|
||||
|
||||
print(f" [!] 代理已失败{self.proxy_fail_count}次,强制更换新代理")
|
||||
self.current_proxy = None
|
||||
self.proxy_fail_count = 0
|
||||
new_proxy = self.fetch_proxy(force_new=True)
|
||||
if new_proxy:
|
||||
proxy_change_count += 1
|
||||
print(f" [✓] 已更换新代理({proxy_change_count}/{max_proxy_changes}),继续重试")
|
||||
else:
|
||||
print(f" [X] 无法获取新代理")
|
||||
break
|
||||
|
||||
retry_count += 1
|
||||
continue
|
||||
else:
|
||||
print(f" [X] 请求异常: {e}")
|
||||
break
|
||||
|
||||
if successful_data:
|
||||
return {
|
||||
'apis': successful_data,
|
||||
'count': len(successful_data)
|
||||
}
|
||||
|
||||
return None
|
||||
|
||||
def fetch_income_for_date(self, max_retries: int = 3) -> Optional[Dict]:
|
||||
"""获取指定日期的收入数据
|
||||
|
||||
使用overviewhomelist API获取按天的详细收入数据
|
||||
|
||||
Returns:
|
||||
收入数据
|
||||
"""
|
||||
import time
|
||||
from datetime import timedelta
|
||||
|
||||
# 计算Unix时间戳(从目标日期往前30天,以便获取更多数据)
|
||||
end_date = self.target_date
|
||||
start_date = end_date - timedelta(days=29) # 30天范围
|
||||
|
||||
# 转换为Unix时间戳(秒)
|
||||
start_timestamp = int(start_date.timestamp())
|
||||
end_timestamp = int(end_date.timestamp())
|
||||
|
||||
# 使用overviewhomelist API获取每日收入明细
|
||||
api_url = f"{self.base_url}/author/eco/income4/overviewhomelist"
|
||||
|
||||
token_cookie = self.session.cookies.get('bjhStoken') or self.session.cookies.get('devStoken')
|
||||
|
||||
headers = {
|
||||
'Accept': 'application/json, text/plain, */*',
|
||||
'Accept-Language': 'zh-CN,zh;q=0.9',
|
||||
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
|
||||
'Referer': f'{self.base_url}/builder/rc/incomecenter',
|
||||
'Sec-Fetch-Dest': 'empty',
|
||||
'Sec-Fetch-Mode': 'cors',
|
||||
'Sec-Fetch-Site': 'same-origin',
|
||||
}
|
||||
|
||||
if token_cookie:
|
||||
headers['token'] = token_cookie
|
||||
|
||||
# 请求参数
|
||||
params = {
|
||||
'start_date': start_timestamp,
|
||||
'end_date': end_timestamp
|
||||
}
|
||||
|
||||
print(f"\n[请求] 获取收入数据")
|
||||
print(f" 日期范围: {start_date.strftime('%Y-%m-%d')} 至 {end_date.strftime('%Y-%m-%d')}")
|
||||
|
||||
retry_count = 0
|
||||
proxy_change_count = 0 # 代理更换次数计数器
|
||||
max_proxy_changes = 3 # 最多更换3次代理(即最多使用4个不同代理)
|
||||
|
||||
while retry_count <= max_retries:
|
||||
try:
|
||||
if retry_count > 0:
|
||||
wait_time = retry_count * 2
|
||||
print(f" [重试 {retry_count}/{max_retries}] 等待 {wait_time} 秒...")
|
||||
time.sleep(wait_time)
|
||||
|
||||
proxies = self.fetch_proxy() if self.use_proxy else None
|
||||
|
||||
# 调试信息:显示代理使用情况
|
||||
if self.use_proxy:
|
||||
if proxies:
|
||||
proxy_url = proxies.get('http', '')
|
||||
if '@' in proxy_url:
|
||||
proxy_ip = proxy_url.split('@')[1]
|
||||
else:
|
||||
proxy_ip = proxy_url.replace('http://', '').replace('https://', '')
|
||||
print(f" [代理] 使用IP: {proxy_ip}")
|
||||
else:
|
||||
print(f" [!] 警告:代理未生效!use_proxy={self.use_proxy}")
|
||||
|
||||
response = self.session.get(
|
||||
api_url,
|
||||
headers=headers,
|
||||
params=params,
|
||||
proxies=proxies,
|
||||
timeout=15,
|
||||
verify=False
|
||||
)
|
||||
|
||||
print(f" 状态码: {response.status_code}")
|
||||
|
||||
if response.status_code == 200:
|
||||
data = response.json()
|
||||
errno = data.get('errno', -1)
|
||||
|
||||
if errno == 0:
|
||||
print(f" [✓] API调用成功")
|
||||
|
||||
# 请求成功,重置代理失败计数
|
||||
self.reset_proxy_fail_count()
|
||||
|
||||
# 提取收入列表
|
||||
income_list = data.get('data', {}).get('list', [])
|
||||
|
||||
if income_list:
|
||||
# 找到目标日期的数据
|
||||
target_timestamp = int(self.target_date.timestamp())
|
||||
target_income_data = None
|
||||
|
||||
for item in income_list:
|
||||
if item.get('day_time') == target_timestamp:
|
||||
target_income_data = item
|
||||
break
|
||||
|
||||
if target_income_data:
|
||||
day_revenue = target_income_data.get('total_income', 0)
|
||||
print(f"\n 收入数据详情:")
|
||||
print(f" {self.target_date_str} 当日收入: ¥{day_revenue:.2f}")
|
||||
|
||||
# 计算近7天收入
|
||||
recent7_revenue = 0.0
|
||||
recent7_start = self.target_date - timedelta(days=6)
|
||||
recent7_start_ts = int(recent7_start.timestamp())
|
||||
for item in income_list:
|
||||
if recent7_start_ts <= item.get('day_time', 0) <= target_timestamp:
|
||||
recent7_revenue += item.get('total_income', 0)
|
||||
print(f" 近7天: ¥{recent7_revenue:.2f}")
|
||||
|
||||
# 计算近30天收入
|
||||
recent30_revenue = sum(item.get('total_income', 0) for item in income_list)
|
||||
print(f" 近30天: ¥{recent30_revenue:.2f}")
|
||||
|
||||
# 计算当月收入(从月初到目标日期)
|
||||
month_start = self.target_date.replace(day=1)
|
||||
month_start_ts = int(month_start.timestamp())
|
||||
current_month_revenue = 0.0
|
||||
for item in income_list:
|
||||
if month_start_ts <= item.get('day_time', 0) <= target_timestamp:
|
||||
current_month_revenue += item.get('total_income', 0)
|
||||
print(f" 当月收入: ¥{current_month_revenue:.2f}")
|
||||
|
||||
# 构造返回数据(与原有格式保持一致)
|
||||
return {
|
||||
'errno': 0,
|
||||
'errmsg': 'success',
|
||||
'data': {
|
||||
'income': {
|
||||
'yesterday': {
|
||||
'income': day_revenue,
|
||||
'value': day_revenue
|
||||
},
|
||||
'recent7Days': {
|
||||
'income': recent7_revenue,
|
||||
'value': recent7_revenue
|
||||
},
|
||||
'recent30Days': {
|
||||
'income': recent30_revenue,
|
||||
'value': recent30_revenue
|
||||
},
|
||||
'currentMonth': {
|
||||
'income': current_month_revenue,
|
||||
'value': current_month_revenue
|
||||
}
|
||||
}
|
||||
},
|
||||
'raw_list': income_list # 保留原始数据
|
||||
}
|
||||
else:
|
||||
print(f" [警告] 未找到 {self.target_date_str} 的收入数据")
|
||||
return None
|
||||
else:
|
||||
print(f" [警告] 收入数据列表为空")
|
||||
return None
|
||||
else:
|
||||
errmsg = data.get('errmsg', '')
|
||||
print(f" [X] API返回错误: errno={errno}, errmsg={errmsg}")
|
||||
|
||||
# 特别处理 errno=10000015(异常请求),这通常是代理未生效
|
||||
if errno == 10000015 and self.use_proxy:
|
||||
print(f" [!] 检测到代理未生效,立即更换新代理")
|
||||
|
||||
# 检查是否超过代理更换上限
|
||||
if proxy_change_count >= max_proxy_changes:
|
||||
print(f" [X] 已达代理更换上限({max_proxy_changes}次),放弃重试")
|
||||
return None
|
||||
|
||||
# 立即强制获取新代理
|
||||
self.current_proxy = None
|
||||
self.proxy_fail_count = 0
|
||||
new_proxy = self.fetch_proxy(force_new=True)
|
||||
|
||||
if new_proxy and retry_count < max_retries:
|
||||
proxy_change_count += 1
|
||||
print(f" [!] 已更换新代理({proxy_change_count}/{max_proxy_changes}),将重试...")
|
||||
retry_count += 1
|
||||
continue
|
||||
else:
|
||||
print(f" [X] 无法获取新代理或已达重试上限")
|
||||
return None
|
||||
else:
|
||||
# 其他API错误,不重试
|
||||
return None
|
||||
else:
|
||||
print(f" [X] HTTP错误: {response.status_code}")
|
||||
return None
|
||||
|
||||
except Exception as e:
|
||||
error_type = type(e).__name__
|
||||
is_retry_error = any([
|
||||
'Connection' in error_type,
|
||||
'Timeout' in error_type,
|
||||
'ProxyError' in error_type,
|
||||
])
|
||||
|
||||
if is_retry_error and retry_count < max_retries:
|
||||
print(f" [!] 连接错误: {error_type}")
|
||||
|
||||
# 标记代理失败
|
||||
self.mark_proxy_failed()
|
||||
|
||||
# 如果代理失败次数达到3次,强制更换新代理(第4次重试用新代理)
|
||||
if self.proxy_fail_count >= 3 and self.use_proxy:
|
||||
# 检查是否超过代理更换上限
|
||||
if proxy_change_count >= max_proxy_changes:
|
||||
print(f" [X] 已达代理更换上限({max_proxy_changes}次),放弃重试")
|
||||
return None
|
||||
|
||||
print(f" [!] 代理已失败{self.proxy_fail_count}次,强制更换新代理")
|
||||
self.current_proxy = None
|
||||
self.proxy_fail_count = 0
|
||||
new_proxy = self.fetch_proxy(force_new=True)
|
||||
if new_proxy:
|
||||
proxy_change_count += 1
|
||||
print(f" [✓] 已更换新代理({proxy_change_count}/{max_proxy_changes}),继续重试")
|
||||
else:
|
||||
print(f" [X] 无法获取新代理")
|
||||
return None
|
||||
|
||||
retry_count += 1
|
||||
continue
|
||||
else:
|
||||
print(f" [X] 请求异常: {e}")
|
||||
return None
|
||||
|
||||
return None
|
||||
|
||||
def extract_integrated_data_for_date(self, account_id: str, days: int = 7) -> Optional[Dict]:
|
||||
"""提取指定账号在指定日期的整合数据
|
||||
|
||||
Args:
|
||||
account_id: 账号ID
|
||||
days: 查询天数(从target_date往前推)
|
||||
|
||||
Returns:
|
||||
整合数据
|
||||
"""
|
||||
import time
|
||||
import random
|
||||
|
||||
print(f"\n{'='*70}")
|
||||
print(f"开始提取账号数据: {account_id}")
|
||||
print(f"目标日期: {self.target_date_str}")
|
||||
print(f"{'='*70}")
|
||||
|
||||
if not self.set_account_cookies(account_id):
|
||||
return None
|
||||
|
||||
result = {
|
||||
'account_id': account_id,
|
||||
'fetch_time': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
|
||||
'target_date': self.target_date_str,
|
||||
'status': 'unknown',
|
||||
'analytics': {},
|
||||
'income': {},
|
||||
'error_info': {}
|
||||
}
|
||||
|
||||
# 1. 获取发文统计数据
|
||||
print("\n[1/2] 获取发文统计数据...")
|
||||
api_data = self.fetch_analytics_api_for_date(days=days)
|
||||
if api_data:
|
||||
result['analytics'] = api_data
|
||||
print("[OK] 发文统计数据获取成功")
|
||||
else:
|
||||
print("[X] 发文统计数据获取失败")
|
||||
result['error_info']['analytics'] = 'API调用失败'
|
||||
|
||||
# API调用间隔
|
||||
api_delay = random.uniform(2, 4)
|
||||
print(f"\n[间隔] 等待 {api_delay:.1f} 秒...")
|
||||
time.sleep(api_delay)
|
||||
|
||||
# 2. 获取收入数据
|
||||
print("\n[2/2] 获取收入数据...")
|
||||
income_data = self.fetch_income_for_date()
|
||||
if income_data:
|
||||
result['income'] = income_data
|
||||
print("[OK] 收入数据获取成功")
|
||||
else:
|
||||
print("[X] 收入数据获取失败")
|
||||
result['error_info']['income'] = 'API调用失败'
|
||||
|
||||
# 设置状态
|
||||
if result['analytics'] and result['income']:
|
||||
result['status'] = 'success_all'
|
||||
elif result['analytics'] or result['income']:
|
||||
result['status'] = 'success_partial'
|
||||
else:
|
||||
result['status'] = 'failed'
|
||||
|
||||
return result
|
||||
|
||||
def extract_all_for_date(self, days: int = 7, delay_seconds: float = 3.0) -> List[Dict]:
|
||||
"""提取所有账号在指定日期的数据
|
||||
|
||||
Args:
|
||||
days: 查询天数
|
||||
delay_seconds: 账号间延迟
|
||||
|
||||
Returns:
|
||||
所有账号的数据
|
||||
"""
|
||||
import random
|
||||
|
||||
if not self.account_cookies:
|
||||
print("[X] 没有可用的账号Cookie")
|
||||
return []
|
||||
|
||||
print("\n" + "="*70)
|
||||
print(f"开始提取 {len(self.account_cookies)} 个账号的数据")
|
||||
print(f"目标日期: {self.target_date_str}")
|
||||
print("="*70)
|
||||
|
||||
results = []
|
||||
|
||||
for idx, account_id in enumerate(self.account_cookies.keys(), 1):
|
||||
print(f"\n[{idx}/{len(self.account_cookies)}] 处理账号: {account_id}")
|
||||
|
||||
result = self.extract_integrated_data_for_date(account_id, days=days)
|
||||
if result:
|
||||
results.append(result)
|
||||
|
||||
# 添加延迟
|
||||
if idx < len(self.account_cookies):
|
||||
actual_delay = delay_seconds * random.uniform(0.7, 1.3)
|
||||
print(f"\n[延迟] 等待 {actual_delay:.1f} 秒后继续...")
|
||||
import time
|
||||
time.sleep(actual_delay)
|
||||
|
||||
return results
|
||||
|
||||
def save_results(self, results: List[Dict]):
|
||||
"""保存结果到文件(同时备份带时间戳的副本)
|
||||
|
||||
Args:
|
||||
results: 数据分析结果列表
|
||||
"""
|
||||
import json
|
||||
import shutil
|
||||
|
||||
try:
|
||||
# 1. 保存到主文件(不带时间戳)
|
||||
with open(self.output_file, 'w', encoding='utf-8') as f:
|
||||
json.dump(results, f, ensure_ascii=False, indent=2)
|
||||
|
||||
print(f"\n{'='*70}")
|
||||
print(f"[OK] 数据已保存到: {self.output_file}")
|
||||
|
||||
# 2. 创建带时间戳的备份文件(只保留日期)
|
||||
timestamp = datetime.now().strftime('%Y%m%d')
|
||||
backup_filename = f"bjh_integrated_data_{timestamp}.json"
|
||||
backup_file = os.path.join(self.backup_dir, backup_filename)
|
||||
|
||||
# 复制文件到备份目录
|
||||
shutil.copy2(self.output_file, backup_file)
|
||||
print(f"[OK] 备份已保存到: {backup_file}")
|
||||
print(f"{'='*70}")
|
||||
|
||||
# 显示统计
|
||||
success_count = sum(1 for r in results if r.get('status', '').startswith('success'))
|
||||
print(f"\n统计信息:")
|
||||
print(f" - 总账号数: {len(results)}")
|
||||
print(f" - 成功获取: {success_count}")
|
||||
print(f" - 失败: {len(results) - success_count}")
|
||||
|
||||
except Exception as e:
|
||||
print(f"[X] 保存文件失败: {e}")
|
||||
|
||||
|
||||
def main():
|
||||
"""主函数"""
|
||||
parser = argparse.ArgumentParser(
|
||||
description='百家号指定日期数据抓取工具',
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
示例用法:
|
||||
python bjh_analytics_date.py 2025-12-20
|
||||
python bjh_analytics_date.py 2025-12-20 --days 7
|
||||
python bjh_analytics_date.py 2025-12-20 --proxy
|
||||
python bjh_analytics_date.py 2025-12-20 --database
|
||||
python bjh_analytics_date.py 2025-12-20 --account "乳腺专家林华" # 仅测试单个账号
|
||||
"""
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'date',
|
||||
type=str,
|
||||
help='目标日期 (格式: YYYY-MM-DD)'
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--days',
|
||||
type=int,
|
||||
default=7,
|
||||
help='查询天数(从目标日期往前推,默认7天)'
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--proxy',
|
||||
action='store_true',
|
||||
default=True, # 默认启用代理
|
||||
help='启用代理(默认启用)'
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--no-proxy',
|
||||
dest='proxy',
|
||||
action='store_false',
|
||||
help='禁用代理'
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--database',
|
||||
action='store_true',
|
||||
default=True, # 默认从数据库加载Cookie
|
||||
help='从数据库加载Cookie(默认启用)'
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--local',
|
||||
dest='database',
|
||||
action='store_false',
|
||||
help='从本地JSON文件加载Cookie'
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--delay',
|
||||
type=float,
|
||||
default=3.0,
|
||||
help='账号间延迟时间(秒,默认3.0)'
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--account',
|
||||
type=str,
|
||||
default=None,
|
||||
help='仅抓取指定账号(用于测试),格式:账号名称'
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--no-confirm',
|
||||
action='store_true',
|
||||
help='跳过确认提示,直接开始抓取(用于批量脚本)'
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# 验证日期格式
|
||||
try:
|
||||
datetime.strptime(args.date, '%Y-%m-%d')
|
||||
except ValueError:
|
||||
print(f"[X] 日期格式错误: {args.date}")
|
||||
print(" 正确格式: YYYY-MM-DD (例如: 2025-12-20)")
|
||||
return 1
|
||||
|
||||
print("\n" + "="*70)
|
||||
print("百家号指定日期数据抓取工具")
|
||||
print("="*70)
|
||||
print(f"目标日期: {args.date}")
|
||||
print(f"查询天数: {args.days}")
|
||||
print(f"使用代理: {'是' if args.proxy else '否'}")
|
||||
print(f"数据源: {'数据库' if args.database else '本地文件'}")
|
||||
print("="*70)
|
||||
|
||||
try:
|
||||
# 创建分析器
|
||||
analytics = BaijiahaoDateAnalytics(
|
||||
target_date=args.date,
|
||||
use_proxy=args.proxy,
|
||||
load_from_db=args.database
|
||||
)
|
||||
|
||||
if not analytics.account_cookies:
|
||||
print("\n[X] 未找到可用的账号Cookie")
|
||||
return 1
|
||||
|
||||
# 如果指定了单个账号,验证是否存在
|
||||
if args.account:
|
||||
if args.account not in analytics.account_cookies:
|
||||
print(f"\n[X] 未找到指定账号: {args.account}")
|
||||
print(f"\n可用账号列表:")
|
||||
for idx, account_name in enumerate(analytics.account_cookies.keys(), 1):
|
||||
print(f" {idx}. {account_name}")
|
||||
return 1
|
||||
|
||||
# 只保留指定账号
|
||||
analytics.account_cookies = {args.account: analytics.account_cookies[args.account]}
|
||||
print(f"\n[测试模式] 仅抓取账号: {args.account}")
|
||||
|
||||
print(f"\n找到 {len(analytics.account_cookies)} 个账号")
|
||||
|
||||
# 确认执行(除非使用--no-confirm参数)
|
||||
if not args.no_confirm:
|
||||
confirm = input("\n是否开始抓取? (y/n): ").strip().lower()
|
||||
if confirm != 'y':
|
||||
print("已取消")
|
||||
return 0
|
||||
|
||||
# 提取所有账号数据
|
||||
results = analytics.extract_all_for_date(
|
||||
days=args.days,
|
||||
delay_seconds=args.delay
|
||||
)
|
||||
|
||||
if results:
|
||||
analytics.save_results(results)
|
||||
|
||||
# 显示统计
|
||||
success_all = sum(1 for r in results if r.get('status') == 'success_all')
|
||||
success_partial = sum(1 for r in results if r.get('status') == 'success_partial')
|
||||
failed = sum(1 for r in results if r.get('status') == 'failed')
|
||||
|
||||
print(f"\n{'='*70}")
|
||||
print("数据提取统计")
|
||||
print(f"{'='*70}")
|
||||
print(f" 总账号数: {len(results)}")
|
||||
print(f" 全部成功: {success_all}")
|
||||
print(f" 部分成功: {success_partial}")
|
||||
print(f" 失败: {failed}")
|
||||
print(f"{'='*70}")
|
||||
|
||||
return 0
|
||||
else:
|
||||
print("\n[X] 未获取到任何数据")
|
||||
return 1
|
||||
|
||||
except Exception as e:
|
||||
print(f"\n[X] 程序执行出错: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
return 1
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
sys.exit(main())
|
||||
@@ -1,6 +1,7 @@
|
||||
[Unit]
|
||||
Description=百家号数据同步守护进程
|
||||
After=network.target
|
||||
Description=百家号数据同步守护进程(含数据验证与短信告警)
|
||||
After=network.target mysql.service
|
||||
Wants=mysql.service
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
@@ -12,8 +13,18 @@ RestartSec=10
|
||||
StandardOutput=journal
|
||||
StandardError=journal
|
||||
|
||||
# 环境变量(如果需要)
|
||||
# Environment="PATH=/usr/local/bin:/usr/bin:/bin"
|
||||
# 环境变量配置
|
||||
Environment="LOAD_FROM_DB=true"
|
||||
Environment="USE_PROXY=true"
|
||||
Environment="DAYS=7"
|
||||
Environment="MAX_RETRIES=3"
|
||||
Environment="RUN_NOW=true"
|
||||
Environment="ENABLE_VALIDATION=true"
|
||||
Environment="NON_INTERACTIVE=true"
|
||||
|
||||
# 阿里云短信服务凭据(可选,也可使用sms_config.json)
|
||||
# Environment="ALIBABA_CLOUD_ACCESS_KEY_ID=your_access_key_id"
|
||||
# Environment="ALIBABA_CLOUD_ACCESS_KEY_SECRET=your_access_key_secret"
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
|
||||
@@ -27,12 +27,12 @@ from log_config import setup_bjh_daemon_logger
|
||||
class BaijiahaoDataDaemon:
|
||||
"""百家号数据定时更新守护进程"""
|
||||
|
||||
def __init__(self, update_interval_hours: int = 6, use_proxy: bool = False, load_from_db: bool = False):
|
||||
def __init__(self, update_interval_hours: int = 1, use_proxy: bool = False, load_from_db: bool = False):
|
||||
"""
|
||||
初始化守护进程
|
||||
|
||||
Args:
|
||||
update_interval_hours: 更新间隔(小时),默认6小时
|
||||
update_interval_hours: 更新间隔(小时),默认1小时
|
||||
use_proxy: 是否使用代理,默认False
|
||||
load_from_db: 是否从数据库加载Cookie,默认False
|
||||
"""
|
||||
@@ -371,8 +371,8 @@ def main():
|
||||
print("\n请配置守护进程参数:\n")
|
||||
|
||||
# 更新间隔
|
||||
interval_input = input("1. 更新间隔(小时,默认6小时): ").strip()
|
||||
update_interval = int(interval_input) if interval_input.isdigit() and int(interval_input) > 0 else 6
|
||||
interval_input = input("1. 更新间隔(小时,默认1小时): ").strip()
|
||||
update_interval = int(interval_input) if interval_input.isdigit() and int(interval_input) > 0 else 1
|
||||
|
||||
# 查询天数
|
||||
days_input = input("2. 查询天数(默认7天): ").strip()
|
||||
|
||||
34564
bjh_integrated_data.json
34564
bjh_integrated_data.json
File diff suppressed because it is too large
Load Diff
0
calc_ip.py
Normal file
0
calc_ip.py
Normal file
@@ -4,18 +4,20 @@
|
||||
数据同步守护进程
|
||||
|
||||
功能:
|
||||
1. 24小时不间断运行
|
||||
2. 在每天午夜(00:00)自动执行数据抓取和同步
|
||||
1. 24小时不间断运行(仅在工作时间8:00-24:00执行任务)
|
||||
2. 每隔1小时自动执行数据抓取和同步
|
||||
3. 自动执行流程:
|
||||
- 从百家号API抓取最新数据
|
||||
- 生成CSV文件(包含从数据库查询的author_id)
|
||||
- 将CSV数据导入到数据库
|
||||
4. 支持手动触发刷新
|
||||
5. 详细的日志记录
|
||||
6. 非工作时间(0:00-8:00)自动休眠,减少API请求压力
|
||||
|
||||
使用场景:
|
||||
- 24/7运行,每天凌晨自动更新数据
|
||||
- 24/7运行,在工作时间(8:00-24:00)每隔1小时自动更新数据
|
||||
- 无需人工干预,自动化数据同步
|
||||
- 避免在夜间时段进行数据抓取
|
||||
"""
|
||||
|
||||
import sys
|
||||
@@ -38,11 +40,19 @@ from export_to_csv import DataExporter
|
||||
from import_csv_to_database import CSVImporter
|
||||
from log_config import setup_logger
|
||||
|
||||
# 导入数据验证与短信告警模块
|
||||
try:
|
||||
from data_validation_with_sms import DataValidationWithSMS
|
||||
VALIDATION_AVAILABLE = True
|
||||
except ImportError:
|
||||
print("[!] 数据验证模块未找到,验证功能将不可用")
|
||||
VALIDATION_AVAILABLE = False
|
||||
|
||||
|
||||
class DataSyncDaemon:
|
||||
"""数据同步守护进程"""
|
||||
|
||||
def __init__(self, use_proxy: bool = False, load_from_db: bool = True, days: int = 7, max_retries: int = 3):
|
||||
def __init__(self, use_proxy: bool = False, load_from_db: bool = True, days: int = 7, max_retries: int = 3, enable_validation: bool = True):
|
||||
"""初始化守护进程
|
||||
|
||||
Args:
|
||||
@@ -50,16 +60,28 @@ class DataSyncDaemon:
|
||||
load_from_db: 是否从数据库加载Cookie
|
||||
days: 抓取最近多少天的数据
|
||||
max_retries: 最大重试次数
|
||||
enable_validation: 是否启用数据验证与短信告警
|
||||
"""
|
||||
self.script_dir = os.path.dirname(os.path.abspath(__file__))
|
||||
self.use_proxy = use_proxy
|
||||
self.load_from_db = load_from_db
|
||||
self.days = days
|
||||
self.max_retries = max_retries
|
||||
self.enable_validation = enable_validation and VALIDATION_AVAILABLE
|
||||
|
||||
# 工作时间配置(8:00-24:00)
|
||||
self.work_start_hour = 8
|
||||
self.work_end_hour = 24
|
||||
|
||||
# 初始化日志
|
||||
self.logger = setup_logger('data_sync_daemon', os.path.join(self.script_dir, 'logs', 'data_sync_daemon.log'))
|
||||
|
||||
# 创建验证报告目录
|
||||
self.validation_reports_dir = os.path.join(self.script_dir, 'validation_reports')
|
||||
if not os.path.exists(self.validation_reports_dir):
|
||||
os.makedirs(self.validation_reports_dir)
|
||||
self.logger.info(f"创建验证报告目录: {self.validation_reports_dir}")
|
||||
|
||||
# 统计信息
|
||||
self.stats = {
|
||||
'total_runs': 0,
|
||||
@@ -76,13 +98,17 @@ class DataSyncDaemon:
|
||||
print(f" 使用代理: {'是' if use_proxy else '否'}")
|
||||
print(f" Cookie来源: {'数据库' if load_from_db else '本地文件'}")
|
||||
print(f" 抓取天数: {days}天")
|
||||
print(f" 工作时间: {self.work_start_hour}:00 - {self.work_end_hour}:00")
|
||||
print(f" 错误重试: 最大{max_retries}次")
|
||||
print(f" 定时执行: 每天午夜00:00")
|
||||
print(f" 定时执行: 每隔1小时")
|
||||
print(f" 数据验证: {'已启用' if self.enable_validation else '已禁用'}")
|
||||
if self.enable_validation:
|
||||
print(f" 短信告警: 验证失败时发送 (错误代码2222)")
|
||||
print("="*70 + "\n")
|
||||
|
||||
self.logger.info("="*70)
|
||||
self.logger.info("数据同步守护进程启动")
|
||||
self.logger.info(f"使用代理: {use_proxy}, Cookie来源: {'数据库' if load_from_db else '本地文件'}, 抓取天数: {days}, 重试: {max_retries}次")
|
||||
self.logger.info(f"使用代理: {use_proxy}, Cookie来源: {'数据库' if load_from_db else '本地文件'}, 抓取天数: {days}, 工作时间: {self.work_start_hour}:00-{self.work_end_hour}:00, 重试: {max_retries}次, 验证: {'已启用' if self.enable_validation else '已禁用'}")
|
||||
self.logger.info("="*70)
|
||||
|
||||
def fetch_data(self) -> bool:
|
||||
@@ -294,6 +320,77 @@ class DataSyncDaemon:
|
||||
self.logger.error(f"数据库导入失败: {e}", exc_info=True)
|
||||
return False
|
||||
|
||||
def validate_data(self) -> bool:
|
||||
"""步顷4:数据验证与短信告警"""
|
||||
if not self.enable_validation:
|
||||
print("\n[跳过] 数据验证功能未启用")
|
||||
self.logger.info("跳过数据验证(功能未启用)")
|
||||
return True
|
||||
|
||||
print("\n" + "="*70)
|
||||
print("【步顷4/4】数据验证与短信告警")
|
||||
print("="*70)
|
||||
|
||||
try:
|
||||
# 等待3秒,确保数据库写入完成
|
||||
print("\n等待3秒,确保数据写入完成...")
|
||||
self.logger.info("等待3秒以确保数据库写入完成")
|
||||
time.sleep(3)
|
||||
|
||||
print("\n执行数据验证...")
|
||||
self.logger.info("开始执行数据验证")
|
||||
|
||||
# 创建验证器(验证昨天的数据)
|
||||
yesterday = (datetime.now() - timedelta(days=1)).strftime('%Y-%m-%d')
|
||||
validator = DataValidationWithSMS(date_str=yesterday)
|
||||
|
||||
# 执行验证(JSON + CSV + Database)
|
||||
passed = validator.run_validation(
|
||||
sources=['json', 'csv', 'database'],
|
||||
table='ai_statistics'
|
||||
)
|
||||
|
||||
# 生成验证报告
|
||||
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
|
||||
report_file = os.path.join(
|
||||
self.validation_reports_dir,
|
||||
f'validation_report_{timestamp}.txt'
|
||||
)
|
||||
validator.validator.generate_report(report_file)
|
||||
|
||||
if passed:
|
||||
print("\n[✓] 数据验证通过")
|
||||
self.logger.info("数据验证通过")
|
||||
return True
|
||||
else:
|
||||
print("\n[X] 数据验证失败,准备发送短信告警")
|
||||
self.logger.error("数据验证失败")
|
||||
|
||||
# 生成错误摘要
|
||||
error_summary = validator.generate_error_summary()
|
||||
self.logger.error(f"错误摘要: {error_summary}")
|
||||
|
||||
# 发送短信告警(错误代码2222)
|
||||
sms_sent = validator.send_sms_alert("2222", error_summary)
|
||||
|
||||
if sms_sent:
|
||||
print("[✓] 告警短信已发送")
|
||||
self.logger.info("告警短信发送成功")
|
||||
else:
|
||||
print("[X] 告警短信发送失败")
|
||||
self.logger.error("告警短信发送失败")
|
||||
|
||||
print(f"\n详细报告: {report_file}")
|
||||
|
||||
# 验证失败不阻止后续流程,但返回True表示步骤完成
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
print(f"\n[X] 数据验证异常: {e}")
|
||||
self.logger.error(f"数据验证异常: {e}", exc_info=True)
|
||||
# 验证异常不影响整体流程
|
||||
return True
|
||||
|
||||
def sync_data(self):
|
||||
"""执行完整的数据同步流程"""
|
||||
start_time = datetime.now()
|
||||
@@ -317,10 +414,15 @@ class DataSyncDaemon:
|
||||
if not self.generate_csv():
|
||||
raise Exception("CSV生成失败")
|
||||
|
||||
# 步骤3:导入数据库
|
||||
# 步顷3:导入数据库
|
||||
if not self.import_to_database():
|
||||
raise Exception("数据库导入失败")
|
||||
|
||||
# 步顷4:数据验证与短信告警
|
||||
if not self.validate_data():
|
||||
# 验证失败不阻止整体流程,只记录警告
|
||||
self.logger.warning("数据验证步骤未成功完成")
|
||||
|
||||
# 成功
|
||||
end_time = datetime.now()
|
||||
duration = (end_time - start_time).total_seconds()
|
||||
@@ -370,12 +472,36 @@ class DataSyncDaemon:
|
||||
|
||||
self.logger.info(f"运行统计: 总{self.stats['total_runs']}次, 成功{self.stats['successful_runs']}次, 失败{self.stats['failed_runs']}次")
|
||||
|
||||
def get_next_midnight(self) -> datetime:
|
||||
"""获取下一个午夜时刻"""
|
||||
def is_work_time(self) -> tuple:
|
||||
"""
|
||||
检查当前是否在工作时间内(8:00-24:00)
|
||||
|
||||
Returns:
|
||||
tuple: (是否在工作时间内, 距离下次工作时间的秒数)
|
||||
"""
|
||||
now = datetime.now()
|
||||
tomorrow = now + timedelta(days=1)
|
||||
next_midnight = tomorrow.replace(hour=0, minute=0, second=0, microsecond=0)
|
||||
return next_midnight
|
||||
current_hour = now.hour
|
||||
|
||||
# 在工作时间内(8:00-23:59)
|
||||
if self.work_start_hour <= current_hour < self.work_end_hour:
|
||||
return True, 0
|
||||
|
||||
# 不在工作时间内,计算到下个工作时间的秒数
|
||||
if current_hour < self.work_start_hour:
|
||||
# 今天还没到工作时间
|
||||
next_work_time = now.replace(hour=self.work_start_hour, minute=0, second=0, microsecond=0)
|
||||
else:
|
||||
# 今天已过工作时间,等待明天
|
||||
next_work_time = (now + timedelta(days=1)).replace(hour=self.work_start_hour, minute=0, second=0, microsecond=0)
|
||||
|
||||
seconds_until_work = (next_work_time - now).total_seconds()
|
||||
return False, seconds_until_work
|
||||
|
||||
def get_next_run_time(self) -> datetime:
|
||||
"""获取下一次执行时间(1小时后)"""
|
||||
now = datetime.now()
|
||||
next_run = now + timedelta(hours=1)
|
||||
return next_run
|
||||
|
||||
def run(self):
|
||||
"""启动守护进程"""
|
||||
@@ -383,22 +509,51 @@ class DataSyncDaemon:
|
||||
print("守护进程已启动")
|
||||
print("="*70)
|
||||
|
||||
# 设置定时任务:每天午夜00:00执行
|
||||
schedule.every().day.at("00:00").do(self.sync_data)
|
||||
# 设置定时任务:每隔1小时执行
|
||||
schedule.every(1).hours.do(self.sync_data)
|
||||
|
||||
# 计算下次执行时间
|
||||
next_run = self.get_next_midnight()
|
||||
next_run = self.get_next_run_time()
|
||||
time_until_next = (next_run - datetime.now()).total_seconds()
|
||||
|
||||
print(f"\n下次执行时间: {next_run.strftime('%Y-%m-%d %H:%M:%S')}")
|
||||
print(f"距离下次执行: {time_until_next/3600:.1f} 小时")
|
||||
print(f"\n执行间隔: 每隔1小时")
|
||||
print(f"工作时间: {self.work_start_hour}:00 - {self.work_end_hour}:00(非工作时间自动休眠)")
|
||||
print(f"下次执行时间: {next_run.strftime('%Y-%m-%d %H:%M:%S')}")
|
||||
print(f"距离下次执行: {time_until_next/60:.1f} 分钟")
|
||||
print("\n按 Ctrl+C 可以停止守护进程")
|
||||
print("="*70 + "\n")
|
||||
|
||||
self.logger.info(f"守护进程已启动,下次执行时间: {next_run.strftime('%Y-%m-%d %H:%M:%S')}")
|
||||
self.logger.info(f"守护进程已启动,执行间隔: 每隔1小时,工作时间: {self.work_start_hour}:00-{self.work_end_hour}:00,下次执行时间: {next_run.strftime('%Y-%m-%d %H:%M:%S')}")
|
||||
|
||||
try:
|
||||
while True:
|
||||
# 检查是否在工作时间内
|
||||
is_work, seconds_until_work = self.is_work_time()
|
||||
|
||||
if not is_work:
|
||||
# 不在工作时间内,等待至工作时间
|
||||
next_work_time = datetime.now() + timedelta(seconds=seconds_until_work)
|
||||
self.logger.info(f"当前非工作时间,等待至 {next_work_time.strftime('%Y-%m-%d %H:%M:%S')}")
|
||||
print(f"\n[休眠] 当前不在工作时间内({self.work_start_hour}:00-{self.work_end_hour}:00)")
|
||||
print(f"[休眠] 下次工作时间: {next_work_time.strftime('%Y-%m-%d %H:%M:%S')}")
|
||||
print(f"[休眠] 等待 {seconds_until_work/3600:.1f} 小时...")
|
||||
|
||||
# 每30分钟检查一次
|
||||
check_interval = 1800
|
||||
elapsed = 0
|
||||
|
||||
while elapsed < seconds_until_work:
|
||||
sleep_time = min(check_interval, seconds_until_work - elapsed)
|
||||
time.sleep(sleep_time)
|
||||
elapsed += sleep_time
|
||||
|
||||
remaining = seconds_until_work - elapsed
|
||||
if remaining > 0:
|
||||
print(f" 距离工作时间还有: {remaining/3600:.1f} 小时 ({datetime.now().strftime('%H:%M:%S')})")
|
||||
|
||||
continue
|
||||
|
||||
# 在工作时间内,执行定时任务
|
||||
schedule.run_pending()
|
||||
time.sleep(60) # 每分钟检查一次
|
||||
|
||||
@@ -427,13 +582,15 @@ def main():
|
||||
print(" USE_PROXY=true/false - 是否使用代理")
|
||||
print(" DAYS=7 - 抓取天数")
|
||||
print(" MAX_RETRIES=3 - 重试次数")
|
||||
print(" RUN_NOW=true/false - 是否立即执行\n")
|
||||
print(" RUN_NOW=true/false - 是否立即执行")
|
||||
print(" ENABLE_VALIDATION=true/false - 是否启用验证\n")
|
||||
|
||||
load_from_db = os.getenv('LOAD_FROM_DB', 'true').lower() == 'true'
|
||||
use_proxy = os.getenv('USE_PROXY', 'true').lower() == 'true'
|
||||
days = int(os.getenv('DAYS', '7'))
|
||||
max_retries = int(os.getenv('MAX_RETRIES', '3'))
|
||||
run_now = os.getenv('RUN_NOW', 'true').lower() == 'true'
|
||||
enable_validation = os.getenv('ENABLE_VALIDATION', 'true').lower() == 'true'
|
||||
else:
|
||||
# 交互模式:显示菜单
|
||||
# 配置选项
|
||||
@@ -468,9 +625,15 @@ def main():
|
||||
except ValueError:
|
||||
max_retries = 3
|
||||
|
||||
# 5. 是否立即执行一次
|
||||
print("\n5. 是否立即执行一次同步?")
|
||||
print(" (否则等待到午夜00:00执行)")
|
||||
# 5. 是否启用数据验证
|
||||
print("\n5. 是否启用数据验证与短信告警?")
|
||||
print(" (每次同步后自动验证数据,失败时发送短信2222)")
|
||||
enable_validation_input = input(" (y/n, 默认y): ").strip().lower() or 'y'
|
||||
enable_validation = (enable_validation_input == 'y')
|
||||
|
||||
# 6. 是否立即执行一次
|
||||
print("\n6. 是否立即执行一次同步?")
|
||||
print(" (否则等待到下一个整点小时执行)")
|
||||
run_now_input = input(" (y/n, 默认n): ").strip().lower() or 'n'
|
||||
run_now = (run_now_input == 'y')
|
||||
|
||||
@@ -480,9 +643,13 @@ def main():
|
||||
print(f" Cookie来源: {'数据库' if load_from_db else '本地文件'}")
|
||||
print(f" 使用代理: {'是' if use_proxy else '否'}")
|
||||
print(f" 抓取天数: {days}天")
|
||||
print(f" 工作时间: 8:00 - 24:00(非工作时间自动休眠)")
|
||||
print(f" 错误重试: 最大{max_retries}次")
|
||||
print(f" 数据验证: {'已启用' if enable_validation else '已禁用'}")
|
||||
if enable_validation:
|
||||
print(f" 短信告警: 验证失败时发送 (错误代码2222)")
|
||||
print(f" 立即执行: {'是' if run_now else '否'}")
|
||||
print(f" 定时执行: 每天午夜00:00")
|
||||
print(f" 定时执行: 每隔1小时")
|
||||
print("="*70)
|
||||
|
||||
confirm = input("\n确认启动守护进程?(y/n): ").strip().lower()
|
||||
@@ -491,7 +658,13 @@ def main():
|
||||
return
|
||||
|
||||
# 创建守护进程
|
||||
daemon = DataSyncDaemon(use_proxy=use_proxy, load_from_db=load_from_db, days=days, max_retries=max_retries)
|
||||
daemon = DataSyncDaemon(
|
||||
use_proxy=use_proxy,
|
||||
load_from_db=load_from_db,
|
||||
days=days,
|
||||
max_retries=max_retries,
|
||||
enable_validation=enable_validation
|
||||
)
|
||||
|
||||
# 如果选择立即执行,先执行一次
|
||||
if run_now:
|
||||
|
||||
769
data_validation.py
Normal file
769
data_validation.py
Normal file
@@ -0,0 +1,769 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
数据比对验证脚本
|
||||
|
||||
功能:
|
||||
1. 顺序验证:验证不同数据源中记录的顺序一致性
|
||||
2. 交叉验证:对比数据内容,识别缺失、新增或不匹配的记录
|
||||
|
||||
支持的数据源:
|
||||
- JSON文件 (bjh_integrated_data.json)
|
||||
- CSV文件 (ai_statistics_*.csv)
|
||||
- MySQL数据库 (ai_statistics_* 表)
|
||||
|
||||
使用方法:
|
||||
# 验证JSON和CSV的一致性
|
||||
python data_validation.py --source json csv --date 2025-12-29
|
||||
|
||||
# 验证CSV和数据库的一致性
|
||||
python data_validation.py --source csv database --date 2025-12-29
|
||||
|
||||
# 完整验证(三个数据源)
|
||||
python data_validation.py --source json csv database --date 2025-12-29
|
||||
|
||||
# 验证特定表
|
||||
python data_validation.py --source csv database --table ai_statistics_day --date 2025-12-29
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
import json
|
||||
import csv
|
||||
import argparse
|
||||
from datetime import datetime, timedelta
|
||||
from typing import Dict, List, Tuple, Optional, Any, Set
|
||||
from collections import OrderedDict
|
||||
import hashlib
|
||||
|
||||
# 设置UTF-8编码
|
||||
if sys.platform == 'win32':
|
||||
import io
|
||||
if not isinstance(sys.stdout, io.TextIOWrapper) or sys.stdout.encoding != 'utf-8':
|
||||
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')
|
||||
if not isinstance(sys.stderr, io.TextIOWrapper) or sys.stderr.encoding != 'utf-8':
|
||||
sys.stderr = io.TextIOWrapper(sys.stderr.buffer, encoding='utf-8')
|
||||
|
||||
# 导入数据库配置
|
||||
try:
|
||||
from database_config import DatabaseManager
|
||||
except ImportError:
|
||||
print("[X] 无法导入 database_config.py,数据库验证功能将不可用")
|
||||
DatabaseManager = None
|
||||
|
||||
|
||||
class DataValidator:
|
||||
"""数据比对验证器"""
|
||||
|
||||
def __init__(self, date_str: Optional[str] = None):
|
||||
"""初始化
|
||||
|
||||
Args:
|
||||
date_str: 目标日期 (YYYY-MM-DD),默认为昨天
|
||||
"""
|
||||
self.script_dir = os.path.dirname(os.path.abspath(__file__))
|
||||
|
||||
# 目标日期(默认为昨天)
|
||||
if date_str:
|
||||
self.target_date = datetime.strptime(date_str, '%Y-%m-%d')
|
||||
else:
|
||||
# 默认使用昨天的日期
|
||||
self.target_date = datetime.now() - timedelta(days=1)
|
||||
|
||||
self.date_str = self.target_date.strftime('%Y-%m-%d')
|
||||
|
||||
# 数据库管理器
|
||||
self.db_manager = None
|
||||
if DatabaseManager:
|
||||
try:
|
||||
self.db_manager = DatabaseManager()
|
||||
print(f"[OK] 数据库连接成功")
|
||||
except Exception as e:
|
||||
print(f"[!] 数据库连接失败: {e}")
|
||||
|
||||
# 验证结果
|
||||
self.validation_results = {
|
||||
'顺序验证': [],
|
||||
'交叉验证': [],
|
||||
'差异统计': {}
|
||||
}
|
||||
|
||||
def load_json_data(self, file_path: Optional[str] = None) -> Optional[Any]:
|
||||
"""加载JSON数据
|
||||
|
||||
Args:
|
||||
file_path: JSON文件路径,默认为 bjh_integrated_data.json
|
||||
|
||||
Returns:
|
||||
JSON数据字典
|
||||
"""
|
||||
if not file_path:
|
||||
file_path = os.path.join(self.script_dir, 'bjh_integrated_data.json')
|
||||
|
||||
try:
|
||||
if not os.path.exists(file_path):
|
||||
print(f"[X] JSON文件不存在: {file_path}")
|
||||
return None
|
||||
|
||||
with open(file_path, 'r', encoding='utf-8') as f:
|
||||
data = json.load(f)
|
||||
|
||||
print(f"[OK] 加载JSON文件: {file_path}")
|
||||
print(f" 账号数量: {len(data) if isinstance(data, list) else 1}")
|
||||
return data
|
||||
|
||||
except Exception as e:
|
||||
print(f"[X] 加载JSON文件失败: {e}")
|
||||
return None
|
||||
|
||||
def load_csv_data(self, csv_file: str) -> Optional[List[Dict]]:
|
||||
"""加载CSV数据
|
||||
|
||||
Args:
|
||||
csv_file: CSV文件名
|
||||
|
||||
Returns:
|
||||
CSV数据列表
|
||||
"""
|
||||
csv_path = os.path.join(self.script_dir, csv_file)
|
||||
|
||||
try:
|
||||
if not os.path.exists(csv_path):
|
||||
print(f"[X] CSV文件不存在: {csv_path}")
|
||||
return None
|
||||
|
||||
rows = []
|
||||
with open(csv_path, 'r', encoding='utf-8-sig') as f:
|
||||
reader = csv.DictReader(f)
|
||||
rows = list(reader)
|
||||
|
||||
print(f"[OK] 加载CSV文件: {csv_file}")
|
||||
print(f" 记录数量: {len(rows)}")
|
||||
return rows
|
||||
|
||||
except Exception as e:
|
||||
print(f"[X] 加载CSV文件失败: {e}")
|
||||
return None
|
||||
|
||||
def load_database_data(self, table_name: str, date_filter: Optional[str] = None) -> Optional[List[Dict]]:
|
||||
"""从数据库加载数据
|
||||
|
||||
Args:
|
||||
table_name: 表名
|
||||
date_filter: 日期过滤字段名(如 'date', 'stat_date')
|
||||
|
||||
Returns:
|
||||
数据库记录列表
|
||||
"""
|
||||
if not self.db_manager:
|
||||
print(f"[X] 数据库管理器未初始化")
|
||||
return None
|
||||
|
||||
try:
|
||||
# 构建SQL查询
|
||||
if date_filter:
|
||||
sql = f"SELECT * FROM {table_name} WHERE {date_filter} = %s ORDER BY author_name, channel"
|
||||
params = (self.date_str,)
|
||||
else:
|
||||
sql = f"SELECT * FROM {table_name} ORDER BY author_name, channel"
|
||||
params = None
|
||||
|
||||
rows = self.db_manager.execute_query(sql, params)
|
||||
|
||||
print(f"[OK] 加载数据库表: {table_name}")
|
||||
if date_filter:
|
||||
print(f" 过滤条件: {date_filter} = {self.date_str}")
|
||||
print(f" 记录数量: {len(rows) if rows else 0}")
|
||||
|
||||
return rows if rows else []
|
||||
|
||||
except Exception as e:
|
||||
print(f"[X] 加载数据库数据失败: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
return None
|
||||
|
||||
def generate_record_key(self, record: Dict, key_fields: List[str]) -> str:
|
||||
"""生成记录唯一键
|
||||
|
||||
Args:
|
||||
record: 数据记录
|
||||
key_fields: 主键字段列表
|
||||
|
||||
Returns:
|
||||
唯一键字符串
|
||||
"""
|
||||
key_values = []
|
||||
for field in key_fields:
|
||||
value = record.get(field, '')
|
||||
# 统一转为字符串并去除空白
|
||||
key_values.append(str(value).strip())
|
||||
|
||||
return '|'.join(key_values)
|
||||
|
||||
def calculate_record_hash(self, record: Dict, exclude_fields: Optional[Set[str]] = None) -> str:
|
||||
"""计算记录的哈希值(用于内容比对)
|
||||
|
||||
Args:
|
||||
record: 数据记录
|
||||
exclude_fields: 排除的字段集合(如时间戳字段)
|
||||
|
||||
Returns:
|
||||
MD5哈希值
|
||||
"""
|
||||
if exclude_fields is None:
|
||||
exclude_fields = {'updated_at', 'created_at', 'fetch_time'}
|
||||
|
||||
# 排序字段并生成稳定的字符串
|
||||
sorted_items = []
|
||||
for key in sorted(record.keys()):
|
||||
if key not in exclude_fields:
|
||||
value = record.get(key, '')
|
||||
# 浮点数保留4位小数
|
||||
if isinstance(value, float):
|
||||
value = f"{value:.4f}"
|
||||
sorted_items.append(f"{key}={value}")
|
||||
|
||||
content = '|'.join(sorted_items)
|
||||
return hashlib.md5(content.encode('utf-8')).hexdigest()
|
||||
|
||||
def validate_order(self, source1_data: List[Dict], source2_data: List[Dict],
|
||||
source1_name: str, source2_name: str,
|
||||
key_fields: List[str]) -> Dict:
|
||||
"""顺序验证:验证两个数据源中记录的顺序是否一致
|
||||
|
||||
Args:
|
||||
source1_data: 数据源1的数据
|
||||
source2_data: 数据源2的数据
|
||||
source1_name: 数据源1名称
|
||||
source2_name: 数据源2名称
|
||||
key_fields: 主键字段列表
|
||||
|
||||
Returns:
|
||||
验证结果字典
|
||||
"""
|
||||
print(f"\n{'='*70}")
|
||||
print(f"顺序验证: {source1_name} vs {source2_name}")
|
||||
print(f"{'='*70}")
|
||||
|
||||
result = {
|
||||
'source1': source1_name,
|
||||
'source2': source2_name,
|
||||
'source1_count': len(source1_data),
|
||||
'source2_count': len(source2_data),
|
||||
'order_match': True,
|
||||
'mismatches': []
|
||||
}
|
||||
|
||||
# 生成记录键列表
|
||||
source1_keys = [self.generate_record_key(r, key_fields) for r in source1_data]
|
||||
source2_keys = [self.generate_record_key(r, key_fields) for r in source2_data]
|
||||
|
||||
# 比对顺序
|
||||
min_len = min(len(source1_keys), len(source2_keys))
|
||||
|
||||
for i in range(min_len):
|
||||
if source1_keys[i] != source2_keys[i]:
|
||||
result['order_match'] = False
|
||||
result['mismatches'].append({
|
||||
'position': i,
|
||||
'source1_key': source1_keys[i],
|
||||
'source2_key': source2_keys[i]
|
||||
})
|
||||
|
||||
# 输出结果
|
||||
if result['order_match'] and len(source1_keys) == len(source2_keys):
|
||||
print(f"[✓] 顺序一致,记录数相同: {len(source1_keys)}")
|
||||
else:
|
||||
print(f"[X] 顺序不一致")
|
||||
print(f" {source1_name} 记录数: {len(source1_keys)}")
|
||||
print(f" {source2_name} 记录数: {len(source2_keys)}")
|
||||
|
||||
if result['mismatches']:
|
||||
print(f" 不匹配位置数: {len(result['mismatches'])}")
|
||||
# 显示前5个不匹配
|
||||
for mismatch in result['mismatches'][:5]:
|
||||
print(f" 位置{mismatch['position']}: {mismatch['source1_key']} != {mismatch['source2_key']}")
|
||||
|
||||
return result
|
||||
|
||||
def validate_cross(self, source1_data: List[Dict], source2_data: List[Dict],
|
||||
source1_name: str, source2_name: str,
|
||||
key_fields: List[str],
|
||||
compare_fields: Optional[List[str]] = None) -> Dict:
|
||||
"""交叉验证:对比数据内容,识别缺失、新增或不匹配的记录
|
||||
|
||||
Args:
|
||||
source1_data: 数据源1的数据
|
||||
source2_data: 数据源2的数据
|
||||
source1_name: 数据源1名称
|
||||
source2_name: 数据源2名称
|
||||
key_fields: 主键字段列表
|
||||
compare_fields: 需要对比的字段列表(None表示全部字段)
|
||||
|
||||
Returns:
|
||||
验证结果字典
|
||||
"""
|
||||
print(f"\n{'='*70}")
|
||||
print(f"交叉验证: {source1_name} vs {source2_name}")
|
||||
print(f"{'='*70}")
|
||||
|
||||
# 构建字典:key -> record
|
||||
source1_dict = {}
|
||||
for record in source1_data:
|
||||
key = self.generate_record_key(record, key_fields)
|
||||
source1_dict[key] = record
|
||||
|
||||
source2_dict = {}
|
||||
for record in source2_data:
|
||||
key = self.generate_record_key(record, key_fields)
|
||||
source2_dict[key] = record
|
||||
|
||||
# 查找差异
|
||||
only_in_source1 = set(source1_dict.keys()) - set(source2_dict.keys())
|
||||
only_in_source2 = set(source2_dict.keys()) - set(source1_dict.keys())
|
||||
common_keys = set(source1_dict.keys()) & set(source2_dict.keys())
|
||||
|
||||
# 对比共同记录的字段值
|
||||
field_mismatches = []
|
||||
for key in common_keys:
|
||||
record1 = source1_dict[key]
|
||||
record2 = source2_dict[key]
|
||||
|
||||
# 确定要比对的字段
|
||||
if compare_fields:
|
||||
fields_to_compare = compare_fields
|
||||
else:
|
||||
fields_to_compare = set(record1.keys()) & set(record2.keys())
|
||||
|
||||
# 比对每个字段
|
||||
mismatches_in_record = {}
|
||||
for field in fields_to_compare:
|
||||
val1 = record1.get(field, '')
|
||||
val2 = record2.get(field, '')
|
||||
|
||||
# 类型转换和标准化
|
||||
val1_normalized = self._normalize_value(val1)
|
||||
val2_normalized = self._normalize_value(val2)
|
||||
|
||||
if val1_normalized != val2_normalized:
|
||||
mismatches_in_record[field] = {
|
||||
source1_name: val1,
|
||||
source2_name: val2
|
||||
}
|
||||
|
||||
if mismatches_in_record:
|
||||
field_mismatches.append({
|
||||
'key': key,
|
||||
'fields': mismatches_in_record
|
||||
})
|
||||
|
||||
# 输出结果
|
||||
result = {
|
||||
'source1': source1_name,
|
||||
'source2': source2_name,
|
||||
'source1_count': len(source1_data),
|
||||
'source2_count': len(source2_data),
|
||||
'only_in_source1': list(only_in_source1),
|
||||
'only_in_source2': list(only_in_source2),
|
||||
'common_count': len(common_keys),
|
||||
'field_mismatches': field_mismatches
|
||||
}
|
||||
|
||||
print(f"记录数统计:")
|
||||
print(f" {source1_name}: {len(source1_data)} 条")
|
||||
print(f" {source2_name}: {len(source2_data)} 条")
|
||||
print(f" 共同记录: {len(common_keys)} 条")
|
||||
print(f" 仅在{source1_name}: {len(only_in_source1)} 条")
|
||||
print(f" 仅在{source2_name}: {len(only_in_source2)} 条")
|
||||
print(f" 字段不匹配: {len(field_mismatches)} 条")
|
||||
|
||||
# 显示详细差异
|
||||
if only_in_source1:
|
||||
print(f"\n仅在{source1_name}中的记录(前5条):")
|
||||
for key in list(only_in_source1)[:5]:
|
||||
print(f" - {key}")
|
||||
|
||||
if only_in_source2:
|
||||
print(f"\n仅在{source2_name}中的记录(前5条):")
|
||||
for key in list(only_in_source2)[:5]:
|
||||
print(f" - {key}")
|
||||
|
||||
if field_mismatches:
|
||||
print(f"\n字段值不匹配的记录(前3条):")
|
||||
for mismatch in field_mismatches[:3]:
|
||||
print(f" 记录: {mismatch['key']}")
|
||||
for field, values in list(mismatch['fields'].items())[:5]: # 每条记录最多显示5个字段
|
||||
print(f" 字段 {field}:")
|
||||
print(f" {source1_name}: {values[source1_name]}")
|
||||
print(f" {source2_name}: {values[source2_name]}")
|
||||
|
||||
return result
|
||||
|
||||
def _normalize_value(self, value: Any) -> str:
|
||||
"""标准化值用于比对
|
||||
|
||||
Args:
|
||||
value: 原始值
|
||||
|
||||
Returns:
|
||||
标准化后的字符串
|
||||
"""
|
||||
if value is None or value == '':
|
||||
return ''
|
||||
|
||||
# 浮点数保留4位小数
|
||||
if isinstance(value, float):
|
||||
return f"{value:.4f}"
|
||||
|
||||
# 整数转字符串
|
||||
if isinstance(value, int):
|
||||
return str(value)
|
||||
|
||||
# 字符串去除首尾空白
|
||||
return str(value).strip()
|
||||
|
||||
def validate_ai_statistics(self, sources: List[str]) -> bool:
|
||||
"""验证 ai_statistics 表数据
|
||||
|
||||
Args:
|
||||
sources: 数据源列表 ['json', 'csv', 'database']
|
||||
|
||||
Returns:
|
||||
验证是否通过
|
||||
"""
|
||||
print(f"\n{'#'*70}")
|
||||
print(f"# 验证 ai_statistics 表数据")
|
||||
print(f"# 日期: {self.date_str}")
|
||||
print(f"{'#'*70}")
|
||||
|
||||
# 主键字段
|
||||
key_fields = ['author_name', 'channel']
|
||||
|
||||
# 重要字段
|
||||
compare_fields = [
|
||||
'submission_count', 'read_count', 'comment_count', 'comment_rate',
|
||||
'like_count', 'like_rate', 'favorite_count', 'favorite_rate',
|
||||
'share_count', 'share_rate', 'slide_ratio', 'baidu_search_volume'
|
||||
]
|
||||
|
||||
# 加载数据
|
||||
data_sources = {}
|
||||
|
||||
if 'json' in sources:
|
||||
json_data = self.load_json_data()
|
||||
if json_data:
|
||||
# 确保json_data是列表类型
|
||||
if not isinstance(json_data, list):
|
||||
json_data = [json_data]
|
||||
# 从JSON提取 ai_statistics 数据
|
||||
json_records = self._extract_ai_statistics_from_json(json_data)
|
||||
data_sources['json'] = json_records
|
||||
|
||||
if 'csv' in sources:
|
||||
csv_data = self.load_csv_data('ai_statistics.csv')
|
||||
if csv_data:
|
||||
data_sources['csv'] = csv_data
|
||||
|
||||
if 'database' in sources:
|
||||
db_data = self.load_database_data('ai_statistics', date_filter='date')
|
||||
if db_data:
|
||||
data_sources['database'] = db_data
|
||||
|
||||
# 执行验证
|
||||
if len(data_sources) < 2:
|
||||
print(f"[X] 数据源不足,至少需要2个数据源进行比对")
|
||||
return False
|
||||
|
||||
# 两两比对
|
||||
source_names = list(data_sources.keys())
|
||||
all_passed = True
|
||||
|
||||
for i in range(len(source_names)):
|
||||
for j in range(i + 1, len(source_names)):
|
||||
source1_name = source_names[i]
|
||||
source2_name = source_names[j]
|
||||
|
||||
# 只对 json vs csv 进行顺序验证
|
||||
if (source1_name == 'json' and source2_name == 'csv') or \
|
||||
(source1_name == 'csv' and source2_name == 'json'):
|
||||
# 顺序验证
|
||||
order_result = self.validate_order(
|
||||
data_sources[source1_name],
|
||||
data_sources[source2_name],
|
||||
source1_name,
|
||||
source2_name,
|
||||
key_fields
|
||||
)
|
||||
self.validation_results['顺序验证'].append(order_result)
|
||||
|
||||
if not order_result['order_match']:
|
||||
all_passed = False
|
||||
|
||||
# 交叉验证(所有组合都执行)
|
||||
cross_result = self.validate_cross(
|
||||
data_sources[source1_name],
|
||||
data_sources[source2_name],
|
||||
source1_name,
|
||||
source2_name,
|
||||
key_fields,
|
||||
compare_fields
|
||||
)
|
||||
self.validation_results['交叉验证'].append(cross_result)
|
||||
|
||||
# 判断是否通过
|
||||
if cross_result['only_in_source1'] or \
|
||||
cross_result['only_in_source2'] or \
|
||||
cross_result['field_mismatches']:
|
||||
all_passed = False
|
||||
|
||||
return all_passed
|
||||
|
||||
def validate_ai_statistics_day(self, sources: List[str]) -> bool:
|
||||
"""验证 ai_statistics_day 表数据
|
||||
|
||||
Args:
|
||||
sources: 数据源列表
|
||||
|
||||
Returns:
|
||||
验证是否通过
|
||||
"""
|
||||
print(f"\n{'#'*70}")
|
||||
print(f"# 验证 ai_statistics_day 表数据")
|
||||
print(f"# 日期: {self.date_str}")
|
||||
print(f"{'#'*70}")
|
||||
|
||||
key_fields = ['author_name', 'channel', 'stat_date']
|
||||
compare_fields = [
|
||||
'total_submission_count', 'total_read_count', 'total_comment_count',
|
||||
'total_like_count', 'total_favorite_count', 'total_share_count',
|
||||
'avg_comment_rate', 'avg_like_rate', 'avg_favorite_rate',
|
||||
'avg_share_rate', 'avg_slide_ratio', 'total_baidu_search_volume'
|
||||
]
|
||||
|
||||
# 加载数据
|
||||
data_sources = {}
|
||||
|
||||
if 'csv' in sources:
|
||||
csv_data = self.load_csv_data('ai_statistics_day.csv')
|
||||
if csv_data:
|
||||
data_sources['csv'] = csv_data
|
||||
|
||||
if 'database' in sources:
|
||||
db_data = self.load_database_data('ai_statistics_day', date_filter='stat_date')
|
||||
if db_data:
|
||||
data_sources['database'] = db_data
|
||||
|
||||
if len(data_sources) < 2:
|
||||
print(f"[X] 数据源不足")
|
||||
return False
|
||||
|
||||
# 执行验证
|
||||
source_names = list(data_sources.keys())
|
||||
all_passed = True
|
||||
|
||||
for i in range(len(source_names)):
|
||||
for j in range(i + 1, len(source_names)):
|
||||
source1_name = source_names[i]
|
||||
source2_name = source_names[j]
|
||||
|
||||
# ai_statistics_day 表不需要顺序验证,只执行交叉验证
|
||||
cross_result = self.validate_cross(
|
||||
data_sources[source1_name],
|
||||
data_sources[source2_name],
|
||||
source1_name,
|
||||
source2_name,
|
||||
key_fields,
|
||||
compare_fields
|
||||
)
|
||||
self.validation_results['交叉验证'].append(cross_result)
|
||||
|
||||
if cross_result['only_in_source1'] or \
|
||||
cross_result['only_in_source2'] or \
|
||||
cross_result['field_mismatches']:
|
||||
all_passed = False
|
||||
|
||||
return all_passed
|
||||
|
||||
def _extract_ai_statistics_from_json(self, json_data: List[Dict]) -> List[Dict]:
|
||||
"""从JSON数据中提取ai_statistics格式的数据
|
||||
|
||||
Args:
|
||||
json_data: JSON数据
|
||||
|
||||
Returns:
|
||||
ai_statistics格式的数据列表
|
||||
"""
|
||||
records = []
|
||||
|
||||
for account_data in json_data:
|
||||
account_id = account_data.get('account_id', '')
|
||||
if not account_id:
|
||||
continue
|
||||
|
||||
analytics = account_data.get('analytics', {})
|
||||
apis = analytics.get('apis', [])
|
||||
|
||||
if apis and len(apis) > 0:
|
||||
api_data = apis[0].get('data', {})
|
||||
if api_data.get('errno') == 0:
|
||||
total_info = api_data.get('data', {}).get('total_info', {})
|
||||
|
||||
record = {
|
||||
'author_name': account_id,
|
||||
'channel': 1,
|
||||
'submission_count': int(total_info.get('publish_count', 0) or 0),
|
||||
'read_count': int(total_info.get('view_count', 0) or 0),
|
||||
'comment_count': int(total_info.get('comment_count', 0) or 0),
|
||||
'comment_rate': float(total_info.get('comment_rate', 0) or 0) / 100,
|
||||
'like_count': int(total_info.get('likes_count', 0) or 0),
|
||||
'like_rate': float(total_info.get('likes_rate', 0) or 0) / 100,
|
||||
'favorite_count': int(total_info.get('collect_count', 0) or 0),
|
||||
'favorite_rate': float(total_info.get('collect_rate', 0) or 0) / 100,
|
||||
'share_count': int(total_info.get('share_count', 0) or 0),
|
||||
'share_rate': float(total_info.get('share_rate', 0) or 0) / 100,
|
||||
'slide_ratio': float(total_info.get('pic_slide_rate', 0) or 0) / 100,
|
||||
'baidu_search_volume': int(total_info.get('disp_pv', 0) or 0)
|
||||
}
|
||||
records.append(record)
|
||||
|
||||
return records
|
||||
|
||||
def generate_report(self, output_file: Optional[str] = None) -> None:
|
||||
"""生成验证报告
|
||||
|
||||
Args:
|
||||
output_file: 输出文件路径
|
||||
"""
|
||||
if not output_file:
|
||||
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
|
||||
output_file = os.path.join(self.script_dir, f'validation_report_{timestamp}.txt')
|
||||
|
||||
try:
|
||||
with open(output_file, 'w', encoding='utf-8') as f:
|
||||
f.write(f"数据验证报告\n")
|
||||
f.write(f"{'='*70}\n")
|
||||
f.write(f"生成时间: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")
|
||||
f.write(f"目标日期: {self.date_str}\n\n")
|
||||
|
||||
# 顺序验证结果
|
||||
f.write(f"\n顺序验证结果\n")
|
||||
f.write(f"{'-'*70}\n")
|
||||
for result in self.validation_results['顺序验证']:
|
||||
f.write(f"{result['source1']} vs {result['source2']}\n")
|
||||
f.write(f" 顺序匹配: {'是' if result['order_match'] else '否'}\n")
|
||||
f.write(f" {result['source1']} 记录数: {result['source1_count']}\n")
|
||||
f.write(f" {result['source2']} 记录数: {result['source2_count']}\n")
|
||||
if result['mismatches']:
|
||||
f.write(f" 不匹配数: {len(result['mismatches'])}\n")
|
||||
f.write(f"\n")
|
||||
|
||||
# 交叉验证结果
|
||||
f.write(f"\n交叉验证结果\n")
|
||||
f.write(f"{'-'*70}\n")
|
||||
for result in self.validation_results['交叉验证']:
|
||||
f.write(f"{result['source1']} vs {result['source2']}\n")
|
||||
f.write(f" 共同记录: {result['common_count']}\n")
|
||||
f.write(f" 仅在{result['source1']}: {len(result['only_in_source1'])}\n")
|
||||
f.write(f" 仅在{result['source2']}: {len(result['only_in_source2'])}\n")
|
||||
f.write(f" 字段不匹配: {len(result['field_mismatches'])}\n")
|
||||
f.write(f"\n")
|
||||
|
||||
print(f"\n[OK] 验证报告已生成: {output_file}")
|
||||
|
||||
except Exception as e:
|
||||
print(f"[X] 生成报告失败: {e}")
|
||||
|
||||
|
||||
def main():
|
||||
"""主函数"""
|
||||
parser = argparse.ArgumentParser(
|
||||
description='数据比对验证脚本',
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
示例用法:
|
||||
# 验证JSON和CSV
|
||||
python data_validation.py --source json csv --date 2025-12-29
|
||||
|
||||
# 验证CSV和数据库
|
||||
python data_validation.py --source csv database --date 2025-12-29
|
||||
|
||||
# 完整验证(三个数据源)
|
||||
python data_validation.py --source json csv database --date 2025-12-29
|
||||
|
||||
# 验证特定表
|
||||
python data_validation.py --source csv database --table ai_statistics_day --date 2025-12-29
|
||||
"""
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--source',
|
||||
nargs='+',
|
||||
choices=['json', 'csv', 'database'],
|
||||
default=['json', 'csv', 'database'],
|
||||
help='数据源列表(至少2个)'
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--date',
|
||||
type=str,
|
||||
default=(datetime.now() - timedelta(days=1)).strftime('%Y-%m-%d'),
|
||||
help='目标日期 (YYYY-MM-DD),默认为昨天'
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--table',
|
||||
type=str,
|
||||
choices=['ai_statistics', 'ai_statistics_day', 'ai_statistics_days'],
|
||||
default='ai_statistics',
|
||||
help='要验证的表名'
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--report',
|
||||
type=str,
|
||||
help='输出报告文件路径'
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# 检查数据源数量
|
||||
if len(args.source) < 2:
|
||||
print("[X] 至少需要指定2个数据源进行比对")
|
||||
return 1
|
||||
|
||||
# 创建验证器
|
||||
validator = DataValidator(date_str=args.date)
|
||||
|
||||
# 执行验证
|
||||
try:
|
||||
if args.table == 'ai_statistics':
|
||||
passed = validator.validate_ai_statistics(args.source)
|
||||
elif args.table == 'ai_statistics_day':
|
||||
passed = validator.validate_ai_statistics_day(args.source)
|
||||
else:
|
||||
print(f"[!] 表 {args.table} 的验证功能暂未实现")
|
||||
passed = False
|
||||
|
||||
# 生成报告
|
||||
validator.generate_report(args.report)
|
||||
|
||||
# 输出总结
|
||||
print(f"\n{'='*70}")
|
||||
if passed:
|
||||
print(f"[✓] 验证通过:所有数据源数据一致")
|
||||
else:
|
||||
print(f"[X] 验证失败:发现数据差异")
|
||||
print(f"{'='*70}")
|
||||
|
||||
return 0 if passed else 1
|
||||
|
||||
except Exception as e:
|
||||
print(f"\n[X] 验证过程出错: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
return 1
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
sys.exit(main())
|
||||
441
data_validation_with_sms.py
Normal file
441
data_validation_with_sms.py
Normal file
@@ -0,0 +1,441 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
数据验证与短信告警集成脚本
|
||||
|
||||
功能:
|
||||
1. 执行数据验证(JSON/CSV/数据库)
|
||||
2. 如果验证失败,发送阿里云短信告警
|
||||
3. 支持定时任务调度(每天9点执行)
|
||||
|
||||
使用方法:
|
||||
# 手动执行一次验证
|
||||
python data_validation_with_sms.py
|
||||
|
||||
# 指定日期验证
|
||||
python data_validation_with_sms.py --date 2025-12-29
|
||||
|
||||
# 配置定时任务(Windows任务计划程序)
|
||||
python data_validation_with_sms.py --setup-schedule
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
import json
|
||||
from datetime import datetime, timedelta
|
||||
from typing import Dict, List, Optional
|
||||
|
||||
# 添加项目根目录到路径
|
||||
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
|
||||
|
||||
# 导入数据验证模块
|
||||
from data_validation import DataValidator
|
||||
|
||||
# 阿里云短信SDK导入
|
||||
try:
|
||||
from alibabacloud_dysmsapi20170525.client import Client as Dysmsapi20170525Client
|
||||
from alibabacloud_credentials.client import Client as CredentialClient
|
||||
from alibabacloud_credentials.models import Config as CredentialConfig
|
||||
from alibabacloud_tea_openapi import models as open_api_models
|
||||
from alibabacloud_dysmsapi20170525 import models as dysmsapi_20170525_models
|
||||
from alibabacloud_tea_util import models as util_models
|
||||
SMS_AVAILABLE = True
|
||||
except ImportError:
|
||||
print("[!] 阿里云短信SDK未安装,短信功能将不可用")
|
||||
print(" 安装命令: pip install alibabacloud_dysmsapi20170525")
|
||||
SMS_AVAILABLE = False
|
||||
|
||||
|
||||
class SMSAlertConfig:
|
||||
"""短信告警配置"""
|
||||
|
||||
def __init__(self):
|
||||
"""从配置文件或环境变量加载配置"""
|
||||
# 尝试从配置文件加载
|
||||
config_file = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'sms_config.json')
|
||||
config_data = {}
|
||||
|
||||
if os.path.exists(config_file):
|
||||
try:
|
||||
with open(config_file, 'r', encoding='utf-8') as f:
|
||||
config_data = json.load(f)
|
||||
except Exception as e:
|
||||
print(f"[!] 读取配置文件失败: {e}")
|
||||
|
||||
# 阿里云访问凭据(优先使用环境变量)
|
||||
self.ACCESS_KEY_ID = os.environ.get(
|
||||
'ALIBABA_CLOUD_ACCESS_KEY_ID',
|
||||
config_data.get('access_key_id', 'LTAI5tSMvnCJdqkZtCVWgh8R')
|
||||
)
|
||||
self.ACCESS_KEY_SECRET = os.environ.get(
|
||||
'ALIBABA_CLOUD_ACCESS_KEY_SECRET',
|
||||
config_data.get('access_key_secret', 'nyFzXyIi47peVLK4wR2qqbPezmU79W')
|
||||
)
|
||||
|
||||
# 短信签名和模板
|
||||
self.SIGN_NAME = config_data.get('sign_name', '北京乐航时代科技')
|
||||
self.TEMPLATE_CODE = config_data.get('template_code', 'SMS_486210104')
|
||||
|
||||
# 接收短信的手机号(多个号码用逗号分隔)
|
||||
self.PHONE_NUMBERS = config_data.get('phone_numbers', '13621242430')
|
||||
|
||||
# 短信endpoint
|
||||
self.ENDPOINT = config_data.get('endpoint', 'dysmsapi.aliyuncs.com')
|
||||
|
||||
@staticmethod
|
||||
def get_instance():
|
||||
"""获取配置实例(单例模式)"""
|
||||
if not hasattr(SMSAlertConfig, '_instance'):
|
||||
SMSAlertConfig._instance = SMSAlertConfig()
|
||||
return SMSAlertConfig._instance
|
||||
|
||||
|
||||
class DataValidationWithSMS:
|
||||
"""数据验证与短信告警集成器"""
|
||||
|
||||
def __init__(self, date_str: Optional[str] = None):
|
||||
"""初始化
|
||||
|
||||
Args:
|
||||
date_str: 目标日期 (YYYY-MM-DD),默认为昨天
|
||||
"""
|
||||
self.validator = DataValidator(date_str)
|
||||
self.sms_client = None
|
||||
self.sms_config = SMSAlertConfig.get_instance()
|
||||
|
||||
if SMS_AVAILABLE:
|
||||
self.sms_client = self._create_sms_client()
|
||||
|
||||
def _create_sms_client(self) -> Optional[Dysmsapi20170525Client]:
|
||||
"""创建阿里云短信客户端
|
||||
|
||||
Returns:
|
||||
短信客户端实例
|
||||
"""
|
||||
try:
|
||||
credential_config = CredentialConfig(
|
||||
type='access_key',
|
||||
access_key_id=self.sms_config.ACCESS_KEY_ID,
|
||||
access_key_secret=self.sms_config.ACCESS_KEY_SECRET
|
||||
)
|
||||
credential = CredentialClient(credential_config)
|
||||
config = open_api_models.Config(
|
||||
credential=credential,
|
||||
endpoint=self.sms_config.ENDPOINT
|
||||
)
|
||||
return Dysmsapi20170525Client(config)
|
||||
except Exception as e:
|
||||
print(f"[X] 创建短信客户端失败: {e}")
|
||||
return None
|
||||
|
||||
def send_sms_alert(self, error_code: str, error_details: str) -> bool:
|
||||
"""发送短信告警
|
||||
|
||||
Args:
|
||||
error_code: 错误代码(如 "2222")
|
||||
error_details: 错误详情
|
||||
|
||||
Returns:
|
||||
是否发送成功
|
||||
"""
|
||||
if not self.sms_client:
|
||||
print(f"[X] 短信客户端未初始化,无法发送告警")
|
||||
return False
|
||||
|
||||
try:
|
||||
# 构建短信请求
|
||||
send_sms_request = dysmsapi_20170525_models.SendSmsRequest(
|
||||
phone_numbers=self.sms_config.PHONE_NUMBERS,
|
||||
sign_name=self.sms_config.SIGN_NAME,
|
||||
template_code=self.sms_config.TEMPLATE_CODE,
|
||||
template_param=json.dumps({"code": error_code})
|
||||
)
|
||||
|
||||
runtime = util_models.RuntimeOptions()
|
||||
|
||||
print(f"\n[短信] 正在发送告警短信...")
|
||||
print(f" 接收号码: {self.sms_config.PHONE_NUMBERS}")
|
||||
print(f" 错误代码: {error_code}")
|
||||
print(f" 错误详情: {error_details[:100]}...")
|
||||
|
||||
# 发送短信
|
||||
resp = self.sms_client.send_sms_with_options(send_sms_request, runtime)
|
||||
|
||||
# 检查响应
|
||||
result = resp.to_map()
|
||||
if result.get('body', {}).get('Code') == 'OK':
|
||||
print(f"[✓] 短信发送成功")
|
||||
print(f" 请求ID: {result.get('body', {}).get('RequestId')}")
|
||||
print(f" 消息ID: {result.get('body', {}).get('BizId')}")
|
||||
return True
|
||||
else:
|
||||
print(f"[X] 短信发送失败")
|
||||
print(f" 错误码: {result.get('body', {}).get('Code')}")
|
||||
print(f" 错误信息: {result.get('body', {}).get('Message')}")
|
||||
return False
|
||||
|
||||
except Exception as e:
|
||||
print(f"[X] 发送短信异常: {e}")
|
||||
if hasattr(e, 'data') and e.data:
|
||||
print(f" 诊断地址: {e.data.get('Recommend')}")
|
||||
return False
|
||||
|
||||
def run_validation(self, sources: List[str] = None, table: str = 'ai_statistics') -> bool:
|
||||
"""执行数据验证
|
||||
|
||||
Args:
|
||||
sources: 数据源列表,默认 ['json', 'csv', 'database']
|
||||
table: 要验证的表名
|
||||
|
||||
Returns:
|
||||
验证是否通过
|
||||
"""
|
||||
if sources is None:
|
||||
sources = ['json', 'csv', 'database']
|
||||
|
||||
print(f"\n{'='*70}")
|
||||
print(f"数据验证与短信告警")
|
||||
print(f"{'='*70}")
|
||||
print(f"验证日期: {self.validator.date_str}")
|
||||
print(f"验证表: {table}")
|
||||
print(f"数据源: {', '.join(sources)}")
|
||||
print(f"{'='*70}")
|
||||
|
||||
try:
|
||||
# 执行验证
|
||||
if table == 'ai_statistics':
|
||||
passed = self.validator.validate_ai_statistics(sources)
|
||||
elif table == 'ai_statistics_day':
|
||||
passed = self.validator.validate_ai_statistics_day(sources)
|
||||
elif table == 'ai_statistics_days':
|
||||
# TODO: 实现 ai_statistics_days 验证
|
||||
print(f"[!] 表 {table} 的验证功能暂未实现")
|
||||
passed = False
|
||||
else:
|
||||
print(f"[X] 未知的表名: {table}")
|
||||
passed = False
|
||||
|
||||
return passed
|
||||
|
||||
except Exception as e:
|
||||
print(f"\n[X] 验证过程出错: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
return False
|
||||
|
||||
def generate_error_summary(self) -> str:
|
||||
"""生成错误摘要信息
|
||||
|
||||
Returns:
|
||||
错误摘要字符串
|
||||
"""
|
||||
results = self.validator.validation_results
|
||||
|
||||
summary_lines = []
|
||||
summary_lines.append(f"日期: {self.validator.date_str}")
|
||||
|
||||
# 顺序验证错误
|
||||
order_errors = [r for r in results['顺序验证'] if not r['order_match']]
|
||||
if order_errors:
|
||||
summary_lines.append(f"顺序不一致: {len(order_errors)}个")
|
||||
|
||||
# 交叉验证错误
|
||||
cross_errors = []
|
||||
for r in results['交叉验证']:
|
||||
if r['only_in_source1'] or r['only_in_source2'] or r['field_mismatches']:
|
||||
cross_errors.append(r)
|
||||
|
||||
if cross_errors:
|
||||
summary_lines.append(f"数据不一致: {len(cross_errors)}个")
|
||||
|
||||
# 统计详情
|
||||
total_missing = sum(len(r['only_in_source1']) for r in cross_errors)
|
||||
total_extra = sum(len(r['only_in_source2']) for r in cross_errors)
|
||||
total_diff = sum(len(r['field_mismatches']) for r in cross_errors)
|
||||
|
||||
if total_missing:
|
||||
summary_lines.append(f" 缺失记录: {total_missing}条")
|
||||
if total_extra:
|
||||
summary_lines.append(f" 多余记录: {total_extra}条")
|
||||
if total_diff:
|
||||
summary_lines.append(f" 字段差异: {total_diff}条")
|
||||
|
||||
return '; '.join(summary_lines)
|
||||
|
||||
def run_with_alert(self, sources: List[str] = None, table: str = 'ai_statistics') -> int:
|
||||
"""执行验证并在失败时发送告警
|
||||
|
||||
Args:
|
||||
sources: 数据源列表
|
||||
table: 要验证的表名
|
||||
|
||||
Returns:
|
||||
退出码(0=成功,1=失败)
|
||||
"""
|
||||
# 执行验证
|
||||
passed = self.run_validation(sources, table)
|
||||
|
||||
# 创建验证报告目录
|
||||
script_dir = os.path.dirname(os.path.abspath(__file__))
|
||||
validation_reports_dir = os.path.join(script_dir, 'validation_reports')
|
||||
if not os.path.exists(validation_reports_dir):
|
||||
os.makedirs(validation_reports_dir)
|
||||
|
||||
# 生成报告
|
||||
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
|
||||
report_file = os.path.join(
|
||||
validation_reports_dir,
|
||||
f'validation_report_{timestamp}.txt'
|
||||
)
|
||||
self.validator.generate_report(report_file)
|
||||
|
||||
# 判断是否需要发送告警
|
||||
if not passed:
|
||||
print(f"\n{'='*70}")
|
||||
print(f"[!] 验证失败,准备发送短信告警")
|
||||
print(f"{'='*70}")
|
||||
|
||||
# 生成错误摘要
|
||||
error_summary = self.generate_error_summary()
|
||||
|
||||
# 发送短信(错误代码固定为 "2222")
|
||||
sms_sent = self.send_sms_alert("2222", error_summary)
|
||||
|
||||
if sms_sent:
|
||||
print(f"\n[✓] 告警短信已发送")
|
||||
else:
|
||||
print(f"\n[X] 告警短信发送失败")
|
||||
|
||||
print(f"\n详细报告: {report_file}")
|
||||
return 1
|
||||
else:
|
||||
print(f"\n{'='*70}")
|
||||
print(f"[✓] 验证通过,无需发送告警")
|
||||
print(f"{'='*70}")
|
||||
return 0
|
||||
|
||||
|
||||
def setup_windows_task_scheduler():
|
||||
"""配置Windows任务计划程序(每天9点执行)"""
|
||||
print(f"\n{'='*70}")
|
||||
print(f"配置Windows任务计划程序")
|
||||
print(f"{'='*70}")
|
||||
|
||||
script_path = os.path.abspath(__file__)
|
||||
python_path = sys.executable
|
||||
|
||||
# 生成任务计划XML配置
|
||||
task_name = "DataValidationWithSMS"
|
||||
|
||||
print(f"\n请手动创建Windows任务计划,或使用以下PowerShell命令:\n")
|
||||
|
||||
# PowerShell命令
|
||||
ps_command = f"""
|
||||
# 创建任务计划
|
||||
$action = New-ScheduledTaskAction -Execute '{python_path}' -Argument '{script_path}'
|
||||
$trigger = New-ScheduledTaskTrigger -Daily -At 9:00AM
|
||||
$settings = New-ScheduledTaskSettingsSet -AllowStartIfOnBatteries -DontStopIfGoingOnBatteries
|
||||
$principal = New-ScheduledTaskPrincipal -UserId "$env:USERNAME" -RunLevel Highest
|
||||
|
||||
Register-ScheduledTask -TaskName "{task_name}" -Action $action -Trigger $trigger -Settings $settings -Principal $principal -Description "每天9点执行数据验证并发送短信告警"
|
||||
|
||||
Write-Host "任务计划已创建: {task_name}"
|
||||
"""
|
||||
|
||||
print(ps_command)
|
||||
|
||||
print(f"\n或者手动配置:")
|
||||
print(f"1. 打开 '任务计划程序' (taskschd.msc)")
|
||||
print(f"2. 创建基本任务")
|
||||
print(f"3. 名称: {task_name}")
|
||||
print(f"4. 触发器: 每天 上午9:00")
|
||||
print(f"5. 操作: 启动程序")
|
||||
print(f"6. 程序: {python_path}")
|
||||
print(f"7. 参数: {script_path}")
|
||||
print(f"8. 完成")
|
||||
|
||||
print(f"\n{'='*70}")
|
||||
|
||||
|
||||
def main():
|
||||
"""主函数"""
|
||||
import argparse
|
||||
|
||||
parser = argparse.ArgumentParser(
|
||||
description='数据验证与短信告警集成脚本',
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--date',
|
||||
type=str,
|
||||
help='目标日期 (YYYY-MM-DD),默认为昨天'
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--source',
|
||||
nargs='+',
|
||||
choices=['json', 'csv', 'database'],
|
||||
default=['json', 'csv', 'database'],
|
||||
help='数据源列表'
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--table',
|
||||
type=str,
|
||||
choices=['ai_statistics', 'ai_statistics_day', 'ai_statistics_days'],
|
||||
default='ai_statistics',
|
||||
help='要验证的表名'
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--setup-schedule',
|
||||
action='store_true',
|
||||
help='配置定时任务(每天9点执行)'
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--test-sms',
|
||||
action='store_true',
|
||||
help='测试短信发送功能'
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--no-sms',
|
||||
action='store_true',
|
||||
help='禁用短信发送(仅验证数据)'
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# 配置定时任务
|
||||
if args.setup_schedule:
|
||||
setup_windows_task_scheduler()
|
||||
return 0
|
||||
|
||||
# 测试短信
|
||||
if args.test_sms:
|
||||
print(f"\n{'='*70}")
|
||||
print(f"测试短信发送功能")
|
||||
print(f"{'='*70}")
|
||||
|
||||
validator = DataValidationWithSMS()
|
||||
success = validator.send_sms_alert(
|
||||
"2222",
|
||||
"这是一条测试短信,数据验证系统运行正常"
|
||||
)
|
||||
return 0 if success else 1
|
||||
|
||||
# 执行验证
|
||||
try:
|
||||
validator = DataValidationWithSMS(date_str=args.date)
|
||||
return validator.run_with_alert(args.source, args.table)
|
||||
except Exception as e:
|
||||
print(f"\n[X] 程序执行失败: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
return 1
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
sys.exit(main())
|
||||
@@ -28,70 +28,19 @@ CREATE TABLE `ai_statistics_days` (
|
||||
`channel` tinyint(1) NOT NULL DEFAULT 1 COMMENT '1=baidu|2=toutiao|3=weixin',
|
||||
`stat_date` date NOT NULL COMMENT '统计日期(自然日)',
|
||||
`daily_published_count` int NULL DEFAULT 0 COMMENT '单日发文量',
|
||||
`cumulative_published_count` int NULL DEFAULT 0 COMMENT '累计发文量(从起始日到stat_date的总和)',
|
||||
`monthly_revenue` decimal(18, 2) NULL DEFAULT 0.00 COMMENT '当月收益(stat_date所在自然月的总收益)',
|
||||
`weekly_revenue` decimal(18, 2) NULL DEFAULT 0.00 COMMENT '当周收益(stat_date所在自然周的总收益,周一至周日)',
|
||||
`revenue_mom_growth_rate` decimal(10, 6) NULL DEFAULT 0.000000 COMMENT '收益月环比增长率((本月收益 - 上月收益) / NULLIF(上月收益, 0))',
|
||||
`revenue_wow_growth_rate` decimal(10, 6) NULL DEFAULT 0.000000 COMMENT '收益周环比增长率((本周收益 - 上周收益) / NULLIF(上周收益, 0))',
|
||||
`cumulative_published_count` int NULL DEFAULT 0 COMMENT '累计发文量(当月1号至stat_date的累计发文量)',
|
||||
`day_revenue` decimal(18, 2) NULL DEFAULT 0.00 COMMENT '当日收益',
|
||||
`created_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
|
||||
`updated_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
|
||||
PRIMARY KEY (`id`) USING BTREE,
|
||||
UNIQUE INDEX `uk_stat_date`(`stat_date` ASC) USING BTREE,
|
||||
INDEX `idx_stat_date`(`stat_date` ASC) USING BTREE
|
||||
) ENGINE = InnoDB AUTO_INCREMENT = 51 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_0900_ai_ci COMMENT = 'AI内容每日核心指标汇总表(含累计、收益及环比)' ROW_FORMAT = Dynamic;
|
||||
UNIQUE INDEX `uk_author_stat_date`(`author_id` ASC, `channel` ASC, `stat_date` ASC) USING BTREE,
|
||||
INDEX `idx_stat_date`(`stat_date` ASC) USING BTREE,
|
||||
INDEX `idx_author_id`(`author_id` ASC) USING BTREE
|
||||
) ENGINE = InnoDB AUTO_INCREMENT = 1 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_0900_ai_ci COMMENT = 'AI内容每日核心指标汇总表(日粒度数据)' ROW_FORMAT = Dynamic;
|
||||
|
||||
-- ----------------------------
|
||||
-- Records of ai_statistics_days
|
||||
-- ----------------------------
|
||||
INSERT INTO `ai_statistics_days` VALUES (1, 129, '梁金宇医生', 1, '2025-10-28', 27, 27, 198.44, 198.44, 0.000000, 0.000000, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (2, 127, '黄燕飞医生', 1, '2025-10-29', 6, 33, 382.29, 382.29, 0.000000, 0.000000, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (3, 151, '皮肤科赵鹏', 1, '2025-10-30', 30, 63, 1317.62, 1317.62, 0.000000, 0.000000, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (4, 132, '石爱真医生', 1, '2025-10-31', 22, 85, 1435.84, 1435.84, 0.000000, 0.000000, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (5, 211, '中医王倚东', 1, '2025-11-01', 27, 112, 116.15, 1551.99, -0.919107, 0.000000, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (6, 176, '武娜中医', 1, '2025-11-02', 11, 123, 1025.18, 2461.02, -0.286007, 0.000000, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (7, 193, '血管外科钟若雷', 1, '2025-11-03', 6, 129, 1462.23, 437.05, 0.018379, -0.822411, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (8, 104, '男科医生刘德风', 1, '2025-11-04', 5, 134, 2050.55, 1025.37, 0.428119, -0.583356, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (9, 175, '静脉曲张的杀手医生', 1, '2025-11-05', 12, 146, 3004.99, 1979.81, 1.092845, -0.195533, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (10, 202, '整形外科侯丽平', 1, '2025-11-06', 26, 172, 3260.49, 2235.31, 1.270789, -0.091714, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (11, 117, '唐小明医生', 1, '2025-11-07', 13, 185, 4064.21, 3039.03, 1.830545, 0.234866, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (12, 214, '传海2018', 1, '2025-11-08', 12, 197, 4961.73, 3936.55, 2.455629, 0.599560, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (13, 170, '任志宏中医', 1, '2025-11-09', 18, 215, 5160.70, 4135.52, 2.594203, 0.680409, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (14, 179, '风湿免疫专家李小峰', 1, '2025-11-10', 18, 233, 5794.59, 633.89, 3.035679, -0.846721, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (15, 202, '整形外科侯丽平', 1, '2025-11-11', 14, 247, 6673.98, 1513.28, 3.648136, -0.634077, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (16, 203, '中医针灸侯医生', 1, '2025-11-12', 19, 266, 7412.32, 2251.62, 4.162358, -0.455541, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (17, 217, '中医杨志杰', 1, '2025-11-13', 24, 290, 7641.13, 2480.43, 4.321714, -0.400213, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (18, 115, '冯玉燕医生', 1, '2025-11-14', 25, 315, 8384.03, 3223.33, 4.839112, -0.220574, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (19, 184, '耳鼻喉科贾闯医生', 1, '2025-11-15', 22, 337, 9067.99, 3907.29, 5.315460, -0.055188, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (20, 131, '骆小辉副主任医师', 1, '2025-11-16', 14, 351, 9538.20, 4377.50, 5.642941, 0.058513, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (21, 130, '妇产科许春艳', 1, '2025-11-17', 9, 360, 9827.47, 289.27, 5.844405, -0.933919, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (22, 211, '中医王倚东', 1, '2025-11-18', 19, 379, 10482.77, 944.57, 6.300793, -0.784222, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (23, 201, '面部提升梁永鑫', 1, '2025-11-19', 22, 401, 11126.90, 1588.70, 6.749401, -0.637076, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (24, 181, '针灸科高小勇医生', 1, '2025-11-20', 8, 409, 11849.59, 2311.39, 7.252723, -0.471984, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (25, 122, '高丽娜中医', 1, '2025-11-21', 6, 415, 12167.82, 2629.62, 7.474356, -0.399287, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (26, 105, '抗衰孟大夫', 1, '2025-11-22', 29, 444, 12921.45, 3383.25, 7.999227, -0.227127, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (27, 111, '赖婷医生', 1, '2025-11-23', 13, 457, 13852.86, 4314.66, 8.647913, -0.014355, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (28, 214, '传海2018', 1, '2025-11-24', 23, 480, 14590.97, 738.11, 9.161975, -0.828930, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (29, 226, '中医李伟杰', 1, '2025-11-25', 27, 507, 14899.04, 1046.18, 9.376532, -0.757529, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (30, 180, '尹海琴医生', 1, '2025-11-26', 26, 533, 15860.20, 2007.34, 10.045938, -0.534763, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (31, 136, '普外科马春雷', 1, '2025-11-27', 15, 548, 16177.26, 2324.40, 10.266757, -0.461279, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (32, 241, '测试作者_更新', 1, '2025-11-28', 8, 556, 16606.60, 2753.74, 10.565773, -0.361771, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (33, 153, '曹凤娇中医', 1, '2025-11-29', 24, 580, 16946.60, 3093.74, 10.802569, -0.282970, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (34, 175, '静脉曲张的杀手医生', 1, '2025-11-30', 20, 600, 17569.87, 3717.01, 11.236649, -0.138516, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (35, 173, '赵剑锋医生', 1, '2025-12-01', 15, 615, 687.78, 687.78, -0.960855, -0.814964, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (36, 196, '血管外科阿力木', 1, '2025-12-02', 22, 637, 1298.68, 1298.68, -0.926085, -0.650612, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (37, 248, '百胜号', 2, '2025-12-03', 7, 644, 1620.63, 1620.63, -0.907761, -0.563996, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (38, 185, '郭俊恒中医', 1, '2025-12-04', 18, 662, 2172.80, 2172.80, -0.876334, -0.415444, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (39, 104, '男科医生刘德风', 1, '2025-12-05', 18, 680, 2813.87, 2813.87, -0.839847, -0.242975, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (40, 214, '传海2018', 1, '2025-12-06', 15, 695, 3393.18, 3393.18, -0.806875, -0.087121, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (41, 245, '文龙号', 2, '2025-12-07', 20, 715, 4382.30, 4382.30, -0.750579, 0.178985, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (42, 198, '拇外翻医生李昕宇', 1, '2025-12-08', 5, 720, 4487.00, 104.70, -0.744620, -0.976108, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (43, 175, '静脉曲张的杀手医生', 1, '2025-12-09', 16, 736, 4628.11, 245.81, -0.736588, -0.943908, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (44, 110, '白凌文医生', 1, '2025-12-10', 10, 746, 5393.95, 1011.65, -0.693000, -0.769151, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (45, 141, '耳鼻喉科杨书勋医生', 1, '2025-12-11', 7, 753, 5897.48, 1515.18, -0.664341, -0.654250, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (46, 226, '中医李伟杰', 1, '2025-12-12', 11, 764, 6830.48, 2448.18, -0.611239, -0.441348, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (47, 183, '杜晋芳中医', 1, '2025-12-13', 22, 786, 7500.72, 3118.42, -0.573092, -0.288406, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (48, 192, '整形医生路会', 1, '2025-12-14', 26, 812, 7738.47, 3356.17, -0.559560, -0.234153, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (49, 146, 'Dr蓝剑雄', 1, '2025-12-15', 12, 824, 8072.01, 333.54, -0.540577, -0.900619, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
INSERT INTO `ai_statistics_days` VALUES (50, 241, '测试作者_更新', 1, '2025-12-16', 14, 838, 8548.49, 810.02, -0.513457, -0.758648, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
|
||||
-- 数据已清空,由导入脚本从CSV文件导入
|
||||
|
||||
SET FOREIGN_KEY_CHECKS = 1;
|
||||
|
||||
45
db/ai_statistics_monthly.sql
Normal file
45
db/ai_statistics_monthly.sql
Normal file
@@ -0,0 +1,45 @@
|
||||
/*
|
||||
Navicat Premium Dump SQL
|
||||
|
||||
Source Server : mixue
|
||||
Source Server Type : MySQL
|
||||
Source Server Version : 90001 (9.0.1)
|
||||
Source Host : localhost:3306
|
||||
Source Schema : ai_article
|
||||
|
||||
Target Server Type : MySQL
|
||||
Target Server Version : 90001 (9.0.1)
|
||||
File Encoding : 65001
|
||||
|
||||
Date: 25/12/2025 14:30:00
|
||||
*/
|
||||
|
||||
SET NAMES utf8mb4;
|
||||
SET FOREIGN_KEY_CHECKS = 0;
|
||||
|
||||
-- ----------------------------
|
||||
-- Table structure for ai_statistics_monthly
|
||||
-- ----------------------------
|
||||
DROP TABLE IF EXISTS `ai_statistics_monthly`;
|
||||
CREATE TABLE `ai_statistics_monthly` (
|
||||
`id` bigint NOT NULL AUTO_INCREMENT COMMENT '自增主键',
|
||||
`author_id` int NOT NULL DEFAULT 0 COMMENT '作者ID',
|
||||
`author_name` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '作者名称',
|
||||
`channel` tinyint(1) NOT NULL DEFAULT 1 COMMENT '1=baidu|2=toutiao|3=weixin',
|
||||
`stat_monthly` varchar(7) NOT NULL COMMENT '统计月份(格式:YYYY-MM,如2025-12表示2025年12月)',
|
||||
`monthly_revenue` decimal(18, 2) NULL DEFAULT 0.00 COMMENT '当月收益(stat_monthly所在自然月的总收益)',
|
||||
`revenue_mom_growth_rate` decimal(10, 6) NULL DEFAULT 0.000000 COMMENT '收益月环比增长率((本月收益 - 上月收益) / NULLIF(上月收益, 0))',
|
||||
`created_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
|
||||
`updated_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
|
||||
PRIMARY KEY (`id`) USING BTREE,
|
||||
UNIQUE INDEX `uk_author_stat_date`(`author_id` ASC, `stat_monthly` ASC) USING BTREE,
|
||||
INDEX `idx_stat_date`(`stat_monthly` ASC) USING BTREE,
|
||||
INDEX `idx_author_id`(`author_id` ASC) USING BTREE
|
||||
) ENGINE = InnoDB AUTO_INCREMENT = 1 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_0900_ai_ci COMMENT = 'AI内容每月核心指标汇总表(月粒度数据)' ROW_FORMAT = Dynamic;
|
||||
|
||||
-- ----------------------------
|
||||
-- Records of ai_statistics_monthly
|
||||
-- ----------------------------
|
||||
-- 数据由导入脚本从CSV文件自动导入
|
||||
|
||||
SET FOREIGN_KEY_CHECKS = 1;
|
||||
45
db/ai_statistics_weekly.sql
Normal file
45
db/ai_statistics_weekly.sql
Normal file
@@ -0,0 +1,45 @@
|
||||
/*
|
||||
Navicat Premium Dump SQL
|
||||
|
||||
Source Server : mixue
|
||||
Source Server Type : MySQL
|
||||
Source Server Version : 90001 (9.0.1)
|
||||
Source Host : localhost:3306
|
||||
Source Schema : ai_article
|
||||
|
||||
Target Server Type : MySQL
|
||||
Target Server Version : 90001 (9.0.1)
|
||||
File Encoding : 65001
|
||||
|
||||
Date: 25/12/2025 14:30:00
|
||||
*/
|
||||
|
||||
SET NAMES utf8mb4;
|
||||
SET FOREIGN_KEY_CHECKS = 0;
|
||||
|
||||
-- ----------------------------
|
||||
-- Table structure for ai_statistics_weekly
|
||||
-- ----------------------------
|
||||
DROP TABLE IF EXISTS `ai_statistics_weekly`;
|
||||
CREATE TABLE `ai_statistics_weekly` (
|
||||
`id` bigint NOT NULL AUTO_INCREMENT COMMENT '自增主键',
|
||||
`author_id` int NOT NULL DEFAULT 0 COMMENT '作者ID',
|
||||
`author_name` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '作者名称',
|
||||
`channel` tinyint(1) NOT NULL DEFAULT 1 COMMENT '1=baidu|2=toutiao|3=weixin',
|
||||
`stat_weekly` varchar(2) NOT NULL COMMENT '统计周次(格式:WW,如51表示第51周)',
|
||||
`weekly_revenue` decimal(18, 2) NULL DEFAULT 0.00 COMMENT '当周收益(stat_weekly所在自然周的总收益,周一至周日)',
|
||||
`revenue_wow_growth_rate` decimal(10, 6) NULL DEFAULT 0.000000 COMMENT '收益周环比增长率((本周收益 - 上周收益) / NULLIF(上周收益, 0))',
|
||||
`created_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
|
||||
`updated_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
|
||||
PRIMARY KEY (`id`) USING BTREE,
|
||||
UNIQUE INDEX `uk_author_stat_date`(`author_id` ASC, `stat_weekly` ASC) USING BTREE,
|
||||
INDEX `idx_stat_date`(`stat_weekly` ASC) USING BTREE,
|
||||
INDEX `idx_author_id`(`author_id` ASC) USING BTREE
|
||||
) ENGINE = InnoDB AUTO_INCREMENT = 1 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_0900_ai_ci COMMENT = 'AI内容每周核心指标汇总表(周粒度数据)' ROW_FORMAT = Dynamic;
|
||||
|
||||
-- ----------------------------
|
||||
-- Records of ai_statistics_weekly
|
||||
-- ----------------------------
|
||||
-- 数据由导入脚本从CSV文件自动导入
|
||||
|
||||
SET FOREIGN_KEY_CHECKS = 1;
|
||||
105
deploy_daemon.sh
Normal file
105
deploy_daemon.sh
Normal file
@@ -0,0 +1,105 @@
|
||||
#!/bin/bash
|
||||
# 数据同步守护进程部署脚本(Linux systemd)
|
||||
|
||||
echo "============================================================"
|
||||
echo "百家号数据同步守护进程 - 部署脚本"
|
||||
echo "含数据验证与短信告警功能"
|
||||
echo "============================================================"
|
||||
echo ""
|
||||
|
||||
# 检查是否为root用户
|
||||
if [ "$EUID" -ne 0 ]; then
|
||||
echo "[错误] 请使用root用户运行此脚本"
|
||||
echo " sudo bash deploy_daemon.sh"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# 项目目录(根据实际情况修改)
|
||||
PROJECT_DIR="/root/xhh_baijiahao"
|
||||
SERVICE_NAME="bjh_daemon"
|
||||
|
||||
echo "[1/6] 检查项目目录..."
|
||||
if [ ! -d "$PROJECT_DIR" ]; then
|
||||
echo "[错误] 项目目录不存在: $PROJECT_DIR"
|
||||
exit 1
|
||||
fi
|
||||
echo " 项目目录: $PROJECT_DIR"
|
||||
echo ""
|
||||
|
||||
echo "[2/6] 检查Python依赖..."
|
||||
cd "$PROJECT_DIR"
|
||||
python3 -c "import schedule" 2>/dev/null
|
||||
if [ $? -ne 0 ]; then
|
||||
echo " 安装 schedule 模块..."
|
||||
pip3 install schedule
|
||||
fi
|
||||
|
||||
python3 -c "from data_validation_with_sms import DataValidationWithSMS" 2>/dev/null
|
||||
if [ $? -ne 0 ]; then
|
||||
echo "[警告] 数据验证模块检查失败,请确保以下文件存在:"
|
||||
echo " - data_validation.py"
|
||||
echo " - data_validation_with_sms.py"
|
||||
echo " - sms_config.json"
|
||||
fi
|
||||
|
||||
python3 -c "from alibabacloud_dysmsapi20170525.client import Client" 2>/dev/null
|
||||
if [ $? -ne 0 ]; then
|
||||
echo " 安装阿里云短信SDK..."
|
||||
pip3 install alibabacloud_dysmsapi20170525 alibabacloud_credentials alibabacloud_tea_openapi alibabacloud_tea_util
|
||||
fi
|
||||
echo ""
|
||||
|
||||
echo "[3/6] 配置systemd服务..."
|
||||
# 复制服务文件
|
||||
cp "$PROJECT_DIR/bjh_daemon.service" /etc/systemd/system/
|
||||
chmod 644 /etc/systemd/system/bjh_daemon.service
|
||||
|
||||
# 重新加载systemd配置
|
||||
systemctl daemon-reload
|
||||
echo " 服务文件已安装: /etc/systemd/system/bjh_daemon.service"
|
||||
echo ""
|
||||
|
||||
echo "[4/6] 配置短信告警..."
|
||||
if [ ! -f "$PROJECT_DIR/sms_config.json" ]; then
|
||||
echo "[警告] 未找到 sms_config.json,短信功能可能不可用"
|
||||
echo " 请创建配置文件: $PROJECT_DIR/sms_config.json"
|
||||
else
|
||||
echo " 短信配置文件已存在: sms_config.json"
|
||||
fi
|
||||
echo ""
|
||||
|
||||
echo "[5/6] 启用并启动服务..."
|
||||
systemctl enable bjh_daemon.service
|
||||
systemctl start bjh_daemon.service
|
||||
|
||||
# 等待2秒
|
||||
sleep 2
|
||||
echo ""
|
||||
|
||||
echo "[6/6] 检查服务状态..."
|
||||
systemctl status bjh_daemon.service --no-pager
|
||||
echo ""
|
||||
|
||||
echo "============================================================"
|
||||
echo "部署完成!"
|
||||
echo "============================================================"
|
||||
echo ""
|
||||
echo "常用命令:"
|
||||
echo " 查看状态: sudo systemctl status bjh_daemon"
|
||||
echo " 查看日志: sudo journalctl -u bjh_daemon -f"
|
||||
echo " 停止服务: sudo systemctl stop bjh_daemon"
|
||||
echo " 重启服务: sudo systemctl restart bjh_daemon"
|
||||
echo " 禁用服务: sudo systemctl disable bjh_daemon"
|
||||
echo ""
|
||||
echo "配置文件:"
|
||||
echo " systemd配置: /etc/systemd/system/bjh_daemon.service"
|
||||
echo " 短信配置: $PROJECT_DIR/sms_config.json"
|
||||
echo " 程序日志: $PROJECT_DIR/logs/data_sync_daemon.log"
|
||||
echo " 验证报告: $PROJECT_DIR/validation_reports/"
|
||||
echo ""
|
||||
echo "功能说明:"
|
||||
echo " 1. 每隔1小时自动执行数据同步(工作时间8:00-24:00)"
|
||||
echo " 2. 数据同步完成后自动验证数据完整性"
|
||||
echo " 3. 验证失败时自动发送短信告警(错误代码2222)"
|
||||
echo " 4. 非工作时间自动休眠"
|
||||
echo ""
|
||||
347
export_to_csv.py
347
export_to_csv.py
@@ -12,6 +12,7 @@ import sys
|
||||
import os
|
||||
import json
|
||||
import csv
|
||||
import shutil
|
||||
from datetime import datetime
|
||||
from typing import Dict, List, Optional
|
||||
from decimal import Decimal
|
||||
@@ -67,6 +68,10 @@ class DataExporter:
|
||||
self.output_ai_statistics_day = os.path.join(self.script_dir, "ai_statistics_day.csv")
|
||||
self.output_ai_statistics_days = os.path.join(self.script_dir, "ai_statistics_days.csv")
|
||||
|
||||
# 备份文件夹路径
|
||||
self.backup_dir = os.path.join(self.script_dir, "csv_backups")
|
||||
self._ensure_backup_dir()
|
||||
|
||||
# 数据库模式
|
||||
self.use_database = use_database
|
||||
self.db_manager = None
|
||||
@@ -90,6 +95,51 @@ class DataExporter:
|
||||
# 缓存author_id映射(author_name -> author_id)
|
||||
self.author_id_cache = {}
|
||||
|
||||
def _ensure_backup_dir(self):
|
||||
"""确保备份文件夹存在"""
|
||||
try:
|
||||
if not os.path.exists(self.backup_dir):
|
||||
os.makedirs(self.backup_dir)
|
||||
print(f"[OK] 创建备份文件夹: {self.backup_dir}")
|
||||
except Exception as e:
|
||||
print(f"[!] 创建备份文件夹失败: {e}")
|
||||
|
||||
def _backup_csv_file(self, csv_file_path: str) -> bool:
|
||||
"""备份CSV文件
|
||||
|
||||
Args:
|
||||
csv_file_path: CSV文件的完整路径
|
||||
|
||||
Returns:
|
||||
bool: 备份是否成功
|
||||
"""
|
||||
try:
|
||||
if not os.path.exists(csv_file_path):
|
||||
print(f"[!] 文件不存在,跳过备份: {csv_file_path}")
|
||||
return False
|
||||
|
||||
# 获取文件名
|
||||
file_name = os.path.basename(csv_file_path)
|
||||
|
||||
# 生成时间戳(只保留日期)
|
||||
timestamp = datetime.now().strftime('%Y%m%d')
|
||||
|
||||
# 备份文件名:20251226_ai_statistics.csv
|
||||
backup_file_name = f"{timestamp}_{file_name}"
|
||||
backup_file_path = os.path.join(self.backup_dir, backup_file_name)
|
||||
|
||||
# 复制文件
|
||||
shutil.copy2(csv_file_path, backup_file_path)
|
||||
|
||||
print(f" [备份] {file_name} -> {backup_file_name}")
|
||||
self.logger.info(f"备份CSV文件: {backup_file_path}")
|
||||
|
||||
return True
|
||||
except Exception as e:
|
||||
print(f" [!] 备份失败: {e}")
|
||||
self.logger.error(f"备份CSV文件失败: {e}")
|
||||
return False
|
||||
|
||||
def get_author_id(self, author_name: str) -> int:
|
||||
"""获取作者ID
|
||||
|
||||
@@ -286,21 +336,25 @@ class DataExporter:
|
||||
print(f" [!] 从数据库计算当周发文量失败: {e}")
|
||||
return 0
|
||||
|
||||
def calculate_weekly_revenue_from_db(self, author_id: int, stat_date: str) -> float:
|
||||
"""从ai_statistics_days表汇总计算当周收益(周一至周日)
|
||||
def calculate_weekly_revenue_from_db(self, author_id: int, stat_date: str, today_revenue: float = 0.0) -> float:
|
||||
"""从ai_statistics_days表汇总计算当周收益(周一至当前日期)
|
||||
|
||||
基于day_revenue字段进行汇总计算
|
||||
计算逻辑:
|
||||
1. 从数据库查询本周一到stat_date前一天的day_revenue总和
|
||||
2. 加上today_revenue(当日收益,从API获取)
|
||||
3. 得到本周累计收益
|
||||
|
||||
Args:
|
||||
author_id: 作者ID
|
||||
stat_date: 统计日期 (YYYY-MM-DD)
|
||||
today_revenue: 当日收益(从API获取),默认0.0
|
||||
|
||||
Returns:
|
||||
当周收益总额
|
||||
"""
|
||||
if not self.db_manager or author_id == 0:
|
||||
print(f" [数据库] 未连接或author_id无效,无法计算当周收益")
|
||||
return 0.0
|
||||
return today_revenue # 如果数据库不可用,返回当日收益
|
||||
|
||||
try:
|
||||
from datetime import datetime, timedelta
|
||||
@@ -311,14 +365,21 @@ class DataExporter:
|
||||
# 计算本周一的日期(weekday: 0=周一, 6=周日)
|
||||
weekday = target_date.weekday()
|
||||
monday = target_date - timedelta(days=weekday)
|
||||
sunday = monday + timedelta(days=6)
|
||||
|
||||
# 昨天的日期(stat_date的前一天)
|
||||
yesterday = target_date - timedelta(days=1)
|
||||
|
||||
monday_str = monday.strftime('%Y-%m-%d')
|
||||
sunday_str = sunday.strftime('%Y-%m-%d')
|
||||
yesterday_str = yesterday.strftime('%Y-%m-%d')
|
||||
|
||||
print(f" [调试] 目标日期: {stat_date}, 周一: {monday_str}, 周日: {sunday_str}")
|
||||
print(f" [调试] 目标日期: {stat_date}, 本周一: {monday_str}, 昨天: {yesterday_str}")
|
||||
|
||||
# 查询数据库中本周的day_revenue总和
|
||||
# 如果stat_date就是周一,则没有历史数据,直接返回今日收益
|
||||
if target_date == monday:
|
||||
print(f" [数据库] 目标日期是周一,当周收益 = 今日收益: ¥{today_revenue:.2f}")
|
||||
return today_revenue
|
||||
|
||||
# 查询数据库中本周一到昨天的day_revenue总和
|
||||
sql = """
|
||||
SELECT SUM(day_revenue) as weekly_total, COUNT(*) as day_count
|
||||
FROM ai_statistics_days
|
||||
@@ -330,25 +391,33 @@ class DataExporter:
|
||||
|
||||
result = self.db_manager.execute_query(
|
||||
sql,
|
||||
(author_id, monday_str, sunday_str),
|
||||
(author_id, monday_str, yesterday_str),
|
||||
fetch_one=True,
|
||||
dict_cursor=True
|
||||
)
|
||||
|
||||
print(f" [调试] 查询结果: {result}")
|
||||
print(f" [调试] 数据库查询结果: {result}")
|
||||
|
||||
# 计算当周收益 = 本周历史收益 + 今日收益
|
||||
historical_revenue = 0.0
|
||||
day_count = 0
|
||||
|
||||
if result and result.get('weekly_total') is not None:
|
||||
weekly_total = float(result['weekly_total'] or 0)
|
||||
historical_revenue = float(result['weekly_total'] or 0)
|
||||
day_count = int(result.get('day_count', 0) or 0)
|
||||
print(f" [数据库] 当周收益 ({monday_str} 至 {sunday_str}): ¥{weekly_total:.2f} (基于{day_count}天的数据)")
|
||||
|
||||
weekly_total = historical_revenue + today_revenue
|
||||
|
||||
print(f" [数据库] 当周收益计算:")
|
||||
print(f" 本周一至昨天 ({monday_str} ~ {yesterday_str}): ¥{historical_revenue:.2f} (基于{day_count}天)")
|
||||
print(f" 今日收益 ({stat_date}): ¥{today_revenue:.2f}")
|
||||
print(f" 当周总收益: ¥{weekly_total:.2f}")
|
||||
|
||||
return weekly_total
|
||||
else:
|
||||
print(f" [数据库] 未找到当周数据 ({monday_str} 至 {sunday_str}),返回0")
|
||||
return 0.0
|
||||
|
||||
except Exception as e:
|
||||
print(f" [!] 从数据库计算当周收益失败: {e}")
|
||||
return 0.0
|
||||
return today_revenue # 出错时返回当日收益
|
||||
|
||||
def calculate_last_week_revenue_from_db(self, author_id: int, stat_date: str) -> float:
|
||||
"""从ai_statistics_days表汇总计算上周收益(上周一至上周日)
|
||||
@@ -407,6 +476,77 @@ class DataExporter:
|
||||
print(f" [!] 从数据库计算上周收益失败: {e}")
|
||||
return 0.0
|
||||
|
||||
def calculate_monthly_revenue_from_db(self, author_id: int, stat_date: str, today_revenue: float = 0.0) -> float:
|
||||
"""从ai_statistics_days表汇总计算当月收益(当月1日至当前日期)
|
||||
|
||||
计算逻辑:
|
||||
1. 从数据库查询当月1日到stat_date前一天的day_revenue总和
|
||||
2. 加上today_revenue(当日收益,从API获取)
|
||||
3. 得到当月累计收益
|
||||
|
||||
Args:
|
||||
author_id: 作者ID
|
||||
stat_date: 统计日期 (YYYY-MM-DD)
|
||||
today_revenue: 当日收益(从API获取),默认0.0
|
||||
|
||||
Returns:
|
||||
当月收益总额
|
||||
"""
|
||||
if not self.db_manager or author_id == 0:
|
||||
print(f" [数据库] 未连接或author_id无效,无法计算当月收益")
|
||||
return today_revenue # 如果数据库不可用,返回当日收益
|
||||
|
||||
try:
|
||||
from datetime import datetime, timedelta
|
||||
|
||||
# 解析日期
|
||||
target_date = datetime.strptime(stat_date, '%Y-%m-%d')
|
||||
|
||||
# 当月第一天
|
||||
month_first = target_date.replace(day=1)
|
||||
# stat_date的前一天(因为当日数据可能还未写入数据库)
|
||||
yesterday = target_date - timedelta(days=1)
|
||||
|
||||
month_first_str = month_first.strftime('%Y-%m-%d')
|
||||
yesterday_str = yesterday.strftime('%Y-%m-%d')
|
||||
|
||||
# 如果stat_date就是当月第一天,直接返回当日收益
|
||||
if target_date.day == 1:
|
||||
print(f" [数据库] 当月第一天,当月收益 = 当日收益: ¥{today_revenue:.2f}")
|
||||
return today_revenue
|
||||
|
||||
# 查询当月1日到stat_date前一天的收益总和
|
||||
sql = """
|
||||
SELECT SUM(day_revenue) as monthly_total
|
||||
FROM ai_statistics_days
|
||||
WHERE author_id = %s
|
||||
AND stat_date >= %s
|
||||
AND stat_date <= %s
|
||||
AND channel = 1
|
||||
"""
|
||||
|
||||
result = self.db_manager.execute_query(
|
||||
sql,
|
||||
(author_id, month_first_str, yesterday_str),
|
||||
fetch_one=True,
|
||||
dict_cursor=True
|
||||
)
|
||||
|
||||
if result and result.get('monthly_total') is not None:
|
||||
db_total = float(result['monthly_total'] or 0)
|
||||
# 加上当日收益
|
||||
monthly_total = db_total + today_revenue
|
||||
print(f" [数据库] 当月收益 ({month_first_str} 至 {stat_date}): 数据库¥{db_total:.2f} + 当日¥{today_revenue:.2f} = ¥{monthly_total:.2f}")
|
||||
return monthly_total
|
||||
else:
|
||||
# 没有历史数据,返回当日收益
|
||||
print(f" [数据库] 未找到当月历史数据 ({month_first_str} 至 {yesterday_str}),当月收益 = 当日收益: ¥{today_revenue:.2f}")
|
||||
return today_revenue
|
||||
|
||||
except Exception as e:
|
||||
print(f" [!] 从数据库计算当月收益失败: {e}")
|
||||
return today_revenue
|
||||
|
||||
def calculate_last_month_revenue_from_db(self, author_id: int, stat_date: str) -> float:
|
||||
"""从ai_statistics_days表汇总计算上月收益
|
||||
|
||||
@@ -510,14 +650,20 @@ class DataExporter:
|
||||
metrics['submission_count'] = int(total_info.get('publish_count', 0) or 0)
|
||||
metrics['read_count'] = int(total_info.get('view_count', 0) or 0)
|
||||
metrics['comment_count'] = int(total_info.get('comment_count', 0) or 0)
|
||||
metrics['comment_rate'] = float(total_info.get('comment_rate', 0) or 0)
|
||||
# 所有rate字段API返回的都是百分制(如0.30表示0.30%),需要除以100转换为小数
|
||||
comment_rate_raw = float(total_info.get('comment_rate', 0) or 0)
|
||||
metrics['comment_rate'] = comment_rate_raw / 100 if comment_rate_raw > 0 else 0.0
|
||||
metrics['like_count'] = int(total_info.get('likes_count', 0) or 0)
|
||||
metrics['like_rate'] = float(total_info.get('likes_rate', 0) or 0)
|
||||
like_rate_raw = float(total_info.get('likes_rate', 0) or 0)
|
||||
metrics['like_rate'] = like_rate_raw / 100 if like_rate_raw > 0 else 0.0
|
||||
metrics['favorite_count'] = int(total_info.get('collect_count', 0) or 0)
|
||||
metrics['favorite_rate'] = float(total_info.get('collect_rate', 0) or 0)
|
||||
favorite_rate_raw = float(total_info.get('collect_rate', 0) or 0)
|
||||
metrics['favorite_rate'] = favorite_rate_raw / 100 if favorite_rate_raw > 0 else 0.0
|
||||
metrics['share_count'] = int(total_info.get('share_count', 0) or 0)
|
||||
metrics['share_rate'] = float(total_info.get('share_rate', 0) or 0)
|
||||
metrics['slide_ratio'] = float(total_info.get('pic_slide_rate', 0) or 0)
|
||||
share_rate_raw = float(total_info.get('share_rate', 0) or 0)
|
||||
metrics['share_rate'] = share_rate_raw / 100 if share_rate_raw > 0 else 0.0
|
||||
slide_ratio_raw = float(total_info.get('pic_slide_rate', 0) or 0)
|
||||
metrics['slide_ratio'] = slide_ratio_raw / 100 if slide_ratio_raw > 0 else 0.0
|
||||
metrics['baidu_search_volume'] = int(total_info.get('disp_pv', 0) or 0) # 修正:使用disp_pv
|
||||
except Exception as e:
|
||||
print(f" [!] 提取汇总指标失败: {e}")
|
||||
@@ -529,7 +675,7 @@ class DataExporter:
|
||||
|
||||
注意:
|
||||
- weekly_revenue: 不再从API获取,在export_ai_statistics_days中从数据库计算
|
||||
- monthly_revenue: 使用currentMonth(当前自然月收益)
|
||||
- monthly_revenue: 不再从API获取,在export_ai_statistics_days中从数据库计算
|
||||
- day_revenue: 从yesterday提取昨日收益(当日收益)
|
||||
- revenue_wow_growth_rate: 周环比,从数据库计算(本周 vs 上周)
|
||||
- revenue_mom_growth_rate: 月环比,从数据库计算(当月 vs 上月)
|
||||
@@ -564,10 +710,8 @@ class DataExporter:
|
||||
# 这里保持为0,由export_ai_statistics_days方法计算
|
||||
print(f" 环比增长率: 将从数据库计算")
|
||||
|
||||
# 当前自然月收入(currentMonth)
|
||||
current_month = income_data.get('currentMonth', {})
|
||||
if current_month:
|
||||
metrics['monthly_revenue'] = float(current_month.get('income', 0) or 0)
|
||||
# monthly_revenue 不再从API获取,在导出时从数据库的day_revenue汇总计算
|
||||
print(f" 当月收益: 将从数据库计算")
|
||||
|
||||
except Exception as e:
|
||||
print(f" [!] 提取收入指标失败: {e}")
|
||||
@@ -650,6 +794,10 @@ class DataExporter:
|
||||
print(f"[OK] ai_statistics 表数据已导出到: {self.output_ai_statistics}")
|
||||
print(f" 共 {len(csv_rows)} 条记录")
|
||||
print(f"{'='*70}")
|
||||
|
||||
# 备份CSV文件
|
||||
self._backup_csv_file(self.output_ai_statistics)
|
||||
|
||||
return True
|
||||
else:
|
||||
print("\n[!] 没有数据可导出")
|
||||
@@ -729,11 +877,12 @@ class DataExporter:
|
||||
'total_like_count': int(latest_day_data.get('likes_count', 0) or 0),
|
||||
'total_favorite_count': int(latest_day_data.get('collect_count', 0) or 0),
|
||||
'total_share_count': int(latest_day_data.get('share_count', 0) or 0),
|
||||
'avg_comment_rate': f"{float(latest_day_data.get('comment_rate', 0) or 0):.4f}",
|
||||
'avg_like_rate': f"{float(latest_day_data.get('likes_rate', 0) or 0):.4f}",
|
||||
'avg_favorite_rate': f"{float(latest_day_data.get('collect_rate', 0) or 0):.4f}",
|
||||
'avg_share_rate': f"{float(latest_day_data.get('share_rate', 0) or 0):.4f}",
|
||||
'avg_slide_ratio': f"{float(latest_day_data.get('pic_slide_rate', 0) or 0):.4f}",
|
||||
# 所有rate字段API返回的都是百分制,需要除以100转换为小数
|
||||
'avg_comment_rate': f"{(float(latest_day_data.get('comment_rate', 0) or 0) / 100):.4f}",
|
||||
'avg_like_rate': f"{(float(latest_day_data.get('likes_rate', 0) or 0) / 100):.4f}",
|
||||
'avg_favorite_rate': f"{(float(latest_day_data.get('collect_rate', 0) or 0) / 100):.4f}",
|
||||
'avg_share_rate': f"{(float(latest_day_data.get('share_rate', 0) or 0) / 100):.4f}",
|
||||
'avg_slide_ratio': f"{(float(latest_day_data.get('pic_slide_rate', 0) or 0) / 100):.4f}",
|
||||
'total_baidu_search_volume': int(latest_day_data.get('disp_pv', 0) or 0),
|
||||
}
|
||||
|
||||
@@ -763,6 +912,10 @@ class DataExporter:
|
||||
print(f"[OK] ai_statistics_day 表数据已导出到: {self.output_ai_statistics_day}")
|
||||
print(f" 共 {len(csv_rows)} 条记录")
|
||||
print(f"{'='*70}")
|
||||
|
||||
# 备份CSV文件
|
||||
self._backup_csv_file(self.output_ai_statistics_day)
|
||||
|
||||
return True
|
||||
else:
|
||||
print("\n[!] 没有数据可导出")
|
||||
@@ -779,7 +932,7 @@ class DataExporter:
|
||||
注意:
|
||||
- daily_published_count: 优先从ai_articles表查询,否则使用API数据
|
||||
- cumulative_published_count: 优先从ai_articles表查询(从起始日到stat_date的累计发文量)
|
||||
- monthly_revenue: stat_date所在自然月的总收益(使用近30天收益作为近似值)
|
||||
- monthly_revenue: 从ai_statistics_days表汇总计算(当月1日至stat_date的day_revenue总和)
|
||||
- weekly_revenue: 优先从ai_statistics_days表汇总计算,否则使用API数据
|
||||
|
||||
Args:
|
||||
@@ -851,38 +1004,49 @@ class DataExporter:
|
||||
daily_published = int(latest_day_data.get('publish_count', 0) or 0)
|
||||
print(f" [使用API] 文章数据: 单日={daily_published}, 累计={cumulative_count}")
|
||||
|
||||
# 计算当周收益:数据库中本周已有的收益 + 当日新抓取的收益
|
||||
# 计算当周收益:从数据库汇总本周一至周日的day_revenue总和
|
||||
if use_db_weekly_revenue and author_id > 0:
|
||||
# 从数据库查询本周已有的收益(不包括今天,因为今天的数据还没导入)
|
||||
weekly_revenue_db = self.calculate_weekly_revenue_from_db(author_id, formatted_date)
|
||||
# 当周收益 = 数据库中的历史收益 + 当日新抓取的收益
|
||||
day_revenue = income_metrics['day_revenue']
|
||||
weekly_revenue_total = weekly_revenue_db + day_revenue
|
||||
# 从数据库查询本周的收益总和(传入当日收益)
|
||||
weekly_revenue_total = self.calculate_weekly_revenue_from_db(
|
||||
author_id,
|
||||
formatted_date,
|
||||
today_revenue=income_metrics['day_revenue'] # 传入当日收益
|
||||
)
|
||||
income_metrics['weekly_revenue'] = weekly_revenue_total
|
||||
print(f" [数据库] 本周已有收益: ¥{weekly_revenue_db:.2f}")
|
||||
print(f" [API] 当日新增收益: ¥{day_revenue:.2f}")
|
||||
print(f" [计算] 当周总收益: ¥{weekly_revenue_total:.2f}")
|
||||
print(f" [数据库] 当周收益: ¥{weekly_revenue_total:.2f}")
|
||||
|
||||
# 计算当月收益:从数据库汇总当月1日至stat_date的day_revenue总和
|
||||
monthly_revenue_total = self.calculate_monthly_revenue_from_db(
|
||||
author_id,
|
||||
formatted_date,
|
||||
today_revenue=income_metrics['day_revenue'] # 传入当日收益
|
||||
)
|
||||
income_metrics['monthly_revenue'] = monthly_revenue_total
|
||||
|
||||
# 计算周环比:本周 vs 上周
|
||||
# 公式:周环比 = (本周收益 - 上周收益) / 上周收益
|
||||
last_week_revenue = self.calculate_last_week_revenue_from_db(author_id, formatted_date)
|
||||
if last_week_revenue > 0:
|
||||
income_metrics['revenue_wow_growth_rate'] = (weekly_revenue_total - last_week_revenue) / last_week_revenue
|
||||
print(f" [计算] 周环比: {income_metrics['revenue_wow_growth_rate']:.2%} (本周¥{weekly_revenue_total:.2f} vs 上周¥{last_week_revenue:.2f})")
|
||||
else:
|
||||
print(f" [计算] 周环比: 无法计算(上周没有数据)")
|
||||
# 分母为0时设为1,避免除零错误
|
||||
denominator = last_week_revenue if last_week_revenue > 0 else 1
|
||||
wow_rate = (weekly_revenue_total - last_week_revenue) / denominator
|
||||
income_metrics['revenue_wow_growth_rate'] = wow_rate
|
||||
print(f" [计算] 周环比: {wow_rate:.4f} (本周¥{weekly_revenue_total:.2f} vs 上周¥{last_week_revenue:.2f})")
|
||||
|
||||
# 计算月环比:当月 vs 上月
|
||||
# 公式:月环比 = (当月收益 - 上月收益) / 上月收益
|
||||
last_month_revenue = self.calculate_last_month_revenue_from_db(author_id, formatted_date)
|
||||
monthly_revenue = income_metrics['monthly_revenue']
|
||||
if last_month_revenue > 0:
|
||||
income_metrics['revenue_mom_growth_rate'] = (monthly_revenue - last_month_revenue) / last_month_revenue
|
||||
print(f" [计算] 月环比: {income_metrics['revenue_mom_growth_rate']:.2%} (当月¥{monthly_revenue:.2f} vs 上月¥{last_month_revenue:.2f})")
|
||||
else:
|
||||
print(f" [计算] 月环比: 无法计算(上月没有数据)")
|
||||
# 分母为0时设为1,避免除零错误
|
||||
denominator = last_month_revenue if last_month_revenue > 0 else 1
|
||||
mom_rate = (monthly_revenue - last_month_revenue) / denominator
|
||||
income_metrics['revenue_mom_growth_rate'] = mom_rate
|
||||
print(f" [计算] 月环比: {mom_rate:.4f} (当月¥{monthly_revenue:.2f} vs 上月¥{last_month_revenue:.2f})")
|
||||
else:
|
||||
# 如果不使用数据库,weekly_revenue = 当日收益
|
||||
income_metrics['weekly_revenue'] = income_metrics['day_revenue']
|
||||
income_metrics['monthly_revenue'] = income_metrics['day_revenue']
|
||||
print(f" [跳过数据库] 当周收益 = 当日收益: ¥{income_metrics['day_revenue']:.2f}")
|
||||
print(f" [跳过数据库] 当月收益 = 当日收益: ¥{income_metrics['day_revenue']:.2f}")
|
||||
|
||||
row = {
|
||||
'author_id': author_id,
|
||||
@@ -940,6 +1104,10 @@ class DataExporter:
|
||||
print(f"[OK] ai_statistics_days 表数据已导出到: {self.output_ai_statistics_days}")
|
||||
print(f" 共 {len(csv_rows)} 条记录")
|
||||
print(f"{'='*70}")
|
||||
|
||||
# 备份CSV文件
|
||||
self._backup_csv_file(self.output_ai_statistics_days)
|
||||
|
||||
return True
|
||||
else:
|
||||
print("\n[!] 没有数据可导出")
|
||||
@@ -1439,9 +1607,6 @@ class DataExporter:
|
||||
|
||||
# 滑图占比需要限制在decimal(5,4)范围内(0-9.9999)
|
||||
slide_ratio_value = float(metrics['slide_ratio'])
|
||||
# 如果值大于10,说明是百分比格式,需要除以100
|
||||
if slide_ratio_value > 10:
|
||||
slide_ratio_value = slide_ratio_value / 100
|
||||
# 确保不超过9.9999
|
||||
slide_ratio_value = min(slide_ratio_value, 9.9999)
|
||||
|
||||
@@ -1547,14 +1712,28 @@ class DataExporter:
|
||||
else:
|
||||
print(f" [使用API] 投稿量: {total_submission_count}")
|
||||
|
||||
# 滑图占比需要限制在decimal(5,4)范围内(0-9.9999)
|
||||
slide_ratio_value = float(latest_day_data.get('pic_slide_rate', 0) or 0)
|
||||
# 如果值大于10,说明是百分比格式,需要除以100
|
||||
if slide_ratio_value > 10:
|
||||
slide_ratio_value = slide_ratio_value / 100
|
||||
# 确保不超过9.9999
|
||||
# 所有rate字段需要限制在decimal(5,4)范围内(0-9.9999)
|
||||
# API返回的都是百分制,需要除以100转换为小数
|
||||
slide_ratio_raw = float(latest_day_data.get('pic_slide_rate', 0) or 0)
|
||||
slide_ratio_value = (slide_ratio_raw / 100 if slide_ratio_raw > 0 else 0.0)
|
||||
slide_ratio_value = min(slide_ratio_value, 9.9999)
|
||||
|
||||
comment_rate_raw = float(latest_day_data.get('comment_rate', 0) or 0)
|
||||
comment_rate_value = (comment_rate_raw / 100 if comment_rate_raw > 0 else 0.0)
|
||||
comment_rate_value = min(comment_rate_value, 9.9999)
|
||||
|
||||
like_rate_raw = float(latest_day_data.get('likes_rate', 0) or 0)
|
||||
like_rate_value = (like_rate_raw / 100 if like_rate_raw > 0 else 0.0)
|
||||
like_rate_value = min(like_rate_value, 9.9999)
|
||||
|
||||
favorite_rate_raw = float(latest_day_data.get('collect_rate', 0) or 0)
|
||||
favorite_rate_value = (favorite_rate_raw / 100 if favorite_rate_raw > 0 else 0.0)
|
||||
favorite_rate_value = min(favorite_rate_value, 9.9999)
|
||||
|
||||
share_rate_raw = float(latest_day_data.get('share_rate', 0) or 0)
|
||||
share_rate_value = (share_rate_raw / 100 if share_rate_raw > 0 else 0.0)
|
||||
share_rate_value = min(share_rate_value, 9.9999)
|
||||
|
||||
record = {
|
||||
'author_id': author_id,
|
||||
'author_name': account_id,
|
||||
@@ -1566,10 +1745,10 @@ class DataExporter:
|
||||
'total_like_count': int(latest_day_data.get('likes_count', 0) or 0),
|
||||
'total_favorite_count': int(latest_day_data.get('collect_count', 0) or 0),
|
||||
'total_share_count': int(latest_day_data.get('share_count', 0) or 0),
|
||||
'avg_comment_rate': float(latest_day_data.get('comment_rate', 0) or 0),
|
||||
'avg_like_rate': float(latest_day_data.get('likes_rate', 0) or 0),
|
||||
'avg_favorite_rate': float(latest_day_data.get('collect_rate', 0) or 0),
|
||||
'avg_share_rate': float(latest_day_data.get('share_rate', 0) or 0),
|
||||
'avg_comment_rate': comment_rate_value,
|
||||
'avg_like_rate': like_rate_value,
|
||||
'avg_favorite_rate': favorite_rate_value,
|
||||
'avg_share_rate': share_rate_value,
|
||||
'avg_slide_ratio': slide_ratio_value,
|
||||
'total_baidu_search_volume': int(latest_day_data.get('disp_pv', 0) or 0),
|
||||
}
|
||||
@@ -1698,18 +1877,38 @@ class DataExporter:
|
||||
|
||||
|
||||
def main():
|
||||
import argparse
|
||||
|
||||
# 解析命令行参数
|
||||
parser = argparse.ArgumentParser(
|
||||
description='百家号数据导出工具 - 从 bjh_integrated_data.json 导出',
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--mode',
|
||||
type=str,
|
||||
choices=['csv', 'database'],
|
||||
default='csv',
|
||||
help='导出模式:csv=导出CSV文件, database=直接插入数据库 (默认: csv)'
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--no-confirm',
|
||||
action='store_true',
|
||||
help='跳过确认提示,直接执行(用于批量脚本)'
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
print("\n" + "="*70)
|
||||
print("百家号数据导出工具 - 从 bjh_integrated_data.json 导出")
|
||||
print("="*70)
|
||||
|
||||
# 选择导出模式
|
||||
print("\n请选择导出模式:")
|
||||
print(" 1. 导出CSV文件")
|
||||
print(" 2. 直接插入数据库")
|
||||
use_database = (args.mode == 'database')
|
||||
|
||||
mode = input("\n输入选项 (1/2, 默认1): ").strip() or '1'
|
||||
|
||||
if mode == '2':
|
||||
if use_database:
|
||||
# 数据库模式
|
||||
exporter = DataExporter(use_database=True)
|
||||
|
||||
@@ -1728,14 +1927,16 @@ def main():
|
||||
print(" 3. ai_statistics_days.csv - 核心指标统计表(含发文量、收益、环比)")
|
||||
print("="*70)
|
||||
|
||||
# 确认执行(除非使用--no-confirm参数)
|
||||
if not args.no_confirm:
|
||||
confirm = input("\n是否继续? (y/n): ").strip().lower()
|
||||
|
||||
if confirm == 'y':
|
||||
exporter.export_all_tables()
|
||||
else:
|
||||
if confirm != 'y':
|
||||
print("\n已取消")
|
||||
return
|
||||
|
||||
exporter.export_all_tables()
|
||||
|
||||
print("\n" + "="*70)
|
||||
print("完成")
|
||||
print("="*70 + "\n")
|
||||
|
||||
680
fetch_date_statistics.py
Normal file
680
fetch_date_statistics.py
Normal file
@@ -0,0 +1,680 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
指定日期统计数据获取脚本
|
||||
功能:获取指定日期的百家号统计数据并填充到数据库三个统计表
|
||||
"""
|
||||
|
||||
import os
|
||||
import sys
|
||||
import json
|
||||
import argparse
|
||||
import requests
|
||||
import time
|
||||
from datetime import datetime, timedelta
|
||||
from typing import List, Dict, Optional
|
||||
from decimal import Decimal
|
||||
|
||||
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
|
||||
|
||||
from database_config import DatabaseManager
|
||||
from export_to_csv import DataExporter
|
||||
|
||||
# 天启代理配置
|
||||
PROXY_API_URL = 'http://api.tianqiip.com/getip?secret=tmcrmh3q&num=1&type=txt&port=1&mr=1&sign=5451e454a54b9f1f06222606c418e12f'
|
||||
|
||||
|
||||
class DateStatisticsFetcher:
|
||||
"""指定日期统计数据获取器"""
|
||||
|
||||
def __init__(self, target_date: str, use_proxy: bool = True):
|
||||
"""初始化
|
||||
|
||||
Args:
|
||||
target_date: 目标日期 (YYYY-MM-DD)
|
||||
use_proxy: 是否使用代理(默认True)
|
||||
"""
|
||||
self.target_date = datetime.strptime(target_date, '%Y-%m-%d')
|
||||
self.target_date_str = target_date
|
||||
self.db_manager = DatabaseManager()
|
||||
self.script_dir = os.path.dirname(os.path.abspath(__file__))
|
||||
self.use_proxy = use_proxy
|
||||
self.current_proxy = None
|
||||
|
||||
# 创建临时数据目录
|
||||
self.temp_dir = os.path.join(self.script_dir, 'temp_data')
|
||||
os.makedirs(self.temp_dir, exist_ok=True)
|
||||
|
||||
# 创建请求会话
|
||||
self.session = requests.Session()
|
||||
self.session.verify = False
|
||||
|
||||
# 禁用SSL警告
|
||||
import urllib3
|
||||
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
|
||||
|
||||
print(f"[初始化] 目标日期: {target_date}")
|
||||
print(f"[初始化] 代理模式: {'启用' if use_proxy else '禁用'}")
|
||||
print(f"[初始化] 临时数据目录: {self.temp_dir}")
|
||||
|
||||
def get_all_authors(self) -> List[Dict]:
|
||||
"""获取所有活跃账号
|
||||
|
||||
Returns:
|
||||
账号列表
|
||||
"""
|
||||
try:
|
||||
sql = """
|
||||
SELECT id as author_id, author_name, toutiao_cookie
|
||||
FROM ai_authors
|
||||
WHERE channel = 1
|
||||
AND status = 'active'
|
||||
AND toutiao_cookie IS NOT NULL
|
||||
AND toutiao_cookie != ''
|
||||
ORDER BY id
|
||||
"""
|
||||
|
||||
accounts = self.db_manager.execute_query(sql, fetch_one=False, dict_cursor=True)
|
||||
|
||||
if accounts:
|
||||
print(f"[数据库] 找到 {len(accounts)} 个活跃账号")
|
||||
return accounts
|
||||
else:
|
||||
print("[!] 未找到任何活跃账号")
|
||||
return []
|
||||
|
||||
except Exception as e:
|
||||
print(f"[X] 查询账号失败: {e}")
|
||||
return []
|
||||
|
||||
def get_daily_article_count(self, author_id: int, date_str: str) -> int:
|
||||
"""从ai_articles表获取指定日期的发文量
|
||||
|
||||
Args:
|
||||
author_id: 作者ID
|
||||
date_str: 日期字符串 (YYYY-MM-DD)
|
||||
|
||||
Returns:
|
||||
发文量
|
||||
"""
|
||||
try:
|
||||
sql = """
|
||||
SELECT COUNT(*) as count
|
||||
FROM ai_articles
|
||||
WHERE author_id = %s
|
||||
AND DATE(publish_time) = %s
|
||||
AND status = 'published'
|
||||
AND channel = 1
|
||||
"""
|
||||
|
||||
result = self.db_manager.execute_query(
|
||||
sql,
|
||||
(author_id, date_str),
|
||||
fetch_one=True,
|
||||
dict_cursor=True
|
||||
)
|
||||
|
||||
return result['count'] if result else 0
|
||||
except Exception as e:
|
||||
print(f" [!] 查询发文量失败: {e}")
|
||||
return 0
|
||||
|
||||
def fetch_daily_income(self, cookie_string: str, date_timestamp: int, max_retries: int = 3) -> Optional[Dict]:
|
||||
"""获取指定日期的收入数据(带重试机制)
|
||||
|
||||
Args:
|
||||
cookie_string: Cookie字符串
|
||||
date_timestamp: 日期Unix时间戳(秒)
|
||||
max_retries: 最大重试次数
|
||||
|
||||
Returns:
|
||||
收入数据字典,失败返回None
|
||||
"""
|
||||
api_url = "https://baijiahao.baidu.com/author/eco/income4/overviewhomelist"
|
||||
|
||||
# 设置Cookie
|
||||
self.session.cookies.clear()
|
||||
for item in cookie_string.split(';'):
|
||||
item = item.strip()
|
||||
if '=' in item:
|
||||
key, value = item.split('=', 1)
|
||||
self.session.cookies.set(key.strip(), value.strip())
|
||||
|
||||
# 从Cookie中提取token
|
||||
token_cookie = self.session.cookies.get('bjhStoken') or self.session.cookies.get('devStoken')
|
||||
|
||||
# 请求参数
|
||||
params = {
|
||||
'start_date': date_timestamp,
|
||||
'end_date': date_timestamp
|
||||
}
|
||||
|
||||
# 请求头
|
||||
headers = {
|
||||
'Accept': 'application/json, text/plain, */*',
|
||||
'Accept-Language': 'zh-CN,zh;q=0.9',
|
||||
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
|
||||
'Referer': 'https://baijiahao.baidu.com/builder/rc/incomecenter',
|
||||
'Sec-Fetch-Dest': 'empty',
|
||||
'Sec-Fetch-Mode': 'cors',
|
||||
'Sec-Fetch-Site': 'same-origin',
|
||||
}
|
||||
|
||||
if token_cookie:
|
||||
headers['token'] = token_cookie
|
||||
|
||||
retry_count = 0
|
||||
while retry_count <= max_retries:
|
||||
try:
|
||||
# 如果是重试,先等待
|
||||
if retry_count > 0:
|
||||
wait_time = retry_count * 3 # 3秒、6秒、9秒
|
||||
print(f" [重试 {retry_count}/{max_retries}] 等待 {wait_time} 秒...")
|
||||
time.sleep(wait_time)
|
||||
|
||||
# 获取代理
|
||||
proxies = self.fetch_proxy() if self.use_proxy else None
|
||||
|
||||
response = self.session.get(
|
||||
api_url,
|
||||
headers=headers,
|
||||
params=params,
|
||||
proxies=proxies,
|
||||
timeout=15
|
||||
)
|
||||
|
||||
if response.status_code == 200:
|
||||
data = response.json()
|
||||
if data.get('errno') == 0:
|
||||
return data
|
||||
else:
|
||||
error_msg = data.get('errmsg', '')
|
||||
errno = data.get('errno')
|
||||
print(f" [!] API返回错误: errno={errno}, errmsg={error_msg}")
|
||||
|
||||
# 异常请求错误,尝试重试
|
||||
if errno == 10000015 and retry_count < max_retries:
|
||||
retry_count += 1
|
||||
continue
|
||||
return None
|
||||
else:
|
||||
print(f" [!] HTTP错误: {response.status_code}")
|
||||
return None
|
||||
|
||||
except Exception as e:
|
||||
error_type = type(e).__name__
|
||||
print(f" [!] 请求异常: {error_type} - {e}")
|
||||
|
||||
# 判断是否需要重试
|
||||
is_retry_error = any([
|
||||
'Connection' in error_type,
|
||||
'Timeout' in error_type,
|
||||
'ProxyError' in error_type,
|
||||
])
|
||||
|
||||
if is_retry_error and retry_count < max_retries:
|
||||
retry_count += 1
|
||||
continue
|
||||
return None
|
||||
|
||||
return None
|
||||
|
||||
def fetch_analytics_api(self, cookie_string: str, target_date: str, max_retries: int = 3) -> Optional[Dict]:
|
||||
"""调用百家号发文统计API获取阅读量、评论量等数据
|
||||
|
||||
Args:
|
||||
cookie_string: Cookie字符串
|
||||
target_date: 目标日期 (YYYY-MM-DD)
|
||||
max_retries: 最大重试次数
|
||||
|
||||
Returns:
|
||||
API返回数据,失败返回None
|
||||
"""
|
||||
# 设置Cookie
|
||||
self.session.cookies.clear()
|
||||
for item in cookie_string.split(';'):
|
||||
item = item.strip()
|
||||
if '=' in item:
|
||||
key, value = item.split('=', 1)
|
||||
self.session.cookies.set(key.strip(), value.strip(), domain='.baidu.com')
|
||||
|
||||
# 从Cookie中提取token
|
||||
token_cookie = self.session.cookies.get('bjhStoken') or self.session.cookies.get('devStoken')
|
||||
|
||||
# 计算日期范围(仅查询目标日期当天)
|
||||
date_obj = datetime.strptime(target_date, '%Y-%m-%d')
|
||||
start_day = date_obj.strftime('%Y%m%d')
|
||||
end_day = start_day # 开始和结束是同一天
|
||||
|
||||
# API端点(使用appStatisticV3)
|
||||
api_url = "https://baijiahao.baidu.com/author/eco/statistics/appStatisticV3"
|
||||
|
||||
# 请求参数
|
||||
params = {
|
||||
'type': 'event',
|
||||
'start_day': start_day,
|
||||
'end_day': end_day,
|
||||
'stat': '0',
|
||||
'special_filter_days': '1'
|
||||
}
|
||||
|
||||
# 请求头
|
||||
headers = {
|
||||
'Accept': 'application/json, text/plain, */*',
|
||||
'Accept-Language': 'zh-CN,zh;q=0.9',
|
||||
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
|
||||
'Referer': 'https://baijiahao.baidu.com/builder/rc/analysiscontent',
|
||||
'Sec-Fetch-Dest': 'empty',
|
||||
'Sec-Fetch-Mode': 'cors',
|
||||
'Sec-Fetch-Site': 'same-origin',
|
||||
}
|
||||
|
||||
if token_cookie:
|
||||
headers['token'] = token_cookie
|
||||
|
||||
retry_count = 0
|
||||
while retry_count <= max_retries:
|
||||
try:
|
||||
# 如果是重试,先等待
|
||||
if retry_count > 0:
|
||||
wait_time = retry_count * 3
|
||||
print(f" [重试 {retry_count}/{max_retries}] 等待 {wait_time} 秒...")
|
||||
time.sleep(wait_time)
|
||||
|
||||
# 获取代理
|
||||
proxies = self.fetch_proxy() if self.use_proxy else None
|
||||
|
||||
response = self.session.get(
|
||||
api_url,
|
||||
headers=headers,
|
||||
params=params,
|
||||
proxies=proxies,
|
||||
timeout=15
|
||||
)
|
||||
|
||||
if response.status_code == 200:
|
||||
data = response.json()
|
||||
errno = data.get('errno', -1)
|
||||
|
||||
if errno == 0:
|
||||
# 提取total_info和list数据
|
||||
data_content = data.get('data', {})
|
||||
total_info = data_content.get('total_info', {})
|
||||
daily_list = data_content.get('list', [])
|
||||
|
||||
print(f" [发文统计] 阅读量: {total_info.get('view_count', 0)}")
|
||||
print(f" [发文统计] 评论量: {total_info.get('comment_count', 0)}")
|
||||
|
||||
return data
|
||||
else:
|
||||
error_msg = data.get('errmsg', '')
|
||||
print(f" [!] 发文统计API错误: errno={errno}, errmsg={error_msg}")
|
||||
|
||||
if errno == 10000015 and retry_count < max_retries:
|
||||
retry_count += 1
|
||||
continue
|
||||
return None
|
||||
else:
|
||||
print(f" [!] HTTP错误: {response.status_code}")
|
||||
return None
|
||||
|
||||
except Exception as e:
|
||||
error_type = type(e).__name__
|
||||
print(f" [!] 请求异常: {error_type} - {e}")
|
||||
|
||||
is_retry_error = any([
|
||||
'Connection' in error_type,
|
||||
'Timeout' in error_type,
|
||||
'ProxyError' in error_type,
|
||||
])
|
||||
|
||||
if is_retry_error and retry_count < max_retries:
|
||||
retry_count += 1
|
||||
continue
|
||||
return None
|
||||
|
||||
return None
|
||||
|
||||
def get_cumulative_article_count(self, author_id: int, start_date: str, end_date: str) -> int:
|
||||
"""从ai_articles表获取累计发文量
|
||||
|
||||
Args:
|
||||
author_id: 作者ID
|
||||
start_date: 开始日期 (YYYY-MM-DD)
|
||||
end_date: 结束日期 (YYYY-MM-DD)
|
||||
|
||||
Returns:
|
||||
累计发文量
|
||||
"""
|
||||
try:
|
||||
sql = """
|
||||
SELECT COUNT(*) as count
|
||||
FROM ai_articles
|
||||
WHERE author_id = %s
|
||||
AND DATE(publish_time) >= %s
|
||||
AND DATE(publish_time) <= %s
|
||||
AND status = 'published'
|
||||
AND channel = 1
|
||||
"""
|
||||
|
||||
result = self.db_manager.execute_query(
|
||||
sql,
|
||||
(author_id, start_date, end_date),
|
||||
fetch_one=True,
|
||||
dict_cursor=True
|
||||
)
|
||||
|
||||
return result['count'] if result else 0
|
||||
except Exception as e:
|
||||
print(f" [!] 查询累计发文量失败: {e}")
|
||||
return 0
|
||||
|
||||
def fetch_proxy(self) -> Optional[Dict]:
|
||||
"""获取天启代理IP
|
||||
|
||||
Returns:
|
||||
代理配置字典,失败返回None
|
||||
"""
|
||||
if not self.use_proxy:
|
||||
return None
|
||||
|
||||
try:
|
||||
resp = requests.get(PROXY_API_URL, timeout=10)
|
||||
resp.raise_for_status()
|
||||
|
||||
text = resp.text.strip()
|
||||
|
||||
# 检测是否返回错误信息
|
||||
if text.upper().startswith('ERROR'):
|
||||
print(f" [!] 代理API返回错误: {text}")
|
||||
return None
|
||||
|
||||
# 解析IP:PORT格式
|
||||
lines = text.split('\n')
|
||||
for line in lines:
|
||||
line = line.strip()
|
||||
if ':' in line and line.count(':') == 1:
|
||||
ip_port = line.split()[0] if ' ' in line else line
|
||||
host, port = ip_port.split(':', 1)
|
||||
proxy_url = f'http://{host}:{port}'
|
||||
self.current_proxy = {
|
||||
'http': proxy_url,
|
||||
'https': proxy_url,
|
||||
}
|
||||
print(f" [代理] 使用天启IP: {ip_port}")
|
||||
return self.current_proxy
|
||||
|
||||
print(f" [!] 无法解析代理API返回: {text[:100]}")
|
||||
return None
|
||||
|
||||
except Exception as e:
|
||||
print(f" [!] 获取代理失败: {e}")
|
||||
return None
|
||||
|
||||
def build_integrated_data(self, author_id: int, author_name: str, cookie_string: str) -> Dict:
|
||||
"""构建指定日期的整合数据
|
||||
|
||||
Args:
|
||||
author_id: 作者ID
|
||||
author_name: 作者名称
|
||||
cookie_string: Cookie字符串
|
||||
|
||||
Returns:
|
||||
整合数据字典
|
||||
"""
|
||||
print(f"\n [构建] 账号 {author_name} 的整合数据...")
|
||||
|
||||
# 计算当月第一天(用于累计发文量)
|
||||
month_first = self.target_date.replace(day=1).strftime('%Y-%m-%d')
|
||||
|
||||
# 从数据库获取发文量
|
||||
daily_count = self.get_daily_article_count(author_id, self.target_date_str)
|
||||
cumulative_count = self.get_cumulative_article_count(author_id, month_first, self.target_date_str)
|
||||
|
||||
print(f" 单日发文量: {daily_count}")
|
||||
print(f" 累计发文量: {cumulative_count} (从{month_first}至{self.target_date_str})")
|
||||
|
||||
# 获取发文统计数据(阅读量、评论量等)
|
||||
print(f" [API] 获取发文统计数据...")
|
||||
analytics_data = self.fetch_analytics_api(cookie_string, self.target_date_str)
|
||||
|
||||
# 提取total_info和list数据
|
||||
total_info = {}
|
||||
daily_list = []
|
||||
if analytics_data:
|
||||
data_content = analytics_data.get('data', {})
|
||||
total_info = data_content.get('total_info', {})
|
||||
daily_list = data_content.get('list', [])
|
||||
|
||||
# 获取收入数据
|
||||
day_revenue = 0.0
|
||||
date_timestamp = int(self.target_date.replace(hour=0, minute=0, second=0, microsecond=0).timestamp())
|
||||
|
||||
print(f" [API] 获取收入数据...")
|
||||
income_data = self.fetch_daily_income(cookie_string, date_timestamp)
|
||||
|
||||
if income_data and income_data.get('data', {}).get('list'):
|
||||
income_list = income_data['data']['list']
|
||||
if income_list and len(income_list) > 0:
|
||||
total_income = income_list[0].get('total_income', 0)
|
||||
day_revenue = float(total_income)
|
||||
print(f" 当日收益: ¥{day_revenue:.2f}")
|
||||
else:
|
||||
print(f" 当日收益: ¥0.00 (无收入数据)")
|
||||
else:
|
||||
print(f" 当日收益: ¥0.00 (API调用失败)")
|
||||
|
||||
# 构建整合数据(模拟BaijiahaoAnalytics的数据结构)
|
||||
integrated_data = {
|
||||
'account_id': author_name,
|
||||
'author_id': author_id,
|
||||
'fetch_time': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
|
||||
'target_date': self.target_date_str,
|
||||
'status': 'success',
|
||||
'analytics': {
|
||||
'apis': [ # 修改:需要包装在apis数组中
|
||||
{
|
||||
'data': {
|
||||
'errno': 0,
|
||||
'data': {
|
||||
'list': daily_list if daily_list else [
|
||||
{
|
||||
'event_day': self.target_date_str.replace('-', ''), # 格式:20251225
|
||||
'date': self.target_date_str,
|
||||
'publish_count': daily_count,
|
||||
'daily_published_count': daily_count,
|
||||
'cumulative_published_count': cumulative_count,
|
||||
}
|
||||
],
|
||||
'latest_event_day': self.target_date_str.replace('-', ''), # 格式:20251225
|
||||
'total_info': total_info if total_info else {
|
||||
'publish_count': daily_count,
|
||||
'view_count': 0,
|
||||
'comment_count': 0,
|
||||
'comment_rate': 0,
|
||||
'likes_count': 0,
|
||||
'likes_rate': 0,
|
||||
'collect_count': 0,
|
||||
'collect_rate': 0,
|
||||
'share_count': 0,
|
||||
'share_rate': 0,
|
||||
'pic_slide_rate': 0,
|
||||
'disp_pv': 0,
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
'income': {
|
||||
'errno': 0, # 添加:标记API调用成功
|
||||
'data': {
|
||||
'income': {
|
||||
'yesterday': {
|
||||
'income': day_revenue # 修改:使用income字段而不是value
|
||||
},
|
||||
'currentMonth': {
|
||||
'income': 0 # 历史数据无法获取当月收益,设为0
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return integrated_data
|
||||
|
||||
def process_single_date(self) -> bool:
|
||||
"""处理单个日期的所有账号数据
|
||||
|
||||
Returns:
|
||||
是否成功
|
||||
"""
|
||||
print(f"\n{'='*70}")
|
||||
print(f"开始处理 {self.target_date_str} 的数据")
|
||||
print(f"{'='*70}")
|
||||
|
||||
# 获取所有账号
|
||||
accounts = self.get_all_authors()
|
||||
if not accounts:
|
||||
print("[X] 没有可用的账号,退出")
|
||||
return False
|
||||
|
||||
# 构建所有账号的整合数据
|
||||
integrated_data_list = []
|
||||
|
||||
for idx, account in enumerate(accounts, 1):
|
||||
author_id = account.get('author_id')
|
||||
author_name = account.get('author_name', '')
|
||||
cookie_string = account.get('toutiao_cookie', '')
|
||||
|
||||
if not author_id:
|
||||
print(f"\n[{idx}/{len(accounts)}] 跳过: {author_name} (缺少author_id)")
|
||||
continue
|
||||
|
||||
if not cookie_string:
|
||||
print(f"\n[{idx}/{len(accounts)}] 跳过: {author_name} (缺少Cookie)")
|
||||
continue
|
||||
|
||||
print(f"\n[{idx}/{len(accounts)}] 处理账号: {author_name} (ID: {author_id})")
|
||||
|
||||
try:
|
||||
integrated_data = self.build_integrated_data(author_id, author_name, cookie_string)
|
||||
integrated_data_list.append(integrated_data)
|
||||
print(f" [OK] 数据构建成功")
|
||||
|
||||
# 延迟避免请求过快(增加到3-5秒)
|
||||
if idx < len(accounts):
|
||||
import random
|
||||
delay = random.uniform(3, 5)
|
||||
print(f" [延迟] 等待 {delay:.1f} 秒...")
|
||||
time.sleep(delay)
|
||||
|
||||
except Exception as e:
|
||||
print(f" [X] 数据构建失败: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
continue
|
||||
|
||||
if not integrated_data_list:
|
||||
print("[!] 没有成功构建任何数据")
|
||||
return False
|
||||
|
||||
# 保存整合数据到临时文件
|
||||
integrated_file = os.path.join(self.temp_dir, f'integrated_{self.target_date_str}.json')
|
||||
try:
|
||||
with open(integrated_file, 'w', encoding='utf-8') as f:
|
||||
json.dump(integrated_data_list, f, ensure_ascii=False, indent=2)
|
||||
print(f"\n[保存] 整合数据: {integrated_file}")
|
||||
except Exception as e:
|
||||
print(f"[X] 保存整合数据失败: {e}")
|
||||
return False
|
||||
|
||||
# 使用DataExporter导出到三个表
|
||||
print(f"\n[导出] 开始导出到数据库...")
|
||||
try:
|
||||
exporter = DataExporter(use_database=False)
|
||||
|
||||
# 临时替换整合数据文件路径
|
||||
original_file = exporter.integrated_file
|
||||
exporter.integrated_file = integrated_file
|
||||
|
||||
# 导出三个表的数据
|
||||
result = exporter.export_all_tables()
|
||||
|
||||
# 恢复原路径
|
||||
exporter.integrated_file = original_file
|
||||
|
||||
if result:
|
||||
print(f"\n[OK] {self.target_date_str} 数据处理完成")
|
||||
return True
|
||||
else:
|
||||
print(f"\n[!] {self.target_date_str} 数据导出失败")
|
||||
return False
|
||||
|
||||
except Exception as e:
|
||||
print(f"[X] 导出数据失败: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
return False
|
||||
|
||||
|
||||
def main():
|
||||
"""主函数"""
|
||||
parser = argparse.ArgumentParser(
|
||||
description='获取指定日期的百家号统计数据',
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
示例用法:
|
||||
python fetch_date_statistics.py 2025-12-01
|
||||
python fetch_date_statistics.py 2025-12-15
|
||||
|
||||
注意事项:
|
||||
1. 由于百家号API限制,无法获取历史日期的收入数据
|
||||
2. 脚本会从ai_articles表统计发文量数据
|
||||
3. 收入字段将被设置为0(需要在数据产生当天运行才能获取真实收入)
|
||||
"""
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'date',
|
||||
type=str,
|
||||
help='目标日期 (格式: YYYY-MM-DD)'
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--no-proxy',
|
||||
action='store_true',
|
||||
help='禁用代理(默认启用天启代理)'
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# 验证日期格式
|
||||
try:
|
||||
datetime.strptime(args.date, '%Y-%m-%d')
|
||||
except ValueError:
|
||||
print(f"[X] 日期格式错误: {args.date}")
|
||||
print(" 正确格式: YYYY-MM-DD (例如: 2025-12-01)")
|
||||
return 1
|
||||
|
||||
print("\n" + "="*70)
|
||||
print("百家号指定日期统计数据获取工具")
|
||||
print("="*70)
|
||||
print(f"目标日期: {args.date}")
|
||||
print("="*70)
|
||||
|
||||
try:
|
||||
fetcher = DateStatisticsFetcher(args.date, use_proxy=not args.no_proxy)
|
||||
success = fetcher.process_single_date()
|
||||
|
||||
return 0 if success else 1
|
||||
|
||||
except Exception as e:
|
||||
print(f"\n[X] 程序执行出错: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
return 1
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
sys.exit(main())
|
||||
@@ -154,10 +154,8 @@ class CSVImporter:
|
||||
continue
|
||||
|
||||
try:
|
||||
# 处理slide_ratio值
|
||||
# 处理slide_ratio值(CSV中已是小数格式)
|
||||
slide_ratio_value = float(self.convert_value(row.get('slide_ratio', '0'), 'float') or 0.0)
|
||||
if slide_ratio_value > 10:
|
||||
slide_ratio_value = slide_ratio_value / 100
|
||||
slide_ratio_value = min(slide_ratio_value, 9.9999)
|
||||
|
||||
# 获取channel
|
||||
@@ -271,9 +269,8 @@ class CSVImporter:
|
||||
continue
|
||||
|
||||
try:
|
||||
# 处理avg_slide_ratio值(CSV中已是小数格式)
|
||||
avg_slide_ratio_value = float(self.convert_value(row.get('avg_slide_ratio', '0'), 'float') or 0.0)
|
||||
if avg_slide_ratio_value > 10:
|
||||
avg_slide_ratio_value = avg_slide_ratio_value / 100
|
||||
avg_slide_ratio_value = min(avg_slide_ratio_value, 9.9999)
|
||||
|
||||
# 获取channel并查询author_id
|
||||
@@ -348,13 +345,14 @@ class CSVImporter:
|
||||
return success_count > 0
|
||||
|
||||
def import_ai_statistics_days(self, batch_size: int = 50) -> bool:
|
||||
"""导入 ai_statistics_days 表数据(使用批量提交)
|
||||
"""导入 ai_statistics_days 表数据(仅当日数据:day_revenue)
|
||||
同时自动拆分数据到 ai_statistics_weekly 和 ai_statistics_monthly 表
|
||||
|
||||
Args:
|
||||
batch_size: 批量提交大小,默认50条
|
||||
"""
|
||||
print("\n" + "="*70)
|
||||
print("开始导入 ai_statistics_days 表数据")
|
||||
print("开始导入 ai_statistics_days 表数据(拆分到3个表)")
|
||||
print("="*70)
|
||||
|
||||
csv_file = self.csv_files['ai_statistics_days']
|
||||
@@ -365,14 +363,27 @@ class CSVImporter:
|
||||
self.logger.warning("ai_statistics_days表没有数据可导入")
|
||||
return False
|
||||
|
||||
self.logger.info(f"开始导入ai_statistics_days表数据,共 {len(rows)} 条记录,批量大小: {batch_size}")
|
||||
print(f"\n总计 {len(rows)} 条记录,分批导入(每批 {batch_size} 条)\n")
|
||||
self.logger.info(f"开始导入数据,共 {len(rows)} 条记录,批量大小: {batch_size}")
|
||||
print(f"\n总计 {len(rows)} 条记录,将拆分到3个表\n")
|
||||
|
||||
success_count = 0
|
||||
# 三个表的统计
|
||||
days_success = 0
|
||||
weekly_success = 0
|
||||
monthly_success = 0
|
||||
failed_count = 0
|
||||
batch_params = []
|
||||
first_record_keys = None
|
||||
sql_template = None
|
||||
|
||||
# 批量参数
|
||||
days_batch = []
|
||||
weekly_batch = []
|
||||
monthly_batch = []
|
||||
|
||||
# SQL模板
|
||||
days_sql = None
|
||||
weekly_sql = None
|
||||
monthly_sql = None
|
||||
days_keys = None
|
||||
weekly_keys = None
|
||||
monthly_keys = None
|
||||
|
||||
for idx, row in enumerate(rows, 1):
|
||||
author_name = row.get('author_name', '').strip()
|
||||
@@ -388,68 +399,153 @@ class CSVImporter:
|
||||
failed_count += 1
|
||||
continue
|
||||
|
||||
# 处理day_revenue字段(每日收益)
|
||||
day_revenue_value = self.convert_value(row.get('day_revenue', '0'), 'decimal')
|
||||
if day_revenue_value is None:
|
||||
day_revenue_value = Decimal('0')
|
||||
stat_date = row.get('stat_date', '').strip()
|
||||
|
||||
record = {
|
||||
# 1. ai_statistics_days 表数据(仅当日数据)
|
||||
day_revenue = self.convert_value(row.get('day_revenue', '0'), 'decimal') or Decimal('0')
|
||||
daily_published_count = self.convert_value(row.get('daily_published_count', '0'), 'int') or 0
|
||||
cumulative_published_count = self.convert_value(row.get('cumulative_published_count', '0'), 'int') or 0
|
||||
|
||||
days_record = {
|
||||
'author_id': author_id,
|
||||
'author_name': author_name,
|
||||
'channel': channel,
|
||||
'stat_date': row.get('stat_date', '').strip(),
|
||||
'daily_published_count': self.convert_value(row.get('daily_published_count', '0'), 'int') or 0,
|
||||
'cumulative_published_count': self.convert_value(row.get('cumulative_published_count', '0'), 'int') or 0,
|
||||
'day_revenue': day_revenue_value, # 每日收益
|
||||
'monthly_revenue': self.convert_value(row.get('monthly_revenue', '0'), 'decimal') or Decimal('0'),
|
||||
'weekly_revenue': self.convert_value(row.get('weekly_revenue', '0'), 'decimal') or Decimal('0'),
|
||||
'revenue_mom_growth_rate': self.convert_value(row.get('revenue_mom_growth_rate', '0'), 'decimal') or Decimal('0'),
|
||||
'revenue_wow_growth_rate': self.convert_value(row.get('revenue_wow_growth_rate', '0'), 'decimal') or Decimal('0'),
|
||||
'updated_at': datetime.now().strftime('%Y-%m-%d %H:%M:%S'), # 添加更新时间戳,强制更新
|
||||
'stat_date': stat_date,
|
||||
'daily_published_count': daily_published_count,
|
||||
'day_revenue': day_revenue,
|
||||
'updated_at': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
|
||||
}
|
||||
|
||||
if sql_template is None:
|
||||
first_record_keys = list(record.keys())
|
||||
columns = ', '.join(first_record_keys)
|
||||
placeholders = ', '.join(['%s'] * len(first_record_keys))
|
||||
update_parts = [f"{key} = VALUES({key})" for key in first_record_keys if key not in ['author_name', 'channel', 'stat_date']]
|
||||
sql_template = f"""
|
||||
# 2. ai_statistics_weekly 表数据
|
||||
weekly_revenue = self.convert_value(row.get('weekly_revenue', '0'), 'decimal') or Decimal('0')
|
||||
revenue_wow_growth_rate = self.convert_value(row.get('revenue_wow_growth_rate', '0'), 'decimal') or Decimal('0')
|
||||
|
||||
# 计算该日期所在周次(格式:WW,如51)
|
||||
from datetime import datetime as dt, timedelta
|
||||
date_obj = dt.strptime(stat_date, '%Y-%m-%d')
|
||||
# 使用isocalendar()获取ISO周数(周一为一周开始)
|
||||
year, week_num, _ = date_obj.isocalendar()
|
||||
stat_weekly = week_num # 直接使用数字
|
||||
|
||||
weekly_record = {
|
||||
'author_id': author_id,
|
||||
'author_name': author_name,
|
||||
'channel': channel,
|
||||
'stat_weekly': stat_weekly,
|
||||
'weekly_revenue': weekly_revenue,
|
||||
'revenue_wow_growth_rate': revenue_wow_growth_rate,
|
||||
'updated_at': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
|
||||
}
|
||||
|
||||
# 3. ai_statistics_monthly 表数据
|
||||
monthly_revenue = self.convert_value(row.get('monthly_revenue', '0'), 'decimal') or Decimal('0')
|
||||
revenue_mom_growth_rate = self.convert_value(row.get('revenue_mom_growth_rate', '0'), 'decimal') or Decimal('0')
|
||||
|
||||
# 计算该日期所在月份(格式:YYYY-MM,如2025-12)
|
||||
stat_monthly = date_obj.strftime('%Y-%m')
|
||||
|
||||
monthly_record = {
|
||||
'author_id': author_id,
|
||||
'author_name': author_name,
|
||||
'channel': channel,
|
||||
'stat_monthly': stat_monthly,
|
||||
'monthly_revenue': monthly_revenue,
|
||||
'revenue_mom_growth_rate': revenue_mom_growth_rate,
|
||||
'updated_at': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
|
||||
}
|
||||
|
||||
# 构建SQL模板(首次)
|
||||
if days_sql is None:
|
||||
days_keys = list(days_record.keys())
|
||||
columns = ', '.join(days_keys)
|
||||
placeholders = ', '.join(['%s'] * len(days_keys))
|
||||
update_parts = [f"{key} = VALUES({key})" for key in days_keys if key not in ['author_name', 'channel', 'stat_date']]
|
||||
days_sql = f"""
|
||||
INSERT INTO ai_statistics_days ({columns})
|
||||
VALUES ({placeholders})
|
||||
ON DUPLICATE KEY UPDATE {', '.join(update_parts)}
|
||||
"""
|
||||
|
||||
if first_record_keys is not None:
|
||||
batch_params.append(tuple(record[key] for key in first_record_keys))
|
||||
if weekly_sql is None:
|
||||
weekly_keys = list(weekly_record.keys())
|
||||
columns = ', '.join(weekly_keys)
|
||||
placeholders = ', '.join(['%s'] * len(weekly_keys))
|
||||
update_parts = [f"{key} = VALUES({key})" for key in weekly_keys if key not in ['author_name', 'channel', 'stat_weekly']]
|
||||
weekly_sql = f"""
|
||||
INSERT INTO ai_statistics_weekly ({columns})
|
||||
VALUES ({placeholders})
|
||||
ON DUPLICATE KEY UPDATE {', '.join(update_parts)}
|
||||
"""
|
||||
|
||||
if len(batch_params) >= batch_size or idx == len(rows):
|
||||
if monthly_sql is None:
|
||||
monthly_keys = list(monthly_record.keys())
|
||||
columns = ', '.join(monthly_keys)
|
||||
placeholders = ', '.join(['%s'] * len(monthly_keys))
|
||||
update_parts = [f"{key} = VALUES({key})" for key in monthly_keys if key not in ['author_name', 'channel', 'stat_monthly']]
|
||||
monthly_sql = f"""
|
||||
INSERT INTO ai_statistics_monthly ({columns})
|
||||
VALUES ({placeholders})
|
||||
ON DUPLICATE KEY UPDATE {', '.join(update_parts)}
|
||||
"""
|
||||
|
||||
# 添加到批量参数
|
||||
days_batch.append(tuple(days_record[key] for key in days_keys))
|
||||
weekly_batch.append(tuple(weekly_record[key] for key in weekly_keys))
|
||||
monthly_batch.append(tuple(monthly_record[key] for key in monthly_keys))
|
||||
|
||||
# 批量提交
|
||||
if len(days_batch) >= batch_size or idx == len(rows):
|
||||
try:
|
||||
result_count = self.db_manager.execute_many(sql_template, batch_params, autocommit=True)
|
||||
success_count += result_count
|
||||
print(f"[批次提交] 已导入 {success_count} 条记录(本批: {result_count}/{len(batch_params)})")
|
||||
self.logger.info(f"ai_statistics_days表批量提交: {result_count}/{len(batch_params)} 条记录")
|
||||
batch_params = []
|
||||
except Exception as batch_error:
|
||||
failed_count += len(batch_params)
|
||||
print(f" [X] 批次提交失败: {batch_error}")
|
||||
self.logger.error(f"ai_statistics_days表批量提交失败: {batch_error}")
|
||||
batch_params = []
|
||||
# 提交 ai_statistics_days
|
||||
result = self.db_manager.execute_many(days_sql, days_batch, autocommit=True)
|
||||
days_success += result
|
||||
print(f"[days] 已导入 {days_success} 条")
|
||||
days_batch = []
|
||||
except Exception as e:
|
||||
print(f" [X] days表提交失败: {e}")
|
||||
self.logger.error(f"ai_statistics_days批量提交失败: {e}")
|
||||
failed_count += len(days_batch)
|
||||
days_batch = []
|
||||
|
||||
try:
|
||||
# 提交 ai_statistics_weekly
|
||||
result = self.db_manager.execute_many(weekly_sql, weekly_batch, autocommit=True)
|
||||
weekly_success += result
|
||||
print(f"[weekly] 已导入 {weekly_success} 条")
|
||||
weekly_batch = []
|
||||
except Exception as e:
|
||||
print(f" [X] weekly表提交失败: {e}")
|
||||
self.logger.error(f"ai_statistics_weekly批量提交失败: {e}")
|
||||
weekly_batch = []
|
||||
|
||||
try:
|
||||
# 提交 ai_statistics_monthly
|
||||
result = self.db_manager.execute_many(monthly_sql, monthly_batch, autocommit=True)
|
||||
monthly_success += result
|
||||
print(f"[monthly] 已导入 {monthly_success} 条")
|
||||
monthly_batch = []
|
||||
except Exception as e:
|
||||
print(f" [X] monthly表提交失败: {e}")
|
||||
self.logger.error(f"ai_statistics_monthly批量提交失败: {e}")
|
||||
monthly_batch = []
|
||||
|
||||
except Exception as e:
|
||||
failed_count += 1
|
||||
print(f" [X] 处理失败 ({author_name}): {e}")
|
||||
self.logger.error(f"ai_statistics_days表处理失败: {author_name}, 错误: {e}")
|
||||
self.logger.error(f"数据处理失败: {author_name}, 错误: {e}")
|
||||
continue
|
||||
|
||||
print("\n" + "="*70)
|
||||
print(f"[OK] ai_statistics_days 表数据导入完成")
|
||||
print(f" 成功: {success_count} 条记录")
|
||||
print(f"[OK] 数据导入完成(拆分到3个表)")
|
||||
print(f" ai_statistics_days: {days_success} 条")
|
||||
print(f" ai_statistics_weekly: {weekly_success} 条")
|
||||
print(f" ai_statistics_monthly: {monthly_success} 条")
|
||||
if failed_count > 0:
|
||||
print(f" 失败: {failed_count} 条记录")
|
||||
print(f" 失败: {failed_count} 条")
|
||||
print("="*70)
|
||||
|
||||
self.logger.info(f"ai_statistics_days表数据导入完成: 成功 {success_count} 条,失败 {failed_count} 条")
|
||||
return success_count > 0
|
||||
self.logger.info(f"数据导入完成: days={days_success}, weekly={weekly_success}, monthly={monthly_success}, failed={failed_count}")
|
||||
return days_success > 0
|
||||
|
||||
def import_all(self) -> bool:
|
||||
"""导入所有CSV文件"""
|
||||
|
||||
0
query_statistics.py
Normal file
0
query_statistics.py
Normal file
15
sms_config.json
Normal file
15
sms_config.json
Normal file
@@ -0,0 +1,15 @@
|
||||
{
|
||||
"阿里云配置说明": "请填写您的阿里云短信服务配置",
|
||||
"access_key_id": "LTAI5tSMvnCJdqkZtCVWgh8R",
|
||||
"access_key_secret": "nyFzXyIi47peVLK4wR2qqbPezmU79W",
|
||||
"sign_name": "北京乐航时代科技",
|
||||
"template_code": "SMS_486210104",
|
||||
"phone_numbers": "13621242430",
|
||||
"endpoint": "dysmsapi.aliyuncs.com",
|
||||
"注意事项": [
|
||||
"access_key_id 和 access_key_secret 可在 https://ram.console.aliyun.com/manage/ak 获取",
|
||||
"sign_name 需要在阿里云短信服务控制台申请并通过审核",
|
||||
"template_code 是短信模板代码,需要在阿里云短信服务控制台申请",
|
||||
"phone_numbers 可以配置多个手机号,用逗号分隔,如: 13621242430,13800138000"
|
||||
]
|
||||
}
|
||||
@@ -277,17 +277,44 @@ class CookieSyncToDB:
|
||||
# 转换Cookie为字符串
|
||||
cookie_string = self.cookie_dict_to_string(cookies)
|
||||
|
||||
# 提取其他信息
|
||||
# 提取其他信息(使用username和nick作为author_name进行匹配)
|
||||
username = account_info.get('username', '') # 优先用于与数据库author_name匹配
|
||||
nick = account_info.get('nick', '') # 备用匹配字段
|
||||
app_id = account_info.get('app_id', '')
|
||||
nick = account_info.get('nick', '')
|
||||
domain = account_info.get('domain', '')
|
||||
level = account_info.get('level', '')
|
||||
|
||||
# 查找作者(使用 author_name + channel 作为唯一键)
|
||||
# 验证username或nick至少有一个存在
|
||||
if not username and not nick:
|
||||
print(" [!] 该账号没有username和nick字段,跳过")
|
||||
stats['skipped'] += 1
|
||||
continue
|
||||
|
||||
print(f" Username: {username}")
|
||||
print(f" 昵称: {nick}")
|
||||
|
||||
# 查找作者(使用双重匹配机制:先username,后nick)
|
||||
channel = 1 # 百家号固定为channel=1
|
||||
author = self.find_author_by_name(account_name, channel)
|
||||
author = None
|
||||
matched_field = None
|
||||
|
||||
# 1. 首先尝试使用username匹配
|
||||
if username:
|
||||
author = self.find_author_by_name(username, channel)
|
||||
if author:
|
||||
print(f" [√] 找到作者: {author['author_name']} (ID: {author['id']}, Channel: {author['channel']})")
|
||||
matched_field = 'username'
|
||||
print(f" [√] 通过username匹配到作者: {author['author_name']} (ID: {author['id']}, Channel: {author['channel']})")
|
||||
|
||||
# 2. 如果username匹配失败,尝试使用nick匹配
|
||||
if not author and nick:
|
||||
author = self.find_author_by_name(nick, channel)
|
||||
if author:
|
||||
matched_field = 'nick'
|
||||
print(f" [√] 通过nick匹配到作者: {author['author_name']} (ID: {author['id']}, Channel: {author['channel']})")
|
||||
|
||||
# 3. 如果都没匹配到
|
||||
if not author:
|
||||
print(f" [!] 未找到匹配的作者(已尝试username和nick)")
|
||||
|
||||
# 更新或创建
|
||||
if author:
|
||||
@@ -300,15 +327,17 @@ class CookieSyncToDB:
|
||||
)
|
||||
|
||||
if success:
|
||||
print(f" [OK] Cookie已更新")
|
||||
print(f" [OK] Cookie已更新(匹配字段: {matched_field})")
|
||||
stats['updated'] += 1
|
||||
# 记录成功
|
||||
success_records.append({
|
||||
'account_name': account_name,
|
||||
'app_id': app_id,
|
||||
'username': username,
|
||||
'nick': nick,
|
||||
'app_id': app_id,
|
||||
'domain': domain,
|
||||
'action': 'updated',
|
||||
'matched_field': matched_field,
|
||||
'db_author_id': author['id'],
|
||||
'db_author_name': author['author_name']
|
||||
})
|
||||
@@ -318,17 +347,21 @@ class CookieSyncToDB:
|
||||
# 记录失败
|
||||
failed_records.append({
|
||||
'account_name': account_name,
|
||||
'app_id': app_id,
|
||||
'username': username,
|
||||
'nick': nick,
|
||||
'app_id': app_id,
|
||||
'reason': '数据库更新失败',
|
||||
'matched_field': matched_field,
|
||||
'db_author_id': author['id']
|
||||
})
|
||||
else:
|
||||
# 作者不存在
|
||||
# 作者不存在,考虑创建
|
||||
if auto_create:
|
||||
print(f" [*] 作者不存在,创建新记录...")
|
||||
# 优先使用username,如果没有则使用nick
|
||||
author_name_to_create = username if username else nick
|
||||
print(f" [*] 作者不存在,创建新记录(author_name: {author_name_to_create})...")
|
||||
success = self.insert_new_author(
|
||||
account_name,
|
||||
author_name_to_create, # 优先使用username,否则使用nick
|
||||
cookie_string,
|
||||
app_id,
|
||||
nick,
|
||||
@@ -337,15 +370,17 @@ class CookieSyncToDB:
|
||||
)
|
||||
|
||||
if success:
|
||||
print(f" [OK] 新作者已创建")
|
||||
print(f" [OK] 新作者已创建 (author_name: {author_name_to_create})")
|
||||
stats['created'] += 1
|
||||
# 记录成功
|
||||
success_records.append({
|
||||
'account_name': account_name,
|
||||
'app_id': app_id,
|
||||
'username': username,
|
||||
'nick': nick,
|
||||
'app_id': app_id,
|
||||
'domain': domain,
|
||||
'action': 'created'
|
||||
'action': 'created',
|
||||
'created_with': 'username' if username else 'nick'
|
||||
})
|
||||
else:
|
||||
print(f" [X] 创建作者失败")
|
||||
@@ -353,8 +388,9 @@ class CookieSyncToDB:
|
||||
# 记录失败
|
||||
failed_records.append({
|
||||
'account_name': account_name,
|
||||
'app_id': app_id,
|
||||
'username': username,
|
||||
'nick': nick,
|
||||
'app_id': app_id,
|
||||
'reason': '数据库插入失败'
|
||||
})
|
||||
else:
|
||||
@@ -363,9 +399,10 @@ class CookieSyncToDB:
|
||||
# 记录失败(数据库中不存在)
|
||||
failed_records.append({
|
||||
'account_name': account_name,
|
||||
'app_id': app_id,
|
||||
'username': username,
|
||||
'nick': nick,
|
||||
'reason': '数据库中不存在该账号,且未开启自动创建'
|
||||
'app_id': app_id,
|
||||
'reason': '数据库中不存在该账号(已尝试username和nick),且未开启自动创建'
|
||||
})
|
||||
|
||||
# 保存记录文件
|
||||
|
||||
40
test_validation_sms.bat
Normal file
40
test_validation_sms.bat
Normal file
@@ -0,0 +1,40 @@
|
||||
@echo off
|
||||
chcp 65001 >nul
|
||||
echo ============================================================
|
||||
echo 数据验证与短信告警系统 - 快速测试
|
||||
echo ============================================================
|
||||
echo.
|
||||
|
||||
echo [步骤1] 检查Python环境...
|
||||
python --version
|
||||
if %errorlevel% neq 0 (
|
||||
echo [错误] Python未安装或未添加到PATH
|
||||
pause
|
||||
exit /b 1
|
||||
)
|
||||
echo.
|
||||
|
||||
echo [步骤2] 测试短信发送功能...
|
||||
echo.
|
||||
python data_validation_with_sms.py --test-sms
|
||||
if %errorlevel% neq 0 (
|
||||
echo.
|
||||
echo [错误] 短信发送测试失败
|
||||
echo 请检查:
|
||||
echo 1. 阿里云SDK是否已安装
|
||||
echo 2. sms_config.json配置是否正确
|
||||
echo 3. AccessKey和Secret是否有效
|
||||
pause
|
||||
exit /b 1
|
||||
)
|
||||
echo.
|
||||
|
||||
echo [步骤3] 执行数据验证...
|
||||
echo.
|
||||
python data_validation_with_sms.py
|
||||
echo.
|
||||
|
||||
echo ============================================================
|
||||
echo 测试完成!
|
||||
echo ============================================================
|
||||
pause
|
||||
47
test_validation_sms.sh
Normal file
47
test_validation_sms.sh
Normal file
@@ -0,0 +1,47 @@
|
||||
#!/bin/bash
|
||||
# 数据验证与短信告警系统 - 快速测试(Linux版本)
|
||||
|
||||
echo "============================================================"
|
||||
echo "数据验证与短信告警系统 - 快速测试"
|
||||
echo "============================================================"
|
||||
echo ""
|
||||
|
||||
echo "[步骤1] 检查Python环境..."
|
||||
if command -v python3 &> /dev/null; then
|
||||
PYTHON_CMD=python3
|
||||
elif command -v python &> /dev/null; then
|
||||
PYTHON_CMD=python
|
||||
else
|
||||
echo "[错误] Python未安装或未添加到PATH"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
$PYTHON_CMD --version
|
||||
if [ $? -ne 0 ]; then
|
||||
echo "[错误] Python版本检查失败"
|
||||
exit 1
|
||||
fi
|
||||
echo ""
|
||||
|
||||
echo "[步骤2] 测试短信发送功能..."
|
||||
echo ""
|
||||
$PYTHON_CMD data_validation_with_sms.py --test-sms
|
||||
if [ $? -ne 0 ]; then
|
||||
echo ""
|
||||
echo "[错误] 短信发送测试失败"
|
||||
echo "请检查:"
|
||||
echo " 1. 阿里云SDK是否已安装"
|
||||
echo " 2. sms_config.json配置是否正确"
|
||||
echo " 3. AccessKey和Secret是否有效"
|
||||
exit 1
|
||||
fi
|
||||
echo ""
|
||||
|
||||
echo "[步骤3] 执行数据验证..."
|
||||
echo ""
|
||||
$PYTHON_CMD data_validation_with_sms.py
|
||||
echo ""
|
||||
|
||||
echo "============================================================"
|
||||
echo "测试完成!"
|
||||
echo "============================================================"
|
||||
151
update_day_revenue.py
Normal file
151
update_day_revenue.py
Normal file
@@ -0,0 +1,151 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
更新day_revenue脚本
|
||||
功能:从CSV文件读取数据,只更新ai_statistics_days表中的day_revenue字段
|
||||
"""
|
||||
|
||||
import os
|
||||
import sys
|
||||
import csv
|
||||
from typing import List, Dict
|
||||
from decimal import Decimal
|
||||
|
||||
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
|
||||
|
||||
from database_config import DatabaseManager
|
||||
|
||||
|
||||
class DayRevenueUpdater:
|
||||
"""day_revenue字段更新器"""
|
||||
|
||||
def __init__(self):
|
||||
"""初始化"""
|
||||
self.db_manager = DatabaseManager()
|
||||
self.script_dir = os.path.dirname(os.path.abspath(__file__))
|
||||
self.csv_file = os.path.join(self.script_dir, 'ai_statistics_days.csv')
|
||||
|
||||
print(f"[初始化] CSV文件: {self.csv_file}")
|
||||
|
||||
def read_csv_data(self) -> List[Dict]:
|
||||
"""读取CSV文件数据
|
||||
|
||||
Returns:
|
||||
数据列表
|
||||
"""
|
||||
if not os.path.exists(self.csv_file):
|
||||
print(f"[X] CSV文件不存在: {self.csv_file}")
|
||||
return []
|
||||
|
||||
try:
|
||||
with open(self.csv_file, 'r', encoding='utf-8-sig') as f:
|
||||
reader = csv.DictReader(f)
|
||||
data = list(reader)
|
||||
|
||||
print(f"[OK] 读取到 {len(data)} 条记录")
|
||||
return data
|
||||
except Exception as e:
|
||||
print(f"[X] 读取CSV失败: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
return []
|
||||
|
||||
def update_day_revenue(self, batch_size: int = 50) -> bool:
|
||||
"""更新day_revenue字段
|
||||
|
||||
Args:
|
||||
batch_size: 批量更新大小
|
||||
|
||||
Returns:
|
||||
是否成功
|
||||
"""
|
||||
# 读取CSV数据
|
||||
csv_data = self.read_csv_data()
|
||||
if not csv_data:
|
||||
print("[!] 没有数据需要更新")
|
||||
return False
|
||||
|
||||
print(f"\n[开始] 更新day_revenue字段...")
|
||||
|
||||
# 准备批量更新
|
||||
update_sql = """
|
||||
UPDATE ai_statistics_days
|
||||
SET day_revenue = %s,
|
||||
updated_at = NOW()
|
||||
WHERE author_id = %s
|
||||
AND stat_date = %s
|
||||
AND channel = %s
|
||||
"""
|
||||
|
||||
success_count = 0
|
||||
failed_count = 0
|
||||
not_found_count = 0
|
||||
|
||||
# 逐条更新
|
||||
for idx, row in enumerate(csv_data, 1):
|
||||
try:
|
||||
author_id = int(row.get('author_id', 0))
|
||||
stat_date = row.get('stat_date', '')
|
||||
channel = int(row.get('channel', 1))
|
||||
day_revenue = Decimal(row.get('day_revenue', '0.00'))
|
||||
|
||||
# 执行更新
|
||||
affected_rows = self.db_manager.execute_update(
|
||||
update_sql,
|
||||
(day_revenue, author_id, stat_date, channel)
|
||||
)
|
||||
|
||||
if affected_rows > 0:
|
||||
success_count += 1
|
||||
print(f" [{idx}/{len(csv_data)}] ✓ 更新成功: author_id={author_id}, stat_date={stat_date}, day_revenue={day_revenue}")
|
||||
else:
|
||||
not_found_count += 1
|
||||
print(f" [{idx}/{len(csv_data)}] - 未找到记录: author_id={author_id}, stat_date={stat_date}")
|
||||
|
||||
except Exception as e:
|
||||
failed_count += 1
|
||||
print(f" [{idx}/{len(csv_data)}] ✗ 更新失败: {e}")
|
||||
print(f" 数据: {row}")
|
||||
|
||||
# 输出统计
|
||||
print(f"\n{'='*70}")
|
||||
print(f"更新完成")
|
||||
print(f"{'='*70}")
|
||||
print(f"成功更新: {success_count}/{len(csv_data)}")
|
||||
print(f"未找到记录: {not_found_count}/{len(csv_data)}")
|
||||
print(f"更新失败: {failed_count}/{len(csv_data)}")
|
||||
print(f"{'='*70}")
|
||||
|
||||
return failed_count == 0
|
||||
|
||||
|
||||
def main():
|
||||
"""主函数"""
|
||||
print("\n" + "="*70)
|
||||
print("day_revenue字段批量更新工具")
|
||||
print("="*70)
|
||||
print("功能:从 ai_statistics_days.csv 读取数据,只更新数据库中的 day_revenue 字段")
|
||||
print("="*70)
|
||||
|
||||
try:
|
||||
updater = DayRevenueUpdater()
|
||||
|
||||
# 确认执行
|
||||
confirm = input("\n是否开始更新? (y/n): ").strip().lower()
|
||||
if confirm != 'y':
|
||||
print("已取消")
|
||||
return 0
|
||||
|
||||
success = updater.update_day_revenue()
|
||||
|
||||
return 0 if success else 1
|
||||
|
||||
except Exception as e:
|
||||
print(f"\n[X] 程序执行出错: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
return 1
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
sys.exit(main())
|
||||
8
validation_report_20251230_102131.txt
Normal file
8
validation_report_20251230_102131.txt
Normal file
@@ -0,0 +1,8 @@
|
||||
数据验证报告
|
||||
======================================================================
|
||||
生成时间: 2025-12-30 10:21:31
|
||||
目标日期: 2025-12-30
|
||||
|
||||
|
||||
顺序验证结果
|
||||
----------------------------------------------------------------------
|
||||
34
validation_report_20251230_113338.txt
Normal file
34
validation_report_20251230_113338.txt
Normal file
@@ -0,0 +1,34 @@
|
||||
数据验证报告
|
||||
======================================================================
|
||||
生成时间: 2025-12-30 11:33:38
|
||||
目标日期: 2025-12-29
|
||||
|
||||
|
||||
顺序验证结果
|
||||
----------------------------------------------------------------------
|
||||
json vs csv
|
||||
顺序匹配: 是
|
||||
json 记录数: 84
|
||||
csv 记录数: 84
|
||||
|
||||
|
||||
交叉验证结果
|
||||
----------------------------------------------------------------------
|
||||
json vs csv
|
||||
共同记录: 84
|
||||
仅在json: 0
|
||||
仅在csv: 0
|
||||
字段不匹配: 0
|
||||
|
||||
json vs database
|
||||
共同记录: 84
|
||||
仅在json: 0
|
||||
仅在database: 0
|
||||
字段不匹配: 0
|
||||
|
||||
csv vs database
|
||||
共同记录: 84
|
||||
仅在csv: 0
|
||||
仅在database: 0
|
||||
字段不匹配: 0
|
||||
|
||||
@@ -28,12 +28,12 @@ logger = setup_baidu_crawl_logger()
|
||||
|
||||
# 简单的代理获取配置 - 大麦代理IP
|
||||
PROXY_API_URL = (
|
||||
'https://api2.damaiip.com/index.php?s=/front/user/getIPlist&xsn=e054861d08471263d970bde4f4905181&osn=TC_NO176655872088456223&tiqu=1'
|
||||
'https://api2.damaiip.com/index.php?s=/front/user/getIPlist&xsn=2912cb2b22d3b7ae724f045012790479&osn=TC_NO176707424165606223&tiqu=1'
|
||||
)
|
||||
|
||||
# 大麦代理账号密码认证
|
||||
PROXY_USERNAME = '694b8c3172af7'
|
||||
PROXY_PASSWORD = 'q8yA8x1dwCpdyIK'
|
||||
PROXY_USERNAME = '69538fdef04e1'
|
||||
PROXY_PASSWORD = '63v0kQBr2yJXnjf'
|
||||
|
||||
# 备用固定代理IP池(格式:'IP:端口', '用户名', '密码')
|
||||
BACKUP_PROXY_POOL = [
|
||||
|
||||
@@ -27,12 +27,12 @@ if 'https_proxy' in os.environ:
|
||||
|
||||
# 简单的代理获取配置 - 大麦代理IP
|
||||
PROXY_API_URL = (
|
||||
'https://api2.damaiip.com/index.php?s=/front/user/getIPlist&xsn=e054861d08471263d970bde4f4905181&osn=TC_NO176655872088456223&tiqu=1'
|
||||
'https://api2.damaiip.com/index.php?s=/front/user/getIPlist&xsn=2912cb2b22d3b7ae724f045012790479&osn=TC_NO176707424165606223&tiqu=1'
|
||||
)
|
||||
|
||||
# 大麦代理账号密码认证
|
||||
PROXY_USERNAME = '694b8c3172af7'
|
||||
PROXY_PASSWORD = 'q8yA8x1dwCpdyIK'
|
||||
PROXY_USERNAME = '69538fdef04e1'
|
||||
PROXY_PASSWORD = '63v0kQBr2yJXnjf'
|
||||
|
||||
# 备用固定代理IP池(格式:'IP:端口', '用户名', '密码')
|
||||
BACKUP_PROXY_POOL = [
|
||||
|
||||
5619
备份/captured_account_cookies-6677.json
Normal file
5619
备份/captured_account_cookies-6677.json
Normal file
File diff suppressed because it is too large
Load Diff
0
安装守护进程服务.bat
Normal file
0
安装守护进程服务.bat
Normal file
0
手动启动守护进程任务.bat
Normal file
0
手动启动守护进程任务.bat
Normal file
341
数据验证短信告警README.md
Normal file
341
数据验证短信告警README.md
Normal file
@@ -0,0 +1,341 @@
|
||||
# 📱 数据验证与短信告警系统
|
||||
|
||||
## 🎯 功能概述
|
||||
|
||||
自动化数据验证系统,每天定时检查数据一致性,发现问题时通过阿里云短信服务发送告警。
|
||||
|
||||
**核心功能:**
|
||||
- ✅ 自动验证 JSON/CSV/数据库 三个数据源的一致性
|
||||
- ✅ 验证失败自动发送短信告警(错误代码:2222)
|
||||
- ✅ 支持定时任务(每天上午9点执行)
|
||||
- ✅ 生成详细的验证报告
|
||||
- ✅ 支持多手机号接收告警
|
||||
|
||||
---
|
||||
|
||||
## 📁 文件结构
|
||||
|
||||
```
|
||||
xhh_baijiahao/
|
||||
├── data_validation.py # 数据验证核心模块
|
||||
├── data_validation_with_sms.py # 数据验证+短信告警集成脚本 ⭐
|
||||
├── sms_config.json # 短信服务配置文件 ⭐
|
||||
├── test_validation_sms.bat # Windows快速测试脚本
|
||||
├── 数据验证短信告警使用说明.md # 详细使用文档
|
||||
└── ai_sms/ # 阿里云短信SDK示例
|
||||
└── alibabacloud_sample/
|
||||
└── sample.py
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🚀 快速开始(5分钟)
|
||||
|
||||
### 1️⃣ 安装依赖
|
||||
|
||||
```bash
|
||||
pip install alibabacloud_dysmsapi20170525 alibabacloud_credentials alibabacloud_tea_openapi alibabacloud_tea_util
|
||||
```
|
||||
|
||||
### 2️⃣ 配置短信服务
|
||||
|
||||
编辑 `sms_config.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"access_key_id": "您的AccessKey ID",
|
||||
"access_key_secret": "您的AccessKey Secret",
|
||||
"sign_name": "北京乐航时代科技",
|
||||
"template_code": "SMS_486210104",
|
||||
"phone_numbers": "13621242430"
|
||||
}
|
||||
```
|
||||
|
||||
**获取AccessKey:** https://ram.console.aliyun.com/manage/ak
|
||||
|
||||
### 3️⃣ 测试运行
|
||||
|
||||
**Windows用户(双击运行):**
|
||||
```
|
||||
test_validation_sms.bat
|
||||
```
|
||||
|
||||
**命令行运行:**
|
||||
```bash
|
||||
# 测试短信发送
|
||||
python data_validation_with_sms.py --test-sms
|
||||
|
||||
# 执行数据验证
|
||||
python data_validation_with_sms.py
|
||||
```
|
||||
|
||||
### 4️⃣ 配置定时任务
|
||||
|
||||
```bash
|
||||
# 查看配置命令
|
||||
python data_validation_with_sms.py --setup-schedule
|
||||
```
|
||||
|
||||
按照提示配置Windows任务计划程序,设置每天9点自动执行。
|
||||
|
||||
---
|
||||
|
||||
## 📖 使用场景
|
||||
|
||||
### 场景1:每日自动验证
|
||||
|
||||
**定时任务配置(每天9点):**
|
||||
- 程序:`C:\Python\python.exe`
|
||||
- 参数:`D:\workspace\xhh_baijiahao\data_validation_with_sms.py`
|
||||
- 触发器:每天上午9:00
|
||||
|
||||
**执行流程:**
|
||||
1. 自动验证昨天的数据(JSON/CSV/数据库)
|
||||
2. 如果发现问题 → 发送短信告警(错误代码2222)
|
||||
3. 生成详细验证报告
|
||||
|
||||
### 场景2:手动验证指定日期
|
||||
|
||||
```bash
|
||||
# 验证2025-12-29的数据
|
||||
python data_validation_with_sms.py --date 2025-12-29
|
||||
|
||||
# 验证指定表
|
||||
python data_validation_with_sms.py --table ai_statistics_day --date 2025-12-29
|
||||
|
||||
# 只验证特定数据源
|
||||
python data_validation_with_sms.py --source csv database
|
||||
```
|
||||
|
||||
### 场景3:仅验证不发短信
|
||||
|
||||
```bash
|
||||
# 适用于调试或测试
|
||||
python data_validation_with_sms.py --no-sms
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔔 短信告警说明
|
||||
|
||||
### 触发条件
|
||||
|
||||
发送短信告警(错误代码:**2222**)的情况:
|
||||
|
||||
| 问题类型 | 说明 | 示例 |
|
||||
|---------|-----|-----|
|
||||
| **顺序不一致** | JSON和CSV记录顺序不同 | 账号A在JSON第1位,CSV第3位 |
|
||||
| **缺失记录** | 某个数据源少了记录 | JSON有10条,CSV只有9条 |
|
||||
| **多余记录** | 某个数据源多了记录 | CSV有11条,数据库只有10条 |
|
||||
| **字段差异** | 相同记录的字段值不同 | 阅读量:JSON=1000, CSV=999 |
|
||||
|
||||
### 短信内容
|
||||
|
||||
```
|
||||
【北京乐航时代科技】您的验证码是2222
|
||||
```
|
||||
|
||||
> 💡 **说明**:由于使用验证码模板,错误代码固定为 `2222`。具体错误详情请查看验证报告文件。
|
||||
|
||||
### 多号码配置
|
||||
|
||||
在 `sms_config.json` 中配置多个接收号码:
|
||||
|
||||
```json
|
||||
{
|
||||
"phone_numbers": "13621242430,13800138000,13900139000"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📊 验证报告
|
||||
|
||||
每次验证自动生成详细报告:
|
||||
|
||||
```
|
||||
validation_report_20250104_090523.txt
|
||||
```
|
||||
|
||||
**报告内容:**
|
||||
- ✅ 顺序验证结果
|
||||
- ✅ 交叉验证结果
|
||||
- ✅ 数据差异统计
|
||||
- ✅ 详细错误列表(记录级别)
|
||||
|
||||
**示例报告片段:**
|
||||
```
|
||||
交叉验证结果
|
||||
----------------------------------------------------------------------
|
||||
json vs csv
|
||||
共同记录: 48 条
|
||||
仅在json: 0 条
|
||||
仅在csv: 2 条
|
||||
字段不匹配: 3 条
|
||||
|
||||
仅在csv中的记录(前5条):
|
||||
- 测试账号1|1
|
||||
- 测试账号2|1
|
||||
|
||||
字段值不匹配的记录(前3条):
|
||||
记录: 主力账号|1
|
||||
字段 read_count:
|
||||
json: 150000
|
||||
csv: 149999
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ⚙️ 高级配置
|
||||
|
||||
### 环境变量方式(更安全)
|
||||
|
||||
**Windows PowerShell:**
|
||||
```powershell
|
||||
$env:ALIBABA_CLOUD_ACCESS_KEY_ID="您的AccessKey ID"
|
||||
$env:ALIBABA_CLOUD_ACCESS_KEY_SECRET="您的AccessKey Secret"
|
||||
```
|
||||
|
||||
**Linux/macOS:**
|
||||
```bash
|
||||
export ALIBABA_CLOUD_ACCESS_KEY_ID="您的AccessKey ID"
|
||||
export ALIBABA_CLOUD_ACCESS_KEY_SECRET="您的AccessKey Secret"
|
||||
```
|
||||
|
||||
### 配置优先级
|
||||
|
||||
1. **环境变量** (最高优先级)
|
||||
2. **sms_config.json** 配置文件
|
||||
3. **代码中的默认值**
|
||||
|
||||
---
|
||||
|
||||
## 🔧 故障排查
|
||||
|
||||
### ❌ 短信发送失败
|
||||
|
||||
**常见原因:**
|
||||
1. AccessKey ID/Secret 不正确
|
||||
2. 短信签名未审核通过
|
||||
3. 短信模板未审核通过
|
||||
4. 账户余额不足
|
||||
5. 手机号格式错误
|
||||
|
||||
**解决方法:**
|
||||
```bash
|
||||
# 测试短信发送
|
||||
python data_validation_with_sms.py --test-sms
|
||||
```
|
||||
|
||||
查看控制台输出的详细错误信息和诊断地址。
|
||||
|
||||
### ❌ 导入错误
|
||||
|
||||
```
|
||||
ImportError: No module named 'alibabacloud_dysmsapi20170525'
|
||||
```
|
||||
|
||||
**解决方法:**
|
||||
```bash
|
||||
pip install alibabacloud_dysmsapi20170525 alibabacloud_credentials alibabacloud_tea_openapi alibabacloud_tea_util
|
||||
```
|
||||
|
||||
### ❌ 数据库连接失败
|
||||
|
||||
检查 `database_config.py` 配置是否正确。
|
||||
|
||||
### ❌ 定时任务不执行
|
||||
|
||||
**检查项:**
|
||||
1. 任务计划程序中任务状态
|
||||
2. Python路径和脚本路径是否正确
|
||||
3. 查看任务历史记录
|
||||
|
||||
---
|
||||
|
||||
## 📝 命令行参数
|
||||
|
||||
```bash
|
||||
python data_validation_with_sms.py [参数]
|
||||
```
|
||||
|
||||
| 参数 | 说明 | 示例 |
|
||||
|-----|-----|-----|
|
||||
| `--date` | 指定验证日期 | `--date 2025-12-29` |
|
||||
| `--source` | 指定数据源 | `--source json csv` |
|
||||
| `--table` | 指定验证表 | `--table ai_statistics_day` |
|
||||
| `--setup-schedule` | 配置定时任务 | `--setup-schedule` |
|
||||
| `--test-sms` | 测试短信功能 | `--test-sms` |
|
||||
| `--no-sms` | 禁用短信发送 | `--no-sms` |
|
||||
|
||||
---
|
||||
|
||||
## 🔐 安全建议
|
||||
|
||||
1. ✅ **使用环境变量存储敏感信息**
|
||||
- 不要将 AccessKey 提交到 Git
|
||||
- 将 `sms_config.json` 添加到 `.gitignore`
|
||||
|
||||
2. ✅ **定期轮换 AccessKey**
|
||||
- 建议每3-6个月更换一次
|
||||
|
||||
3. ✅ **使用 RAM 子账号**
|
||||
- 为短信服务创建专用子账号
|
||||
- 仅授予短信发送权限
|
||||
|
||||
4. ✅ **设置 IP 白名单**
|
||||
- 在阿里云 RAM 控制台限制访问 IP
|
||||
|
||||
---
|
||||
|
||||
## 📞 技术支持
|
||||
|
||||
### 阿里云短信服务
|
||||
|
||||
- 控制台:https://dysms.console.aliyun.com/
|
||||
- 文档:https://help.aliyun.com/product/44282.html
|
||||
- API参考:https://api.aliyun.com/product/Dysmsapi
|
||||
|
||||
### 常见问题
|
||||
|
||||
**Q:短信收不到?**
|
||||
A:检查手机号是否正确,短信签名和模板是否已审核通过。
|
||||
|
||||
**Q:如何查看短信发送记录?**
|
||||
A:登录阿里云短信服务控制台 → 业务统计 → 发送记录查询。
|
||||
|
||||
**Q:短信费用多少?**
|
||||
A:验证码短信约0.045元/条,具体价格以阿里云官网为准。
|
||||
|
||||
**Q:可以自定义短信内容吗?**
|
||||
A:需要在阿里云控制台申请新的短信模板,审核通过后修改 `template_code` 配置。
|
||||
|
||||
---
|
||||
|
||||
## 🎉 快速测试检查清单
|
||||
|
||||
- [ ] 安装Python依赖
|
||||
- [ ] 配置 `sms_config.json`
|
||||
- [ ] 运行 `test_validation_sms.bat`
|
||||
- [ ] 收到测试短信
|
||||
- [ ] 执行数据验证
|
||||
- [ ] 生成验证报告
|
||||
- [ ] 配置定时任务
|
||||
|
||||
---
|
||||
|
||||
## 📅 版本历史
|
||||
|
||||
### v1.0.0 (2025-01-04)
|
||||
- ✨ 初始版本发布
|
||||
- ✅ 数据验证功能
|
||||
- ✅ 阿里云短信告警
|
||||
- ✅ 定时任务支持
|
||||
- ✅ 多数据源支持(JSON/CSV/数据库)
|
||||
- ✅ 详细验证报告
|
||||
- ✅ 配置文件支持
|
||||
|
||||
---
|
||||
|
||||
**开发团队:** 北京乐航时代科技
|
||||
**最后更新:** 2025-01-04
|
||||
294
数据验证短信告警使用说明.md
Normal file
294
数据验证短信告警使用说明.md
Normal file
@@ -0,0 +1,294 @@
|
||||
# 数据验证与短信告警系统 - 使用说明
|
||||
|
||||
## 📋 功能概述
|
||||
|
||||
自动执行数据验证(JSON/CSV/数据库),当验证失败时通过阿里云短信服务发送告警通知。
|
||||
|
||||
**核心功能:**
|
||||
- ✅ 每天定时执行数据验证(默认上午9点)
|
||||
- ✅ 验证失败自动发送短信告警(错误代码:2222)
|
||||
- ✅ 支持多个手机号接收告警
|
||||
- ✅ 生成详细的验证报告
|
||||
|
||||
---
|
||||
|
||||
## 🚀 快速开始
|
||||
|
||||
### 1. 安装依赖
|
||||
|
||||
```bash
|
||||
# 安装阿里云短信SDK
|
||||
pip install alibabacloud_dysmsapi20170525
|
||||
pip install alibabacloud_credentials
|
||||
pip install alibabacloud_tea_openapi
|
||||
pip install alibabacloud_tea_util
|
||||
```
|
||||
|
||||
### 2. 配置短信服务
|
||||
|
||||
编辑 `sms_config.json` 文件,填写您的阿里云配置:
|
||||
|
||||
```json
|
||||
{
|
||||
"access_key_id": "您的AccessKey ID",
|
||||
"access_key_secret": "您的AccessKey Secret",
|
||||
"sign_name": "您的短信签名",
|
||||
"template_code": "SMS_486210104",
|
||||
"phone_numbers": "13621242430,13800138000"
|
||||
}
|
||||
```
|
||||
|
||||
**获取方式:**
|
||||
- AccessKey:https://ram.console.aliyun.com/manage/ak
|
||||
- 短信签名和模板:https://dysms.console.aliyun.com/
|
||||
|
||||
### 3. 测试短信功能
|
||||
|
||||
```bash
|
||||
# 测试短信发送
|
||||
python data_validation_with_sms.py --test-sms
|
||||
```
|
||||
|
||||
### 4. 手动执行验证
|
||||
|
||||
```bash
|
||||
# 验证昨天的数据
|
||||
python data_validation_with_sms.py
|
||||
|
||||
# 验证指定日期
|
||||
python data_validation_with_sms.py --date 2025-12-29
|
||||
|
||||
# 验证特定表
|
||||
python data_validation_with_sms.py --table ai_statistics_day
|
||||
```
|
||||
|
||||
### 5. 配置定时任务
|
||||
|
||||
**Windows系统:**
|
||||
|
||||
```bash
|
||||
# 查看任务计划配置命令
|
||||
python data_validation_with_sms.py --setup-schedule
|
||||
```
|
||||
|
||||
然后使用管理员权限运行显示的PowerShell命令,或者手动配置:
|
||||
|
||||
1. 打开 `任务计划程序` (按 Win+R,输入 `taskschd.msc`)
|
||||
2. 点击 "创建基本任务"
|
||||
3. 填写任务信息:
|
||||
- **名称**: DataValidationWithSMS
|
||||
- **描述**: 每天9点执行数据验证并发送短信告警
|
||||
4. 触发器:**每天**,时间:**上午 9:00**
|
||||
5. 操作:**启动程序**
|
||||
- 程序:`C:\Python\python.exe` (您的Python路径)
|
||||
- 参数:`D:\workspace\xhh_baijiahao\data_validation_with_sms.py`
|
||||
6. 完成
|
||||
|
||||
**Linux/macOS系统:**
|
||||
|
||||
编辑crontab:
|
||||
```bash
|
||||
crontab -e
|
||||
```
|
||||
|
||||
添加定时任务(每天9点执行):
|
||||
```
|
||||
0 9 * * * /usr/bin/python3 /path/to/data_validation_with_sms.py
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📖 使用示例
|
||||
|
||||
### 示例1:验证JSON和CSV
|
||||
|
||||
```bash
|
||||
python data_validation_with_sms.py --source json csv
|
||||
```
|
||||
|
||||
### 示例2:验证CSV和数据库
|
||||
|
||||
```bash
|
||||
python data_validation_with_sms.py --source csv database
|
||||
```
|
||||
|
||||
### 示例3:完整验证(三个数据源)
|
||||
|
||||
```bash
|
||||
python data_validation_with_sms.py --source json csv database
|
||||
```
|
||||
|
||||
### 示例4:验证 ai_statistics_day 表
|
||||
|
||||
```bash
|
||||
python data_validation_with_sms.py --table ai_statistics_day --source csv database
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📧 短信告警说明
|
||||
|
||||
### 触发条件
|
||||
|
||||
以下情况会发送短信告警(错误代码:2222):
|
||||
|
||||
1. **顺序不一致**:JSON和CSV中记录的顺序不匹配
|
||||
2. **缺失记录**:某个数据源缺少记录
|
||||
3. **多余记录**:某个数据源有多余的记录
|
||||
4. **字段差异**:相同记录的字段值不一致
|
||||
|
||||
### 短信内容
|
||||
|
||||
短信格式(使用验证码模板):
|
||||
```
|
||||
【北京乐航时代科技】您的验证码是2222
|
||||
```
|
||||
|
||||
**说明**:由于使用的是验证码模板,错误代码固定为 `2222`,具体错误详情请查看生成的验证报告文件。
|
||||
|
||||
### 多号码配置
|
||||
|
||||
在 `sms_config.json` 中配置多个手机号:
|
||||
|
||||
```json
|
||||
{
|
||||
"phone_numbers": "13621242430,13800138000,13900139000"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📊 验证报告
|
||||
|
||||
每次验证后会自动生成详细报告,保存在项目根目录:
|
||||
|
||||
```
|
||||
validation_report_20250104_090000.txt
|
||||
```
|
||||
|
||||
报告内容包括:
|
||||
- 顺序验证结果
|
||||
- 交叉验证结果
|
||||
- 数据差异统计
|
||||
- 详细的错误列表
|
||||
|
||||
---
|
||||
|
||||
## ⚙️ 配置说明
|
||||
|
||||
### SMSAlertConfig 类
|
||||
|
||||
在 `data_validation_with_sms.py` 中可以修改配置:
|
||||
|
||||
```python
|
||||
class SMSAlertConfig:
|
||||
# 阿里云访问凭据
|
||||
ACCESS_KEY_ID = os.environ.get('ALIBABA_CLOUD_ACCESS_KEY_ID', '默认值')
|
||||
ACCESS_KEY_SECRET = os.environ.get('ALIBABA_CLOUD_ACCESS_KEY_SECRET', '默认值')
|
||||
|
||||
# 短信签名和模板
|
||||
SIGN_NAME = '北京乐航时代科技'
|
||||
TEMPLATE_CODE = 'SMS_486210104'
|
||||
|
||||
# 接收号码
|
||||
PHONE_NUMBERS = '13621242430'
|
||||
|
||||
# endpoint
|
||||
ENDPOINT = 'dysmsapi.aliyuncs.com'
|
||||
```
|
||||
|
||||
### 环境变量方式(推荐)
|
||||
|
||||
为了安全起见,建议使用环境变量存储敏感信息:
|
||||
|
||||
**Windows PowerShell:**
|
||||
```powershell
|
||||
$env:ALIBABA_CLOUD_ACCESS_KEY_ID="您的AccessKey ID"
|
||||
$env:ALIBABA_CLOUD_ACCESS_KEY_SECRET="您的AccessKey Secret"
|
||||
```
|
||||
|
||||
**Linux/macOS:**
|
||||
```bash
|
||||
export ALIBABA_CLOUD_ACCESS_KEY_ID="您的AccessKey ID"
|
||||
export ALIBABA_CLOUD_ACCESS_KEY_SECRET="您的AccessKey Secret"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔍 故障排查
|
||||
|
||||
### 1. 短信发送失败
|
||||
|
||||
**检查项:**
|
||||
- AccessKey ID 和 Secret 是否正确
|
||||
- 短信签名是否已审核通过
|
||||
- 短信模板是否已审核通过
|
||||
- 账户余额是否充足
|
||||
- 手机号格式是否正确(国内号码11位)
|
||||
|
||||
**查看错误日志:**
|
||||
```bash
|
||||
python data_validation_with_sms.py --test-sms
|
||||
```
|
||||
|
||||
### 2. 数据库连接失败
|
||||
|
||||
检查 `database_config.py` 中的配置是否正确。
|
||||
|
||||
### 3. 验证失败
|
||||
|
||||
查看生成的验证报告文件,了解详细错误信息。
|
||||
|
||||
### 4. 定时任务不执行
|
||||
|
||||
**Windows:**
|
||||
- 检查任务计划程序中任务状态
|
||||
- 查看任务历史记录
|
||||
- 确认Python路径和脚本路径正确
|
||||
|
||||
**Linux/macOS:**
|
||||
- 检查crontab配置:`crontab -l`
|
||||
- 查看系统日志:`grep CRON /var/log/syslog`
|
||||
|
||||
---
|
||||
|
||||
## 📝 日志位置
|
||||
|
||||
- 验证报告:`validation_report_*.txt`
|
||||
- 短信发送日志:控制台输出
|
||||
- 系统日志:根据日志配置(如有)
|
||||
|
||||
---
|
||||
|
||||
## 🔐 安全建议
|
||||
|
||||
1. **不要提交敏感信息到Git**
|
||||
- 将 `sms_config.json` 添加到 `.gitignore`
|
||||
- 使用环境变量存储AccessKey
|
||||
|
||||
2. **定期轮换AccessKey**
|
||||
- 建议每3-6个月更换一次
|
||||
|
||||
3. **使用RAM子账号**
|
||||
- 为短信服务创建专用的RAM子账号
|
||||
- 仅授予必要的短信发送权限
|
||||
|
||||
4. **设置IP白名单**
|
||||
- 在阿里云RAM控制台设置IP访问限制
|
||||
|
||||
---
|
||||
|
||||
## 📞 联系支持
|
||||
|
||||
如有问题,请联系技术支持。
|
||||
|
||||
---
|
||||
|
||||
## 📅 更新日志
|
||||
|
||||
### v1.0.0 (2025-01-04)
|
||||
- ✨ 初始版本
|
||||
- ✅ 集成数据验证功能
|
||||
- ✅ 阿里云短信告警
|
||||
- ✅ 定时任务调度
|
||||
- ✅ 多数据源支持
|
||||
Reference in New Issue
Block a user