feat: 完善代理重试机制,添加数据验证告警,新增README文档

This commit is contained in:
shengyudong@yunqueai.net
2026-01-16 18:36:52 +08:00
parent 322ac74336
commit b518e6aacf
55 changed files with 13202 additions and 34781 deletions

396
README.md Normal file
View File

@@ -0,0 +1,396 @@
# 百家号数据采集与分析系统
## 项目简介
本项目是一个面向百家号平台的自动化数据采集、分析与监控系统,支持多账号管理、定时数据同步、数据验证和短信告警等功能。
## 核心功能
### 1. 数据采集
- **Cookie管理**通过mitmproxy自动捕获账号Cookie支持批量同步至数据库
- **文章抓取**:抓取百家号文章数据,包括标题、内容、发布时间等
- **统计数据获取**:获取发文统计(曝光量、阅读量、点击率)和收入数据
### 2. 数据分析
- **多维度统计**:按日/周/月维度生成统计报表
- **环比计算**:自动计算周环比、月环比增长率
- **数据导出**支持导出为CSV格式便于数据分析
### 3. 数据同步
- **守护进程**systemd服务定时自动同步数据
- **批量导入**:支持历史数据批量导入
- **增量更新**:支持指定日期的增量数据更新
### 4. 数据验证与监控
- **数据一致性验证**校验JSON/CSV/数据库三个数据源的一致性
- **短信告警**集成阿里云短信服务数据异常时自动发送告警错误代码2222
- **验证报告**:生成详细的验证报告,支持保存到专门目录
### 5. 代理管理
- **天启代理集成**支持HTTP代理避免IP限制
- **智能重试机制**
- 同一代理最多尝试3次
- 超时/连接错误立即更换代理
- 最多更换3次代理共尝试4个不同代理
- **错误处理**自动识别errno=10000015异常请求立即更换代理
## 技术栈
- **Python 3.8+**
- **数据库**MySQL 8.0+ (pymysql)
- **网络请求**requests, urllib3
- **抓包工具**mitmproxy 10.0+
- **定时任务**schedule
- **短信服务**阿里云短信SDK (alibabacloud_dysmsapi20170525)
## 项目结构
```
xhh_baijiahao/
├── db/ # 数据库SQL脚本
│ ├── ai_articles.sql # 文章表
│ ├── ai_authors.sql # 作者表
│ ├── ai_statistics_days.sql # 日统计表
│ ├── ai_statistics_weekly.sql # 周统计表
│ └── ai_statistics_monthly.sql # 月统计表
├── ai_sms/ # 阿里云短信服务
│ └── ai_sms/ # 短信SDK示例代码
├── 核心模块
├── bjh_analytics.py # 百家号数据分析API主要
├── bjh_analytics_date.py # 指定日期数据抓取
├── bjh_articles_crawler.py # 文章爬虫
├── export_to_csv.py # 数据导出CSV
├── import_csv_to_database.py # CSV导入数据库
├── Cookie管理
├── mitmproxy_capture.py # mitmproxy Cookie捕获
├── 一键捕获Cookie.py # 快速Cookie捕获工具
├── sync_cookies_to_db.py # 批量Cookie同步
├── add_single_cookie_to_db.py # 单账号Cookie导入
├── add_account_from_cookie.py # 从Cookie添加账号
├── 守护进程与定时任务
├── data_sync_daemon.py # 数据同步守护进程(主要)
├── bjh_data_daemon.py # 备用守护进程
├── bjh_daemon.service # systemd服务配置
├── deploy_daemon.sh # 守护进程部署脚本
├── install_service.sh # 服务安装脚本
├── diagnose_service.sh # 服务诊断脚本
├── 数据验证与告警
├── data_validation.py # 数据验证核心
├── data_validation_with_sms.py # 数据验证+短信告警
├── test_validation_sms.sh # Linux测试脚本
├── test_validation_sms.bat # Windows测试脚本
├── 批量任务
├── batch_import_history.py # 历史数据批量导入
├── fetch_date_statistics.py # 指定日期统计获取
├── update_day_revenue.py # 日收益更新
├── 配置文件
├── database_config.py # 数据库配置
├── log_config.py # 日志配置
├── sms_config.json # 短信服务配置
├── requirements.txt # Python依赖
└── 快捷脚本
├── 一键捕获Cookie.bat # Windows一键Cookie捕获
├── 启动数据同步守护进程.bat # Windows启动守护进程
└── 抓取百家号文章.bat # Windows文章抓取
```
## 快速开始
### 1. 安装依赖
```bash
pip install -r requirements.txt
```
核心依赖:
- `requests>=2.31.0`
- `pymysql>=1.1.0`
- `mitmproxy>=10.0.0`
- `schedule>=1.2.0`
- `python-dateutil>=2.8.0`
### 2. 配置数据库
编辑 `database_config.py`配置MySQL连接信息
```python
DB_CONFIG = {
'host': 'your_host',
'port': 3306,
'user': 'your_user',
'password': 'your_password',
'database': 'ai_article',
'charset': 'utf8mb4'
}
```
### 3. 初始化数据库
执行 `db/` 目录下的SQL脚本创建表
```bash
mysql -u root -p ai_article < db/ai_authors.sql
mysql -u root -p ai_article < db/ai_articles.sql
mysql -u root -p ai_article < db/ai_statistics_days.sql
mysql -u root -p ai_article < db/ai_statistics_weekly.sql
mysql -u root -p ai_article < db/ai_statistics_monthly.sql
```
### 4. 捕获Cookie
#### Windows:
```bash
一键捕获Cookie.bat
```
#### Linux:
```bash
python3 mitmproxy_capture.py
```
### 5. 同步Cookie到数据库
```bash
python3 sync_cookies_to_db.py
```
### 6. 启动数据同步守护进程
#### Linux (推荐使用systemd):
```bash
# 部署服务
sudo bash deploy_daemon.sh
# 启动服务
sudo systemctl start bjh_daemon
# 查看状态
sudo systemctl status bjh_daemon
# 查看日志
journalctl -u bjh_daemon -f
```
#### Windows:
```bash
启动数据同步守护进程.bat
```
#### 手动运行:
```bash
python3 data_sync_daemon.py
```
## 主要功能使用
### 批量导入历史数据
```bash
python3 batch_import_history.py
```
支持交互式选择:
- 账号选择(单个/多个/全部)
- 日期范围设置
- 是否使用代理
- 数据库/文件来源选择
### 获取指定日期统计数据
```bash
python3 fetch_date_statistics.py 2025-12-26
```
### 导出数据为CSV
```bash
python3 export_to_csv.py
```
### 数据验证与短信告警
```bash
# 执行验证
python3 data_validation_with_sms.py
# 测试短信功能
python3 data_validation_with_sms.py --test-sms
```
验证报告保存在 `validation_reports/` 目录。
### 添加单个账号Cookie
```bash
python3 add_single_cookie_to_db.py
```
支持交互式输入:
- Username / 昵称
- App ID / 领域
- Cookie (多种格式)
## 数据库表结构
### ai_authors - 作者表
- `id`: 主键
- `author_name`: 作者名称使用username或nick
- `app_id`: 百家号app_id
- `toutiao_cookie`: Cookie字符串
- `channel`: 渠道1=百家号)
- `status`: 状态active/inactive
### ai_statistics_days - 日统计表
- `author_id`: 作者ID
- `stat_date`: 统计日期
- `day_revenue`: 当日收益
- `daily_published_count`: 当日发文量
- `cumulative_published_count`: 累计发文量
- 唯一键:`uk_author_stat_date(author_id, channel, stat_date)`
### ai_statistics_weekly - 周统计表
- `author_id`: 作者ID
- `stat_weekly`: 周一日期(自然周)
- `weekly_revenue`: 当周收益(从日数据汇总)
- `revenue_wow_growth_rate`: 周环比增长率
### ai_statistics_monthly - 月统计表
- `author_id`: 作者ID
- `stat_monthly`: 每月1日日期
- `monthly_revenue`: 当月收益(从日数据汇总)
- `revenue_mom_growth_rate`: 月环比增长率
## 代理配置
项目支持天启代理API配置在代码中
```python
PROXY_API = "http://api.tianqiip.com/getip?secret=xxx&num=1&type=txt&port=1&mr=1&sign=xxx"
```
代理特性:
- IP白名单认证无需账号密码
- 返回格式:纯文本 `IP:端口`
- 智能重试:超时/连接错误立即更换代理
- 双重限制同一代理最多3次最多更换3次代理
## 短信告警配置
编辑 `sms_config.json`
```json
{
"access_key_id": "your_access_key_id",
"access_key_secret": "your_access_key_secret",
"sign_name": "your_sign_name",
"template_code": "SMS_486210104",
"phone_numbers": "13621242430",
"endpoint": "dysmsapi.aliyuncs.com"
}
```
## 守护进程配置
### systemd服务配置 (bjh_daemon.service)
```ini
[Unit]
Description=百家号数据同步守护进程(含数据验证与短信告警)
After=network.target mysql.service
[Service]
Type=simple
User=root
WorkingDirectory=/root/xhh_baijiahao
ExecStart=/usr/bin/python3 data_sync_daemon.py
Restart=always
Environment="LOAD_FROM_DB=true"
Environment="USE_PROXY=true"
Environment="ENABLE_VALIDATION=true"
Environment="NON_INTERACTIVE=true"
[Install]
WantedBy=multi-user.target
```
### 环境变量配置
- `LOAD_FROM_DB`: 是否从数据库加载Cookie (true/false)
- `USE_PROXY`: 是否使用代理 (true/false)
- `DAYS`: 抓取天数 (默认7)
- `MAX_RETRIES`: 最大重试次数 (默认3)
- `RUN_NOW`: 是否立即执行 (true/false)
- `ENABLE_VALIDATION`: 是否启用验证 (true/false)
- `NON_INTERACTIVE`: 非交互模式 (true/false)
## 日志管理
日志文件位置:
- 守护进程:`logs/data_sync_daemon.log`
- 数据库操作:`logs/database.log`
- Cookie同步`logs/cookie_sync.log`
- 验证报告:`validation_reports/validation_report_YYYYMMDD_HHMMSS.txt`
查看实时日志:
```bash
tail -f logs/data_sync_daemon.log
```
## 常见问题
### 1. Cookie失效
- 症状API返回 `errno=10000015`(异常请求)
- 解决重新捕获Cookie并同步到数据库
### 2. 代理超时
- 症状请求超时15秒
- 解决系统自动更换新代理最多尝试4个不同代理
### 3. 数据验证失败
- 症状短信收到错误代码2222
- 解决:查看 `validation_reports/` 中的详细报告
### 4. 守护进程停止
- 诊断:`sudo bash diagnose_service.sh`
- 重启:`sudo systemctl restart bjh_daemon`
## 开发说明
### 添加新账号
1. 使用 `一键捕获Cookie.py` 捕获Cookie
2. 运行 `sync_cookies_to_db.py` 同步到数据库
3. 或使用 `add_single_cookie_to_db.py` 手动添加
### 修改统计维度
- 日统计:修改 `ai_statistics_days` 表结构
- 周统计:修改 `ai_statistics_weekly` 表结构
- 月统计:修改 `ai_statistics_monthly` 表结构
### 自定义代理
修改 `bjh_analytics.py``bjh_analytics_date.py` 中的代理获取逻辑:
```python
def fetch_proxy(self, force_new: bool = False):
# 自定义代理获取逻辑
pass
```
## 贡献指南
欢迎提交Issue和Pull Request
## 许可证
本项目仅供学习和研究使用。
## 联系方式
如有问题请通过Issue反馈。

440
add_single_cookie_to_db.py Normal file
View File

@@ -0,0 +1,440 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
将单个账号的Cookie输入到MySQL数据库
支持手动输入Cookie信息或从剪贴板粘贴
"""
import json
import sys
import os
from datetime import datetime
from typing import Dict, Optional
# 导入统一的数据库管理器和日志配置
from database_config import DatabaseManager, DB_CONFIG
from log_config import setup_cookie_sync_logger
# 初始化日志记录器
logger = setup_cookie_sync_logger()
# 设置UTF-8编码
if sys.platform == 'win32':
import io
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')
sys.stderr = io.TextIOWrapper(sys.stderr.buffer, encoding='utf-8')
class SingleCookieToDB:
"""单个Cookie同步到数据库"""
def __init__(self, db_config: Optional[Dict] = None):
"""
初始化数据库连接
Args:
db_config: 数据库配置字典默认使用database_config.DB_CONFIG
"""
self.script_dir = os.path.dirname(os.path.abspath(__file__))
# 使用统一的数据库管理器
self.db_manager = DatabaseManager(db_config)
self.db_config = self.db_manager.config
def connect_db(self) -> bool:
"""连接数据库"""
return self.db_manager.test_connection()
def close_db(self):
"""关闭数据库连接"""
print("[OK] 数据库操作完成")
def cookie_dict_to_string(self, cookies: Dict) -> str:
"""
将Cookie字典转换为字符串格式
Args:
cookies: Cookie字典
Returns:
Cookie字符串格式: "key1=value1; key2=value2"
"""
return '; '.join([f"{k}={v}" for k, v in cookies.items()])
def cookie_string_to_dict(self, cookie_string: str) -> Dict:
"""
将Cookie字符串转换为字典格式
Args:
cookie_string: Cookie字符串格式: "key1=value1; key2=value2"
Returns:
Cookie字典
"""
cookies = {}
for item in cookie_string.split(';'):
item = item.strip()
if '=' in item:
key, value = item.split('=', 1)
cookies[key.strip()] = value.strip()
return cookies
def find_author_by_name(self, author_name: str, channel: int = 1) -> Optional[Dict]:
"""
根据作者名称和渠道查找数据库记录
Args:
author_name: 作者名称
channel: 渠道1=百家号默认1
Returns:
作者记录字典未找到返回None
"""
try:
sql = "SELECT * FROM ai_authors WHERE author_name = %s AND channel = %s LIMIT 1"
result = self.db_manager.execute_query(sql, (author_name, channel), fetch_one=True)
return result
except Exception as e:
print(f"[X] 查询作者失败: {e}")
return None
def update_author_cookie(self, author_id: int, cookie_string: str,
app_id: Optional[str] = None) -> bool:
"""
更新作者的Cookie信息
Args:
author_id: 作者ID
cookie_string: Cookie字符串
app_id: 百家号app_id可选
Returns:
是否更新成功
"""
try:
# 构建更新SQL
update_fields = ["toutiao_cookie = %s", "updated_at = NOW()"]
params = [cookie_string]
# 如果提供了app_id也一并更新
if app_id:
update_fields.append("app_id = %s")
params.append(app_id)
params.append(author_id)
sql = f"UPDATE ai_authors SET {', '.join(update_fields)} WHERE id = %s"
self.db_manager.execute_update(sql, tuple(params))
logger.info(f"成功更新作者ID={author_id}的Cookie")
return True
except Exception as e:
logger.error(f"更新Cookie失败: {e}", exc_info=True)
print(f"[X] 更新Cookie失败: {e}")
return False
def insert_new_author(self, author_name: str, cookie_string: str,
app_id: Optional[str] = None, nick: Optional[str] = None,
domain: Optional[str] = None) -> bool:
"""
插入新作者记录
Args:
author_name: 作者名称用于数据库author_name字段
cookie_string: Cookie字符串
app_id: 百家号app_id
nick: 昵称
domain: 领域
Returns:
是否插入成功
"""
try:
# 构建插入SQL
sql = """
INSERT INTO ai_authors
(author_name, app_id, app_token, department_id, department_name,
department, toutiao_cookie, channel, status, created_at, updated_at)
VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, NOW(), NOW())
"""
# 参数
params = (
author_name,
app_id or '',
'', # app_token 暂时为空
0, # department_id 默认0
domain or '其它', # department_name 使用领域
'', # department 暂时为空
cookie_string,
1, # channel: 1=baidu
'active' # status
)
self.db_manager.execute_update(sql, params)
logger.info(f"成功创建新作者: {author_name}")
return True
except Exception as e:
logger.error(f"插入新作者失败: {e}", exc_info=True)
print(f"[X] 插入新作者失败: {e}")
return False
def add_cookie(self, account_info: Dict, auto_create: bool = True) -> bool:
"""
添加单个账号的Cookie到数据库
Args:
account_info: 账号信息字典包含cookies、username、nick等字段
auto_create: 当作者不存在时是否自动创建默认True
Returns:
是否添加成功
"""
# 提取Cookie信息
cookies = account_info.get('cookies', {})
if not cookies:
print("[X] Cookie信息为空")
return False
# 转换Cookie为字符串如果是字典格式
if isinstance(cookies, dict):
cookie_string = self.cookie_dict_to_string(cookies)
else:
cookie_string = str(cookies)
# 提取其他信息使用username和nick作为author_name进行匹配
username = account_info.get('username', '').strip()
nick = account_info.get('nick', '').strip()
app_id = account_info.get('app_id', '').strip()
domain = account_info.get('domain', '').strip()
# 验证username或nick至少有一个存在
if not username and not nick:
print("[X] username和nick至少需要提供一个")
return False
print(f"\n账号信息:")
print(f" Username: {username}")
print(f" 昵称: {nick}")
print(f" App ID: {app_id}")
print(f" 领域: {domain}")
# 查找作者使用双重匹配机制先username后nick
channel = 1 # 百家号固定为channel=1
author = None
matched_field = None
# 1. 首先尝试使用username匹配
if username:
author = self.find_author_by_name(username, channel)
if author:
matched_field = 'username'
print(f"\n[√] 通过username匹配到作者: {author['author_name']} (ID: {author['id']}, Channel: {author['channel']})")
# 2. 如果username匹配失败尝试使用nick匹配
if not author and nick:
author = self.find_author_by_name(nick, channel)
if author:
matched_field = 'nick'
print(f"\n[√] 通过nick匹配到作者: {author['author_name']} (ID: {author['id']}, Channel: {author['channel']})")
# 3. 如果都没匹配到
if not author:
print(f"\n[!] 未找到匹配的作者已尝试username和nick")
# 更新或创建
if author:
# 更新现有记录
print(f"\n正在更新作者Cookie...")
success = self.update_author_cookie(
author['id'],
cookie_string,
app_id if app_id else None
)
if success:
print(f"[OK] Cookie已更新匹配字段: {matched_field}")
return True
else:
print(f"[X] Cookie更新失败")
return False
else:
# 作者不存在,考虑创建
if auto_create:
# 优先使用username如果没有则使用nick
author_name_to_create = username if username else nick
print(f"\n正在创建新作者author_name: {author_name_to_create}...")
success = self.insert_new_author(
author_name_to_create,
cookie_string,
app_id,
nick,
domain
)
if success:
print(f"[OK] 新作者已创建 (author_name: {author_name_to_create})")
return True
else:
print(f"[X] 创建作者失败")
return False
else:
print(f"[X] 作者不存在,且未开启自动创建")
return False
def run_interactive(self):
"""交互式运行模式"""
print("\n" + "="*70)
print("添加单个账号Cookie到数据库")
print("="*70)
# 连接数据库
if not self.connect_db():
logger.error("数据库连接失败,退出")
return
try:
# 询问是否自动创建不存在的作者
print("\n当作者不存在时是否自动创建?")
auto_create_input = input("(y/n, 默认y): ").strip().lower()
auto_create = auto_create_input != 'n'
# 输入账号信息
print("\n" + "="*70)
print("请输入账号信息:")
print("="*70)
username = input("\n1. Username (用于匹配数据库author_name): ").strip()
nick = input("2. 昵称 (备用匹配字段): ").strip()
app_id = input("3. App ID (可选): ").strip()
domain = input("4. 领域 (可选): ").strip()
# 输入Cookie
print("\n" + "="*70)
print("请输入Cookie信息:")
print("提示: 可以输入以下任意格式:")
print(" 1. Cookie字符串: key1=value1; key2=value2")
print(" 2. JSON格式: {\"key1\": \"value1\", \"key2\": \"value2\"}")
print(" 3. 多行输入,输入完成后输入 END 结束")
print("="*70)
cookie_lines = []
while True:
line = input().strip()
if line.upper() == 'END':
break
if line:
cookie_lines.append(line)
cookie_input = ' '.join(cookie_lines)
# 解析Cookie
cookies = {}
if cookie_input.startswith('{'):
# JSON格式
try:
cookies = json.loads(cookie_input)
except json.JSONDecodeError:
print("[X] Cookie JSON格式解析失败")
return
else:
# 字符串格式
cookies = self.cookie_string_to_dict(cookie_input)
if not cookies:
print("[X] Cookie为空操作取消")
return
# 构建账号信息
account_info = {
'username': username,
'nick': nick,
'app_id': app_id,
'domain': domain,
'cookies': cookies
}
# 确认信息
print("\n" + "="*70)
print("确认账号信息:")
print("="*70)
print(f" Username: {username}")
print(f" 昵称: {nick}")
print(f" App ID: {app_id}")
print(f" 领域: {domain}")
print(f" Cookie条目数: {len(cookies)}")
print(f" 自动创建: {'' if auto_create else ''}")
print("="*70)
confirm = input("\n确认添加到数据库?(y/n): ").strip().lower()
if confirm != 'y':
print("\n已取消")
return
# 添加Cookie
success = self.add_cookie(account_info, auto_create)
if success:
print("\n" + "="*70)
print("添加成功!")
print("="*70)
else:
print("\n" + "="*70)
print("添加失败,请查看错误信息")
print("="*70)
finally:
# 关闭数据库连接
self.close_db()
def main():
"""主函数"""
print("\n" + "="*70)
print("单个账号Cookie同步工具")
print("="*70)
# 使用默认配置还是自定义配置
print("\n请选择数据库配置方式:")
print(" 1. 使用默认配置 (8.149.233.36/ai_statistics_read)")
print(" 2. 自定义配置")
choice = input("\n请选择 (1/2, 默认1): ").strip() or '1'
if choice == '2':
# 自定义数据库配置
print("\n请输入数据库连接信息:\n")
host = input("数据库地址: ").strip()
port = input("端口 (默认: 3306): ").strip() or '3306'
user = input("用户名: ").strip()
password = input("密码: ").strip()
database = input("数据库名: ").strip()
db_config = {
'host': host,
'port': int(port),
'user': user,
'password': password,
'database': database,
'charset': 'utf8mb4'
}
else:
# 使用默认配置
db_config = None
print("\n使用默认数据库配置...")
# 创建同步器并执行
syncer = SingleCookieToDB(db_config)
print("\n配置确认:")
print(f" 数据库: {syncer.db_config['host']}:{syncer.db_config.get('port', 3306)}/{syncer.db_config['database']}")
print(f" 用户: {syncer.db_config['user']}")
print("="*70)
# 运行交互式模式
syncer.run_interactive()
if __name__ == '__main__':
main()

View File

@@ -0,0 +1,6 @@
# Alibaba Cloud Access Credentials
# 请在阿里云控制台获取您的 AccessKey ID 和 AccessKey Secret
# https://ram.console.aliyun.com/manage/ak
ALIBABA_CLOUD_ACCESS_KEY_ID=your_access_key_id_here
ALIBABA_CLOUD_ACCESS_KEY_SECRET=your_access_key_secret_here

5
ai_sms/ai_sms/.gitignore vendored Normal file
View File

@@ -0,0 +1,5 @@
runtime/
.idea/
.vscode/
__pycache__/
.pytest_cache/

57
ai_sms/ai_sms/README.md Normal file
View File

@@ -0,0 +1,57 @@
# 发送短信验证码完整工程示例
该项目为SendSmsVerifyCode的完整工程示例。
**工程代码建议使用更安全的无AK方式凭据配置方式请参阅[管理访问凭据](https://help.aliyun.com/zh/sdk/developer-reference/v2-manage-python-access-credentials)。**
## 运行条件
- 下载并解压需要语言的代码;
- *要求 Python >= 3.7*
## 执行步骤
完成凭据配置后,可以在**解压代码所在目录下**按如下的步骤执行:
- **创建并激活虚拟环境:**
```sh
python -m venv venv && source venv/bin/activate
```
- **安装依赖:**
```sh
pip install -r requirements.txt
```
- **运行代码**
```sh
python ./alibabacloud_sample/sample.py
```
## 使用的 API
- SendSmsVerifyCode发送短信验证码。 更多信息可参考:[文档](https://next.api.aliyun.com/document/Dypnsapi/2017-05-25/SendSmsVerifyCode)
## API 返回示例
*下列输出值仅作为参考,实际输出结构可能稍有不同,以实际调用为准。*
- JSON 格式
```js
{
"AccessDeniedDetail": "无",
"Message": "成功 ",
"RequestId": "CC3BB6D2-2FDF-4321-9DCE-B38165CE4C47",
"Model": {
"VerifyCode": "4232",
"RequestId": "a3671ccf-0102-4c8e-8797-a3678e091d09",
"OutId": "1231231313",
"BizId": "112231421412414124123^4"
},
"Code": "OK",
"Success": true
}
```

View File

@@ -0,0 +1,94 @@
# 阿里云短信验证码 API 运行说明
## 项目配置完成情况
✅ 依赖已安装: `alibabacloud_dypnsapi20170525==2.0.0`
✅ 代码已配置: 手机号码 `13621242430`,验证码长度 4 位
✅ Conda 环境: `douyin`
## 运行步骤
### 1. 配置阿里云访问凭据
您需要先获取阿里云的 AccessKey ID 和 AccessKey Secret:
- 访问: https://ram.console.aliyun.com/manage/ak
- 创建或查看您的 AccessKey
### 2. 设置环境变量 (PowerShell)
在运行程序前,需要设置环境变量:
```powershell
$env:ALIBABA_CLOUD_ACCESS_KEY_ID="your_access_key_id"
$env:ALIBABA_CLOUD_ACCESS_KEY_SECRET="your_access_key_secret"
```
### 3. 运行程序
```powershell
python ./alibabacloud_sample/sample.py
```
## 完整运行示例 (PowerShell)
```powershell
# 激活 conda 环境
conda activate douyin
# 设置凭据
$env:ALIBABA_CLOUD_ACCESS_KEY_ID="your_access_key_id"
$env:ALIBABA_CLOUD_ACCESS_KEY_SECRET="your_access_key_secret"
# 运行程序
python ./alibabacloud_sample/sample.py
```
## 代码说明
### 当前配置参数
`sample.py` 已配置以下参数:
```python
send_sms_verify_code_request = dypnsapi_20170525_models.SendSmsVerifyCodeRequest(
phone_number='13621242430', # 接收短信的手机号码
code_length=4, # 验证码长度 (1314 是 4 位)
code_type=1 # 验证码类型: 1=数字
)
```
### API 返回示例
成功时返回:
```json
{
"Code": "OK",
"Success": true,
"Message": "成功",
"Model": {
"VerifyCode": "1234",
"BizId": "...",
"RequestId": "..."
}
}
```
## 注意事项
1. **安全性**: 不要将 AccessKey 硬编码在代码中或提交到版本控制系统
2. **权限**: 确保您的 AccessKey 有发送短信的权限
3. **费用**: 发送短信会产生费用,请注意阿里云账户余额
4. **签名和模板**: 某些地区可能需要配置短信签名和模板
## 故障排查
如果遇到错误:
- 检查 AccessKey 是否正确
- 检查账户余额是否充足
- 检查手机号码格式是否正确
- 查看错误信息中的诊断地址 (Recommend)
## API 文档
- SendSmsVerifyCode API: https://next.api.aliyun.com/document/Dypnsapi/2017-05-25/SendSmsVerifyCode
- 凭据管理: https://help.aliyun.com/zh/sdk/developer-reference/v2-manage-python-access-credentials

View File

@@ -0,0 +1,79 @@
# 开始使用 - 3步运行
## ⚠️ 您的错误原因
**环境变量未设置!** 错误信息显示:
```
Environment variable accessKeyId cannot be empty
```
## ✅ 解决方案(选一个)
### 方案 1: 最简单 - 使用测试脚本
```powershell
# 第1步: 测试凭据配置
python test_credentials.py
# 第2步: 如果提示未设置,在 PowerShell 中运行:
$env:ALIBABA_CLOUD_ACCESS_KEY_ID="您的AccessKey_ID"
$env:ALIBABA_CLOUD_ACCESS_KEY_SECRET="您的AccessKey_Secret"
# 第3步: 再次测试
python test_credentials.py
# 第4步: 测试通过后运行
python ./alibabacloud_sample/sample.py
```
### 方案 2: 一键运行(交互式)
```powershell
python run_with_credentials.py
# 脚本会提示您输入 AccessKey然后自动运行
```
### 方案 3: 手动设置后运行
```powershell
# 设置环境变量
$env:ALIBABA_CLOUD_ACCESS_KEY_ID="LTAI5t..."
$env:ALIBABA_CLOUD_ACCESS_KEY_SECRET="your_secret_here"
# 直接运行
python ./alibabacloud_sample/sample.py
```
## 🔑 获取 AccessKey
https://ram.console.aliyun.com/manage/ak
## 🔧 已修复的问题
1. ✅ 代理超时问题(已禁用 ECS 元数据代理)
2. ✅ 错误处理问题(已修复 AttributeError
3. ✅ 手机号已配置: 13621242430
4. ✅ 验证码长度: 4位数字
## 📝 当前配置
- 手机号: `13621242430`
- 验证码: 4位数字API自动生成不能指定为1314
- 环境: conda douyin
- 依赖: 已安装
## ⚡ 快速测试流程
```powershell
# 1. 测试凭据
python test_credentials.py
# 2. 如果失败,设置环境变量
$env:ALIBABA_CLOUD_ACCESS_KEY_ID="your_key"
$env:ALIBABA_CLOUD_ACCESS_KEY_SECRET="your_secret"
# 3. 运行程序
python ./alibabacloud_sample/sample.py
```
**现在就试试 `python test_credentials.py`**

View File

@@ -0,0 +1 @@
__version__ = "1.0.0"

View File

@@ -0,0 +1,73 @@
# -*- coding: utf-8 -*-
# This file is auto-generated, don't edit it. Thanks.
import os
import sys
import json
from typing import List
from alibabacloud_dypnsapi20170525.client import Client as Dypnsapi20170525Client
from alibabacloud_credentials.client import Client as CredentialClient
from alibabacloud_tea_openapi import models as open_api_models
from alibabacloud_dypnsapi20170525 import models as dypnsapi_20170525_models
from alibabacloud_tea_util import models as util_models
from alibabacloud_tea_util.client import Client as UtilClient
class Sample:
def __init__(self):
pass
@staticmethod
def create_client() -> Dypnsapi20170525Client:
"""
使用凭据初始化账号Client
@return: Client
@throws Exception
"""
# 工程代码建议使用更安全的无AK方式凭据配置方式请参见https://help.aliyun.com/document_detail/378659.html。
credential = CredentialClient()
config = open_api_models.Config(
credential=credential
)
# Endpoint 请参考 https://api.aliyun.com/product/Dypnsapi
config.endpoint = f'dypnsapi.aliyuncs.com'
return Dypnsapi20170525Client(config)
@staticmethod
def main(
args: List[str],
) -> None:
client = Sample.create_client()
send_sms_verify_code_request = dypnsapi_20170525_models.SendSmsVerifyCodeRequest()
runtime = util_models.RuntimeOptions()
try:
resp = client.send_sms_verify_code_with_options(send_sms_verify_code_request, runtime)
print(json.dumps(resp, default=str, indent=2))
except Exception as error:
# 此处仅做打印展示,请谨慎对待异常处理,在工程项目中切勿直接忽略异常。
# 错误 message
print(error.message)
# 诊断地址
print(error.data.get("Recommend"))
@staticmethod
async def main_async(
args: List[str],
) -> None:
client = Sample.create_client()
send_sms_verify_code_request = dypnsapi_20170525_models.SendSmsVerifyCodeRequest()
runtime = util_models.RuntimeOptions()
try:
resp = await client.send_sms_verify_code_with_options_async(send_sms_verify_code_request, runtime)
print(json.dumps(resp, default=str, indent=2))
except Exception as error:
# 此处仅做打印展示,请谨慎对待异常处理,在工程项目中切勿直接忽略异常。
# 错误 message
print(error.message)
# 诊断地址
print(error.data.get("Recommend"))
if __name__ == '__main__':
Sample.main(sys.argv[1:])

View File

@@ -0,0 +1,87 @@
# -*- coding: utf-8 -*-
# This file is auto-generated, don't edit it. Thanks.
import os
import sys
import json
from typing import List
from alibabacloud_dypnsapi20170525.client import Client as Dypnsapi20170525Client
from alibabacloud_credentials.client import Client as CredentialClient
from alibabacloud_credentials.models import Config as CredentialConfig
from alibabacloud_tea_openapi import models as open_api_models
from alibabacloud_dypnsapi20170525 import models as dypnsapi_20170525_models
from alibabacloud_tea_util import models as util_models
from alibabacloud_tea_util.client import Client as UtilClient
class Sample:
def __init__(self):
pass
@staticmethod
def create_client() -> Dypnsapi20170525Client:
"""
使用凭据初始化账号Client
@return: Client
@throws Exception
"""
credential_config = CredentialConfig(
type='access_key',
access_key_id='LTAI5tSMvnCJdqkZtCVWgh8R',
access_key_secret='nyFzXyIi47peVLK4wR2qqbPezmU79W'
)
credential = CredentialClient(credential_config)
config = open_api_models.Config(
credential=credential
)
# Endpoint 请参考 https://api.aliyun.com/product/Dypnsapi
config.endpoint = f'dypnsapi.aliyuncs.com'
return Dypnsapi20170525Client(config)
@staticmethod
def main(
args: List[str],
) -> None:
client = Sample.create_client()
send_sms_verify_code_request = dypnsapi_20170525_models.SendSmsVerifyCodeRequest(
phone_number='13621242430',
sign_name='阿里云短信',
template_code='SMS_474580174',
template_param='{}',
code_length=4,
code_type=1
)
runtime = util_models.RuntimeOptions()
try:
resp = client.send_sms_verify_code_with_options(send_sms_verify_code_request, runtime)
print(json.dumps(resp, default=str, indent=2))
except Exception as error:
# 此处仅做打印展示,请谨慎对待异常处理,在工程项目中切勿直接忽略异常。
# 错误 message
print(f"错误: {error}")
# 诊断地址
if hasattr(error, 'data') and error.data:
print(f"诊断地址: {error.data.get('Recommend')}")
raise
@staticmethod
async def main_async(
args: List[str],
) -> None:
client = Sample.create_client()
send_sms_verify_code_request = dypnsapi_20170525_models.SendSmsVerifyCodeRequest()
runtime = util_models.RuntimeOptions()
try:
resp = await client.send_sms_verify_code_with_options_async(send_sms_verify_code_request, runtime)
print(json.dumps(resp, default=str, indent=2))
except Exception as error:
# 此处仅做打印展示,请谨慎对待异常处理,在工程项目中切勿直接忽略异常。
# 错误 message
print(error.message)
# 诊断地址
print(error.data.get("Recommend"))
if __name__ == '__main__':
Sample.main(sys.argv[1:])

View File

@@ -0,0 +1,87 @@
# -*- coding: utf-8 -*-
# This file is auto-generated, don't edit it. Thanks.
import os
import sys
import json
from typing import List
from alibabacloud_dypnsapi20170525.client import Client as Dypnsapi20170525Client
from alibabacloud_credentials.client import Client as CredentialClient
from alibabacloud_credentials.models import Config as CredentialConfig
from alibabacloud_tea_openapi import models as open_api_models
from alibabacloud_dypnsapi20170525 import models as dypnsapi_20170525_models
from alibabacloud_tea_util import models as util_models
from alibabacloud_tea_util.client import Client as UtilClient
class Sample:
def __init__(self):
pass
@staticmethod
def create_client() -> Dypnsapi20170525Client:
"""
使用凭据初始化账号Client
@return: Client
@throws Exception
"""
credential_config = CredentialConfig(
type='access_key',
access_key_id='LTAI5tPnLdDkvSxrVJfRZMCn',
access_key_secret='AII2A8hgfxXWM1xYqeuNwnS61AErDz'
)
credential = CredentialClient(credential_config)
config = open_api_models.Config(
credential=credential
)
# Endpoint 请参考 https://api.aliyun.com/product/Dypnsapi
config.endpoint = f'dypnsapi.aliyuncs.com'
return Dypnsapi20170525Client(config)
@staticmethod
def main(
args: List[str],
) -> None:
client = Sample.create_client()
send_sms_verify_code_request = dypnsapi_20170525_models.SendSmsVerifyCodeRequest(
phone_number='13621242430',
sign_name='阿里云短信',
template_code='SMS_474580174',
template_param='{}',
code_length=4,
code_type=1
)
runtime = util_models.RuntimeOptions()
try:
resp = client.send_sms_verify_code_with_options(send_sms_verify_code_request, runtime)
print(json.dumps(resp, default=str, indent=2))
except Exception as error:
# 此处仅做打印展示,请谨慎对待异常处理,在工程项目中切勿直接忽略异常。
# 错误 message
print(f"错误: {error}")
# 诊断地址
if hasattr(error, 'data') and error.data:
print(f"诊断地址: {error.data.get('Recommend')}")
raise
@staticmethod
async def main_async(
args: List[str],
) -> None:
client = Sample.create_client()
send_sms_verify_code_request = dypnsapi_20170525_models.SendSmsVerifyCodeRequest()
runtime = util_models.RuntimeOptions()
try:
resp = await client.send_sms_verify_code_with_options_async(send_sms_verify_code_request, runtime)
print(json.dumps(resp, default=str, indent=2))
except Exception as error:
# 此处仅做打印展示,请谨慎对待异常处理,在工程项目中切勿直接忽略异常。
# 错误 message
print(error.message)
# 诊断地址
print(error.data.get("Recommend"))
if __name__ == '__main__':
Sample.main(sys.argv[1:])

View File

@@ -0,0 +1,90 @@
# -*- coding: utf-8 -*-
# This file is auto-generated, don't edit it. Thanks.
import os
import sys
import json
from typing import List
from alibabacloud_dysmsapi20170525.client import Client as Dysmsapi20170525Client
from alibabacloud_credentials.client import Client as CredentialClient
from alibabacloud_credentials.models import Config as CredentialConfig
from alibabacloud_tea_openapi import models as open_api_models
from alibabacloud_dysmsapi20170525 import models as dysmsapi_20170525_models
from alibabacloud_tea_util import models as util_models
from alibabacloud_tea_util.client import Client as UtilClient
class Sample:
def __init__(self):
pass
@staticmethod
def create_client() -> Dysmsapi20170525Client:
"""
使用凭据初始化账号Client
@return: Client
@throws Exception
"""
credential_config = CredentialConfig(
type='access_key',
access_key_id='LTAI5tSMvnCJdqkZtCVWgh8R',
access_key_secret='nyFzXyIi47peVLK4wR2qqbPezmU79W'
)
credential = CredentialClient(credential_config)
config = open_api_models.Config(
credential=credential
)
# Endpoint 请参考 https://api.aliyun.com/product/Dysmsapi
config.endpoint = f'dysmsapi.aliyuncs.com'
return Dysmsapi20170525Client(config)
@staticmethod
def main(
args: List[str],
) -> None:
client = Sample.create_client()
send_sms_request = dysmsapi_20170525_models.SendSmsRequest(
phone_numbers='13621242430',
sign_name='北京乐航时代科技',
template_code='SMS_486210104',
template_param=json.dumps({"code": "1314"})
)
runtime = util_models.RuntimeOptions()
try:
resp = client.send_sms_with_options(send_sms_request, runtime)
print(json.dumps(resp.to_map(), default=str, indent=2))
except Exception as error:
# 此处仅做打印展示,请谨慎对待异常处理,在工程项目中切勿直接忽略异常。
# 错误 message
print(f"错误: {error}")
# 诊断地址
if hasattr(error, 'data') and error.data:
print(f"诊断地址: {error.data.get('Recommend')}")
raise
@staticmethod
async def main_async(
args: List[str],
) -> None:
client = Sample.create_client()
send_sms_request = dysmsapi_20170525_models.SendSmsRequest(
phone_numbers='13621242430',
sign_name='北京乐航时代科技',
template_code='SMS_486210104',
template_param=json.dumps({"code": "1314"})
)
runtime = util_models.RuntimeOptions()
try:
resp = await client.send_sms_with_options_async(send_sms_request, runtime)
print(json.dumps(resp.to_map(), default=str, indent=2))
except Exception as error:
# 此处仅做打印展示,请谨慎对待异常处理,在工程项目中切勿直接忽略异常。
# 错误 message
print(error.message)
# 诊断地址
print(error.data.get("Recommend"))
if __name__ == '__main__':
Sample.main(sys.argv[1:])

View File

@@ -0,0 +1 @@
alibabacloud_dypnsapi20170525==2.0.0

View File

@@ -0,0 +1,63 @@
# -*- coding: utf-8 -*-
"""
快速运行脚本 - 设置凭据并运行 SMS 示例
使用方法: python run_with_credentials.py
"""
import os
import sys
import subprocess
def main():
print("=" * 60)
print("阿里云短信验证码 API - 快速运行")
print("=" * 60)
print()
# 检查环境变量
access_key_id = os.environ.get('ALIBABA_CLOUD_ACCESS_KEY_ID')
access_key_secret = os.environ.get('ALIBABA_CLOUD_ACCESS_KEY_SECRET')
if not access_key_id or not access_key_secret:
print("⚠️ 未检测到环境变量,请输入您的阿里云凭据:")
print(" (获取地址: https://ram.console.aliyun.com/manage/ak)")
print()
access_key_id = input("AccessKey ID: ").strip()
access_key_secret = input("AccessKey Secret: ").strip()
if not access_key_id or not access_key_secret:
print()
print("❌ 错误: AccessKey 不能为空!")
sys.exit(1)
# 设置环境变量
os.environ['ALIBABA_CLOUD_ACCESS_KEY_ID'] = access_key_id
os.environ['ALIBABA_CLOUD_ACCESS_KEY_SECRET'] = access_key_secret
print()
print("✅ 凭据已设置")
else:
print(f"✅ 检测到环境变量:")
print(f" ALIBABA_CLOUD_ACCESS_KEY_ID: {access_key_id[:8]}...")
print(f" ALIBABA_CLOUD_ACCESS_KEY_SECRET: {'*' * 20}")
print()
print("=" * 60)
print("正在运行 SMS 示例...")
print("=" * 60)
print()
# 运行示例
try:
# 使用当前环境运行
result = subprocess.run(
[sys.executable, './alibabacloud_sample/sample.py'],
env=os.environ.copy(),
capture_output=False
)
sys.exit(result.returncode)
except Exception as e:
print(f"❌ 运行失败: {e}")
sys.exit(1)
if __name__ == '__main__':
main()

View File

@@ -0,0 +1,45 @@
# PowerShell 脚本:设置阿里云凭据环境变量
# 使用方法: .\set_credentials.ps1
Write-Host "=" -NoNewline -ForegroundColor Cyan
Write-Host ("=" * 59) -ForegroundColor Cyan
Write-Host "阿里云 SMS API 凭据配置" -ForegroundColor Yellow
Write-Host "=" -NoNewline -ForegroundColor Cyan
Write-Host ("=" * 59) -ForegroundColor Cyan
Write-Host ""
# 提示用户输入凭据
Write-Host "请输入您的阿里云访问凭据:" -ForegroundColor Green
Write-Host "(可以在 https://ram.console.aliyun.com/manage/ak 获取)" -ForegroundColor Gray
Write-Host ""
$accessKeyId = Read-Host "AccessKey ID"
$accessKeySecret = Read-Host "AccessKey Secret" -AsSecureString
$accessKeySecretPlain = [Runtime.InteropServices.Marshal]::PtrToStringAuto(
[Runtime.InteropServices.Marshal]::SecureStringToBSTR($accessKeySecret)
)
if ([string]::IsNullOrWhiteSpace($accessKeyId) -or [string]::IsNullOrWhiteSpace($accessKeySecretPlain)) {
Write-Host ""
Write-Host "错误: AccessKey ID 和 AccessKey Secret 不能为空!" -ForegroundColor Red
exit 1
}
# 设置环境变量
$env:ALIBABA_CLOUD_ACCESS_KEY_ID = $accessKeyId
$env:ALIBABA_CLOUD_ACCESS_KEY_SECRET = $accessKeySecretPlain
Write-Host ""
Write-Host "=" -NoNewline -ForegroundColor Cyan
Write-Host ("=" * 59) -ForegroundColor Cyan
Write-Host "凭据配置成功!" -ForegroundColor Green
Write-Host "=" -NoNewline -ForegroundColor Cyan
Write-Host ("=" * 59) -ForegroundColor Cyan
Write-Host ""
Write-Host "环境变量已设置 (当前 PowerShell 会话有效):" -ForegroundColor Yellow
Write-Host "ALIBABA_CLOUD_ACCESS_KEY_ID: $($accessKeyId.Substring(0, [Math]::Min(8, $accessKeyId.Length)))..." -ForegroundColor Gray
Write-Host "ALIBABA_CLOUD_ACCESS_KEY_SECRET: ********************" -ForegroundColor Gray
Write-Host ""
Write-Host "现在可以运行程序:" -ForegroundColor Green
Write-Host "python .\alibabacloud_sample\sample.py" -ForegroundColor Cyan
Write-Host ""

76
ai_sms/ai_sms/setup.py Normal file
View File

@@ -0,0 +1,76 @@
# -*- coding: utf-8 -*-
"""
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
"""
import os
from setuptools import setup, find_packages
"""
setup module for alibabacloud_sample.
Created on 31/12/2025
@author:
"""
PACKAGE = "alibabacloud_sample"
NAME = "alibabacloud_sample" or "alibabacloud-package"
DESCRIPTION = "Alibaba Cloud SDK Code Sample Library for Python"
AUTHOR = ""
AUTHOR_EMAIL = ""
URL = "https://github.com/aliyun/alibabacloud-sdk"
VERSION = __import__(PACKAGE).__version__
REQUIRES = [
"alibabacloud_dypnsapi20170525>=2.0.0, <3.0.0",
]
LONG_DESCRIPTION = ''
if os.path.exists('./README.md'):
with open("README.md", encoding='utf-8') as fp:
LONG_DESCRIPTION = fp.read()
setup(
name=NAME,
version=VERSION,
description=DESCRIPTION,
long_description=LONG_DESCRIPTION,
long_description_content_type='text/markdown',
author=AUTHOR,
author_email=AUTHOR_EMAIL,
license="Apache License 2.0",
url=URL,
keywords=["alibabacloud","sample"],
packages=find_packages(exclude=["tests*"]),
include_package_data=True,
platforms="any",
install_requires=REQUIRES,
python_requires=">=3.6",
classifiers=(
"Development Status :: 4 - Beta",
"Intended Audience :: Developers",
"License :: OSI Approved :: Apache Software License",
"Programming Language :: Python",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.6",
'Programming Language :: Python :: 3.7',
'Programming Language :: Python :: 3.8',
'Programming Language :: Python :: 3.9',
"Topic :: Software Development"
)
)

View File

@@ -0,0 +1,41 @@
# -*- coding: utf-8 -*-
import os
import sys
def setup_credentials():
"""
设置阿里云访问凭据的环境变量
"""
print("=" * 60)
print("阿里云 SMS API 凭据配置")
print("=" * 60)
print("\n请输入您的阿里云访问凭据:")
print("(可以在 https://ram.console.aliyun.com/manage/ak 获取)\n")
access_key_id = input("AccessKey ID: ").strip()
access_key_secret = input("AccessKey Secret: ").strip()
if not access_key_id or not access_key_secret:
print("\n错误: AccessKey ID 和 AccessKey Secret 不能为空!")
sys.exit(1)
# 设置环境变量
os.environ['ALIBABA_CLOUD_ACCESS_KEY_ID'] = access_key_id
os.environ['ALIBABA_CLOUD_ACCESS_KEY_SECRET'] = access_key_secret
print("\n" + "=" * 60)
print("凭据配置成功!")
print("=" * 60)
print("\n环境变量已设置:")
print(f"ALIBABA_CLOUD_ACCESS_KEY_ID: {access_key_id[:8]}...")
print(f"ALIBABA_CLOUD_ACCESS_KEY_SECRET: {'*' * 20}")
print("\n您可以使用以下命令设置环境变量 (PowerShell):")
print(f'$env:ALIBABA_CLOUD_ACCESS_KEY_ID="{access_key_id}"')
print(f'$env:ALIBABA_CLOUD_ACCESS_KEY_SECRET="{access_key_secret}"')
print("\n或者在当前 Python 会话中运行:")
print("python ./alibabacloud_sample/sample.py")
if __name__ == '__main__':
setup_credentials()

View File

@@ -0,0 +1,83 @@
# -*- coding: utf-8 -*-
"""
测试凭据是否正确配置
使用方法: python test_credentials.py
"""
import os
import sys
os.environ['NO_PROXY'] = '100.100.100.200'
os.environ['no_proxy'] = '100.100.100.200'
from alibabacloud_credentials.client import Client as CredentialClient
def test_credentials():
print("=" * 60)
print("测试阿里云凭据配置")
print("=" * 60)
print()
# 检查环境变量
access_key_id = os.environ.get('ALIBABA_CLOUD_ACCESS_KEY_ID')
access_key_secret = os.environ.get('ALIBABA_CLOUD_ACCESS_KEY_SECRET')
print("1. 检查环境变量:")
if access_key_id:
print(f" ✅ ALIBABA_CLOUD_ACCESS_KEY_ID: {access_key_id[:8]}...")
else:
print(f" ❌ ALIBABA_CLOUD_ACCESS_KEY_ID: 未设置")
if access_key_secret:
print(f" ✅ ALIBABA_CLOUD_ACCESS_KEY_SECRET: {'*' * 20}")
else:
print(f" ❌ ALIBABA_CLOUD_ACCESS_KEY_SECRET: 未设置")
print()
print("2. 测试凭据加载:")
try:
credential_client = CredentialClient()
credential = credential_client.get_credential()
loaded_ak_id = credential.get_access_key_id()
loaded_ak_secret = credential.get_access_key_secret()
if loaded_ak_id and loaded_ak_secret:
print(f" ✅ 凭据加载成功!")
print(f" AccessKey ID: {loaded_ak_id[:8]}...")
print(f" AccessKey Secret: {'*' * 20}")
print()
print("=" * 60)
print("✅ 凭据配置正确,可以运行程序了!")
print("=" * 60)
print()
print("运行命令:")
print(" python ./alibabacloud_sample/sample.py")
print()
print("或使用快速运行脚本:")
print(" python run_with_credentials.py")
return True
else:
print(" ❌ 凭据加载失败: AccessKey 为空")
return False
except Exception as e:
print(f" ❌ 凭据加载失败: {e}")
print()
print("=" * 60)
print("解决方案:")
print("=" * 60)
print()
print("请在 PowerShell 中设置环境变量:")
print()
print(' $env:ALIBABA_CLOUD_ACCESS_KEY_ID="您的AccessKey_ID"')
print(' $env:ALIBABA_CLOUD_ACCESS_KEY_SECRET="您的AccessKey_Secret"')
print()
print("然后重新运行此测试脚本")
print()
print("获取 AccessKey: https://ram.console.aliyun.com/manage/ak")
return False
if __name__ == '__main__':
success = test_credentials()
sys.exit(0 if success else 1)

View File

@@ -0,0 +1,100 @@
# 快速运行指南
## 问题原因
您遇到的错误是因为程序无法加载阿里云访问凭据。默认凭据链尝试了以下方式但都失败了:
1. ❌ 环境变量 `ALIBABA_CLOUD_ACCESS_KEY_ID``ALIBABA_CLOUD_ACCESS_KEY_SECRET`
2. ❌ CLI 配置文件 `C:\Users\34362\.aliyun\config.json`
3. ❌ 凭据配置文件 `C:\Users\34362\.alibabacloud\credentials.ini`
4. ❌ ECS 实例 RAM 角色(尝试连接代理 127.0.0.1:10809 失败)
## 解决方案
### 方案 1: 使用环境变量(推荐,最简单)
在 PowerShell 中运行:
```powershell
# 设置环境变量
$env:ALIBABA_CLOUD_ACCESS_KEY_ID="您的AccessKey_ID"
$env:ALIBABA_CLOUD_ACCESS_KEY_SECRET="您的AccessKey_Secret"
# 运行程序
python .\alibabacloud_sample\sample.py
```
### 方案 2: 使用 PowerShell 脚本(交互式)
```powershell
# 运行凭据配置脚本
.\set_credentials.ps1
# 脚本会提示输入凭据并自动设置环境变量
# 然后直接运行程序
python .\alibabacloud_sample\sample.py
```
### 方案 3: 创建凭据配置文件(永久配置)
创建文件 `C:\Users\34362\.alibabacloud\credentials.ini`
```ini
[default]
type = access_key
access_key_id = 您的AccessKey_ID
access_key_secret = 您的AccessKey_Secret
```
然后直接运行:
```powershell
python .\alibabacloud_sample\sample.py
```
## 获取 AccessKey
1. 访问阿里云 RAM 控制台: https://ram.console.aliyun.com/manage/ak
2. 创建或查看您的 AccessKey ID 和 AccessKey Secret
3. **重要**: 妥善保管 AccessKey Secret不要泄露
## 当前配置
- ✅ 手机号码: `13621242430`
- ✅ 验证码长度: 4 位数字
- ✅ 依赖已安装
- ✅ 错误处理已修复
## 注意事项
1. **代理问题**: 如果您使用了代理(如 127.0.0.1:10809可能需要临时关闭或配置 `NO_PROXY` 环境变量
2. **验证码**: API 会自动生成验证码,无法指定为 "1314"
3. **费用**: 发送短信会产生费用
## 完整示例
```powershell
# 1. 激活 conda 环境(如果需要)
conda activate douyin
# 2. 设置凭据
$env:ALIBABA_CLOUD_ACCESS_KEY_ID="LTAI5t..."
$env:ALIBABA_CLOUD_ACCESS_KEY_SECRET="your_secret"
# 3. 运行程序
python .\alibabacloud_sample\sample.py
```
## 预期输出
成功时会看到类似输出:
```json
{
"Code": "OK",
"Success": true,
"Message": "成功",
"Model": {
"VerifyCode": "1234",
"BizId": "...",
"RequestId": "..."
}
}
```

View File

@@ -0,0 +1,91 @@
# ✅ API 已成功连接!
## 🎉 当前状态
1. **API 连接成功**: 已成功调用阿里云短信服务 API
2. **代码已切换**: 从 `dypnsapi` 切换到正确的 `dysmsapi`
3. **凭据正确**: AccessKey 配置正确
4. **手机号已配置**: 13621242430
5. **验证码已配置**: 1314
## ⚠️ 最后一步:配置正确的短信模板
当前错误: `isv.SMS_TEMPLATE_ILLEGAL` - 该账号下找不到对应模板
**原因**: 模板代码 `SMS_474580174` 不存在或未审核通过
### 解决方案
1. **登录阿里云短信控制台**
- 短信模板管理: https://dysms.console.aliyun.com/domestic/text/template
2. **查看您的已审核通过的模板**
- 找到模板代码(格式: `SMS_xxxxxxxxx`
- 确保模板状态为"审核通过"
3. **更新代码中的模板代码**
编辑 `sample.py` 第 50 行和第 74 行:
```python
template_code='SMS_您的实际模板代码',
```
4. **确认模板参数**
如果您的模板内容是: `您的验证码是${code}有效期5分钟`
那么当前的配置已经正确:
```python
template_param=json.dumps({"code": "1314"})
```
如果模板变量名不是 `code`,请相应修改。
## 📝 当前代码配置
```python
send_sms_request = dysmsapi_20170525_models.SendSmsRequest(
phone_numbers='13621242430', # ✅ 手机号
sign_name='阿里云短信', # ✅ 签名(如果不对请修改)
template_code='SMS_474580174', # ⚠️ 需要替换为您的实际模板代码
template_param=json.dumps({"code": "1314"}) # ✅ 验证码参数
)
```
## 🚀 运行命令
配置完模板代码后:
```powershell
python .\alibabacloud_sample\sample.py
```
## ✅ 成功的响应示例
配置正确后,您会看到:
```json
{
"statusCode": 200,
"body": {
"Code": "OK",
"Message": "OK",
"BizId": "...",
"RequestId": "..."
}
}
```
## 📖 参考链接
- **短信模板管理**: https://dysms.console.aliyun.com/domestic/text/template
- **短信签名管理**: https://dysms.console.aliyun.com/domestic/text/sign
- **API 文档**: https://help.aliyun.com/zh/sms/developer-reference/api-dysmsapi-2017-05-25-sendsms
## 💡 提示
1. 如果签名名称也不对,请同时修改第 49 行和第 73 行的 `sign_name`
2. 模板必须是"审核通过"状态才能使用
3. 签名和模板必须匹配您的账号
**只需要更新正确的模板代码,程序就能成功发送短信了!**

View File

@@ -0,0 +1,69 @@
# ✅ 代码已跑通 - 还需配置短信签名和模板
## 🎉 成功进展
1.**凭据已硬编码**: AccessKey 已配置在代码中
2.**API 连接成功**: 已成功调用阿里云 API
3.**手机号已配置**: 13621242430
4.**验证码配置**: 4位数字
## ⚠️ 还需完成的配置
### 1. 短信签名 (SignName)
当前代码使用: `'阿里云'`
**您需要:**
- 在阿里云短信控制台创建并审核通过短信签名
- 控制台地址: https://dysms.console.aliyun.com/domestic/text/sign
-`@D:\ai_sms\alibabacloud_sample\sample.py:49` 中的 `'阿里云'` 替换为您的实际签名
### 2. 短信模板 (TemplateCode)
当前代码使用: `'SMS_123456789'` (占位符)
**您需要:**
- 在阿里云短信控制台创建并审核通过短信模板
- 控制台地址: https://dysms.console.aliyun.com/domestic/text/template
- 模板类型: 验证码
- 模板内容示例: `您的验证码是${code}有效期5分钟`
-`@D:\ai_sms\alibabacloud_sample\sample.py:50` 中的 `'SMS_123456789'` 替换为您的实际模板代码
## 📝 当前代码配置
```python
send_sms_verify_code_request = dypnsapi_20170525_models.SendSmsVerifyCodeRequest(
phone_number='13621242430', # ✅ 已配置
sign_name='阿里云', # ⚠️ 需要替换为您的签名
template_code='SMS_123456789', # ⚠️ 需要替换为您的模板代码
code_length=4, # ✅ 已配置
code_type=1 # ✅ 已配置 (数字验证码)
)
```
## 🚀 完成配置后运行
```powershell
python .\alibabacloud_sample\sample.py
```
## 📖 参考链接
- **短信签名管理**: https://dysms.console.aliyun.com/domestic/text/sign
- **短信模板管理**: https://dysms.console.aliyun.com/domestic/text/template
- **API 文档**: https://next.api.aliyun.com/document/Dypnsapi/2017-05-25/SendSmsVerifyCode
## 💡 注意事项
1. 签名和模板需要审核,通常需要几分钟到几小时
2. 签名和模板必须审核通过后才能使用
3. 验证码由 API 自动生成,无法指定为 "1314"
4. 发送短信会产生费用,请确保账户有余额
## 🔍 当前错误信息
最后一次运行的错误:
- `MissingTemplateCode`: 需要配置模板代码
- 诊断地址已在错误信息中提供
配置完签名和模板后,程序即可成功发送短信验证码!

View File

@@ -1,85 +1,112 @@
author_id,author_name,channel,stat_date,daily_published_count,cumulative_published_count,day_revenue,monthly_revenue,weekly_revenue,revenue_mom_growth_rate,revenue_wow_growth_rate
101,乳腺专家林华,1,2025-12-23,0,35,17.31,401.48,14.65,0.051353,-0.101366
102,韩主任聊妇科,1,2025-12-23,15,167,295.54,6273.97,250.73,-0.274106,-0.142889
103,男科医生杨宇卓,1,2025-12-23,49,413,142.81,5318.03,277.38,-0.202357,0.086164
104,男科医生刘德风,1,2025-12-23,15,171,171.91,4018.47,137.84,-0.396516,0.031539
105,抗衰孟大夫,1,2025-12-23,0,0,64.02,1717.74,57.83,-0.222260,-0.052155
113,内科主任何少忠医生,1,2025-12-23,494,2820,118.62,3266.14,122.86,0.145142,-0.088487
120,中医贾希瑞主任,1,2025-12-23,51,328,45.82,1406.60,54.46,-0.061046,0.001559
121,协和皮肤科付兰芹主任,1,2025-12-23,0,0,9.70,169.07,7.14,-0.176473,-0.035851
122,高丽娜中医,1,2025-12-23,1,98,6.83,162.98,4.55,0.098914,-0.382008
123,微创腋臭专家邹普功,1,2025-12-23,0,0,1.13,53.29,4.07,-0.084207,4.381958
138,阜外医院神内李大夫,1,2025-12-23,15,112,0.00,0.00,4.18,0.000000,0.000000
139,药师赵志军,1,2025-12-23,1,74,17.31,254.35,9.75,-0.223406,0.096265
140,射雕女英雄,1,2025-12-23,46,318,25.54,913.34,18.20,0.028536,0.004816
141,耳鼻喉科杨书勋医生,1,2025-12-23,100,499,39.87,1488.71,38.66,-0.339516,-0.110187
142,影像科毛医生,1,2025-12-23,1,15,40.48,987.04,16.70,-0.183184,-0.618984
145,注射刘新亚,1,2025-12-23,0,0,2.99,76.23,1.44,-0.310759,0.180328
146,Dr蓝剑雄,1,2025-12-23,0,0,1.08,5.32,0.34,-0.148800,3.238095
147,眼科医生陈慧,1,2025-12-23,0,0,2.94,214.63,4.93,0.132552,-0.505205
148,肿瘤科郭秋均医生,1,2025-12-23,100,762,52.48,697.58,22.99,-0.134291,0.390392
149,生殖科医生师楠,1,2025-12-23,1,273,1.16,235.54,18.71,-0.052953,0.012516
150,药师李宁,1,2025-12-23,1,62,8.04,158.16,5.56,0.344098,0.787563
151,皮肤科赵鹏,1,2025-12-23,97,1064,9.95,268.69,18.81,0.347628,-0.108460
152,皮肤科医生郑占才,1,2025-12-23,50,234,3.12,189.40,4.25,0.149690,-0.114675
153,曹凤娇中医,1,2025-12-23,1,24,2.01,37.29,2.55,-0.825102,3.586572
154,郝国君中医,1,2025-12-23,1,71,1.66,64.57,1.43,-0.860362,2.556184
155,成金枝中医,1,2025-12-23,1,168,16.30,153.45,1.68,0.128641,0.081483
156,许娜中医,1,2025-12-23,50,316,4.08,24.56,1.36,0.627568,0.949066
157,刘冬琴中医,1,2025-12-23,1,61,0.29,8.74,0.33,-0.494505,0.065217
158,刘叔勤中医,1,2025-12-23,1,128,0.44,5.88,0.41,-0.984114,0.000000
159,专治静脉曲张的刘洪医生,1,2025-12-23,0,0,1.06,9.85,0.52,0.591276,-0.389006
172,亮亮中医,1,2025-12-23,1,27,2.23,23.72,0.00,2.201080,-0.222930
173,赵剑锋医生,1,2025-12-23,47,166,4.87,144.57,8.25,2.890474,0.925251
174,李雪民医生,1,2025-12-23,1,59,0.17,32.13,0.30,0.952005,-0.909366
175,静脉曲张的杀手医生,1,2025-12-23,0,0,0.05,26.68,0.76,-0.327960,-0.630263
176,武娜中医,1,2025-12-23,0,22,4.85,211.45,19.40,-0.583120,0.399662
177,好孕闺蜜王珂,1,2025-12-23,0,0,31.60,1039.32,9.39,0.036563,-0.344309
179,风湿免疫专家李小峰,1,2025-12-23,97,506,24.82,515.09,31.33,0.699462,-0.220420
180,尹海琴医生,1,2025-12-23,1,294,3.72,55.02,1.65,-0.507959,0.140513
181,针灸科高小勇医生,1,2025-12-23,0,392,10.25,395.35,13.34,0.067331,0.124069
182,师强华中医,1,2025-12-23,0,0,0.00,0.00,0.00,0.000000,0.000000
183,杜晋芳中医,1,2025-12-23,101,511,28.55,207.79,3.54,0.162787,0.366563
185,郭俊恒中医,1,2025-12-23,48,324,5.44,73.16,0.79,-0.008403,0.898561
186,董强中医,1,2025-12-23,1,24,0.00,3.02,0.00,2.355556,-0.454545
187,李亚娟中医,1,2025-12-23,1,24,0.02,3.87,0.16,-0.692369,-0.685446
188,苗辉医生,1,2025-12-23,0,0,20.53,1122.08,23.76,0.640372,0.020800
189,耳鼻喉医生夏昆峰,1,2025-12-23,0,0,1.34,15.79,0.54,0.146696,0.225490
190,中医苏晨,1,2025-12-23,1,103,15.92,47.70,9.08,0.269292,6.691176
191,智璇医生,1,2025-12-23,1,72,10.18,111.51,2.48,0.684187,-0.321377
246,石鹤医生,1,2025-12-23,1,114,2.03,228.80,0.33,4.412822,-0.854409
247,梁丽君中医,1,2025-12-23,1,115,7.74,29.35,1.23,1.795238,1.088028
248,崔丽荣中医,1,2025-12-23,1,177,1.44,17.44,0.76,0.006347,0.105991
249,张承红中医,1,2025-12-23,45,232,0.51,12.96,0.26,1.234483,1.695312
253,中医郑伟,1,2025-12-23,1,24,0.00,0.12,0.00,-0.294118,0.000000
254,感染科郭金存医生,1,2025-12-23,0,0,4.39,100.37,2.82,-0.411940,0.325933
255,贾素芬中医,1,2025-12-23,101,129,0.02,1.62,0.16,-0.369650,0.000000
256,张立净,1,2025-12-23,0,15,0.00,2.08,0.00,0.552239,-0.477064
257,皮肤科李英医生,1,2025-12-23,1,201,3.19,38.23,0.80,2.530009,-0.472832
364,针灸科冀占岭大夫,1,2025-12-23,0,18,0.21,10.88,0.14,0.000000,-0.806122
365,超声科专家曹怀宇,1,2025-12-23,49,161,0.01,1.34,0.00,0.000000,20.250000
366,脊柱微创易端医生,1,2025-12-23,46,201,0.00,4.44,3.99,17.500000,7.866667
368,苗晋玲医生,1,2025-12-23,100,311,0.09,1.09,0.00,0.000000,-0.142857
369,跟着车主任学中医,1,2025-12-23,15,52,0.87,26.67,0.91,3.246815,-0.202326
370,郭主任讲中医,1,2025-12-23,15,52,9.51,17.78,1.22,0.000000,2.422886
371,洪一针讲中医,1,2025-12-23,0,8,0.02,0.66,0.07,0.000000,1.000000
372,李医生聊健康,1,2025-12-23,1,28,0.27,3.21,0.37,0.000000,0.767241
373,刘刚医生说,1,2025-12-23,0,4,0.85,18.66,0.89,0.000000,-0.257059
374,小丽讲中医,1,2025-12-23,1,9,0.13,83.77,0.17,18.850711,-0.981677
375,西北中医张宝庆,1,2025-12-23,1,6,0.16,8.47,0.15,0.000000,0.758958
376,胡锋医生,1,2025-12-23,1,6,2.04,40.82,1.40,0.440367,-0.134244
377,神经内科巴医生,1,2025-12-23,0,58,0.07,7.40,0.10,0.000000,-0.668122
378,曾国禄讲中医,1,2025-12-23,1,6,1.88,38.62,1.85,3.532864,0.288610
379,泌尿男科陈医生,1,2025-12-23,1,49,1.04,31.10,1.66,0.170493,-0.125193
380,肇庆中医何大夫,1,2025-12-23,16,24,5.01,99.07,3.89,15.187908,-0.287655
381,伟枫医生,1,2025-12-23,15,57,0.31,5.52,0.35,0.000000,0.988166
382,刘医生讲中医,1,2025-12-23,15,54,2.48,104.49,2.76,21.470968,-0.204341
383,卢医生讲健康,1,2025-12-23,1,11,0.30,16.57,0.51,0.690816,-0.479936
384,阮志华讲健康,1,2025-12-23,1,13,9.92,11.96,0.15,0.000000,42.296296
385,沈理医生,1,2025-12-23,46,56,1.21,57.24,1.06,0.435306,-0.051677
386,中医治肾病周厘,1,2025-12-23,1,13,0.05,0.75,0.04,0.000000,0.419355
387,中医妇产科安向荣,1,2025-12-23,15,57,0.38,19.16,0.29,0.527911,-0.379377
388,院博医生,1,2025-12-23,1,13,0.06,0.32,0.04,0.000000,0.000000
389,中医盛刚,1,2025-12-23,16,23,10.56,31.62,2.22,0.000000,2.561047
390,中医王雷,1,2025-12-23,1,6,0.12,0.92,0.11,0.000000,0.000000
391,泌尿科邱医生,1,2025-12-23,1,10,2.50,73.22,2.66,0.000000,-0.150463
101,乳腺专家林华,1,2025-12-30,0,38,7.36,492.95,18.76,492.950000,-0.820770
102,韩主任聊妇科,1,2025-12-30,13,269,85.66,7761.35,161.38,7761.350000,-0.913805
103,男科医生杨宇卓,1,2025-12-30,50,748,36.01,6226.87,76.33,6226.870000,-0.939068
104,男科医生刘德风,1,2025-12-30,15,276,70.48,4862.09,155.43,4862.090000,-0.844249
105,抗衰孟大夫,1,2025-12-30,0,0,34.22,2139.18,86.21,2139.180000,-0.811390
113,内科主任何少忠医生,1,2025-12-30,500,6298,130.84,4128.73,218.06,4128.730000,-0.753885
120,中医贾希瑞主任,1,2025-12-30,50,683,29.40,1758.73,74.48,1758.730000,-0.802926
121,协和皮肤科付兰芹主任,1,2025-12-30,0,0,8.60,215.84,15.85,215.840000,-0.668132
122,高丽娜中医,1,2025-12-30,1,100,0.72,640.30,6.82,640.300000,-0.985847
123,微创腋臭专家邹普功,1,2025-12-30,0,0,1.09,97.20,1.24,97.200000,-0.974097
138,阜外医院神内李大夫,1,2025-12-30,15,187,4.17,123.97,9.45,123.970000,-0.647125
139,药师赵志军,1,2025-12-30,1,80,15.57,341.11,23.76,341.110000,-0.736176
140,射雕女英雄,1,2025-12-30,50,655,27.92,1129.05,60.21,1129.050000,-0.697802
141,耳鼻喉科杨书勋医生,1,2025-12-30,99,1197,47.80,1883.24,111.25,1883.240000,-0.692518
145,注射刘新亚,1,2025-12-30,0,0,2.82,98.66,5.40,98.660000,-0.748369
146,Dr蓝剑雄,1,2025-12-30,0,0,0.90,11.61,0.92,11.610000,-0.864507
147,眼科医生陈慧,1,2025-12-30,0,0,1.98,250.14,6.06,250.140000,-0.837621
148,肿瘤科郭秋均医生,1,2025-12-30,100,1464,42.73,929.38,78.33,929.380000,-0.657858
150,药师李宁,1,2025-12-30,1,68,13.42,203.69,16.00,203.690000,-0.629029
151,皮肤科赵鹏,1,2025-12-30,100,1752,4.77,352.77,13.92,352.770000,-0.859280
152,皮肤科医生郑占才,1,2025-12-30,50,488,4.99,217.91,9.22,217.910000,-0.654164
153,曹凤娇中医,1,2025-12-30,1,27,0.00,41.15,0.08,41.150000,-0.990408
154,郝国君中医,1,2025-12-30,1,74,1.00,74.57,1.43,74.570000,-0.877358
155,成金枝中医,1,2025-12-30,15,260,6.16,209.53,11.33,209.530000,-0.819385
156,许娜中医,1,2025-12-30,50,578,0.70,30.58,1.05,30.580000,-0.899135
157,刘冬琴中医,1,2025-12-30,1,67,0.24,14.73,0.31,14.730000,-0.950794
158,刘叔勤中医,1,2025-12-30,15,197,1.66,9.03,2.45,9.030000,0.580645
159,专治静脉曲张的刘洪医生,1,2025-12-30,0,0,0.00,10.97,0.00,10.970000,-1.000000
172,李亮亮中医,1,2025-12-30,1,33,0.68,31.19,1.60,31.190000,-0.802469
173,赵剑锋医生,1,2025-12-30,50,418,3.42,182.77,9.15,182.770000,-0.783021
174,雪民医生,1,2025-12-30,1,65,4.72,44.73,5.04,44.730000,-0.372354
175,静脉曲张的杀手医生,1,2025-12-30,0,0,0.18,28.26,0.25,28.260000,-0.883178
176,武娜中医,1,2025-12-30,1,28,6.88,304.49,11.92,304.490000,-0.886875
177,好孕闺蜜王珂,1,2025-12-30,0,0,63.18,1259.79,74.98,1259.790000,-0.597919
179,风湿免疫专家李小峰,1,2025-12-30,100,1205,2.48,671.22,24.51,671.220000,-0.869468
180,尹海琴医生,1,2025-12-30,1,299,59.84,139.57,68.60,139.570000,2.217636
181,针灸科高小勇医生,1,2025-12-30,15,460,12.87,497.85,22.19,497.850000,-0.786429
182,师强华中医,1,2025-12-30,0,0,0.00,0.00,0.00,0.000000,0.000000
183,杜晋芳中医,1,2025-12-30,99,1024,3.99,283.58,10.84,283.580000,-0.888293
185,郭俊恒中医,1,2025-12-30,100,637,1.39,93.10,4.37,93.100000,-0.799541
186,董强中医,1,2025-12-30,1,29,0.00,3.24,0.08,3.240000,-0.428571
187,李亚娟中医,1,2025-12-30,1,29,0.09,5.46,0.16,5.460000,-0.900621
188,苗辉医生,1,2025-12-30,0,0,24.61,1436.55,87.87,1436.550000,-0.675625
189,耳鼻喉医生夏昆峰,1,2025-12-30,0,0,0.14,19.03,2.62,19.030000,0.048000
190,中医苏晨,1,2025-12-30,1,108,0.14,172.15,6.13,172.150000,-0.957229
191,智璇医生,1,2025-12-30,1,77,0.44,145.15,7.85,145.150000,-0.795839
246,石鹤医生,1,2025-12-30,1,117,0.02,415.67,1.09,415.670000,-0.994206
247,梁丽君中医,1,2025-12-30,1,118,0.08,36.78,0.84,36.780000,-0.946015
248,崔丽荣中医,1,2025-12-30,1,182,2.06,24.80,2.58,24.800000,-0.609091
249,张承红中医,1,2025-12-30,50,482,0.64,15.76,0.73,15.760000,-0.730627
253,中医郑伟,1,2025-12-30,1,29,0.00,0.12,0.00,0.120000,0.000000
254,感染科郭金存医生,1,2025-12-30,0,0,3.33,134.21,5.86,134.210000,-0.833475
255,贾素芬中医,1,2025-12-30,100,640,0.00,2.42,0.00,2.420000,-1.000000
256,张立净,1,2025-12-30,1,17,0.10,7.29,0.10,7.290000,-0.980431
257,皮肤科李英医生,1,2025-12-30,1,206,0.00,64.14,12.94,64.140000,-0.237028
364,针灸科冀占岭大夫,1,2025-12-30,15,86,0.00,11.75,0.00,11.750000,-1.000000
366,脊柱微创易端医生,1,2025-12-30,50,456,8.70,66.16,8.70,66.160000,-0.004577
368,苗晋玲医生,1,2025-12-30,100,820,0.00,0.00,0.00,0.000000,-1.000000
369,跟着车主任学中医,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
369,跟着车主任学中医,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
369,跟着车主任学中医,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
369,跟着车主任学中医,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
370,郭主任讲中医,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
370,郭主任讲中医,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
370,郭主任讲中医,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
370,郭主任讲中医,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
371,洪一针讲中医,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
371,洪一针讲中医,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
371,洪一针讲中医,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
371,洪一针讲中医,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
372,李医生聊健康,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
372,李医生聊健康,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
372,李医生聊健康,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
372,李医生聊健康,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
373,医生,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
373,刘刚医生说,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
373,刘刚医生说,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
373,刘刚医生,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
374,小丽讲中医,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
374,小丽讲中医,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
374,小丽讲中医,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
374,小丽讲中医,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
375,西北中医张宝庆,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
375,西北中医张宝庆,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
375,西北中医张宝庆,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
375,西北中医张宝庆,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
376,胡锋医生,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
376,胡锋医生,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
376,胡锋医生,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
376,胡锋医生,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
377,神经内科巴医生,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
377,神经内科巴医生,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
377,神经内科巴医生,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
377,神经内科巴医生,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
378,曾国禄讲中医,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
378,曾国禄讲中医,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
378,曾国禄讲中医,1,2025-12-31,0,0,0.00,0.00,0.00,0.000000,0.000000
378,曾国禄讲中医,1,2025-12-31,0,0,0.00,48.56,0.00,0.000000,0.000000
379,泌尿男科陈医生,1,2025-12-30,1,54,0.00,35.54,0.79,35.540000,-0.875591
380,肇庆中医何大夫,1,2025-12-30,15,110,0.84,122.97,5.42,122.970000,-0.802045
381,李伟枫医生,1,2025-12-30,15,142,0.02,9.38,2.42,9.380000,0.152381
382,刘医生讲中医,1,2025-12-30,15,141,0.03,115.38,1.90,115.380000,-0.866479
383,卢医生讲健康,1,2025-12-30,1,16,0.00,17.87,0.23,17.870000,-0.877660
384,阮志华讲健康,1,2025-12-30,1,16,0.00,14.14,0.54,14.140000,-0.953886
385,沈理医生,1,2025-12-30,50,310,0.13,65.73,1.87,65.730000,-0.789651
386,中医治肾病周厘,1,2025-12-30,1,18,0.00,1.50,0.06,1.500000,-0.923077
387,中医妇产科安向荣,1,2025-12-30,15,142,0.06,25.08,3.39,25.080000,0.059375
388,院博医生,1,2025-12-30,1,16,0.00,0.52,0.03,0.520000,-0.888889
389,中医盛刚,1,2025-12-30,14,113,0.04,37.14,1.01,37.140000,-0.941585
390,中医王雷,1,2025-12-30,1,10,0.00,1.70,0.11,1.700000,-0.877778
391,泌尿科邱医生,1,2025-12-30,1,15,0.00,87.82,2.93,87.820000,-0.825906
1 author_id author_name channel stat_date daily_published_count cumulative_published_count day_revenue monthly_revenue weekly_revenue revenue_mom_growth_rate revenue_wow_growth_rate
2 101 乳腺专家林华 1 2025-12-23 2025-12-30 0 35 38 17.31 7.36 401.48 492.95 14.65 18.76 0.051353 492.950000 -0.101366 -0.820770
3 102 韩主任聊妇科 1 2025-12-23 2025-12-30 15 13 167 269 295.54 85.66 6273.97 7761.35 250.73 161.38 -0.274106 7761.350000 -0.142889 -0.913805
4 103 男科医生杨宇卓 1 2025-12-23 2025-12-30 49 50 413 748 142.81 36.01 5318.03 6226.87 277.38 76.33 -0.202357 6226.870000 0.086164 -0.939068
5 104 男科医生刘德风 1 2025-12-23 2025-12-30 15 171 276 171.91 70.48 4018.47 4862.09 137.84 155.43 -0.396516 4862.090000 0.031539 -0.844249
6 105 抗衰孟大夫 1 2025-12-23 2025-12-30 0 0 64.02 34.22 1717.74 2139.18 57.83 86.21 -0.222260 2139.180000 -0.052155 -0.811390
7 113 内科主任何少忠医生 1 2025-12-23 2025-12-30 494 500 2820 6298 118.62 130.84 3266.14 4128.73 122.86 218.06 0.145142 4128.730000 -0.088487 -0.753885
8 120 中医贾希瑞主任 1 2025-12-23 2025-12-30 51 50 328 683 45.82 29.40 1406.60 1758.73 54.46 74.48 -0.061046 1758.730000 0.001559 -0.802926
9 121 协和皮肤科付兰芹主任 1 2025-12-23 2025-12-30 0 0 9.70 8.60 169.07 215.84 7.14 15.85 -0.176473 215.840000 -0.035851 -0.668132
10 122 高丽娜中医 1 2025-12-23 2025-12-30 1 98 100 6.83 0.72 162.98 640.30 4.55 6.82 0.098914 640.300000 -0.382008 -0.985847
11 123 微创腋臭专家邹普功 1 2025-12-23 2025-12-30 0 0 1.13 1.09 53.29 97.20 4.07 1.24 -0.084207 97.200000 4.381958 -0.974097
12 138 阜外医院神内李大夫 1 2025-12-23 2025-12-30 15 112 187 0.00 4.17 0.00 123.97 4.18 9.45 0.000000 123.970000 0.000000 -0.647125
13 139 药师赵志军 1 2025-12-23 2025-12-30 1 74 80 17.31 15.57 254.35 341.11 9.75 23.76 -0.223406 341.110000 0.096265 -0.736176
14 140 射雕女英雄 1 2025-12-23 2025-12-30 46 50 318 655 25.54 27.92 913.34 1129.05 18.20 60.21 0.028536 1129.050000 0.004816 -0.697802
15 141 耳鼻喉科杨书勋医生 1 2025-12-23 2025-12-30 100 99 499 1197 39.87 47.80 1488.71 1883.24 38.66 111.25 -0.339516 1883.240000 -0.110187 -0.692518
16 142 145 影像科毛医生 注射刘新亚 1 2025-12-23 2025-12-30 1 0 15 0 40.48 2.82 987.04 98.66 16.70 5.40 -0.183184 98.660000 -0.618984 -0.748369
17 145 146 注射刘新亚 Dr蓝剑雄 1 2025-12-23 2025-12-30 0 0 2.99 0.90 76.23 11.61 1.44 0.92 -0.310759 11.610000 0.180328 -0.864507
18 146 147 Dr蓝剑雄 眼科医生陈慧 1 2025-12-23 2025-12-30 0 0 1.08 1.98 5.32 250.14 0.34 6.06 -0.148800 250.140000 3.238095 -0.837621
19 147 148 眼科医生陈慧 肿瘤科郭秋均医生 1 2025-12-23 2025-12-30 0 100 0 1464 2.94 42.73 214.63 929.38 4.93 78.33 0.132552 929.380000 -0.505205 -0.657858
20 148 150 肿瘤科郭秋均医生 药师李宁 1 2025-12-23 2025-12-30 100 1 762 68 52.48 13.42 697.58 203.69 22.99 16.00 -0.134291 203.690000 0.390392 -0.629029
21 149 151 生殖科医生师楠 皮肤科赵鹏 1 2025-12-23 2025-12-30 1 100 273 1752 1.16 4.77 235.54 352.77 18.71 13.92 -0.052953 352.770000 0.012516 -0.859280
22 150 152 药师李宁 皮肤科医生郑占才 1 2025-12-23 2025-12-30 1 50 62 488 8.04 4.99 158.16 217.91 5.56 9.22 0.344098 217.910000 0.787563 -0.654164
23 151 153 皮肤科赵鹏 曹凤娇中医 1 2025-12-23 2025-12-30 97 1 1064 27 9.95 0.00 268.69 41.15 18.81 0.08 0.347628 41.150000 -0.108460 -0.990408
24 152 154 皮肤科医生郑占才 郝国君中医 1 2025-12-23 2025-12-30 50 1 234 74 3.12 1.00 189.40 74.57 4.25 1.43 0.149690 74.570000 -0.114675 -0.877358
25 153 155 曹凤娇中医 成金枝中医 1 2025-12-23 2025-12-30 1 15 24 260 2.01 6.16 37.29 209.53 2.55 11.33 -0.825102 209.530000 3.586572 -0.819385
26 154 156 郝国君中医 许娜中医 1 2025-12-23 2025-12-30 1 50 71 578 1.66 0.70 64.57 30.58 1.43 1.05 -0.860362 30.580000 2.556184 -0.899135
27 155 157 成金枝中医 刘冬琴中医 1 2025-12-23 2025-12-30 1 168 67 16.30 0.24 153.45 14.73 1.68 0.31 0.128641 14.730000 0.081483 -0.950794
28 156 158 许娜中医 刘叔勤中医 1 2025-12-23 2025-12-30 50 15 316 197 4.08 1.66 24.56 9.03 1.36 2.45 0.627568 9.030000 0.949066 0.580645
29 157 159 刘冬琴中医 专治静脉曲张的刘洪医生 1 2025-12-23 2025-12-30 1 0 61 0 0.29 0.00 8.74 10.97 0.33 0.00 -0.494505 10.970000 0.065217 -1.000000
30 158 172 刘叔勤中医 李亮亮中医 1 2025-12-23 2025-12-30 1 128 33 0.44 0.68 5.88 31.19 0.41 1.60 -0.984114 31.190000 0.000000 -0.802469
31 159 173 专治静脉曲张的刘洪医生 赵剑锋医生 1 2025-12-23 2025-12-30 0 50 0 418 1.06 3.42 9.85 182.77 0.52 9.15 0.591276 182.770000 -0.389006 -0.783021
32 172 174 李亮亮中医 李雪民医生 1 2025-12-23 2025-12-30 1 27 65 2.23 4.72 23.72 44.73 0.00 5.04 2.201080 44.730000 -0.222930 -0.372354
33 173 175 赵剑锋医生 静脉曲张的杀手医生 1 2025-12-23 2025-12-30 47 0 166 0 4.87 0.18 144.57 28.26 8.25 0.25 2.890474 28.260000 0.925251 -0.883178
34 174 176 李雪民医生 武娜中医 1 2025-12-23 2025-12-30 1 59 28 0.17 6.88 32.13 304.49 0.30 11.92 0.952005 304.490000 -0.909366 -0.886875
35 175 177 静脉曲张的杀手医生 好孕闺蜜王珂 1 2025-12-23 2025-12-30 0 0 0.05 63.18 26.68 1259.79 0.76 74.98 -0.327960 1259.790000 -0.630263 -0.597919
36 176 179 武娜中医 风湿免疫专家李小峰 1 2025-12-23 2025-12-30 0 100 22 1205 4.85 2.48 211.45 671.22 19.40 24.51 -0.583120 671.220000 0.399662 -0.869468
37 177 180 好孕闺蜜王珂 尹海琴医生 1 2025-12-23 2025-12-30 0 1 0 299 31.60 59.84 1039.32 139.57 9.39 68.60 0.036563 139.570000 -0.344309 2.217636
38 179 181 风湿免疫专家李小峰 针灸科高小勇医生 1 2025-12-23 2025-12-30 97 15 506 460 24.82 12.87 515.09 497.85 31.33 22.19 0.699462 497.850000 -0.220420 -0.786429
39 180 182 尹海琴医生 师强华中医 1 2025-12-23 2025-12-30 1 0 294 0 3.72 0.00 55.02 0.00 1.65 0.00 -0.507959 0.000000 0.140513 0.000000
40 181 183 针灸科高小勇医生 杜晋芳中医 1 2025-12-23 2025-12-30 0 99 392 1024 10.25 3.99 395.35 283.58 13.34 10.84 0.067331 283.580000 0.124069 -0.888293
41 182 185 师强华中医 郭俊恒中医 1 2025-12-23 2025-12-30 0 100 0 637 0.00 1.39 0.00 93.10 0.00 4.37 0.000000 93.100000 0.000000 -0.799541
42 183 186 杜晋芳中医 董强中医 1 2025-12-23 2025-12-30 101 1 511 29 28.55 0.00 207.79 3.24 3.54 0.08 0.162787 3.240000 0.366563 -0.428571
43 185 187 郭俊恒中医 李亚娟中医 1 2025-12-23 2025-12-30 48 1 324 29 5.44 0.09 73.16 5.46 0.79 0.16 -0.008403 5.460000 0.898561 -0.900621
44 186 188 董强中医 苗辉医生 1 2025-12-23 2025-12-30 1 0 24 0 0.00 24.61 3.02 1436.55 0.00 87.87 2.355556 1436.550000 -0.454545 -0.675625
45 187 189 李亚娟中医 耳鼻喉医生夏昆峰 1 2025-12-23 2025-12-30 1 0 24 0 0.02 0.14 3.87 19.03 0.16 2.62 -0.692369 19.030000 -0.685446 0.048000
46 188 190 苗辉医生 中医苏晨 1 2025-12-23 2025-12-30 0 1 0 108 20.53 0.14 1122.08 172.15 23.76 6.13 0.640372 172.150000 0.020800 -0.957229
47 189 191 耳鼻喉医生夏昆峰 智璇医生 1 2025-12-23 2025-12-30 0 1 0 77 1.34 0.44 15.79 145.15 0.54 7.85 0.146696 145.150000 0.225490 -0.795839
48 190 246 中医苏晨 石鹤医生 1 2025-12-23 2025-12-30 1 103 117 15.92 0.02 47.70 415.67 9.08 1.09 0.269292 415.670000 6.691176 -0.994206
49 191 247 智璇医生 梁丽君中医 1 2025-12-23 2025-12-30 1 72 118 10.18 0.08 111.51 36.78 2.48 0.84 0.684187 36.780000 -0.321377 -0.946015
50 246 248 石鹤医生 崔丽荣中医 1 2025-12-23 2025-12-30 1 114 182 2.03 2.06 228.80 24.80 0.33 2.58 4.412822 24.800000 -0.854409 -0.609091
51 247 249 梁丽君中医 张承红中医 1 2025-12-23 2025-12-30 1 50 115 482 7.74 0.64 29.35 15.76 1.23 0.73 1.795238 15.760000 1.088028 -0.730627
52 248 253 崔丽荣中医 中医郑伟 1 2025-12-23 2025-12-30 1 177 29 1.44 0.00 17.44 0.12 0.76 0.00 0.006347 0.120000 0.105991 0.000000
53 249 254 张承红中医 感染科郭金存医生 1 2025-12-23 2025-12-30 45 0 232 0 0.51 3.33 12.96 134.21 0.26 5.86 1.234483 134.210000 1.695312 -0.833475
54 253 255 中医郑伟 贾素芬中医 1 2025-12-23 2025-12-30 1 100 24 640 0.00 0.12 2.42 0.00 -0.294118 2.420000 0.000000 -1.000000
55 254 256 感染科郭金存医生 张立净 1 2025-12-23 2025-12-30 0 1 0 17 4.39 0.10 100.37 7.29 2.82 0.10 -0.411940 7.290000 0.325933 -0.980431
56 255 257 贾素芬中医 皮肤科李英医生 1 2025-12-23 2025-12-30 101 1 129 206 0.02 0.00 1.62 64.14 0.16 12.94 -0.369650 64.140000 0.000000 -0.237028
57 256 364 张立净 针灸科冀占岭大夫 1 2025-12-23 2025-12-30 0 15 15 86 0.00 2.08 11.75 0.00 0.552239 11.750000 -0.477064 -1.000000
58 257 366 皮肤科李英医生 脊柱微创易端医生 1 2025-12-23 2025-12-30 1 50 201 456 3.19 8.70 38.23 66.16 0.80 8.70 2.530009 66.160000 -0.472832 -0.004577
59 364 368 针灸科冀占岭大夫 苗晋玲医生 1 2025-12-23 2025-12-30 0 100 18 820 0.21 0.00 10.88 0.00 0.14 0.00 0.000000 -0.806122 -1.000000
60 365 369 超声科专家曹怀宇 跟着车主任学中医 1 2025-12-23 2025-12-31 49 0 161 0 0.01 0.00 1.34 0.00 0.00 0.000000 20.250000 0.000000
61 366 369 脊柱微创易端医生 跟着车主任学中医 1 2025-12-23 2025-12-31 46 0 201 0 0.00 4.44 0.00 3.99 0.00 17.500000 0.000000 7.866667 0.000000
62 368 369 苗晋玲医生 跟着车主任学中医 1 2025-12-23 2025-12-31 100 0 311 0 0.09 0.00 1.09 0.00 0.00 0.000000 -0.142857 0.000000
63 369 跟着车主任学中医 1 2025-12-23 2025-12-31 15 0 52 0 0.87 0.00 26.67 0.00 0.91 0.00 3.246815 0.000000 -0.202326 0.000000
64 370 郭主任讲中医 1 2025-12-23 2025-12-31 15 0 52 0 9.51 0.00 17.78 0.00 1.22 0.00 0.000000 2.422886 0.000000
65 371 370 洪一针讲中医 郭主任讲中医 1 2025-12-23 2025-12-31 0 8 0 0.02 0.00 0.66 0.00 0.07 0.00 0.000000 1.000000 0.000000
66 372 370 李医生聊健康 郭主任讲中医 1 2025-12-23 2025-12-31 1 0 28 0 0.27 0.00 3.21 0.00 0.37 0.00 0.000000 0.767241 0.000000
67 373 370 刘刚医生说 郭主任讲中医 1 2025-12-23 2025-12-31 0 4 0 0.85 0.00 18.66 0.00 0.89 0.00 0.000000 -0.257059 0.000000
68 374 371 小丽讲中医 洪一针讲中医 1 2025-12-23 2025-12-31 1 0 9 0 0.13 0.00 83.77 0.00 0.17 0.00 18.850711 0.000000 -0.981677 0.000000
69 375 371 西北中医张宝庆 洪一针讲中医 1 2025-12-23 2025-12-31 1 0 6 0 0.16 0.00 8.47 0.00 0.15 0.00 0.000000 0.758958 0.000000
70 376 371 胡锋医生 洪一针讲中医 1 2025-12-23 2025-12-31 1 0 6 0 2.04 0.00 40.82 0.00 1.40 0.00 0.440367 0.000000 -0.134244 0.000000
71 377 371 神经内科巴医生 洪一针讲中医 1 2025-12-23 2025-12-31 0 58 0 0.07 0.00 7.40 0.00 0.10 0.00 0.000000 -0.668122 0.000000
72 378 372 曾国禄讲中医 李医生聊健康 1 2025-12-23 2025-12-31 1 0 6 0 1.88 0.00 38.62 0.00 1.85 0.00 3.532864 0.000000 0.288610 0.000000
73 379 372 泌尿男科陈医生 李医生聊健康 1 2025-12-23 2025-12-31 1 0 49 0 1.04 0.00 31.10 0.00 1.66 0.00 0.170493 0.000000 -0.125193 0.000000
74 380 372 肇庆中医何大夫 李医生聊健康 1 2025-12-23 2025-12-31 16 0 24 0 5.01 0.00 99.07 0.00 3.89 0.00 15.187908 0.000000 -0.287655 0.000000
75 381 372 李伟枫医生 李医生聊健康 1 2025-12-23 2025-12-31 15 0 57 0 0.31 0.00 5.52 0.00 0.35 0.00 0.000000 0.988166 0.000000
76 382 373 刘医生讲中医 刘刚医生说 1 2025-12-23 2025-12-31 15 0 54 0 2.48 0.00 104.49 0.00 2.76 0.00 21.470968 0.000000 -0.204341 0.000000
77 383 373 卢医生讲健康 刘刚医生说 1 2025-12-23 2025-12-31 1 0 11 0 0.30 0.00 16.57 0.00 0.51 0.00 0.690816 0.000000 -0.479936 0.000000
78 384 373 阮志华讲健康 刘刚医生说 1 2025-12-23 2025-12-31 1 0 13 0 9.92 0.00 11.96 0.00 0.15 0.00 0.000000 42.296296 0.000000
79 385 373 沈理医生 刘刚医生说 1 2025-12-23 2025-12-31 46 0 56 0 1.21 0.00 57.24 0.00 1.06 0.00 0.435306 0.000000 -0.051677 0.000000
80 386 374 中医治肾病周厘 小丽讲中医 1 2025-12-23 2025-12-31 1 0 13 0 0.05 0.00 0.75 0.00 0.04 0.00 0.000000 0.419355 0.000000
81 387 374 中医妇产科安向荣 小丽讲中医 1 2025-12-23 2025-12-31 15 0 57 0 0.38 0.00 19.16 0.00 0.29 0.00 0.527911 0.000000 -0.379377 0.000000
82 388 374 院博医生 小丽讲中医 1 2025-12-23 2025-12-31 1 0 13 0 0.06 0.00 0.32 0.00 0.04 0.00 0.000000 0.000000
83 389 374 中医盛刚 小丽讲中医 1 2025-12-23 2025-12-31 16 0 23 0 10.56 0.00 31.62 0.00 2.22 0.00 0.000000 2.561047 0.000000
84 390 375 中医王雷 西北中医张宝庆 1 2025-12-23 2025-12-31 1 0 6 0 0.12 0.00 0.92 0.00 0.11 0.00 0.000000 0.000000
85 391 375 泌尿科邱医生 西北中医张宝庆 1 2025-12-23 2025-12-31 1 0 10 0 2.50 0.00 73.22 0.00 2.66 0.00 0.000000 -0.150463 0.000000
86 375 西北中医张宝庆 1 2025-12-31 0 0 0.00 0.00 0.00 0.000000 0.000000
87 375 西北中医张宝庆 1 2025-12-31 0 0 0.00 0.00 0.00 0.000000 0.000000
88 376 胡锋医生 1 2025-12-31 0 0 0.00 0.00 0.00 0.000000 0.000000
89 376 胡锋医生 1 2025-12-31 0 0 0.00 0.00 0.00 0.000000 0.000000
90 376 胡锋医生 1 2025-12-31 0 0 0.00 0.00 0.00 0.000000 0.000000
91 376 胡锋医生 1 2025-12-31 0 0 0.00 0.00 0.00 0.000000 0.000000
92 377 神经内科巴医生 1 2025-12-31 0 0 0.00 0.00 0.00 0.000000 0.000000
93 377 神经内科巴医生 1 2025-12-31 0 0 0.00 0.00 0.00 0.000000 0.000000
94 377 神经内科巴医生 1 2025-12-31 0 0 0.00 0.00 0.00 0.000000 0.000000
95 377 神经内科巴医生 1 2025-12-31 0 0 0.00 0.00 0.00 0.000000 0.000000
96 378 曾国禄讲中医 1 2025-12-31 0 0 0.00 0.00 0.00 0.000000 0.000000
97 378 曾国禄讲中医 1 2025-12-31 0 0 0.00 0.00 0.00 0.000000 0.000000
98 378 曾国禄讲中医 1 2025-12-31 0 0 0.00 0.00 0.00 0.000000 0.000000
99 378 曾国禄讲中医 1 2025-12-31 0 0 0.00 48.56 0.00 0.000000 0.000000
100 379 泌尿男科陈医生 1 2025-12-30 1 54 0.00 35.54 0.79 35.540000 -0.875591
101 380 肇庆中医何大夫 1 2025-12-30 15 110 0.84 122.97 5.42 122.970000 -0.802045
102 381 李伟枫医生 1 2025-12-30 15 142 0.02 9.38 2.42 9.380000 0.152381
103 382 刘医生讲中医 1 2025-12-30 15 141 0.03 115.38 1.90 115.380000 -0.866479
104 383 卢医生讲健康 1 2025-12-30 1 16 0.00 17.87 0.23 17.870000 -0.877660
105 384 阮志华讲健康 1 2025-12-30 1 16 0.00 14.14 0.54 14.140000 -0.953886
106 385 沈理医生 1 2025-12-30 50 310 0.13 65.73 1.87 65.730000 -0.789651
107 386 中医治肾病周厘 1 2025-12-30 1 18 0.00 1.50 0.06 1.500000 -0.923077
108 387 中医妇产科安向荣 1 2025-12-30 15 142 0.06 25.08 3.39 25.080000 0.059375
109 388 院博医生 1 2025-12-30 1 16 0.00 0.52 0.03 0.520000 -0.888889
110 389 中医盛刚 1 2025-12-30 14 113 0.04 37.14 1.01 37.140000 -0.941585
111 390 中医王雷 1 2025-12-30 1 10 0.00 1.70 0.11 1.700000 -0.877778
112 391 泌尿科邱医生 1 2025-12-30 1 15 0.00 87.82 2.93 87.820000 -0.825906

548
batch_import_history.py Normal file
View File

@@ -0,0 +1,548 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
批量历史数据导入脚本
功能:
1. 按日期范围循环抓取百家号数据
2. 每次抓取后自动导出CSV
3. 自动导入数据库
4. 记录执行日志和错误信息
5. 自动重试机制(针对网络、代理等临时性错误)
使用方法:
# 基本用法
python batch_import_history.py --start 2025-12-01 --end 2025-12-25
# 跳过失败的日期继续执行
python batch_import_history.py --start 2025-12-01 --end 2025-12-25 --skip-failed
# 自定义重试次数默认3次
python batch_import_history.py --start 2025-12-01 --end 2025-12-25 --max-retries 5
# 组合使用
python batch_import_history.py --start 2025-12-01 --end 2025-12-25 --skip-failed --max-retries 5
"""
import sys
import os
import subprocess
import argparse
from datetime import datetime, timedelta
from typing import List, Tuple, Optional
import json
import time
# 设置UTF-8编码
if sys.platform == 'win32':
import io
if not isinstance(sys.stdout, io.TextIOWrapper) or sys.stdout.encoding != 'utf-8':
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')
if not isinstance(sys.stderr, io.TextIOWrapper) or sys.stderr.encoding != 'utf-8':
sys.stderr = io.TextIOWrapper(sys.stderr.buffer, encoding='utf-8')
class BatchImporter:
"""批量历史数据导入器"""
def __init__(self, start_date: str, end_date: str, skip_failed: bool = False, max_retries: int = 3):
"""初始化
Args:
start_date: 开始日期 (YYYY-MM-DD)
end_date: 结束日期 (YYYY-MM-DD)
skip_failed: 是否跳过失败的日期继续执行
max_retries: 每个步骤的最大重试次数默认3
"""
self.script_dir = os.path.dirname(os.path.abspath(__file__))
self.start_date = datetime.strptime(start_date, '%Y-%m-%d')
self.end_date = datetime.strptime(end_date, '%Y-%m-%d')
self.skip_failed = skip_failed
self.max_retries = max_retries
# 脚本路径
self.analytics_script = os.path.join(self.script_dir, 'bjh_analytics_date.py')
self.export_script = os.path.join(self.script_dir, 'export_to_csv.py')
self.import_script = os.path.join(self.script_dir, 'import_csv_to_database.py')
# 日志文件
self.log_dir = os.path.join(self.script_dir, 'logs')
if not os.path.exists(self.log_dir):
os.makedirs(self.log_dir)
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
self.log_file = os.path.join(self.log_dir, f'batch_import_{timestamp}.log')
# 执行结果记录
self.results = []
# 验证脚本文件存在
self._validate_scripts()
def _validate_scripts(self):
"""验证所需脚本文件是否存在"""
scripts = {
'bjh_analytics_date.py': self.analytics_script,
'export_to_csv.py': self.export_script,
'import_csv_to_database.py': self.import_script
}
missing_scripts = []
for name, path in scripts.items():
if not os.path.exists(path):
missing_scripts.append(name)
if missing_scripts:
print(f"[X] 缺少必要的脚本文件:")
for script in missing_scripts:
print(f" - {script}")
raise FileNotFoundError("脚本文件缺失")
def log(self, message: str, level: str = 'INFO'):
"""记录日志
Args:
message: 日志消息
level: 日志级别 (INFO, WARNING, ERROR)
"""
timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
log_line = f"[{timestamp}] [{level}] {message}"
# 输出到控制台
print(log_line)
# 写入日志文件
try:
with open(self.log_file, 'a', encoding='utf-8') as f:
f.write(log_line + '\n')
except Exception as e:
print(f"[!] 写入日志文件失败: {e}")
def get_date_list(self) -> List[str]:
"""生成日期列表
Returns:
日期字符串列表 (YYYY-MM-DD)
"""
dates = []
current = self.start_date
while current <= self.end_date:
dates.append(current.strftime('%Y-%m-%d'))
current += timedelta(days=1)
return dates
def run_command_with_retry(self, cmd: List[str], step_name: str, max_retries: Optional[int] = None) -> Tuple[bool, str]:
"""执行命令(带重试机制)
Args:
cmd: 命令列表
step_name: 步骤名称
max_retries: 最大重试次数,默认使用实例配置
Returns:
(是否成功, 错误信息)
"""
if max_retries is None:
max_retries = self.max_retries
retry_count = 0
last_error = ""
while retry_count <= max_retries:
if retry_count > 0:
# 重试前等待递增延迟5秒、10秒、15秒
wait_time = retry_count * 5
self.log(f"{step_name}{retry_count}次重试,等待 {wait_time} 秒...", level='WARNING')
time.sleep(wait_time)
# 执行命令
success, error = self.run_command(cmd, step_name)
if success:
if retry_count > 0:
self.log(f"{step_name} 重试成功!(第{retry_count}次重试)", level='INFO')
return True, ""
# 失败,记录错误
last_error = error
retry_count += 1
# 判断是否需要重试
if retry_count <= max_retries:
# 可重试的错误类型
retryable_errors = [
'超时',
'timeout',
'连接',
'connection',
'代理',
'proxy',
'网络',
'network',
'RemoteDisconnected',
'ConnectionError',
'ProxyError'
]
# 检查错误信息是否包含可重试的关键词
is_retryable = any(keyword in str(error).lower() for keyword in retryable_errors)
if is_retryable:
self.log(f"{step_name} 出现可重试错误: {error}", level='WARNING')
else:
# 不可重试的错误,直接失败
self.log(f"{step_name} 出现不可重试错误,停止重试: {error}", level='ERROR')
return False, error
# 所有重试失败
self.log(f"{step_name} 失败,已达最大重试次数 ({max_retries})", level='ERROR')
return False, last_error
def run_command(self, cmd: List[str], step_name: str) -> Tuple[bool, str]:
"""执行命令
Args:
cmd: 命令列表
step_name: 步骤名称
Returns:
(是否成功, 错误信息)
"""
process = None
try:
self.log(f"执行命令: {' '.join(cmd)}")
# 使用subprocess运行命令实时输出
process = subprocess.Popen(
cmd,
cwd=self.script_dir,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT, # 合并stderr到stdout
text=True,
encoding='utf-8',
bufsize=1, # 行缓冲
universal_newlines=True
)
# 实时读取输出
output_lines = []
if process.stdout:
try:
for line in process.stdout:
line = line.rstrip()
if line: # 只输出非空行
print(f" {line}") # 实时输出到控制台
output_lines.append(line)
# 每10行记录一次日志减少日志文件大小
if len(output_lines) % 10 == 0:
self.log(f"{step_name} 运行中... (已输出{len(output_lines)}行)")
except Exception as e:
self.log(f"读取输出异常: {e}", level='WARNING')
# 等待进程结束
return_code = process.wait(timeout=600) # 10分钟超时
# 记录完整输出
full_output = '\n'.join(output_lines)
if full_output:
self.log(f"{step_name} 输出:\n{full_output}")
# 检查返回码
if return_code == 0:
self.log(f"[✓] {step_name} 执行成功", level='INFO')
return True, ""
else:
error_msg = f"返回码: {return_code}"
self.log(f"[X] {step_name} 执行失败: {error_msg}", level='ERROR')
return False, error_msg
except subprocess.TimeoutExpired:
if process:
process.kill()
error_msg = "命令执行超时(>10分钟"
self.log(f"[X] {step_name} 失败: {error_msg}", level='ERROR')
return False, error_msg
except Exception as e:
error_msg = str(e)
self.log(f"[X] {step_name} 异常: {error_msg}", level='ERROR')
import traceback
self.log(f"异常堆栈:\n{traceback.format_exc()}", level='ERROR')
return False, error_msg
def process_date(self, date_str: str) -> bool:
"""处理单个日期的数据
Args:
date_str: 日期字符串 (YYYY-MM-DD)
Returns:
是否成功
"""
self.log("="*70)
self.log(f"开始处理日期: {date_str}")
self.log("="*70)
result = {
'date': date_str,
'start_time': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
'steps': {},
'success': False,
'error': None
}
# 步骤1: 数据抓取(带重试)
self.log(f"\n[步骤 1/3] 抓取 {date_str} 的数据...")
cmd_analytics = [
sys.executable,
self.analytics_script,
date_str,
'--proxy',
'--database',
'--no-confirm' # 跳过确认提示
]
success, error = self.run_command_with_retry(cmd_analytics, f"数据抓取 ({date_str})")
result['steps']['analytics'] = {'success': success, 'error': error}
if not success:
result['error'] = f"数据抓取失败: {error}"
result['end_time'] = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
self.results.append(result)
return False
# 等待2秒确保文件写入完成
time.sleep(2)
# 步骤2: 导出CSV带重试
self.log(f"\n[步骤 2/3] 导出CSV文件...")
cmd_export = [
sys.executable,
self.export_script,
'--mode', 'csv',
'--no-confirm' # 跳过确认提示
]
success, error = self.run_command_with_retry(cmd_export, f"CSV导出 ({date_str})")
result['steps']['export'] = {'success': success, 'error': error}
if not success:
result['error'] = f"CSV导出失败: {error}"
result['end_time'] = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
self.results.append(result)
return False
# 等待2秒
time.sleep(2)
# 步骤3: 导入数据库(带重试)
self.log(f"\n[步骤 3/3] 导入数据库...")
cmd_import = [
sys.executable,
self.import_script
]
success, error = self.run_command_with_retry(cmd_import, f"数据库导入 ({date_str})")
result['steps']['import'] = {'success': success, 'error': error}
if not success:
result['error'] = f"数据库导入失败: {error}"
result['end_time'] = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
self.results.append(result)
return False
# 全部成功
result['success'] = True
result['end_time'] = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
self.results.append(result)
self.log(f"\n[✓] {date_str} 处理完成!")
self.log("="*70 + "\n")
return True
def run(self):
"""执行批量导入"""
dates = self.get_date_list()
print("\n" + "="*70)
print("批量历史数据导入")
print("="*70)
print(f"开始日期: {self.start_date.strftime('%Y-%m-%d')}")
print(f"结束日期: {self.end_date.strftime('%Y-%m-%d')}")
print(f"总天数: {len(dates)}")
print(f"跳过失败: {'' if self.skip_failed else ''}")
print(f"最大重试次数: {self.max_retries}")
print(f"日志文件: {self.log_file}")
print("="*70)
# 确认执行
confirm = input("\n是否开始执行? (y/n): ").strip().lower()
if confirm != 'y':
print("已取消")
return
self.log(f"开始批量导入: {len(dates)} 个日期")
start_time = datetime.now()
success_count = 0
failed_count = 0
for idx, date_str in enumerate(dates, 1):
print(f"\n{'='*70}")
print(f"进度: [{idx}/{len(dates)}] {date_str}")
print(f"{'='*70}")
success = self.process_date(date_str)
if success:
success_count += 1
else:
failed_count += 1
# 如果不跳过失败,则停止执行
if not self.skip_failed:
self.log(f"[X] 日期 {date_str} 处理失败,停止执行", level='ERROR')
break
else:
self.log(f"[!] 日期 {date_str} 处理失败,跳过继续", level='WARNING')
# 日期间延迟(避免请求过快)
if idx < len(dates):
delay = 5
self.log(f"等待 {delay} 秒后处理下一个日期...")
time.sleep(delay)
# 执行完成
end_time = datetime.now()
duration = end_time - start_time
print("\n" + "="*70)
print("批量导入完成")
print("="*70)
print(f"总耗时: {duration}")
print(f"成功: {success_count}")
print(f"失败: {failed_count}")
print(f"日志文件: {self.log_file}")
print("="*70)
self.log("="*70)
self.log(f"批量导入完成: 成功 {success_count} 天, 失败 {failed_count}")
self.log(f"总耗时: {duration}")
self.log("="*70)
# 保存执行结果
self._save_results()
# 显示失败的日期
if failed_count > 0:
print("\n失败的日期:")
for r in self.results:
if not r['success']:
print(f" - {r['date']}: {r.get('error', '未知错误')}")
def _save_results(self):
"""保存执行结果到JSON文件"""
try:
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
result_file = os.path.join(self.log_dir, f'batch_result_{timestamp}.json')
summary = {
'start_date': self.start_date.strftime('%Y-%m-%d'),
'end_date': self.end_date.strftime('%Y-%m-%d'),
'total_dates': len(self.results),
'success_count': sum(1 for r in self.results if r['success']),
'failed_count': sum(1 for r in self.results if not r['success']),
'results': self.results
}
with open(result_file, 'w', encoding='utf-8') as f:
json.dump(summary, f, ensure_ascii=False, indent=2)
self.log(f"执行结果已保存: {result_file}")
except Exception as e:
self.log(f"保存执行结果失败: {e}", level='ERROR')
def main():
"""主函数"""
parser = argparse.ArgumentParser(
description='批量历史数据导入脚本',
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
示例用法:
python batch_import_history.py --start 2025-12-01 --end 2025-12-25
python batch_import_history.py --start 2025-12-01 --end 2025-12-25 --skip-failed
"""
)
parser.add_argument(
'--start',
type=str,
required=True,
help='开始日期 (格式: YYYY-MM-DD)'
)
parser.add_argument(
'--end',
type=str,
required=True,
help='结束日期 (格式: YYYY-MM-DD)'
)
parser.add_argument(
'--skip-failed',
action='store_true',
help='跳过失败的日期继续执行(默认:遇到失败停止)'
)
parser.add_argument(
'--max-retries',
type=int,
default=3,
help='每个步骤的最大重试次数默认3'
)
args = parser.parse_args()
# 验证日期格式
try:
start = datetime.strptime(args.start, '%Y-%m-%d')
end = datetime.strptime(args.end, '%Y-%m-%d')
if start > end:
print("[X] 开始日期不能晚于结束日期")
return 1
except ValueError as e:
print(f"[X] 日期格式错误: {e}")
print(" 正确格式: YYYY-MM-DD (例如: 2025-12-01)")
return 1
try:
# 创建导入器
importer = BatchImporter(
start_date=args.start,
end_date=args.end,
skip_failed=args.skip_failed,
max_retries=args.max_retries
)
# 执行批量导入
importer.run()
return 0
except Exception as e:
print(f"\n[X] 程序执行出错: {e}")
import traceback
traceback.print_exc()
return 1
if __name__ == '__main__':
sys.exit(main())

View File

@@ -29,12 +29,12 @@ from database_config import DatabaseManager, DB_CONFIG
# 代理配置 - 大麦代理IP
PROXY_API_URL = (
'https://api2.damaiip.com/index.php?s=/front/user/getIPlist&xsn=e054861d08471263d970bde4f4905181&osn=TC_NO176655872088456223&tiqu=1'
'https://api2.damaiip.com/index.php?s=/front/user/getIPlist&xsn=2912cb2b22d3b7ae724f045012790479&osn=TC_NO176707424165606223&tiqu=1'
)
# 大麦代理账号密码认证
PROXY_USERNAME = '694b8c3172af7'
PROXY_PASSWORD = 'q8yA8x1dwCpdyIK'
PROXY_USERNAME = '69538fdef04e1'
PROXY_PASSWORD = '63v0kQBr2yJXnjf'
# 备用固定代理IP池格式'IP:端口', '用户名', '密码'
BACKUP_PROXY_POOL = [
@@ -62,7 +62,8 @@ class BaijiahaoAnalytics:
# 代理配置
self.use_proxy = use_proxy
self.current_proxy = None
self.current_proxy = None # 当前IP使用完后/失败后才重新获取
self.proxy_fail_count = 0 # 当前代理失败次数
# 数据库配置
self.load_from_db = load_from_db
@@ -76,6 +77,8 @@ class BaijiahaoAnalytics:
if self.use_proxy:
self.logger.info("已启用代理模式")
print("[配置] 已启用代理模式")
# 初始化时获取第一个代理
self.fetch_proxy(force_new=True)
if self.load_from_db:
self.logger.info("已启用数据库加载模式")
@@ -99,6 +102,12 @@ class BaijiahaoAnalytics:
self.analytics_output = os.path.join(self.script_dir, "bjh_analytics_data.json")
self.income_output = os.path.join(self.script_dir, "bjh_income_data_v2.json")
# 创建备份文件夹
self.backup_dir = os.path.join(self.script_dir, "backup")
if not os.path.exists(self.backup_dir):
os.makedirs(self.backup_dir)
print(f"[OK] 创建备份文件夹: {self.backup_dir}")
def cookie_string_to_dict(self, cookie_string: str) -> Dict:
"""将Cookie字符串转换为字典格式
@@ -230,15 +239,23 @@ class BaijiahaoAnalytics:
print(f"[OK] 已设置账号 {account_id} 的Cookie ({len(cookies)} 个字段)")
return True
def fetch_proxy(self) -> Optional[Dict]:
def fetch_proxy(self, force_new: bool = False) -> Optional[Dict]:
"""从代理服务获取一个可用代理,失败时使用备用固定代理
Args:
force_new: 是否强制获取新代理默认False优先使用当前IP
Returns:
代理配置字典,格式: {'http': 'http://...', 'https': 'http://...'}
"""
if not self.use_proxy:
return None
# 如果已有可用代理且不强制获取新代理,直接返回
if self.current_proxy and not force_new:
return self.current_proxy
# 获取新代理
try:
# 使用大麦代理API获取IP
resp = requests.get(PROXY_API_URL, timeout=10)
@@ -247,21 +264,30 @@ class BaijiahaoAnalytics:
# 首先尝试解析为纯文本格式(最常见)
text = resp.text.strip()
# 检测是否返回错误信息
if text.upper().startswith('ERROR'):
raise Exception(f"代理API返回错误: {text}")
# 尝试直接解析为IP:PORT格式
lines = text.split('\n')
for line in lines:
line = line.strip()
if ':' in line and not line.startswith('{') and not line.startswith('['):
# 找到第一个IP:PORT格式
ip_port = line.split()[0] if ' ' in line else line # 处理可能带有其他信息的情况
ip_port = line.split()[0] if ' ' in line else line
if ip_port.count(':') == 1: # 确保是IP:PORT格式
nowtime = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
self.logger.info(f'提取大麦代理IP(文本): {ip_port} at {nowtime}')
print(f'[代理] 提取大麦IP: {ip_port}')
# 大麦代理使用账号密码认证
host, port = ip_port.split(':', 1)
if PROXY_USERNAME and PROXY_PASSWORD:
proxy_url = f'http://{PROXY_USERNAME}:{PROXY_PASSWORD}@{host}:{port}'
else:
proxy_url = f'http://{host}:{port}'
self.current_proxy = {
'http': proxy_url,
'https': proxy_url,
@@ -282,8 +308,12 @@ class BaijiahaoAnalytics:
self.logger.info(f'提取大麦代理IP(JSON): {ip_port} at {nowtime}')
print(f'[代理] 提取大麦IP: {ip_port}')
# 构建带账密的代理URL: http://username:password@host:port
# 大麦代理使用账号密码认证
if PROXY_USERNAME and PROXY_PASSWORD:
proxy_url = f'http://{PROXY_USERNAME}:{PROXY_PASSWORD}@{ip_info["ip"]}:{ip_info["port"]}'
else:
proxy_url = f'http://{ip_info["ip"]}:{ip_info["port"]}'
self.current_proxy = {
'http': proxy_url,
'https': proxy_url,
@@ -316,6 +346,34 @@ class BaijiahaoAnalytics:
}
return self.current_proxy
def mark_proxy_failed(self):
"""标记当前代理失败失败超过3次后重新获取代理
Returns:
bool: 是否需要重新获取代理
"""
if not self.use_proxy or not self.current_proxy:
return False
self.proxy_fail_count += 1
self.logger.warning(f"当前代理失败次数: {self.proxy_fail_count}")
# 失败超过3次重新获取代理
if self.proxy_fail_count >= 3:
self.logger.info("当前代理失败次数过多,重新获取新代理")
print(f"[代理] 失败{self.proxy_fail_count}次,重新获取新代理")
self.current_proxy = None
self.proxy_fail_count = 0
# 强制获取新代理
self.fetch_proxy(force_new=True)
return True
return False
def reset_proxy_fail_count(self):
"""重置代理失败计数(请求成功后调用)"""
self.proxy_fail_count = 0
def get_common_headers(self) -> Dict:
"""获取通用请求头"""
return {
@@ -425,6 +483,8 @@ class BaijiahaoAnalytics:
successful_data = []
retry_count = 0
proxy_change_count = 0 # 代理更换次数计数器
max_proxy_changes = 3 # 最多更换3次代理即最多使用4个不同代理
while retry_count <= max_retries:
try:
@@ -438,6 +498,21 @@ class BaijiahaoAnalytics:
# 获取代理(如果启用)
proxies = self.fetch_proxy() if self.use_proxy else None
# 调试信息:显示代理使用情况
if self.use_proxy:
if proxies:
proxy_url = proxies.get('http', '')
if '@' in proxy_url:
# 提取IP部分隐藏账号密码
proxy_ip = proxy_url.split('@')[1]
else:
proxy_ip = proxy_url.replace('http://', '').replace('https://', '')
self.logger.info(f"发文统计API 使用代理: {proxy_ip}")
print(f" [代理] 使用IP: {proxy_ip}")
else:
self.logger.warning(f"发文统计API 代理未生效use_proxy={self.use_proxy}")
print(f" [!] 警告代理未生效use_proxy={self.use_proxy}")
response = self.session.get(
api_url,
headers=headers,
@@ -462,6 +537,9 @@ class BaijiahaoAnalytics:
self.logger.info("发文统计API调用成功")
print(f" [✓] API调用成功")
# 请求成功,重置代理失败计数
self.reset_proxy_fail_count()
# 提取发文统计数据
total_info = data.get('data', {}).get('total_info', {})
@@ -490,6 +568,34 @@ class BaijiahaoAnalytics:
else:
self.logger.error(f"API返回错误: errno={errno}, errmsg={errmsg}")
print(f" [X] API返回错误: errno={errno}, errmsg={errmsg}")
# 特别处理 errno=10000015 (异常请求),这通常是代理未生效
if errno == 10000015 and self.use_proxy:
self.logger.warning("检测到 errno=10000015异常请求代理未生效立即强制更换新代理")
print(f" [!] 检测到代理未生效,立即更换新代理")
# 检查是否超过代理更换上限
if proxy_change_count >= max_proxy_changes:
print(f" [X] 已达代理更换上限({max_proxy_changes}次),放弃重试")
break
# 立即强制获取新代理不等待3次
self.current_proxy = None
self.proxy_fail_count = 0
new_proxy = self.fetch_proxy(force_new=True)
if new_proxy:
# 如果还没达到重试上限,尝试重试
if retry_count < max_retries:
proxy_change_count += 1
self.logger.info(f"已更换新代理({proxy_change_count}/{max_proxy_changes}),将重试,当前第{retry_count+1}")
print(f" [!] 已更换新代理({proxy_change_count}/{max_proxy_changes}),将重试...")
retry_count += 1
continue
else:
self.logger.error("无法获取新代理,放弃重试")
print(f" [X] 无法获取新代理")
break # API错误不重试
except json.JSONDecodeError as e:
@@ -521,6 +627,58 @@ class BaijiahaoAnalytics:
if retry_count < max_retries:
self.logger.warning(f"发文统计API代理连接错误: {error_type},将重试")
print(f" [!] 代理连接错误: {error_type}")
# 标记代理失败
self.mark_proxy_failed()
# 超时或连接错误立即更换代理不等待3次失败
if self.use_proxy and ('Timeout' in error_type or 'Connection' in error_type or 'ProxyError' in error_type):
# 检查是否超过代理更换上限
if proxy_change_count >= max_proxy_changes:
self.logger.error(f"已达代理更换上限({max_proxy_changes}次),放弃重试")
print(f" [X] 已达代理更换上限({max_proxy_changes}次),放弃重试")
break
self.logger.warning(f"检测到{error_type}错误,立即更换新代理")
print(f" [!] 检测到{error_type},立即更换新代理")
self.current_proxy = None
self.proxy_fail_count = 0
new_proxy = self.fetch_proxy(force_new=True)
if new_proxy:
proxy_change_count += 1
self.logger.info(f"已更换新代理({proxy_change_count}/{max_proxy_changes}),继续重试")
print(f" [✓] 已更换新代理({proxy_change_count}/{max_proxy_changes}),继续重试")
# 更换代理后不增加retry_count直接continue重试
continue
else:
self.logger.error("无法获取新代理,放弃重试")
print(f" [X] 无法获取新代理")
break
# 其他代理错误等待3次失败后更换
elif self.proxy_fail_count >= 3 and self.use_proxy:
# 检查是否超过代理更换上限
if proxy_change_count >= max_proxy_changes:
self.logger.error(f"已达代理更换上限({max_proxy_changes}次),放弃重试")
print(f" [X] 已达代理更换上限({max_proxy_changes}次),放弃重试")
break
print(f" [!] 代理已失败{self.proxy_fail_count}次,强制更换新代理")
self.current_proxy = None
self.proxy_fail_count = 0
new_proxy = self.fetch_proxy(force_new=True)
if new_proxy:
proxy_change_count += 1
self.logger.info(f"已更换新代理({proxy_change_count}/{max_proxy_changes}),继续重试")
print(f" [✓] 已更换新代理({proxy_change_count}/{max_proxy_changes}),继续重试")
# 更换代理后不增加retry_count直接continue重试
continue
else:
self.logger.error("无法获取新代理")
print(f" [X] 无法获取新代理")
break
# 其他情况才增加retry_count
retry_count += 1
continue
else:
@@ -701,6 +859,8 @@ class BaijiahaoAnalytics:
print(f" API: {api_url}")
retry_count = 0
proxy_change_count = 0 # 代理更换次数计数器
max_proxy_changes = 3 # 最多更换3次代理即最多使用4个不同代理
while retry_count <= max_retries:
try:
@@ -714,6 +874,21 @@ class BaijiahaoAnalytics:
# 获取代理(如果启用)
proxies = self.fetch_proxy() if self.use_proxy else None
# 调试信息:显示代理使用情况
if self.use_proxy:
if proxies:
proxy_url = proxies.get('http', '')
if '@' in proxy_url:
# 提取IP部分隐藏账号密码
proxy_ip = proxy_url.split('@')[1]
else:
proxy_ip = proxy_url.replace('http://', '').replace('https://', '')
self.logger.info(f"收入API 使用代理: {proxy_ip}")
print(f" [代理] 使用IP: {proxy_ip}")
else:
self.logger.warning(f"收入API 代理未生效use_proxy={self.use_proxy}")
print(f" [!] 警告代理未生效use_proxy={self.use_proxy}")
response = self.session.get(
api_url,
headers=headers,
@@ -735,6 +910,9 @@ class BaijiahaoAnalytics:
self.logger.info("收入数据API调用成功")
print(f" [✓] API调用成功")
# 请求成功,重置代理失败计数
self.reset_proxy_fail_count()
# 显示收入数据摘要
income_data = data.get('data', {}).get('income', {})
if 'recent7Days' in income_data:
@@ -752,6 +930,34 @@ class BaijiahaoAnalytics:
else:
self.logger.error(f"收入API返回错误: errno={errno}, errmsg={errmsg}")
print(f" [X] API返回错误: errno={errno}, errmsg={errmsg}")
# 特别处理 errno=10000015 (异常请求),这通常是代理未生效
if errno == 10000015 and self.use_proxy:
self.logger.warning("检测到收入API errno=10000015异常请求代理未生效立即强制更换新代理")
print(f" [!] 检测到代理未生效,立即更换新代理")
# 检查是否超过代理更换上限
if proxy_change_count >= max_proxy_changes:
print(f" [X] 已达代理更换上限({max_proxy_changes}次),放弃重试")
return None
# 立即强制获取新代理不等待3次
self.current_proxy = None
self.proxy_fail_count = 0
new_proxy = self.fetch_proxy(force_new=True)
if new_proxy:
# 如果还没达到重试上限,尝试重试
if retry_count < max_retries:
proxy_change_count += 1
self.logger.info(f"已更换新代理({proxy_change_count}/{max_proxy_changes})将重试收入API当前第{retry_count+1}")
print(f" [!] 已更换新代理({proxy_change_count}/{max_proxy_changes}),将重试...")
retry_count += 1
continue
else:
self.logger.error("无法获取新代理,放弃重试")
print(f" [X] 无法获取新代理")
return None
except json.JSONDecodeError as e:
self.logger.error(f"收入数据JSON解析失败: {e}")
@@ -781,6 +987,58 @@ class BaijiahaoAnalytics:
if retry_count < max_retries:
self.logger.warning(f"收入数据API代理连接错误: {error_type},将重试")
print(f" [!] 代理连接错误: {error_type}")
# 标记代理失败
self.mark_proxy_failed()
# 超时或连接错误立即更换代理不等待3次失败
if self.use_proxy and ('Timeout' in error_type or 'Connection' in error_type or 'ProxyError' in error_type):
# 检查是否超过代理更换上限
if proxy_change_count >= max_proxy_changes:
self.logger.error(f"已达代理更换上限({max_proxy_changes}次),放弃重试")
print(f" [X] 已达代理更换上限({max_proxy_changes}次),放弃重试")
return None
self.logger.warning(f"检测到{error_type}错误,立即更换新代理")
print(f" [!] 检测到{error_type},立即更换新代理")
self.current_proxy = None
self.proxy_fail_count = 0
new_proxy = self.fetch_proxy(force_new=True)
if new_proxy:
proxy_change_count += 1
self.logger.info(f"已更换新代理({proxy_change_count}/{max_proxy_changes}),继续重试")
print(f" [✓] 已更换新代理({proxy_change_count}/{max_proxy_changes}),继续重试")
# 更换代理后不增加retry_count直接continue重试
continue
else:
self.logger.error("无法获取新代理,放弃重试")
print(f" [X] 无法获取新代理")
return None
# 其他代理错误等待3次失败后更换
elif self.proxy_fail_count >= 3 and self.use_proxy:
# 检查是否超过代理更换上限
if proxy_change_count >= max_proxy_changes:
self.logger.error(f"已达代理更换上限({max_proxy_changes}次),放弃重试")
print(f" [X] 已达代理更换上限({max_proxy_changes}次),放弃重试")
return None
print(f" [!] 代理已失败{self.proxy_fail_count}次,强制更换新代理")
self.current_proxy = None
self.proxy_fail_count = 0
new_proxy = self.fetch_proxy(force_new=True)
if new_proxy:
proxy_change_count += 1
self.logger.info(f"已更换新代理({proxy_change_count}/{max_proxy_changes}),继续重试")
print(f" [✓] 已更换新代理({proxy_change_count}/{max_proxy_changes}),继续重试")
# 更换代理后不增加retry_count直接continue重试
continue
else:
self.logger.error("无法获取新代理")
print(f" [X] 无法获取新代理")
return None
# 其他情况才增加retry_count
retry_count += 1
continue
else:
@@ -866,6 +1124,8 @@ class BaijiahaoAnalytics:
errno = data.get('errno', -1)
if errno == 0:
# 请求成功,重置代理失败计数
self.reset_proxy_fail_count()
return data
else:
self.logger.error(f"单日收入API返回错误: errno={errno}")
@@ -895,6 +1155,10 @@ class BaijiahaoAnalytics:
if is_proxy_error:
if retry_count < max_retries:
self.logger.warning(f"单日收入代理连接错误 ({target_date.strftime('%Y-%m-%d')}): {error_type},将重试")
# 标记代理失败
self.mark_proxy_failed()
retry_count += 1
continue
else:
@@ -1068,17 +1332,29 @@ class BaijiahaoAnalytics:
return results
def save_results(self, results: List[Dict]):
"""保存结果到文件
"""保存结果到文件(同时备份带日期的副本)
Args:
results: 数据分析结果列表
"""
import shutil
try:
# 1. 保存到主文件(不带时间戳)
with open(self.output_file, 'w', encoding='utf-8') as f:
json.dump(results, f, ensure_ascii=False, indent=2)
print(f"\n{'='*70}")
print(f"[OK] 数据已保存到: {self.output_file}")
# 2. 创建带日期的备份文件(只保留日期)
timestamp = datetime.now().strftime('%Y%m%d')
backup_filename = f"bjh_integrated_data_{timestamp}.json"
backup_file = os.path.join(self.backup_dir, backup_filename)
# 复制文件到备份目录
shutil.copy2(self.output_file, backup_file)
print(f"[OK] 备份已保存到: {backup_file}")
print(f"{'='*70}")
# 显示统计

809
bjh_analytics_date.py Normal file
View File

@@ -0,0 +1,809 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
百家号指定日期数据抓取工具
根据指定日期范围抓取发文统计和收入数据
"""
import json
import sys
import os
import argparse
from datetime import datetime, timedelta
from typing import Dict, List, Optional
# 导入基础分析器
from bjh_analytics import BaijiahaoAnalytics
# 设置标准输出编码为UTF-8
if sys.platform == 'win32':
import io
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')
sys.stderr = io.TextIOWrapper(sys.stderr.buffer, encoding='utf-8')
class BaijiahaoDateAnalytics(BaijiahaoAnalytics):
"""百家号指定日期数据抓取器"""
def __init__(self, target_date: str, use_proxy: bool = False, load_from_db: bool = False, db_config: Optional[Dict] = None):
"""初始化
Args:
target_date: 目标日期 (YYYY-MM-DD)
use_proxy: 是否使用代理
load_from_db: 是否从数据库加载Cookie
db_config: 数据库配置
"""
super().__init__(use_proxy=use_proxy, load_from_db=load_from_db, db_config=db_config)
# 解析目标日期
try:
self.target_date = datetime.strptime(target_date, '%Y-%m-%d')
self.target_date_str = target_date
except ValueError:
raise ValueError(f"日期格式错误: {target_date},正确格式: YYYY-MM-DD")
# 修改输出文件名(不带日期,使用固定文件名)
self.output_file = os.path.join(
self.script_dir,
"bjh_integrated_data.json"
)
# 创建备份文件夹
self.backup_dir = os.path.join(self.script_dir, "backup")
if not os.path.exists(self.backup_dir):
os.makedirs(self.backup_dir)
print(f"[配置] 目标日期: {target_date}")
print(f"[配置] 输出文件: {self.output_file}")
print(f"[配置] 备份目录: {self.backup_dir}")
def fetch_analytics_api_for_date(self, days: int = 7, max_retries: int = 3) -> Optional[Dict]:
"""获取指定日期范围的发文统计数据
Args:
days: 查询天数从target_date往前推
max_retries: 最大重试次数
Returns:
发文统计数据
"""
import time
# 计算日期范围从target_date往前推days天
end_date = self.target_date
start_date = end_date - timedelta(days=days-1)
start_day = start_date.strftime('%Y%m%d')
end_day = end_date.strftime('%Y%m%d')
# API端点
api_url = f"{self.base_url}/author/eco/statistics/appStatisticV3"
# 请求参数不使用special_filter_days直接指定日期范围
params = {
'type': 'event',
'start_day': start_day,
'end_day': end_day,
'stat': '0'
}
# 从Cookie中提取token
token_cookie = self.session.cookies.get('bjhStoken') or self.session.cookies.get('devStoken')
# 请求头
headers = {
'Accept': 'application/json, text/plain, */*',
'Accept-Language': 'zh-CN,zh;q=0.9',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Referer': f'{self.base_url}/builder/rc/analysiscontent',
'Sec-Fetch-Dest': 'empty',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Site': 'same-origin',
}
if token_cookie:
headers['token'] = token_cookie
self.logger.info(f"获取发文统计: {start_date.strftime('%Y-%m-%d')}{end_date.strftime('%Y-%m-%d')}")
print(f"\n[请求] 获取发文统计数据")
print(f" 日期范围: {start_date.strftime('%Y-%m-%d')}{end_date.strftime('%Y-%m-%d')}")
successful_data = []
retry_count = 0
proxy_change_count = 0 # 代理更换次数计数器
max_proxy_changes = 3 # 最多更换3次代理即最多使用4个不同代理
while retry_count <= max_retries:
try:
if retry_count > 0:
wait_time = retry_count * 2
print(f" [重试 {retry_count}/{max_retries}] 等待 {wait_time} 秒...")
time.sleep(wait_time)
proxies = self.fetch_proxy() if self.use_proxy else None
# 调试信息:显示代理使用情况
if self.use_proxy:
if proxies:
proxy_url = proxies.get('http', '')
if '@' in proxy_url:
proxy_ip = proxy_url.split('@')[1]
else:
proxy_ip = proxy_url.replace('http://', '').replace('https://', '')
print(f" [代理] 使用IP: {proxy_ip}")
else:
print(f" [!] 警告代理未生效use_proxy={self.use_proxy}")
response = self.session.get(
api_url,
headers=headers,
params=params,
proxies=proxies,
timeout=15,
verify=False
)
print(f" 状态码: {response.status_code}")
if response.status_code == 200:
data = response.json()
errno = data.get('errno', -1)
if errno == 0:
print(f" [✓] API调用成功")
# 请求成功,重置代理失败计数
self.reset_proxy_fail_count()
# 检查data字段类型
data_field = data.get('data', {})
if isinstance(data_field, list):
print(f" [X] API返回数据格式异常: data字段为列表而非字典")
print(f" 原始响应前500字符: {str(data)[:500]}")
break
if not isinstance(data_field, dict):
print(f" [X] API返回数据格式异常: data字段类型为 {type(data_field).__name__}")
break
total_info = data_field.get('total_info', {})
print(f"\n 发文统计数据:")
print(f" 发文量: {total_info.get('publish_count', '0')}")
print(f" 曝光量: {total_info.get('disp_pv', '0')}")
print(f" 阅读量: {total_info.get('view_count', '0')}")
api_result = {
'endpoint': '/author/eco/statistics/appStatisticV3',
'name': '发文统计',
'date_range': f"{start_day} - {end_day}",
'data': data,
'fetch_time': datetime.now().strftime('%Y-%m-%d %H:%M:%S')
}
successful_data.append(api_result)
break
else:
errmsg = data.get('errmsg', '')
print(f" [X] API返回错误: errno={errno}, errmsg={errmsg}")
# 特别处理 errno=10000015异常请求这通常是代理未生效
if errno == 10000015 and self.use_proxy:
print(f" [!] 检测到代理未生效,立即更换新代理")
# 检查是否超过代理更换上限
if proxy_change_count >= max_proxy_changes:
print(f" [X] 已达代理更换上限({max_proxy_changes}次),放弃重试")
break
# 立即强制获取新代理
self.current_proxy = None
self.proxy_fail_count = 0
new_proxy = self.fetch_proxy(force_new=True)
if new_proxy and retry_count < max_retries:
proxy_change_count += 1
print(f" [!] 已更换新代理({proxy_change_count}/{max_proxy_changes}),将重试...")
retry_count += 1
continue
else:
print(f" [X] 无法获取新代理或已达重试上限")
break
else:
# 其他API错误不重试
break
else:
print(f" [X] HTTP错误: {response.status_code}")
break
except Exception as e:
error_type = type(e).__name__
is_retry_error = any([
'Connection' in error_type,
'Timeout' in error_type,
'ProxyError' in error_type,
])
if is_retry_error and retry_count < max_retries:
print(f" [!] 连接错误: {error_type}")
# 标记代理失败
self.mark_proxy_failed()
# 如果代理失败次数达到3次强制更换新代理第4次重试用新代理
if self.proxy_fail_count >= 3 and self.use_proxy:
# 检查是否超过代理更换上限
if proxy_change_count >= max_proxy_changes:
print(f" [X] 已达代理更换上限({max_proxy_changes}次),放弃重试")
break
print(f" [!] 代理已失败{self.proxy_fail_count}次,强制更换新代理")
self.current_proxy = None
self.proxy_fail_count = 0
new_proxy = self.fetch_proxy(force_new=True)
if new_proxy:
proxy_change_count += 1
print(f" [✓] 已更换新代理({proxy_change_count}/{max_proxy_changes}),继续重试")
else:
print(f" [X] 无法获取新代理")
break
retry_count += 1
continue
else:
print(f" [X] 请求异常: {e}")
break
if successful_data:
return {
'apis': successful_data,
'count': len(successful_data)
}
return None
def fetch_income_for_date(self, max_retries: int = 3) -> Optional[Dict]:
"""获取指定日期的收入数据
使用overviewhomelist API获取按天的详细收入数据
Returns:
收入数据
"""
import time
from datetime import timedelta
# 计算Unix时间戳从目标日期往前30天以便获取更多数据
end_date = self.target_date
start_date = end_date - timedelta(days=29) # 30天范围
# 转换为Unix时间戳
start_timestamp = int(start_date.timestamp())
end_timestamp = int(end_date.timestamp())
# 使用overviewhomelist API获取每日收入明细
api_url = f"{self.base_url}/author/eco/income4/overviewhomelist"
token_cookie = self.session.cookies.get('bjhStoken') or self.session.cookies.get('devStoken')
headers = {
'Accept': 'application/json, text/plain, */*',
'Accept-Language': 'zh-CN,zh;q=0.9',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Referer': f'{self.base_url}/builder/rc/incomecenter',
'Sec-Fetch-Dest': 'empty',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Site': 'same-origin',
}
if token_cookie:
headers['token'] = token_cookie
# 请求参数
params = {
'start_date': start_timestamp,
'end_date': end_timestamp
}
print(f"\n[请求] 获取收入数据")
print(f" 日期范围: {start_date.strftime('%Y-%m-%d')}{end_date.strftime('%Y-%m-%d')}")
retry_count = 0
proxy_change_count = 0 # 代理更换次数计数器
max_proxy_changes = 3 # 最多更换3次代理即最多使用4个不同代理
while retry_count <= max_retries:
try:
if retry_count > 0:
wait_time = retry_count * 2
print(f" [重试 {retry_count}/{max_retries}] 等待 {wait_time} 秒...")
time.sleep(wait_time)
proxies = self.fetch_proxy() if self.use_proxy else None
# 调试信息:显示代理使用情况
if self.use_proxy:
if proxies:
proxy_url = proxies.get('http', '')
if '@' in proxy_url:
proxy_ip = proxy_url.split('@')[1]
else:
proxy_ip = proxy_url.replace('http://', '').replace('https://', '')
print(f" [代理] 使用IP: {proxy_ip}")
else:
print(f" [!] 警告代理未生效use_proxy={self.use_proxy}")
response = self.session.get(
api_url,
headers=headers,
params=params,
proxies=proxies,
timeout=15,
verify=False
)
print(f" 状态码: {response.status_code}")
if response.status_code == 200:
data = response.json()
errno = data.get('errno', -1)
if errno == 0:
print(f" [✓] API调用成功")
# 请求成功,重置代理失败计数
self.reset_proxy_fail_count()
# 提取收入列表
income_list = data.get('data', {}).get('list', [])
if income_list:
# 找到目标日期的数据
target_timestamp = int(self.target_date.timestamp())
target_income_data = None
for item in income_list:
if item.get('day_time') == target_timestamp:
target_income_data = item
break
if target_income_data:
day_revenue = target_income_data.get('total_income', 0)
print(f"\n 收入数据详情:")
print(f" {self.target_date_str} 当日收入: ¥{day_revenue:.2f}")
# 计算近7天收入
recent7_revenue = 0.0
recent7_start = self.target_date - timedelta(days=6)
recent7_start_ts = int(recent7_start.timestamp())
for item in income_list:
if recent7_start_ts <= item.get('day_time', 0) <= target_timestamp:
recent7_revenue += item.get('total_income', 0)
print(f" 近7天: ¥{recent7_revenue:.2f}")
# 计算近30天收入
recent30_revenue = sum(item.get('total_income', 0) for item in income_list)
print(f" 近30天: ¥{recent30_revenue:.2f}")
# 计算当月收入(从月初到目标日期)
month_start = self.target_date.replace(day=1)
month_start_ts = int(month_start.timestamp())
current_month_revenue = 0.0
for item in income_list:
if month_start_ts <= item.get('day_time', 0) <= target_timestamp:
current_month_revenue += item.get('total_income', 0)
print(f" 当月收入: ¥{current_month_revenue:.2f}")
# 构造返回数据(与原有格式保持一致)
return {
'errno': 0,
'errmsg': 'success',
'data': {
'income': {
'yesterday': {
'income': day_revenue,
'value': day_revenue
},
'recent7Days': {
'income': recent7_revenue,
'value': recent7_revenue
},
'recent30Days': {
'income': recent30_revenue,
'value': recent30_revenue
},
'currentMonth': {
'income': current_month_revenue,
'value': current_month_revenue
}
}
},
'raw_list': income_list # 保留原始数据
}
else:
print(f" [警告] 未找到 {self.target_date_str} 的收入数据")
return None
else:
print(f" [警告] 收入数据列表为空")
return None
else:
errmsg = data.get('errmsg', '')
print(f" [X] API返回错误: errno={errno}, errmsg={errmsg}")
# 特别处理 errno=10000015异常请求这通常是代理未生效
if errno == 10000015 and self.use_proxy:
print(f" [!] 检测到代理未生效,立即更换新代理")
# 检查是否超过代理更换上限
if proxy_change_count >= max_proxy_changes:
print(f" [X] 已达代理更换上限({max_proxy_changes}次),放弃重试")
return None
# 立即强制获取新代理
self.current_proxy = None
self.proxy_fail_count = 0
new_proxy = self.fetch_proxy(force_new=True)
if new_proxy and retry_count < max_retries:
proxy_change_count += 1
print(f" [!] 已更换新代理({proxy_change_count}/{max_proxy_changes}),将重试...")
retry_count += 1
continue
else:
print(f" [X] 无法获取新代理或已达重试上限")
return None
else:
# 其他API错误不重试
return None
else:
print(f" [X] HTTP错误: {response.status_code}")
return None
except Exception as e:
error_type = type(e).__name__
is_retry_error = any([
'Connection' in error_type,
'Timeout' in error_type,
'ProxyError' in error_type,
])
if is_retry_error and retry_count < max_retries:
print(f" [!] 连接错误: {error_type}")
# 标记代理失败
self.mark_proxy_failed()
# 如果代理失败次数达到3次强制更换新代理第4次重试用新代理
if self.proxy_fail_count >= 3 and self.use_proxy:
# 检查是否超过代理更换上限
if proxy_change_count >= max_proxy_changes:
print(f" [X] 已达代理更换上限({max_proxy_changes}次),放弃重试")
return None
print(f" [!] 代理已失败{self.proxy_fail_count}次,强制更换新代理")
self.current_proxy = None
self.proxy_fail_count = 0
new_proxy = self.fetch_proxy(force_new=True)
if new_proxy:
proxy_change_count += 1
print(f" [✓] 已更换新代理({proxy_change_count}/{max_proxy_changes}),继续重试")
else:
print(f" [X] 无法获取新代理")
return None
retry_count += 1
continue
else:
print(f" [X] 请求异常: {e}")
return None
return None
def extract_integrated_data_for_date(self, account_id: str, days: int = 7) -> Optional[Dict]:
"""提取指定账号在指定日期的整合数据
Args:
account_id: 账号ID
days: 查询天数从target_date往前推
Returns:
整合数据
"""
import time
import random
print(f"\n{'='*70}")
print(f"开始提取账号数据: {account_id}")
print(f"目标日期: {self.target_date_str}")
print(f"{'='*70}")
if not self.set_account_cookies(account_id):
return None
result = {
'account_id': account_id,
'fetch_time': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
'target_date': self.target_date_str,
'status': 'unknown',
'analytics': {},
'income': {},
'error_info': {}
}
# 1. 获取发文统计数据
print("\n[1/2] 获取发文统计数据...")
api_data = self.fetch_analytics_api_for_date(days=days)
if api_data:
result['analytics'] = api_data
print("[OK] 发文统计数据获取成功")
else:
print("[X] 发文统计数据获取失败")
result['error_info']['analytics'] = 'API调用失败'
# API调用间隔
api_delay = random.uniform(2, 4)
print(f"\n[间隔] 等待 {api_delay:.1f} 秒...")
time.sleep(api_delay)
# 2. 获取收入数据
print("\n[2/2] 获取收入数据...")
income_data = self.fetch_income_for_date()
if income_data:
result['income'] = income_data
print("[OK] 收入数据获取成功")
else:
print("[X] 收入数据获取失败")
result['error_info']['income'] = 'API调用失败'
# 设置状态
if result['analytics'] and result['income']:
result['status'] = 'success_all'
elif result['analytics'] or result['income']:
result['status'] = 'success_partial'
else:
result['status'] = 'failed'
return result
def extract_all_for_date(self, days: int = 7, delay_seconds: float = 3.0) -> List[Dict]:
"""提取所有账号在指定日期的数据
Args:
days: 查询天数
delay_seconds: 账号间延迟
Returns:
所有账号的数据
"""
import random
if not self.account_cookies:
print("[X] 没有可用的账号Cookie")
return []
print("\n" + "="*70)
print(f"开始提取 {len(self.account_cookies)} 个账号的数据")
print(f"目标日期: {self.target_date_str}")
print("="*70)
results = []
for idx, account_id in enumerate(self.account_cookies.keys(), 1):
print(f"\n[{idx}/{len(self.account_cookies)}] 处理账号: {account_id}")
result = self.extract_integrated_data_for_date(account_id, days=days)
if result:
results.append(result)
# 添加延迟
if idx < len(self.account_cookies):
actual_delay = delay_seconds * random.uniform(0.7, 1.3)
print(f"\n[延迟] 等待 {actual_delay:.1f} 秒后继续...")
import time
time.sleep(actual_delay)
return results
def save_results(self, results: List[Dict]):
"""保存结果到文件(同时备份带时间戳的副本)
Args:
results: 数据分析结果列表
"""
import json
import shutil
try:
# 1. 保存到主文件(不带时间戳)
with open(self.output_file, 'w', encoding='utf-8') as f:
json.dump(results, f, ensure_ascii=False, indent=2)
print(f"\n{'='*70}")
print(f"[OK] 数据已保存到: {self.output_file}")
# 2. 创建带时间戳的备份文件(只保留日期)
timestamp = datetime.now().strftime('%Y%m%d')
backup_filename = f"bjh_integrated_data_{timestamp}.json"
backup_file = os.path.join(self.backup_dir, backup_filename)
# 复制文件到备份目录
shutil.copy2(self.output_file, backup_file)
print(f"[OK] 备份已保存到: {backup_file}")
print(f"{'='*70}")
# 显示统计
success_count = sum(1 for r in results if r.get('status', '').startswith('success'))
print(f"\n统计信息:")
print(f" - 总账号数: {len(results)}")
print(f" - 成功获取: {success_count}")
print(f" - 失败: {len(results) - success_count}")
except Exception as e:
print(f"[X] 保存文件失败: {e}")
def main():
"""主函数"""
parser = argparse.ArgumentParser(
description='百家号指定日期数据抓取工具',
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
示例用法:
python bjh_analytics_date.py 2025-12-20
python bjh_analytics_date.py 2025-12-20 --days 7
python bjh_analytics_date.py 2025-12-20 --proxy
python bjh_analytics_date.py 2025-12-20 --database
python bjh_analytics_date.py 2025-12-20 --account "乳腺专家林华" # 仅测试单个账号
"""
)
parser.add_argument(
'date',
type=str,
help='目标日期 (格式: YYYY-MM-DD)'
)
parser.add_argument(
'--days',
type=int,
default=7,
help='查询天数从目标日期往前推默认7天'
)
parser.add_argument(
'--proxy',
action='store_true',
default=True, # 默认启用代理
help='启用代理(默认启用)'
)
parser.add_argument(
'--no-proxy',
dest='proxy',
action='store_false',
help='禁用代理'
)
parser.add_argument(
'--database',
action='store_true',
default=True, # 默认从数据库加载Cookie
help='从数据库加载Cookie默认启用'
)
parser.add_argument(
'--local',
dest='database',
action='store_false',
help='从本地JSON文件加载Cookie'
)
parser.add_argument(
'--delay',
type=float,
default=3.0,
help='账号间延迟时间默认3.0'
)
parser.add_argument(
'--account',
type=str,
default=None,
help='仅抓取指定账号(用于测试),格式:账号名称'
)
parser.add_argument(
'--no-confirm',
action='store_true',
help='跳过确认提示,直接开始抓取(用于批量脚本)'
)
args = parser.parse_args()
# 验证日期格式
try:
datetime.strptime(args.date, '%Y-%m-%d')
except ValueError:
print(f"[X] 日期格式错误: {args.date}")
print(" 正确格式: YYYY-MM-DD (例如: 2025-12-20)")
return 1
print("\n" + "="*70)
print("百家号指定日期数据抓取工具")
print("="*70)
print(f"目标日期: {args.date}")
print(f"查询天数: {args.days}")
print(f"使用代理: {'' if args.proxy else ''}")
print(f"数据源: {'数据库' if args.database else '本地文件'}")
print("="*70)
try:
# 创建分析器
analytics = BaijiahaoDateAnalytics(
target_date=args.date,
use_proxy=args.proxy,
load_from_db=args.database
)
if not analytics.account_cookies:
print("\n[X] 未找到可用的账号Cookie")
return 1
# 如果指定了单个账号,验证是否存在
if args.account:
if args.account not in analytics.account_cookies:
print(f"\n[X] 未找到指定账号: {args.account}")
print(f"\n可用账号列表:")
for idx, account_name in enumerate(analytics.account_cookies.keys(), 1):
print(f" {idx}. {account_name}")
return 1
# 只保留指定账号
analytics.account_cookies = {args.account: analytics.account_cookies[args.account]}
print(f"\n[测试模式] 仅抓取账号: {args.account}")
print(f"\n找到 {len(analytics.account_cookies)} 个账号")
# 确认执行(除非使用--no-confirm参数
if not args.no_confirm:
confirm = input("\n是否开始抓取? (y/n): ").strip().lower()
if confirm != 'y':
print("已取消")
return 0
# 提取所有账号数据
results = analytics.extract_all_for_date(
days=args.days,
delay_seconds=args.delay
)
if results:
analytics.save_results(results)
# 显示统计
success_all = sum(1 for r in results if r.get('status') == 'success_all')
success_partial = sum(1 for r in results if r.get('status') == 'success_partial')
failed = sum(1 for r in results if r.get('status') == 'failed')
print(f"\n{'='*70}")
print("数据提取统计")
print(f"{'='*70}")
print(f" 总账号数: {len(results)}")
print(f" 全部成功: {success_all}")
print(f" 部分成功: {success_partial}")
print(f" 失败: {failed}")
print(f"{'='*70}")
return 0
else:
print("\n[X] 未获取到任何数据")
return 1
except Exception as e:
print(f"\n[X] 程序执行出错: {e}")
import traceback
traceback.print_exc()
return 1
if __name__ == '__main__':
sys.exit(main())

View File

@@ -1,6 +1,7 @@
[Unit]
Description=百家号数据同步守护进程
After=network.target
Description=百家号数据同步守护进程(含数据验证与短信告警)
After=network.target mysql.service
Wants=mysql.service
[Service]
Type=simple
@@ -12,8 +13,18 @@ RestartSec=10
StandardOutput=journal
StandardError=journal
# 环境变量(如果需要)
# Environment="PATH=/usr/local/bin:/usr/bin:/bin"
# 环境变量配置
Environment="LOAD_FROM_DB=true"
Environment="USE_PROXY=true"
Environment="DAYS=7"
Environment="MAX_RETRIES=3"
Environment="RUN_NOW=true"
Environment="ENABLE_VALIDATION=true"
Environment="NON_INTERACTIVE=true"
# 阿里云短信服务凭据可选也可使用sms_config.json
# Environment="ALIBABA_CLOUD_ACCESS_KEY_ID=your_access_key_id"
# Environment="ALIBABA_CLOUD_ACCESS_KEY_SECRET=your_access_key_secret"
[Install]
WantedBy=multi-user.target

View File

@@ -27,12 +27,12 @@ from log_config import setup_bjh_daemon_logger
class BaijiahaoDataDaemon:
"""百家号数据定时更新守护进程"""
def __init__(self, update_interval_hours: int = 6, use_proxy: bool = False, load_from_db: bool = False):
def __init__(self, update_interval_hours: int = 1, use_proxy: bool = False, load_from_db: bool = False):
"""
初始化守护进程
Args:
update_interval_hours: 更新间隔(小时),默认6小时
update_interval_hours: 更新间隔(小时),默认1小时
use_proxy: 是否使用代理默认False
load_from_db: 是否从数据库加载Cookie默认False
"""
@@ -371,8 +371,8 @@ def main():
print("\n请配置守护进程参数:\n")
# 更新间隔
interval_input = input("1. 更新间隔(小时,默认6小时): ").strip()
update_interval = int(interval_input) if interval_input.isdigit() and int(interval_input) > 0 else 6
interval_input = input("1. 更新间隔(小时,默认1小时): ").strip()
update_interval = int(interval_input) if interval_input.isdigit() and int(interval_input) > 0 else 1
# 查询天数
days_input = input("2. 查询天数默认7天: ").strip()

File diff suppressed because it is too large Load Diff

0
calc_ip.py Normal file
View File

View File

@@ -4,18 +4,20 @@
数据同步守护进程
功能:
1. 24小时不间断运行
2. 在每天午夜00:00自动执行数据抓取和同步
1. 24小时不间断运行仅在工作时间8:00-24:00执行任务
2. 每隔1小时自动执行数据抓取和同步
3. 自动执行流程:
- 从百家号API抓取最新数据
- 生成CSV文件包含从数据库查询的author_id
- 将CSV数据导入到数据库
4. 支持手动触发刷新
5. 详细的日志记录
6. 非工作时间0:00-8:00自动休眠减少API请求压力
使用场景:
- 24/7运行每天凌晨自动更新数据
- 24/7运行在工作时间8:00-24:00每隔1小时自动更新数据
- 无需人工干预,自动化数据同步
- 避免在夜间时段进行数据抓取
"""
import sys
@@ -38,11 +40,19 @@ from export_to_csv import DataExporter
from import_csv_to_database import CSVImporter
from log_config import setup_logger
# 导入数据验证与短信告警模块
try:
from data_validation_with_sms import DataValidationWithSMS
VALIDATION_AVAILABLE = True
except ImportError:
print("[!] 数据验证模块未找到,验证功能将不可用")
VALIDATION_AVAILABLE = False
class DataSyncDaemon:
"""数据同步守护进程"""
def __init__(self, use_proxy: bool = False, load_from_db: bool = True, days: int = 7, max_retries: int = 3):
def __init__(self, use_proxy: bool = False, load_from_db: bool = True, days: int = 7, max_retries: int = 3, enable_validation: bool = True):
"""初始化守护进程
Args:
@@ -50,16 +60,28 @@ class DataSyncDaemon:
load_from_db: 是否从数据库加载Cookie
days: 抓取最近多少天的数据
max_retries: 最大重试次数
enable_validation: 是否启用数据验证与短信告警
"""
self.script_dir = os.path.dirname(os.path.abspath(__file__))
self.use_proxy = use_proxy
self.load_from_db = load_from_db
self.days = days
self.max_retries = max_retries
self.enable_validation = enable_validation and VALIDATION_AVAILABLE
# 工作时间配置8:00-24:00
self.work_start_hour = 8
self.work_end_hour = 24
# 初始化日志
self.logger = setup_logger('data_sync_daemon', os.path.join(self.script_dir, 'logs', 'data_sync_daemon.log'))
# 创建验证报告目录
self.validation_reports_dir = os.path.join(self.script_dir, 'validation_reports')
if not os.path.exists(self.validation_reports_dir):
os.makedirs(self.validation_reports_dir)
self.logger.info(f"创建验证报告目录: {self.validation_reports_dir}")
# 统计信息
self.stats = {
'total_runs': 0,
@@ -76,13 +98,17 @@ class DataSyncDaemon:
print(f" 使用代理: {'' if use_proxy else ''}")
print(f" Cookie来源: {'数据库' if load_from_db else '本地文件'}")
print(f" 抓取天数: {days}")
print(f" 工作时间: {self.work_start_hour}:00 - {self.work_end_hour}:00")
print(f" 错误重试: 最大{max_retries}")
print(f" 定时执行: 每天午夜00:00")
print(f" 定时执行: 每隔1小时")
print(f" 数据验证: {'已启用' if self.enable_validation else '已禁用'}")
if self.enable_validation:
print(f" 短信告警: 验证失败时发送 (错误代码2222)")
print("="*70 + "\n")
self.logger.info("="*70)
self.logger.info("数据同步守护进程启动")
self.logger.info(f"使用代理: {use_proxy}, Cookie来源: {'数据库' if load_from_db else '本地文件'}, 抓取天数: {days}, 重试: {max_retries}")
self.logger.info(f"使用代理: {use_proxy}, Cookie来源: {'数据库' if load_from_db else '本地文件'}, 抓取天数: {days}, 工作时间: {self.work_start_hour}:00-{self.work_end_hour}:00, 重试: {max_retries}次, 验证: {'已启用' if self.enable_validation else '已禁用'}")
self.logger.info("="*70)
def fetch_data(self) -> bool:
@@ -294,6 +320,77 @@ class DataSyncDaemon:
self.logger.error(f"数据库导入失败: {e}", exc_info=True)
return False
def validate_data(self) -> bool:
"""步顷4数据验证与短信告警"""
if not self.enable_validation:
print("\n[跳过] 数据验证功能未启用")
self.logger.info("跳过数据验证(功能未启用)")
return True
print("\n" + "="*70)
print("【步顷4/4】数据验证与短信告警")
print("="*70)
try:
# 等待3秒确保数据库写入完成
print("\n等待3秒确保数据写入完成...")
self.logger.info("等待3秒以确保数据库写入完成")
time.sleep(3)
print("\n执行数据验证...")
self.logger.info("开始执行数据验证")
# 创建验证器(验证昨天的数据)
yesterday = (datetime.now() - timedelta(days=1)).strftime('%Y-%m-%d')
validator = DataValidationWithSMS(date_str=yesterday)
# 执行验证JSON + CSV + Database
passed = validator.run_validation(
sources=['json', 'csv', 'database'],
table='ai_statistics'
)
# 生成验证报告
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
report_file = os.path.join(
self.validation_reports_dir,
f'validation_report_{timestamp}.txt'
)
validator.validator.generate_report(report_file)
if passed:
print("\n[✓] 数据验证通过")
self.logger.info("数据验证通过")
return True
else:
print("\n[X] 数据验证失败,准备发送短信告警")
self.logger.error("数据验证失败")
# 生成错误摘要
error_summary = validator.generate_error_summary()
self.logger.error(f"错误摘要: {error_summary}")
# 发送短信告警错误代码2222
sms_sent = validator.send_sms_alert("2222", error_summary)
if sms_sent:
print("[✓] 告警短信已发送")
self.logger.info("告警短信发送成功")
else:
print("[X] 告警短信发送失败")
self.logger.error("告警短信发送失败")
print(f"\n详细报告: {report_file}")
# 验证失败不阻止后续流程但返回True表示步骤完成
return True
except Exception as e:
print(f"\n[X] 数据验证异常: {e}")
self.logger.error(f"数据验证异常: {e}", exc_info=True)
# 验证异常不影响整体流程
return True
def sync_data(self):
"""执行完整的数据同步流程"""
start_time = datetime.now()
@@ -317,10 +414,15 @@ class DataSyncDaemon:
if not self.generate_csv():
raise Exception("CSV生成失败")
# 步3导入数据库
# 步3导入数据库
if not self.import_to_database():
raise Exception("数据库导入失败")
# 步顷4数据验证与短信告警
if not self.validate_data():
# 验证失败不阻止整体流程,只记录警告
self.logger.warning("数据验证步骤未成功完成")
# 成功
end_time = datetime.now()
duration = (end_time - start_time).total_seconds()
@@ -370,12 +472,36 @@ class DataSyncDaemon:
self.logger.info(f"运行统计: 总{self.stats['total_runs']}次, 成功{self.stats['successful_runs']}次, 失败{self.stats['failed_runs']}")
def get_next_midnight(self) -> datetime:
"""获取下一个午夜时刻"""
def is_work_time(self) -> tuple:
"""
检查当前是否在工作时间内8:00-24:00
Returns:
tuple: (是否在工作时间内, 距离下次工作时间的秒数)
"""
now = datetime.now()
tomorrow = now + timedelta(days=1)
next_midnight = tomorrow.replace(hour=0, minute=0, second=0, microsecond=0)
return next_midnight
current_hour = now.hour
# 在工作时间内8:00-23:59
if self.work_start_hour <= current_hour < self.work_end_hour:
return True, 0
# 不在工作时间内,计算到下个工作时间的秒数
if current_hour < self.work_start_hour:
# 今天还没到工作时间
next_work_time = now.replace(hour=self.work_start_hour, minute=0, second=0, microsecond=0)
else:
# 今天已过工作时间,等待明天
next_work_time = (now + timedelta(days=1)).replace(hour=self.work_start_hour, minute=0, second=0, microsecond=0)
seconds_until_work = (next_work_time - now).total_seconds()
return False, seconds_until_work
def get_next_run_time(self) -> datetime:
"""获取下一次执行时间1小时后"""
now = datetime.now()
next_run = now + timedelta(hours=1)
return next_run
def run(self):
"""启动守护进程"""
@@ -383,22 +509,51 @@ class DataSyncDaemon:
print("守护进程已启动")
print("="*70)
# 设置定时任务:每天午夜00:00执行
schedule.every().day.at("00:00").do(self.sync_data)
# 设置定时任务:每隔1小时执行
schedule.every(1).hours.do(self.sync_data)
# 计算下次执行时间
next_run = self.get_next_midnight()
next_run = self.get_next_run_time()
time_until_next = (next_run - datetime.now()).total_seconds()
print(f"\n下次执行时间: {next_run.strftime('%Y-%m-%d %H:%M:%S')}")
print(f"距离下次执行: {time_until_next/3600:.1f} 小时")
print(f"\n执行间隔: 每隔1小时")
print(f"工作时间: {self.work_start_hour}:00 - {self.work_end_hour}:00非工作时间自动休眠")
print(f"下次执行时间: {next_run.strftime('%Y-%m-%d %H:%M:%S')}")
print(f"距离下次执行: {time_until_next/60:.1f} 分钟")
print("\n按 Ctrl+C 可以停止守护进程")
print("="*70 + "\n")
self.logger.info(f"守护进程已启动,下次执行时间: {next_run.strftime('%Y-%m-%d %H:%M:%S')}")
self.logger.info(f"守护进程已启动,执行间隔: 每隔1小时工作时间: {self.work_start_hour}:00-{self.work_end_hour}:00下次执行时间: {next_run.strftime('%Y-%m-%d %H:%M:%S')}")
try:
while True:
# 检查是否在工作时间内
is_work, seconds_until_work = self.is_work_time()
if not is_work:
# 不在工作时间内,等待至工作时间
next_work_time = datetime.now() + timedelta(seconds=seconds_until_work)
self.logger.info(f"当前非工作时间,等待至 {next_work_time.strftime('%Y-%m-%d %H:%M:%S')}")
print(f"\n[休眠] 当前不在工作时间内({self.work_start_hour}:00-{self.work_end_hour}:00")
print(f"[休眠] 下次工作时间: {next_work_time.strftime('%Y-%m-%d %H:%M:%S')}")
print(f"[休眠] 等待 {seconds_until_work/3600:.1f} 小时...")
# 每30分钟检查一次
check_interval = 1800
elapsed = 0
while elapsed < seconds_until_work:
sleep_time = min(check_interval, seconds_until_work - elapsed)
time.sleep(sleep_time)
elapsed += sleep_time
remaining = seconds_until_work - elapsed
if remaining > 0:
print(f" 距离工作时间还有: {remaining/3600:.1f} 小时 ({datetime.now().strftime('%H:%M:%S')})")
continue
# 在工作时间内,执行定时任务
schedule.run_pending()
time.sleep(60) # 每分钟检查一次
@@ -427,13 +582,15 @@ def main():
print(" USE_PROXY=true/false - 是否使用代理")
print(" DAYS=7 - 抓取天数")
print(" MAX_RETRIES=3 - 重试次数")
print(" RUN_NOW=true/false - 是否立即执行\n")
print(" RUN_NOW=true/false - 是否立即执行")
print(" ENABLE_VALIDATION=true/false - 是否启用验证\n")
load_from_db = os.getenv('LOAD_FROM_DB', 'true').lower() == 'true'
use_proxy = os.getenv('USE_PROXY', 'true').lower() == 'true'
days = int(os.getenv('DAYS', '7'))
max_retries = int(os.getenv('MAX_RETRIES', '3'))
run_now = os.getenv('RUN_NOW', 'true').lower() == 'true'
enable_validation = os.getenv('ENABLE_VALIDATION', 'true').lower() == 'true'
else:
# 交互模式:显示菜单
# 配置选项
@@ -468,9 +625,15 @@ def main():
except ValueError:
max_retries = 3
# 5. 是否立即执行一次
print("\n5. 是否立即执行一次同步")
print(" (否则等待到午夜00:00执行)")
# 5. 是否启用数据验证
print("\n5. 是否启用数据验证与短信告警")
print(" (每次同步后自动验证数据失败时发送短信2222)")
enable_validation_input = input(" (y/n, 默认y): ").strip().lower() or 'y'
enable_validation = (enable_validation_input == 'y')
# 6. 是否立即执行一次
print("\n6. 是否立即执行一次同步?")
print(" (否则等待到下一个整点小时执行)")
run_now_input = input(" (y/n, 默认n): ").strip().lower() or 'n'
run_now = (run_now_input == 'y')
@@ -480,9 +643,13 @@ def main():
print(f" Cookie来源: {'数据库' if load_from_db else '本地文件'}")
print(f" 使用代理: {'' if use_proxy else ''}")
print(f" 抓取天数: {days}")
print(f" 工作时间: 8:00 - 24:00非工作时间自动休眠")
print(f" 错误重试: 最大{max_retries}")
print(f" 数据验证: {'已启用' if enable_validation else '已禁用'}")
if enable_validation:
print(f" 短信告警: 验证失败时发送 (错误代码2222)")
print(f" 立即执行: {'' if run_now else ''}")
print(f" 定时执行: 每天午夜00:00")
print(f" 定时执行: 每隔1小时")
print("="*70)
confirm = input("\n确认启动守护进程?(y/n): ").strip().lower()
@@ -491,7 +658,13 @@ def main():
return
# 创建守护进程
daemon = DataSyncDaemon(use_proxy=use_proxy, load_from_db=load_from_db, days=days, max_retries=max_retries)
daemon = DataSyncDaemon(
use_proxy=use_proxy,
load_from_db=load_from_db,
days=days,
max_retries=max_retries,
enable_validation=enable_validation
)
# 如果选择立即执行,先执行一次
if run_now:

769
data_validation.py Normal file
View File

@@ -0,0 +1,769 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
数据比对验证脚本
功能:
1. 顺序验证:验证不同数据源中记录的顺序一致性
2. 交叉验证:对比数据内容,识别缺失、新增或不匹配的记录
支持的数据源:
- JSON文件 (bjh_integrated_data.json)
- CSV文件 (ai_statistics_*.csv)
- MySQL数据库 (ai_statistics_* 表)
使用方法:
# 验证JSON和CSV的一致性
python data_validation.py --source json csv --date 2025-12-29
# 验证CSV和数据库的一致性
python data_validation.py --source csv database --date 2025-12-29
# 完整验证(三个数据源)
python data_validation.py --source json csv database --date 2025-12-29
# 验证特定表
python data_validation.py --source csv database --table ai_statistics_day --date 2025-12-29
"""
import sys
import os
import json
import csv
import argparse
from datetime import datetime, timedelta
from typing import Dict, List, Tuple, Optional, Any, Set
from collections import OrderedDict
import hashlib
# 设置UTF-8编码
if sys.platform == 'win32':
import io
if not isinstance(sys.stdout, io.TextIOWrapper) or sys.stdout.encoding != 'utf-8':
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')
if not isinstance(sys.stderr, io.TextIOWrapper) or sys.stderr.encoding != 'utf-8':
sys.stderr = io.TextIOWrapper(sys.stderr.buffer, encoding='utf-8')
# 导入数据库配置
try:
from database_config import DatabaseManager
except ImportError:
print("[X] 无法导入 database_config.py数据库验证功能将不可用")
DatabaseManager = None
class DataValidator:
"""数据比对验证器"""
def __init__(self, date_str: Optional[str] = None):
"""初始化
Args:
date_str: 目标日期 (YYYY-MM-DD),默认为昨天
"""
self.script_dir = os.path.dirname(os.path.abspath(__file__))
# 目标日期(默认为昨天)
if date_str:
self.target_date = datetime.strptime(date_str, '%Y-%m-%d')
else:
# 默认使用昨天的日期
self.target_date = datetime.now() - timedelta(days=1)
self.date_str = self.target_date.strftime('%Y-%m-%d')
# 数据库管理器
self.db_manager = None
if DatabaseManager:
try:
self.db_manager = DatabaseManager()
print(f"[OK] 数据库连接成功")
except Exception as e:
print(f"[!] 数据库连接失败: {e}")
# 验证结果
self.validation_results = {
'顺序验证': [],
'交叉验证': [],
'差异统计': {}
}
def load_json_data(self, file_path: Optional[str] = None) -> Optional[Any]:
"""加载JSON数据
Args:
file_path: JSON文件路径默认为 bjh_integrated_data.json
Returns:
JSON数据字典
"""
if not file_path:
file_path = os.path.join(self.script_dir, 'bjh_integrated_data.json')
try:
if not os.path.exists(file_path):
print(f"[X] JSON文件不存在: {file_path}")
return None
with open(file_path, 'r', encoding='utf-8') as f:
data = json.load(f)
print(f"[OK] 加载JSON文件: {file_path}")
print(f" 账号数量: {len(data) if isinstance(data, list) else 1}")
return data
except Exception as e:
print(f"[X] 加载JSON文件失败: {e}")
return None
def load_csv_data(self, csv_file: str) -> Optional[List[Dict]]:
"""加载CSV数据
Args:
csv_file: CSV文件名
Returns:
CSV数据列表
"""
csv_path = os.path.join(self.script_dir, csv_file)
try:
if not os.path.exists(csv_path):
print(f"[X] CSV文件不存在: {csv_path}")
return None
rows = []
with open(csv_path, 'r', encoding='utf-8-sig') as f:
reader = csv.DictReader(f)
rows = list(reader)
print(f"[OK] 加载CSV文件: {csv_file}")
print(f" 记录数量: {len(rows)}")
return rows
except Exception as e:
print(f"[X] 加载CSV文件失败: {e}")
return None
def load_database_data(self, table_name: str, date_filter: Optional[str] = None) -> Optional[List[Dict]]:
"""从数据库加载数据
Args:
table_name: 表名
date_filter: 日期过滤字段名(如 'date', 'stat_date'
Returns:
数据库记录列表
"""
if not self.db_manager:
print(f"[X] 数据库管理器未初始化")
return None
try:
# 构建SQL查询
if date_filter:
sql = f"SELECT * FROM {table_name} WHERE {date_filter} = %s ORDER BY author_name, channel"
params = (self.date_str,)
else:
sql = f"SELECT * FROM {table_name} ORDER BY author_name, channel"
params = None
rows = self.db_manager.execute_query(sql, params)
print(f"[OK] 加载数据库表: {table_name}")
if date_filter:
print(f" 过滤条件: {date_filter} = {self.date_str}")
print(f" 记录数量: {len(rows) if rows else 0}")
return rows if rows else []
except Exception as e:
print(f"[X] 加载数据库数据失败: {e}")
import traceback
traceback.print_exc()
return None
def generate_record_key(self, record: Dict, key_fields: List[str]) -> str:
"""生成记录唯一键
Args:
record: 数据记录
key_fields: 主键字段列表
Returns:
唯一键字符串
"""
key_values = []
for field in key_fields:
value = record.get(field, '')
# 统一转为字符串并去除空白
key_values.append(str(value).strip())
return '|'.join(key_values)
def calculate_record_hash(self, record: Dict, exclude_fields: Optional[Set[str]] = None) -> str:
"""计算记录的哈希值(用于内容比对)
Args:
record: 数据记录
exclude_fields: 排除的字段集合(如时间戳字段)
Returns:
MD5哈希值
"""
if exclude_fields is None:
exclude_fields = {'updated_at', 'created_at', 'fetch_time'}
# 排序字段并生成稳定的字符串
sorted_items = []
for key in sorted(record.keys()):
if key not in exclude_fields:
value = record.get(key, '')
# 浮点数保留4位小数
if isinstance(value, float):
value = f"{value:.4f}"
sorted_items.append(f"{key}={value}")
content = '|'.join(sorted_items)
return hashlib.md5(content.encode('utf-8')).hexdigest()
def validate_order(self, source1_data: List[Dict], source2_data: List[Dict],
source1_name: str, source2_name: str,
key_fields: List[str]) -> Dict:
"""顺序验证:验证两个数据源中记录的顺序是否一致
Args:
source1_data: 数据源1的数据
source2_data: 数据源2的数据
source1_name: 数据源1名称
source2_name: 数据源2名称
key_fields: 主键字段列表
Returns:
验证结果字典
"""
print(f"\n{'='*70}")
print(f"顺序验证: {source1_name} vs {source2_name}")
print(f"{'='*70}")
result = {
'source1': source1_name,
'source2': source2_name,
'source1_count': len(source1_data),
'source2_count': len(source2_data),
'order_match': True,
'mismatches': []
}
# 生成记录键列表
source1_keys = [self.generate_record_key(r, key_fields) for r in source1_data]
source2_keys = [self.generate_record_key(r, key_fields) for r in source2_data]
# 比对顺序
min_len = min(len(source1_keys), len(source2_keys))
for i in range(min_len):
if source1_keys[i] != source2_keys[i]:
result['order_match'] = False
result['mismatches'].append({
'position': i,
'source1_key': source1_keys[i],
'source2_key': source2_keys[i]
})
# 输出结果
if result['order_match'] and len(source1_keys) == len(source2_keys):
print(f"[✓] 顺序一致,记录数相同: {len(source1_keys)}")
else:
print(f"[X] 顺序不一致")
print(f" {source1_name} 记录数: {len(source1_keys)}")
print(f" {source2_name} 记录数: {len(source2_keys)}")
if result['mismatches']:
print(f" 不匹配位置数: {len(result['mismatches'])}")
# 显示前5个不匹配
for mismatch in result['mismatches'][:5]:
print(f" 位置{mismatch['position']}: {mismatch['source1_key']} != {mismatch['source2_key']}")
return result
def validate_cross(self, source1_data: List[Dict], source2_data: List[Dict],
source1_name: str, source2_name: str,
key_fields: List[str],
compare_fields: Optional[List[str]] = None) -> Dict:
"""交叉验证:对比数据内容,识别缺失、新增或不匹配的记录
Args:
source1_data: 数据源1的数据
source2_data: 数据源2的数据
source1_name: 数据源1名称
source2_name: 数据源2名称
key_fields: 主键字段列表
compare_fields: 需要对比的字段列表None表示全部字段
Returns:
验证结果字典
"""
print(f"\n{'='*70}")
print(f"交叉验证: {source1_name} vs {source2_name}")
print(f"{'='*70}")
# 构建字典key -> record
source1_dict = {}
for record in source1_data:
key = self.generate_record_key(record, key_fields)
source1_dict[key] = record
source2_dict = {}
for record in source2_data:
key = self.generate_record_key(record, key_fields)
source2_dict[key] = record
# 查找差异
only_in_source1 = set(source1_dict.keys()) - set(source2_dict.keys())
only_in_source2 = set(source2_dict.keys()) - set(source1_dict.keys())
common_keys = set(source1_dict.keys()) & set(source2_dict.keys())
# 对比共同记录的字段值
field_mismatches = []
for key in common_keys:
record1 = source1_dict[key]
record2 = source2_dict[key]
# 确定要比对的字段
if compare_fields:
fields_to_compare = compare_fields
else:
fields_to_compare = set(record1.keys()) & set(record2.keys())
# 比对每个字段
mismatches_in_record = {}
for field in fields_to_compare:
val1 = record1.get(field, '')
val2 = record2.get(field, '')
# 类型转换和标准化
val1_normalized = self._normalize_value(val1)
val2_normalized = self._normalize_value(val2)
if val1_normalized != val2_normalized:
mismatches_in_record[field] = {
source1_name: val1,
source2_name: val2
}
if mismatches_in_record:
field_mismatches.append({
'key': key,
'fields': mismatches_in_record
})
# 输出结果
result = {
'source1': source1_name,
'source2': source2_name,
'source1_count': len(source1_data),
'source2_count': len(source2_data),
'only_in_source1': list(only_in_source1),
'only_in_source2': list(only_in_source2),
'common_count': len(common_keys),
'field_mismatches': field_mismatches
}
print(f"记录数统计:")
print(f" {source1_name}: {len(source1_data)}")
print(f" {source2_name}: {len(source2_data)}")
print(f" 共同记录: {len(common_keys)}")
print(f" 仅在{source1_name}: {len(only_in_source1)}")
print(f" 仅在{source2_name}: {len(only_in_source2)}")
print(f" 字段不匹配: {len(field_mismatches)}")
# 显示详细差异
if only_in_source1:
print(f"\n仅在{source1_name}中的记录前5条:")
for key in list(only_in_source1)[:5]:
print(f" - {key}")
if only_in_source2:
print(f"\n仅在{source2_name}中的记录前5条:")
for key in list(only_in_source2)[:5]:
print(f" - {key}")
if field_mismatches:
print(f"\n字段值不匹配的记录前3条:")
for mismatch in field_mismatches[:3]:
print(f" 记录: {mismatch['key']}")
for field, values in list(mismatch['fields'].items())[:5]: # 每条记录最多显示5个字段
print(f" 字段 {field}:")
print(f" {source1_name}: {values[source1_name]}")
print(f" {source2_name}: {values[source2_name]}")
return result
def _normalize_value(self, value: Any) -> str:
"""标准化值用于比对
Args:
value: 原始值
Returns:
标准化后的字符串
"""
if value is None or value == '':
return ''
# 浮点数保留4位小数
if isinstance(value, float):
return f"{value:.4f}"
# 整数转字符串
if isinstance(value, int):
return str(value)
# 字符串去除首尾空白
return str(value).strip()
def validate_ai_statistics(self, sources: List[str]) -> bool:
"""验证 ai_statistics 表数据
Args:
sources: 数据源列表 ['json', 'csv', 'database']
Returns:
验证是否通过
"""
print(f"\n{'#'*70}")
print(f"# 验证 ai_statistics 表数据")
print(f"# 日期: {self.date_str}")
print(f"{'#'*70}")
# 主键字段
key_fields = ['author_name', 'channel']
# 重要字段
compare_fields = [
'submission_count', 'read_count', 'comment_count', 'comment_rate',
'like_count', 'like_rate', 'favorite_count', 'favorite_rate',
'share_count', 'share_rate', 'slide_ratio', 'baidu_search_volume'
]
# 加载数据
data_sources = {}
if 'json' in sources:
json_data = self.load_json_data()
if json_data:
# 确保json_data是列表类型
if not isinstance(json_data, list):
json_data = [json_data]
# 从JSON提取 ai_statistics 数据
json_records = self._extract_ai_statistics_from_json(json_data)
data_sources['json'] = json_records
if 'csv' in sources:
csv_data = self.load_csv_data('ai_statistics.csv')
if csv_data:
data_sources['csv'] = csv_data
if 'database' in sources:
db_data = self.load_database_data('ai_statistics', date_filter='date')
if db_data:
data_sources['database'] = db_data
# 执行验证
if len(data_sources) < 2:
print(f"[X] 数据源不足至少需要2个数据源进行比对")
return False
# 两两比对
source_names = list(data_sources.keys())
all_passed = True
for i in range(len(source_names)):
for j in range(i + 1, len(source_names)):
source1_name = source_names[i]
source2_name = source_names[j]
# 只对 json vs csv 进行顺序验证
if (source1_name == 'json' and source2_name == 'csv') or \
(source1_name == 'csv' and source2_name == 'json'):
# 顺序验证
order_result = self.validate_order(
data_sources[source1_name],
data_sources[source2_name],
source1_name,
source2_name,
key_fields
)
self.validation_results['顺序验证'].append(order_result)
if not order_result['order_match']:
all_passed = False
# 交叉验证(所有组合都执行)
cross_result = self.validate_cross(
data_sources[source1_name],
data_sources[source2_name],
source1_name,
source2_name,
key_fields,
compare_fields
)
self.validation_results['交叉验证'].append(cross_result)
# 判断是否通过
if cross_result['only_in_source1'] or \
cross_result['only_in_source2'] or \
cross_result['field_mismatches']:
all_passed = False
return all_passed
def validate_ai_statistics_day(self, sources: List[str]) -> bool:
"""验证 ai_statistics_day 表数据
Args:
sources: 数据源列表
Returns:
验证是否通过
"""
print(f"\n{'#'*70}")
print(f"# 验证 ai_statistics_day 表数据")
print(f"# 日期: {self.date_str}")
print(f"{'#'*70}")
key_fields = ['author_name', 'channel', 'stat_date']
compare_fields = [
'total_submission_count', 'total_read_count', 'total_comment_count',
'total_like_count', 'total_favorite_count', 'total_share_count',
'avg_comment_rate', 'avg_like_rate', 'avg_favorite_rate',
'avg_share_rate', 'avg_slide_ratio', 'total_baidu_search_volume'
]
# 加载数据
data_sources = {}
if 'csv' in sources:
csv_data = self.load_csv_data('ai_statistics_day.csv')
if csv_data:
data_sources['csv'] = csv_data
if 'database' in sources:
db_data = self.load_database_data('ai_statistics_day', date_filter='stat_date')
if db_data:
data_sources['database'] = db_data
if len(data_sources) < 2:
print(f"[X] 数据源不足")
return False
# 执行验证
source_names = list(data_sources.keys())
all_passed = True
for i in range(len(source_names)):
for j in range(i + 1, len(source_names)):
source1_name = source_names[i]
source2_name = source_names[j]
# ai_statistics_day 表不需要顺序验证,只执行交叉验证
cross_result = self.validate_cross(
data_sources[source1_name],
data_sources[source2_name],
source1_name,
source2_name,
key_fields,
compare_fields
)
self.validation_results['交叉验证'].append(cross_result)
if cross_result['only_in_source1'] or \
cross_result['only_in_source2'] or \
cross_result['field_mismatches']:
all_passed = False
return all_passed
def _extract_ai_statistics_from_json(self, json_data: List[Dict]) -> List[Dict]:
"""从JSON数据中提取ai_statistics格式的数据
Args:
json_data: JSON数据
Returns:
ai_statistics格式的数据列表
"""
records = []
for account_data in json_data:
account_id = account_data.get('account_id', '')
if not account_id:
continue
analytics = account_data.get('analytics', {})
apis = analytics.get('apis', [])
if apis and len(apis) > 0:
api_data = apis[0].get('data', {})
if api_data.get('errno') == 0:
total_info = api_data.get('data', {}).get('total_info', {})
record = {
'author_name': account_id,
'channel': 1,
'submission_count': int(total_info.get('publish_count', 0) or 0),
'read_count': int(total_info.get('view_count', 0) or 0),
'comment_count': int(total_info.get('comment_count', 0) or 0),
'comment_rate': float(total_info.get('comment_rate', 0) or 0) / 100,
'like_count': int(total_info.get('likes_count', 0) or 0),
'like_rate': float(total_info.get('likes_rate', 0) or 0) / 100,
'favorite_count': int(total_info.get('collect_count', 0) or 0),
'favorite_rate': float(total_info.get('collect_rate', 0) or 0) / 100,
'share_count': int(total_info.get('share_count', 0) or 0),
'share_rate': float(total_info.get('share_rate', 0) or 0) / 100,
'slide_ratio': float(total_info.get('pic_slide_rate', 0) or 0) / 100,
'baidu_search_volume': int(total_info.get('disp_pv', 0) or 0)
}
records.append(record)
return records
def generate_report(self, output_file: Optional[str] = None) -> None:
"""生成验证报告
Args:
output_file: 输出文件路径
"""
if not output_file:
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
output_file = os.path.join(self.script_dir, f'validation_report_{timestamp}.txt')
try:
with open(output_file, 'w', encoding='utf-8') as f:
f.write(f"数据验证报告\n")
f.write(f"{'='*70}\n")
f.write(f"生成时间: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")
f.write(f"目标日期: {self.date_str}\n\n")
# 顺序验证结果
f.write(f"\n顺序验证结果\n")
f.write(f"{'-'*70}\n")
for result in self.validation_results['顺序验证']:
f.write(f"{result['source1']} vs {result['source2']}\n")
f.write(f" 顺序匹配: {'' if result['order_match'] else ''}\n")
f.write(f" {result['source1']} 记录数: {result['source1_count']}\n")
f.write(f" {result['source2']} 记录数: {result['source2_count']}\n")
if result['mismatches']:
f.write(f" 不匹配数: {len(result['mismatches'])}\n")
f.write(f"\n")
# 交叉验证结果
f.write(f"\n交叉验证结果\n")
f.write(f"{'-'*70}\n")
for result in self.validation_results['交叉验证']:
f.write(f"{result['source1']} vs {result['source2']}\n")
f.write(f" 共同记录: {result['common_count']}\n")
f.write(f" 仅在{result['source1']}: {len(result['only_in_source1'])}\n")
f.write(f" 仅在{result['source2']}: {len(result['only_in_source2'])}\n")
f.write(f" 字段不匹配: {len(result['field_mismatches'])}\n")
f.write(f"\n")
print(f"\n[OK] 验证报告已生成: {output_file}")
except Exception as e:
print(f"[X] 生成报告失败: {e}")
def main():
"""主函数"""
parser = argparse.ArgumentParser(
description='数据比对验证脚本',
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
示例用法:
# 验证JSON和CSV
python data_validation.py --source json csv --date 2025-12-29
# 验证CSV和数据库
python data_validation.py --source csv database --date 2025-12-29
# 完整验证(三个数据源)
python data_validation.py --source json csv database --date 2025-12-29
# 验证特定表
python data_validation.py --source csv database --table ai_statistics_day --date 2025-12-29
"""
)
parser.add_argument(
'--source',
nargs='+',
choices=['json', 'csv', 'database'],
default=['json', 'csv', 'database'],
help='数据源列表至少2个'
)
parser.add_argument(
'--date',
type=str,
default=(datetime.now() - timedelta(days=1)).strftime('%Y-%m-%d'),
help='目标日期 (YYYY-MM-DD),默认为昨天'
)
parser.add_argument(
'--table',
type=str,
choices=['ai_statistics', 'ai_statistics_day', 'ai_statistics_days'],
default='ai_statistics',
help='要验证的表名'
)
parser.add_argument(
'--report',
type=str,
help='输出报告文件路径'
)
args = parser.parse_args()
# 检查数据源数量
if len(args.source) < 2:
print("[X] 至少需要指定2个数据源进行比对")
return 1
# 创建验证器
validator = DataValidator(date_str=args.date)
# 执行验证
try:
if args.table == 'ai_statistics':
passed = validator.validate_ai_statistics(args.source)
elif args.table == 'ai_statistics_day':
passed = validator.validate_ai_statistics_day(args.source)
else:
print(f"[!] 表 {args.table} 的验证功能暂未实现")
passed = False
# 生成报告
validator.generate_report(args.report)
# 输出总结
print(f"\n{'='*70}")
if passed:
print(f"[✓] 验证通过:所有数据源数据一致")
else:
print(f"[X] 验证失败:发现数据差异")
print(f"{'='*70}")
return 0 if passed else 1
except Exception as e:
print(f"\n[X] 验证过程出错: {e}")
import traceback
traceback.print_exc()
return 1
if __name__ == '__main__':
sys.exit(main())

441
data_validation_with_sms.py Normal file
View File

@@ -0,0 +1,441 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
数据验证与短信告警集成脚本
功能:
1. 执行数据验证JSON/CSV/数据库)
2. 如果验证失败,发送阿里云短信告警
3. 支持定时任务调度每天9点执行
使用方法:
# 手动执行一次验证
python data_validation_with_sms.py
# 指定日期验证
python data_validation_with_sms.py --date 2025-12-29
# 配置定时任务Windows任务计划程序
python data_validation_with_sms.py --setup-schedule
"""
import sys
import os
import json
from datetime import datetime, timedelta
from typing import Dict, List, Optional
# 添加项目根目录到路径
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
# 导入数据验证模块
from data_validation import DataValidator
# 阿里云短信SDK导入
try:
from alibabacloud_dysmsapi20170525.client import Client as Dysmsapi20170525Client
from alibabacloud_credentials.client import Client as CredentialClient
from alibabacloud_credentials.models import Config as CredentialConfig
from alibabacloud_tea_openapi import models as open_api_models
from alibabacloud_dysmsapi20170525 import models as dysmsapi_20170525_models
from alibabacloud_tea_util import models as util_models
SMS_AVAILABLE = True
except ImportError:
print("[!] 阿里云短信SDK未安装短信功能将不可用")
print(" 安装命令: pip install alibabacloud_dysmsapi20170525")
SMS_AVAILABLE = False
class SMSAlertConfig:
"""短信告警配置"""
def __init__(self):
"""从配置文件或环境变量加载配置"""
# 尝试从配置文件加载
config_file = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'sms_config.json')
config_data = {}
if os.path.exists(config_file):
try:
with open(config_file, 'r', encoding='utf-8') as f:
config_data = json.load(f)
except Exception as e:
print(f"[!] 读取配置文件失败: {e}")
# 阿里云访问凭据(优先使用环境变量)
self.ACCESS_KEY_ID = os.environ.get(
'ALIBABA_CLOUD_ACCESS_KEY_ID',
config_data.get('access_key_id', 'LTAI5tSMvnCJdqkZtCVWgh8R')
)
self.ACCESS_KEY_SECRET = os.environ.get(
'ALIBABA_CLOUD_ACCESS_KEY_SECRET',
config_data.get('access_key_secret', 'nyFzXyIi47peVLK4wR2qqbPezmU79W')
)
# 短信签名和模板
self.SIGN_NAME = config_data.get('sign_name', '北京乐航时代科技')
self.TEMPLATE_CODE = config_data.get('template_code', 'SMS_486210104')
# 接收短信的手机号(多个号码用逗号分隔)
self.PHONE_NUMBERS = config_data.get('phone_numbers', '13621242430')
# 短信endpoint
self.ENDPOINT = config_data.get('endpoint', 'dysmsapi.aliyuncs.com')
@staticmethod
def get_instance():
"""获取配置实例(单例模式)"""
if not hasattr(SMSAlertConfig, '_instance'):
SMSAlertConfig._instance = SMSAlertConfig()
return SMSAlertConfig._instance
class DataValidationWithSMS:
"""数据验证与短信告警集成器"""
def __init__(self, date_str: Optional[str] = None):
"""初始化
Args:
date_str: 目标日期 (YYYY-MM-DD),默认为昨天
"""
self.validator = DataValidator(date_str)
self.sms_client = None
self.sms_config = SMSAlertConfig.get_instance()
if SMS_AVAILABLE:
self.sms_client = self._create_sms_client()
def _create_sms_client(self) -> Optional[Dysmsapi20170525Client]:
"""创建阿里云短信客户端
Returns:
短信客户端实例
"""
try:
credential_config = CredentialConfig(
type='access_key',
access_key_id=self.sms_config.ACCESS_KEY_ID,
access_key_secret=self.sms_config.ACCESS_KEY_SECRET
)
credential = CredentialClient(credential_config)
config = open_api_models.Config(
credential=credential,
endpoint=self.sms_config.ENDPOINT
)
return Dysmsapi20170525Client(config)
except Exception as e:
print(f"[X] 创建短信客户端失败: {e}")
return None
def send_sms_alert(self, error_code: str, error_details: str) -> bool:
"""发送短信告警
Args:
error_code: 错误代码(如 "2222"
error_details: 错误详情
Returns:
是否发送成功
"""
if not self.sms_client:
print(f"[X] 短信客户端未初始化,无法发送告警")
return False
try:
# 构建短信请求
send_sms_request = dysmsapi_20170525_models.SendSmsRequest(
phone_numbers=self.sms_config.PHONE_NUMBERS,
sign_name=self.sms_config.SIGN_NAME,
template_code=self.sms_config.TEMPLATE_CODE,
template_param=json.dumps({"code": error_code})
)
runtime = util_models.RuntimeOptions()
print(f"\n[短信] 正在发送告警短信...")
print(f" 接收号码: {self.sms_config.PHONE_NUMBERS}")
print(f" 错误代码: {error_code}")
print(f" 错误详情: {error_details[:100]}...")
# 发送短信
resp = self.sms_client.send_sms_with_options(send_sms_request, runtime)
# 检查响应
result = resp.to_map()
if result.get('body', {}).get('Code') == 'OK':
print(f"[✓] 短信发送成功")
print(f" 请求ID: {result.get('body', {}).get('RequestId')}")
print(f" 消息ID: {result.get('body', {}).get('BizId')}")
return True
else:
print(f"[X] 短信发送失败")
print(f" 错误码: {result.get('body', {}).get('Code')}")
print(f" 错误信息: {result.get('body', {}).get('Message')}")
return False
except Exception as e:
print(f"[X] 发送短信异常: {e}")
if hasattr(e, 'data') and e.data:
print(f" 诊断地址: {e.data.get('Recommend')}")
return False
def run_validation(self, sources: List[str] = None, table: str = 'ai_statistics') -> bool:
"""执行数据验证
Args:
sources: 数据源列表,默认 ['json', 'csv', 'database']
table: 要验证的表名
Returns:
验证是否通过
"""
if sources is None:
sources = ['json', 'csv', 'database']
print(f"\n{'='*70}")
print(f"数据验证与短信告警")
print(f"{'='*70}")
print(f"验证日期: {self.validator.date_str}")
print(f"验证表: {table}")
print(f"数据源: {', '.join(sources)}")
print(f"{'='*70}")
try:
# 执行验证
if table == 'ai_statistics':
passed = self.validator.validate_ai_statistics(sources)
elif table == 'ai_statistics_day':
passed = self.validator.validate_ai_statistics_day(sources)
elif table == 'ai_statistics_days':
# TODO: 实现 ai_statistics_days 验证
print(f"[!] 表 {table} 的验证功能暂未实现")
passed = False
else:
print(f"[X] 未知的表名: {table}")
passed = False
return passed
except Exception as e:
print(f"\n[X] 验证过程出错: {e}")
import traceback
traceback.print_exc()
return False
def generate_error_summary(self) -> str:
"""生成错误摘要信息
Returns:
错误摘要字符串
"""
results = self.validator.validation_results
summary_lines = []
summary_lines.append(f"日期: {self.validator.date_str}")
# 顺序验证错误
order_errors = [r for r in results['顺序验证'] if not r['order_match']]
if order_errors:
summary_lines.append(f"顺序不一致: {len(order_errors)}")
# 交叉验证错误
cross_errors = []
for r in results['交叉验证']:
if r['only_in_source1'] or r['only_in_source2'] or r['field_mismatches']:
cross_errors.append(r)
if cross_errors:
summary_lines.append(f"数据不一致: {len(cross_errors)}")
# 统计详情
total_missing = sum(len(r['only_in_source1']) for r in cross_errors)
total_extra = sum(len(r['only_in_source2']) for r in cross_errors)
total_diff = sum(len(r['field_mismatches']) for r in cross_errors)
if total_missing:
summary_lines.append(f" 缺失记录: {total_missing}")
if total_extra:
summary_lines.append(f" 多余记录: {total_extra}")
if total_diff:
summary_lines.append(f" 字段差异: {total_diff}")
return '; '.join(summary_lines)
def run_with_alert(self, sources: List[str] = None, table: str = 'ai_statistics') -> int:
"""执行验证并在失败时发送告警
Args:
sources: 数据源列表
table: 要验证的表名
Returns:
退出码0=成功1=失败)
"""
# 执行验证
passed = self.run_validation(sources, table)
# 创建验证报告目录
script_dir = os.path.dirname(os.path.abspath(__file__))
validation_reports_dir = os.path.join(script_dir, 'validation_reports')
if not os.path.exists(validation_reports_dir):
os.makedirs(validation_reports_dir)
# 生成报告
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
report_file = os.path.join(
validation_reports_dir,
f'validation_report_{timestamp}.txt'
)
self.validator.generate_report(report_file)
# 判断是否需要发送告警
if not passed:
print(f"\n{'='*70}")
print(f"[!] 验证失败,准备发送短信告警")
print(f"{'='*70}")
# 生成错误摘要
error_summary = self.generate_error_summary()
# 发送短信(错误代码固定为 "2222"
sms_sent = self.send_sms_alert("2222", error_summary)
if sms_sent:
print(f"\n[✓] 告警短信已发送")
else:
print(f"\n[X] 告警短信发送失败")
print(f"\n详细报告: {report_file}")
return 1
else:
print(f"\n{'='*70}")
print(f"[✓] 验证通过,无需发送告警")
print(f"{'='*70}")
return 0
def setup_windows_task_scheduler():
"""配置Windows任务计划程序每天9点执行"""
print(f"\n{'='*70}")
print(f"配置Windows任务计划程序")
print(f"{'='*70}")
script_path = os.path.abspath(__file__)
python_path = sys.executable
# 生成任务计划XML配置
task_name = "DataValidationWithSMS"
print(f"\n请手动创建Windows任务计划或使用以下PowerShell命令\n")
# PowerShell命令
ps_command = f"""
# 创建任务计划
$action = New-ScheduledTaskAction -Execute '{python_path}' -Argument '{script_path}'
$trigger = New-ScheduledTaskTrigger -Daily -At 9:00AM
$settings = New-ScheduledTaskSettingsSet -AllowStartIfOnBatteries -DontStopIfGoingOnBatteries
$principal = New-ScheduledTaskPrincipal -UserId "$env:USERNAME" -RunLevel Highest
Register-ScheduledTask -TaskName "{task_name}" -Action $action -Trigger $trigger -Settings $settings -Principal $principal -Description "每天9点执行数据验证并发送短信告警"
Write-Host "任务计划已创建: {task_name}"
"""
print(ps_command)
print(f"\n或者手动配置:")
print(f"1. 打开 '任务计划程序' (taskschd.msc)")
print(f"2. 创建基本任务")
print(f"3. 名称: {task_name}")
print(f"4. 触发器: 每天 上午9:00")
print(f"5. 操作: 启动程序")
print(f"6. 程序: {python_path}")
print(f"7. 参数: {script_path}")
print(f"8. 完成")
print(f"\n{'='*70}")
def main():
"""主函数"""
import argparse
parser = argparse.ArgumentParser(
description='数据验证与短信告警集成脚本',
formatter_class=argparse.RawDescriptionHelpFormatter
)
parser.add_argument(
'--date',
type=str,
help='目标日期 (YYYY-MM-DD),默认为昨天'
)
parser.add_argument(
'--source',
nargs='+',
choices=['json', 'csv', 'database'],
default=['json', 'csv', 'database'],
help='数据源列表'
)
parser.add_argument(
'--table',
type=str,
choices=['ai_statistics', 'ai_statistics_day', 'ai_statistics_days'],
default='ai_statistics',
help='要验证的表名'
)
parser.add_argument(
'--setup-schedule',
action='store_true',
help='配置定时任务每天9点执行'
)
parser.add_argument(
'--test-sms',
action='store_true',
help='测试短信发送功能'
)
parser.add_argument(
'--no-sms',
action='store_true',
help='禁用短信发送(仅验证数据)'
)
args = parser.parse_args()
# 配置定时任务
if args.setup_schedule:
setup_windows_task_scheduler()
return 0
# 测试短信
if args.test_sms:
print(f"\n{'='*70}")
print(f"测试短信发送功能")
print(f"{'='*70}")
validator = DataValidationWithSMS()
success = validator.send_sms_alert(
"2222",
"这是一条测试短信,数据验证系统运行正常"
)
return 0 if success else 1
# 执行验证
try:
validator = DataValidationWithSMS(date_str=args.date)
return validator.run_with_alert(args.source, args.table)
except Exception as e:
print(f"\n[X] 程序执行失败: {e}")
import traceback
traceback.print_exc()
return 1
if __name__ == '__main__':
sys.exit(main())

View File

@@ -28,70 +28,19 @@ CREATE TABLE `ai_statistics_days` (
`channel` tinyint(1) NOT NULL DEFAULT 1 COMMENT '1=baidu|2=toutiao|3=weixin',
`stat_date` date NOT NULL COMMENT '统计日期(自然日)',
`daily_published_count` int NULL DEFAULT 0 COMMENT '单日发文量',
`cumulative_published_count` int NULL DEFAULT 0 COMMENT '累计发文量(从起始日到stat_date的总和',
`monthly_revenue` decimal(18, 2) NULL DEFAULT 0.00 COMMENT '月收益stat_date所在自然月的总收益',
`weekly_revenue` decimal(18, 2) NULL DEFAULT 0.00 COMMENT '当周收益stat_date所在自然周的总收益周一至周日',
`revenue_mom_growth_rate` decimal(10, 6) NULL DEFAULT 0.000000 COMMENT '收益月环比增长率((本月收益 - 上月收益) / NULLIF(上月收益, 0)',
`revenue_wow_growth_rate` decimal(10, 6) NULL DEFAULT 0.000000 COMMENT '收益周环比增长率((本周收益 - 上周收益) / NULLIF(上周收益, 0)',
`cumulative_published_count` int NULL DEFAULT 0 COMMENT '累计发文量(当月1号至stat_date的累计发文量',
`day_revenue` decimal(18, 2) NULL DEFAULT 0.00 COMMENT '收益',
`created_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
`updated_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
PRIMARY KEY (`id`) USING BTREE,
UNIQUE INDEX `uk_stat_date`(`stat_date` ASC) USING BTREE,
INDEX `idx_stat_date`(`stat_date` ASC) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 51 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_0900_ai_ci COMMENT = 'AI内容每日核心指标汇总表含累计、收益及环比' ROW_FORMAT = Dynamic;
UNIQUE INDEX `uk_author_stat_date`(`author_id` ASC, `channel` ASC, `stat_date` ASC) USING BTREE,
INDEX `idx_stat_date`(`stat_date` ASC) USING BTREE,
INDEX `idx_author_id`(`author_id` ASC) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 1 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_0900_ai_ci COMMENT = 'AI内容每日核心指标汇总表日粒度数据' ROW_FORMAT = Dynamic;
-- ----------------------------
-- Records of ai_statistics_days
-- ----------------------------
INSERT INTO `ai_statistics_days` VALUES (1, 129, '梁金宇医生', 1, '2025-10-28', 27, 27, 198.44, 198.44, 0.000000, 0.000000, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (2, 127, '黄燕飞医生', 1, '2025-10-29', 6, 33, 382.29, 382.29, 0.000000, 0.000000, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (3, 151, '皮肤科赵鹏', 1, '2025-10-30', 30, 63, 1317.62, 1317.62, 0.000000, 0.000000, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (4, 132, '石爱真医生', 1, '2025-10-31', 22, 85, 1435.84, 1435.84, 0.000000, 0.000000, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (5, 211, '中医王倚东', 1, '2025-11-01', 27, 112, 116.15, 1551.99, -0.919107, 0.000000, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (6, 176, '武娜中医', 1, '2025-11-02', 11, 123, 1025.18, 2461.02, -0.286007, 0.000000, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (7, 193, '血管外科钟若雷', 1, '2025-11-03', 6, 129, 1462.23, 437.05, 0.018379, -0.822411, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (8, 104, '男科医生刘德风', 1, '2025-11-04', 5, 134, 2050.55, 1025.37, 0.428119, -0.583356, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (9, 175, '静脉曲张的杀手医生', 1, '2025-11-05', 12, 146, 3004.99, 1979.81, 1.092845, -0.195533, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (10, 202, '整形外科侯丽平', 1, '2025-11-06', 26, 172, 3260.49, 2235.31, 1.270789, -0.091714, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (11, 117, '唐小明医生', 1, '2025-11-07', 13, 185, 4064.21, 3039.03, 1.830545, 0.234866, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (12, 214, '传海2018', 1, '2025-11-08', 12, 197, 4961.73, 3936.55, 2.455629, 0.599560, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (13, 170, '任志宏中医', 1, '2025-11-09', 18, 215, 5160.70, 4135.52, 2.594203, 0.680409, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (14, 179, '风湿免疫专家李小峰', 1, '2025-11-10', 18, 233, 5794.59, 633.89, 3.035679, -0.846721, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (15, 202, '整形外科侯丽平', 1, '2025-11-11', 14, 247, 6673.98, 1513.28, 3.648136, -0.634077, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (16, 203, '中医针灸侯医生', 1, '2025-11-12', 19, 266, 7412.32, 2251.62, 4.162358, -0.455541, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (17, 217, '中医杨志杰', 1, '2025-11-13', 24, 290, 7641.13, 2480.43, 4.321714, -0.400213, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (18, 115, '冯玉燕医生', 1, '2025-11-14', 25, 315, 8384.03, 3223.33, 4.839112, -0.220574, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (19, 184, '耳鼻喉科贾闯医生', 1, '2025-11-15', 22, 337, 9067.99, 3907.29, 5.315460, -0.055188, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (20, 131, '骆小辉副主任医师', 1, '2025-11-16', 14, 351, 9538.20, 4377.50, 5.642941, 0.058513, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (21, 130, '妇产科许春艳', 1, '2025-11-17', 9, 360, 9827.47, 289.27, 5.844405, -0.933919, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (22, 211, '中医王倚东', 1, '2025-11-18', 19, 379, 10482.77, 944.57, 6.300793, -0.784222, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (23, 201, '面部提升梁永鑫', 1, '2025-11-19', 22, 401, 11126.90, 1588.70, 6.749401, -0.637076, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (24, 181, '针灸科高小勇医生', 1, '2025-11-20', 8, 409, 11849.59, 2311.39, 7.252723, -0.471984, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (25, 122, '高丽娜中医', 1, '2025-11-21', 6, 415, 12167.82, 2629.62, 7.474356, -0.399287, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (26, 105, '抗衰孟大夫', 1, '2025-11-22', 29, 444, 12921.45, 3383.25, 7.999227, -0.227127, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (27, 111, '赖婷医生', 1, '2025-11-23', 13, 457, 13852.86, 4314.66, 8.647913, -0.014355, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (28, 214, '传海2018', 1, '2025-11-24', 23, 480, 14590.97, 738.11, 9.161975, -0.828930, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (29, 226, '中医李伟杰', 1, '2025-11-25', 27, 507, 14899.04, 1046.18, 9.376532, -0.757529, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (30, 180, '尹海琴医生', 1, '2025-11-26', 26, 533, 15860.20, 2007.34, 10.045938, -0.534763, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (31, 136, '普外科马春雷', 1, '2025-11-27', 15, 548, 16177.26, 2324.40, 10.266757, -0.461279, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (32, 241, '测试作者_更新', 1, '2025-11-28', 8, 556, 16606.60, 2753.74, 10.565773, -0.361771, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (33, 153, '曹凤娇中医', 1, '2025-11-29', 24, 580, 16946.60, 3093.74, 10.802569, -0.282970, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (34, 175, '静脉曲张的杀手医生', 1, '2025-11-30', 20, 600, 17569.87, 3717.01, 11.236649, -0.138516, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (35, 173, '赵剑锋医生', 1, '2025-12-01', 15, 615, 687.78, 687.78, -0.960855, -0.814964, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (36, 196, '血管外科阿力木', 1, '2025-12-02', 22, 637, 1298.68, 1298.68, -0.926085, -0.650612, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (37, 248, '百胜号', 2, '2025-12-03', 7, 644, 1620.63, 1620.63, -0.907761, -0.563996, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (38, 185, '郭俊恒中医', 1, '2025-12-04', 18, 662, 2172.80, 2172.80, -0.876334, -0.415444, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (39, 104, '男科医生刘德风', 1, '2025-12-05', 18, 680, 2813.87, 2813.87, -0.839847, -0.242975, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (40, 214, '传海2018', 1, '2025-12-06', 15, 695, 3393.18, 3393.18, -0.806875, -0.087121, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (41, 245, '文龙号', 2, '2025-12-07', 20, 715, 4382.30, 4382.30, -0.750579, 0.178985, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (42, 198, '拇外翻医生李昕宇', 1, '2025-12-08', 5, 720, 4487.00, 104.70, -0.744620, -0.976108, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (43, 175, '静脉曲张的杀手医生', 1, '2025-12-09', 16, 736, 4628.11, 245.81, -0.736588, -0.943908, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (44, 110, '白凌文医生', 1, '2025-12-10', 10, 746, 5393.95, 1011.65, -0.693000, -0.769151, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (45, 141, '耳鼻喉科杨书勋医生', 1, '2025-12-11', 7, 753, 5897.48, 1515.18, -0.664341, -0.654250, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (46, 226, '中医李伟杰', 1, '2025-12-12', 11, 764, 6830.48, 2448.18, -0.611239, -0.441348, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (47, 183, '杜晋芳中医', 1, '2025-12-13', 22, 786, 7500.72, 3118.42, -0.573092, -0.288406, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (48, 192, '整形医生路会', 1, '2025-12-14', 26, 812, 7738.47, 3356.17, -0.559560, -0.234153, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (49, 146, 'Dr蓝剑雄', 1, '2025-12-15', 12, 824, 8072.01, 333.54, -0.540577, -0.900619, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
INSERT INTO `ai_statistics_days` VALUES (50, 241, '测试作者_更新', 1, '2025-12-16', 14, 838, 8548.49, 810.02, -0.513457, -0.758648, '2025-12-16 10:53:58', '2025-12-16 11:28:19');
-- 数据已清空由导入脚本从CSV文件导入
SET FOREIGN_KEY_CHECKS = 1;

View File

@@ -0,0 +1,45 @@
/*
Navicat Premium Dump SQL
Source Server : mixue
Source Server Type : MySQL
Source Server Version : 90001 (9.0.1)
Source Host : localhost:3306
Source Schema : ai_article
Target Server Type : MySQL
Target Server Version : 90001 (9.0.1)
File Encoding : 65001
Date: 25/12/2025 14:30:00
*/
SET NAMES utf8mb4;
SET FOREIGN_KEY_CHECKS = 0;
-- ----------------------------
-- Table structure for ai_statistics_monthly
-- ----------------------------
DROP TABLE IF EXISTS `ai_statistics_monthly`;
CREATE TABLE `ai_statistics_monthly` (
`id` bigint NOT NULL AUTO_INCREMENT COMMENT '自增主键',
`author_id` int NOT NULL DEFAULT 0 COMMENT '作者ID',
`author_name` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '作者名称',
`channel` tinyint(1) NOT NULL DEFAULT 1 COMMENT '1=baidu|2=toutiao|3=weixin',
`stat_monthly` varchar(7) NOT NULL COMMENT '统计月份格式YYYY-MM如2025-12表示2025年12月',
`monthly_revenue` decimal(18, 2) NULL DEFAULT 0.00 COMMENT '当月收益stat_monthly所在自然月的总收益',
`revenue_mom_growth_rate` decimal(10, 6) NULL DEFAULT 0.000000 COMMENT '收益月环比增长率((本月收益 - 上月收益) / NULLIF(上月收益, 0)',
`created_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
`updated_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
PRIMARY KEY (`id`) USING BTREE,
UNIQUE INDEX `uk_author_stat_date`(`author_id` ASC, `stat_monthly` ASC) USING BTREE,
INDEX `idx_stat_date`(`stat_monthly` ASC) USING BTREE,
INDEX `idx_author_id`(`author_id` ASC) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 1 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_0900_ai_ci COMMENT = 'AI内容每月核心指标汇总表月粒度数据' ROW_FORMAT = Dynamic;
-- ----------------------------
-- Records of ai_statistics_monthly
-- ----------------------------
-- 数据由导入脚本从CSV文件自动导入
SET FOREIGN_KEY_CHECKS = 1;

View File

@@ -0,0 +1,45 @@
/*
Navicat Premium Dump SQL
Source Server : mixue
Source Server Type : MySQL
Source Server Version : 90001 (9.0.1)
Source Host : localhost:3306
Source Schema : ai_article
Target Server Type : MySQL
Target Server Version : 90001 (9.0.1)
File Encoding : 65001
Date: 25/12/2025 14:30:00
*/
SET NAMES utf8mb4;
SET FOREIGN_KEY_CHECKS = 0;
-- ----------------------------
-- Table structure for ai_statistics_weekly
-- ----------------------------
DROP TABLE IF EXISTS `ai_statistics_weekly`;
CREATE TABLE `ai_statistics_weekly` (
`id` bigint NOT NULL AUTO_INCREMENT COMMENT '自增主键',
`author_id` int NOT NULL DEFAULT 0 COMMENT '作者ID',
`author_name` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '作者名称',
`channel` tinyint(1) NOT NULL DEFAULT 1 COMMENT '1=baidu|2=toutiao|3=weixin',
`stat_weekly` varchar(2) NOT NULL COMMENT '统计周次格式WW如51表示第51周',
`weekly_revenue` decimal(18, 2) NULL DEFAULT 0.00 COMMENT '当周收益stat_weekly所在自然周的总收益周一至周日',
`revenue_wow_growth_rate` decimal(10, 6) NULL DEFAULT 0.000000 COMMENT '收益周环比增长率((本周收益 - 上周收益) / NULLIF(上周收益, 0)',
`created_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
`updated_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
PRIMARY KEY (`id`) USING BTREE,
UNIQUE INDEX `uk_author_stat_date`(`author_id` ASC, `stat_weekly` ASC) USING BTREE,
INDEX `idx_stat_date`(`stat_weekly` ASC) USING BTREE,
INDEX `idx_author_id`(`author_id` ASC) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 1 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_0900_ai_ci COMMENT = 'AI内容每周核心指标汇总表周粒度数据' ROW_FORMAT = Dynamic;
-- ----------------------------
-- Records of ai_statistics_weekly
-- ----------------------------
-- 数据由导入脚本从CSV文件自动导入
SET FOREIGN_KEY_CHECKS = 1;

105
deploy_daemon.sh Normal file
View File

@@ -0,0 +1,105 @@
#!/bin/bash
# 数据同步守护进程部署脚本Linux systemd
echo "============================================================"
echo "百家号数据同步守护进程 - 部署脚本"
echo "含数据验证与短信告警功能"
echo "============================================================"
echo ""
# 检查是否为root用户
if [ "$EUID" -ne 0 ]; then
echo "[错误] 请使用root用户运行此脚本"
echo " sudo bash deploy_daemon.sh"
exit 1
fi
# 项目目录(根据实际情况修改)
PROJECT_DIR="/root/xhh_baijiahao"
SERVICE_NAME="bjh_daemon"
echo "[1/6] 检查项目目录..."
if [ ! -d "$PROJECT_DIR" ]; then
echo "[错误] 项目目录不存在: $PROJECT_DIR"
exit 1
fi
echo " 项目目录: $PROJECT_DIR"
echo ""
echo "[2/6] 检查Python依赖..."
cd "$PROJECT_DIR"
python3 -c "import schedule" 2>/dev/null
if [ $? -ne 0 ]; then
echo " 安装 schedule 模块..."
pip3 install schedule
fi
python3 -c "from data_validation_with_sms import DataValidationWithSMS" 2>/dev/null
if [ $? -ne 0 ]; then
echo "[警告] 数据验证模块检查失败,请确保以下文件存在:"
echo " - data_validation.py"
echo " - data_validation_with_sms.py"
echo " - sms_config.json"
fi
python3 -c "from alibabacloud_dysmsapi20170525.client import Client" 2>/dev/null
if [ $? -ne 0 ]; then
echo " 安装阿里云短信SDK..."
pip3 install alibabacloud_dysmsapi20170525 alibabacloud_credentials alibabacloud_tea_openapi alibabacloud_tea_util
fi
echo ""
echo "[3/6] 配置systemd服务..."
# 复制服务文件
cp "$PROJECT_DIR/bjh_daemon.service" /etc/systemd/system/
chmod 644 /etc/systemd/system/bjh_daemon.service
# 重新加载systemd配置
systemctl daemon-reload
echo " 服务文件已安装: /etc/systemd/system/bjh_daemon.service"
echo ""
echo "[4/6] 配置短信告警..."
if [ ! -f "$PROJECT_DIR/sms_config.json" ]; then
echo "[警告] 未找到 sms_config.json短信功能可能不可用"
echo " 请创建配置文件: $PROJECT_DIR/sms_config.json"
else
echo " 短信配置文件已存在: sms_config.json"
fi
echo ""
echo "[5/6] 启用并启动服务..."
systemctl enable bjh_daemon.service
systemctl start bjh_daemon.service
# 等待2秒
sleep 2
echo ""
echo "[6/6] 检查服务状态..."
systemctl status bjh_daemon.service --no-pager
echo ""
echo "============================================================"
echo "部署完成!"
echo "============================================================"
echo ""
echo "常用命令:"
echo " 查看状态: sudo systemctl status bjh_daemon"
echo " 查看日志: sudo journalctl -u bjh_daemon -f"
echo " 停止服务: sudo systemctl stop bjh_daemon"
echo " 重启服务: sudo systemctl restart bjh_daemon"
echo " 禁用服务: sudo systemctl disable bjh_daemon"
echo ""
echo "配置文件:"
echo " systemd配置: /etc/systemd/system/bjh_daemon.service"
echo " 短信配置: $PROJECT_DIR/sms_config.json"
echo " 程序日志: $PROJECT_DIR/logs/data_sync_daemon.log"
echo " 验证报告: $PROJECT_DIR/validation_reports/"
echo ""
echo "功能说明:"
echo " 1. 每隔1小时自动执行数据同步工作时间8:00-24:00"
echo " 2. 数据同步完成后自动验证数据完整性"
echo " 3. 验证失败时自动发送短信告警错误代码2222"
echo " 4. 非工作时间自动休眠"
echo ""

View File

@@ -12,6 +12,7 @@ import sys
import os
import json
import csv
import shutil
from datetime import datetime
from typing import Dict, List, Optional
from decimal import Decimal
@@ -67,6 +68,10 @@ class DataExporter:
self.output_ai_statistics_day = os.path.join(self.script_dir, "ai_statistics_day.csv")
self.output_ai_statistics_days = os.path.join(self.script_dir, "ai_statistics_days.csv")
# 备份文件夹路径
self.backup_dir = os.path.join(self.script_dir, "csv_backups")
self._ensure_backup_dir()
# 数据库模式
self.use_database = use_database
self.db_manager = None
@@ -90,6 +95,51 @@ class DataExporter:
# 缓存author_id映射author_name -> author_id
self.author_id_cache = {}
def _ensure_backup_dir(self):
"""确保备份文件夹存在"""
try:
if not os.path.exists(self.backup_dir):
os.makedirs(self.backup_dir)
print(f"[OK] 创建备份文件夹: {self.backup_dir}")
except Exception as e:
print(f"[!] 创建备份文件夹失败: {e}")
def _backup_csv_file(self, csv_file_path: str) -> bool:
"""备份CSV文件
Args:
csv_file_path: CSV文件的完整路径
Returns:
bool: 备份是否成功
"""
try:
if not os.path.exists(csv_file_path):
print(f"[!] 文件不存在,跳过备份: {csv_file_path}")
return False
# 获取文件名
file_name = os.path.basename(csv_file_path)
# 生成时间戳(只保留日期)
timestamp = datetime.now().strftime('%Y%m%d')
# 备份文件名20251226_ai_statistics.csv
backup_file_name = f"{timestamp}_{file_name}"
backup_file_path = os.path.join(self.backup_dir, backup_file_name)
# 复制文件
shutil.copy2(csv_file_path, backup_file_path)
print(f" [备份] {file_name} -> {backup_file_name}")
self.logger.info(f"备份CSV文件: {backup_file_path}")
return True
except Exception as e:
print(f" [!] 备份失败: {e}")
self.logger.error(f"备份CSV文件失败: {e}")
return False
def get_author_id(self, author_name: str) -> int:
"""获取作者ID
@@ -286,21 +336,25 @@ class DataExporter:
print(f" [!] 从数据库计算当周发文量失败: {e}")
return 0
def calculate_weekly_revenue_from_db(self, author_id: int, stat_date: str) -> float:
"""从ai_statistics_days表汇总计算当周收益周一至周日
def calculate_weekly_revenue_from_db(self, author_id: int, stat_date: str, today_revenue: float = 0.0) -> float:
"""从ai_statistics_days表汇总计算当周收益周一至当前日期
基于day_revenue字段进行汇总计算
计算逻辑:
1. 从数据库查询本周一到stat_date前一天的day_revenue总和
2. 加上today_revenue当日收益从API获取
3. 得到本周累计收益
Args:
author_id: 作者ID
stat_date: 统计日期 (YYYY-MM-DD)
today_revenue: 当日收益从API获取默认0.0
Returns:
当周收益总额
"""
if not self.db_manager or author_id == 0:
print(f" [数据库] 未连接或author_id无效无法计算当周收益")
return 0.0
return today_revenue # 如果数据库不可用,返回当日收益
try:
from datetime import datetime, timedelta
@@ -311,14 +365,21 @@ class DataExporter:
# 计算本周一的日期weekday: 0=周一, 6=周日)
weekday = target_date.weekday()
monday = target_date - timedelta(days=weekday)
sunday = monday + timedelta(days=6)
# 昨天的日期stat_date的前一天
yesterday = target_date - timedelta(days=1)
monday_str = monday.strftime('%Y-%m-%d')
sunday_str = sunday.strftime('%Y-%m-%d')
yesterday_str = yesterday.strftime('%Y-%m-%d')
print(f" [调试] 目标日期: {stat_date}, 周一: {monday_str}, 周日: {sunday_str}")
print(f" [调试] 目标日期: {stat_date}, 周一: {monday_str}, 昨天: {yesterday_str}")
# 查询数据库中本周的day_revenue总和
# 如果stat_date就是周一则没有历史数据直接返回今日收益
if target_date == monday:
print(f" [数据库] 目标日期是周一,当周收益 = 今日收益: ¥{today_revenue:.2f}")
return today_revenue
# 查询数据库中本周一到昨天的day_revenue总和
sql = """
SELECT SUM(day_revenue) as weekly_total, COUNT(*) as day_count
FROM ai_statistics_days
@@ -330,25 +391,33 @@ class DataExporter:
result = self.db_manager.execute_query(
sql,
(author_id, monday_str, sunday_str),
(author_id, monday_str, yesterday_str),
fetch_one=True,
dict_cursor=True
)
print(f" [调试] 查询结果: {result}")
print(f" [调试] 数据库查询结果: {result}")
# 计算当周收益 = 本周历史收益 + 今日收益
historical_revenue = 0.0
day_count = 0
if result and result.get('weekly_total') is not None:
weekly_total = float(result['weekly_total'] or 0)
historical_revenue = float(result['weekly_total'] or 0)
day_count = int(result.get('day_count', 0) or 0)
print(f" [数据库] 当周收益 ({monday_str}{sunday_str}): ¥{weekly_total:.2f} (基于{day_count}天的数据)")
weekly_total = historical_revenue + today_revenue
print(f" [数据库] 当周收益计算:")
print(f" 本周一至昨天 ({monday_str} ~ {yesterday_str}): ¥{historical_revenue:.2f} (基于{day_count}天)")
print(f" 今日收益 ({stat_date}): ¥{today_revenue:.2f}")
print(f" 当周总收益: ¥{weekly_total:.2f}")
return weekly_total
else:
print(f" [数据库] 未找到当周数据 ({monday_str}{sunday_str})返回0")
return 0.0
except Exception as e:
print(f" [!] 从数据库计算当周收益失败: {e}")
return 0.0
return today_revenue # 出错时返回当日收益
def calculate_last_week_revenue_from_db(self, author_id: int, stat_date: str) -> float:
"""从ai_statistics_days表汇总计算上周收益上周一至上周日
@@ -407,6 +476,77 @@ class DataExporter:
print(f" [!] 从数据库计算上周收益失败: {e}")
return 0.0
def calculate_monthly_revenue_from_db(self, author_id: int, stat_date: str, today_revenue: float = 0.0) -> float:
"""从ai_statistics_days表汇总计算当月收益当月1日至当前日期
计算逻辑:
1. 从数据库查询当月1日到stat_date前一天的day_revenue总和
2. 加上today_revenue当日收益从API获取
3. 得到当月累计收益
Args:
author_id: 作者ID
stat_date: 统计日期 (YYYY-MM-DD)
today_revenue: 当日收益从API获取默认0.0
Returns:
当月收益总额
"""
if not self.db_manager or author_id == 0:
print(f" [数据库] 未连接或author_id无效无法计算当月收益")
return today_revenue # 如果数据库不可用,返回当日收益
try:
from datetime import datetime, timedelta
# 解析日期
target_date = datetime.strptime(stat_date, '%Y-%m-%d')
# 当月第一天
month_first = target_date.replace(day=1)
# stat_date的前一天因为当日数据可能还未写入数据库
yesterday = target_date - timedelta(days=1)
month_first_str = month_first.strftime('%Y-%m-%d')
yesterday_str = yesterday.strftime('%Y-%m-%d')
# 如果stat_date就是当月第一天直接返回当日收益
if target_date.day == 1:
print(f" [数据库] 当月第一天,当月收益 = 当日收益: ¥{today_revenue:.2f}")
return today_revenue
# 查询当月1日到stat_date前一天的收益总和
sql = """
SELECT SUM(day_revenue) as monthly_total
FROM ai_statistics_days
WHERE author_id = %s
AND stat_date >= %s
AND stat_date <= %s
AND channel = 1
"""
result = self.db_manager.execute_query(
sql,
(author_id, month_first_str, yesterday_str),
fetch_one=True,
dict_cursor=True
)
if result and result.get('monthly_total') is not None:
db_total = float(result['monthly_total'] or 0)
# 加上当日收益
monthly_total = db_total + today_revenue
print(f" [数据库] 当月收益 ({month_first_str}{stat_date}): 数据库¥{db_total:.2f} + 当日¥{today_revenue:.2f} = ¥{monthly_total:.2f}")
return monthly_total
else:
# 没有历史数据,返回当日收益
print(f" [数据库] 未找到当月历史数据 ({month_first_str}{yesterday_str}),当月收益 = 当日收益: ¥{today_revenue:.2f}")
return today_revenue
except Exception as e:
print(f" [!] 从数据库计算当月收益失败: {e}")
return today_revenue
def calculate_last_month_revenue_from_db(self, author_id: int, stat_date: str) -> float:
"""从ai_statistics_days表汇总计算上月收益
@@ -510,14 +650,20 @@ class DataExporter:
metrics['submission_count'] = int(total_info.get('publish_count', 0) or 0)
metrics['read_count'] = int(total_info.get('view_count', 0) or 0)
metrics['comment_count'] = int(total_info.get('comment_count', 0) or 0)
metrics['comment_rate'] = float(total_info.get('comment_rate', 0) or 0)
# 所有rate字段API返回的都是百分制如0.30表示0.30%需要除以100转换为小数
comment_rate_raw = float(total_info.get('comment_rate', 0) or 0)
metrics['comment_rate'] = comment_rate_raw / 100 if comment_rate_raw > 0 else 0.0
metrics['like_count'] = int(total_info.get('likes_count', 0) or 0)
metrics['like_rate'] = float(total_info.get('likes_rate', 0) or 0)
like_rate_raw = float(total_info.get('likes_rate', 0) or 0)
metrics['like_rate'] = like_rate_raw / 100 if like_rate_raw > 0 else 0.0
metrics['favorite_count'] = int(total_info.get('collect_count', 0) or 0)
metrics['favorite_rate'] = float(total_info.get('collect_rate', 0) or 0)
favorite_rate_raw = float(total_info.get('collect_rate', 0) or 0)
metrics['favorite_rate'] = favorite_rate_raw / 100 if favorite_rate_raw > 0 else 0.0
metrics['share_count'] = int(total_info.get('share_count', 0) or 0)
metrics['share_rate'] = float(total_info.get('share_rate', 0) or 0)
metrics['slide_ratio'] = float(total_info.get('pic_slide_rate', 0) or 0)
share_rate_raw = float(total_info.get('share_rate', 0) or 0)
metrics['share_rate'] = share_rate_raw / 100 if share_rate_raw > 0 else 0.0
slide_ratio_raw = float(total_info.get('pic_slide_rate', 0) or 0)
metrics['slide_ratio'] = slide_ratio_raw / 100 if slide_ratio_raw > 0 else 0.0
metrics['baidu_search_volume'] = int(total_info.get('disp_pv', 0) or 0) # 修正使用disp_pv
except Exception as e:
print(f" [!] 提取汇总指标失败: {e}")
@@ -529,7 +675,7 @@ class DataExporter:
注意:
- weekly_revenue: 不再从API获取在export_ai_statistics_days中从数据库计算
- monthly_revenue: 使用currentMonth当前自然月收益
- monthly_revenue: 不再从API获取在export_ai_statistics_days中从数据库计算
- day_revenue: 从yesterday提取昨日收益当日收益
- revenue_wow_growth_rate: 周环比,从数据库计算(本周 vs 上周)
- revenue_mom_growth_rate: 月环比,从数据库计算(当月 vs 上月)
@@ -564,10 +710,8 @@ class DataExporter:
# 这里保持为0由export_ai_statistics_days方法计算
print(f" 环比增长率: 将从数据库计算")
# 当前自然月收入currentMonth
current_month = income_data.get('currentMonth', {})
if current_month:
metrics['monthly_revenue'] = float(current_month.get('income', 0) or 0)
# monthly_revenue 不再从API获取在导出时从数据库的day_revenue汇总计算
print(f" 当月收益: 将从数据库计算")
except Exception as e:
print(f" [!] 提取收入指标失败: {e}")
@@ -650,6 +794,10 @@ class DataExporter:
print(f"[OK] ai_statistics 表数据已导出到: {self.output_ai_statistics}")
print(f"{len(csv_rows)} 条记录")
print(f"{'='*70}")
# 备份CSV文件
self._backup_csv_file(self.output_ai_statistics)
return True
else:
print("\n[!] 没有数据可导出")
@@ -729,11 +877,12 @@ class DataExporter:
'total_like_count': int(latest_day_data.get('likes_count', 0) or 0),
'total_favorite_count': int(latest_day_data.get('collect_count', 0) or 0),
'total_share_count': int(latest_day_data.get('share_count', 0) or 0),
'avg_comment_rate': f"{float(latest_day_data.get('comment_rate', 0) or 0):.4f}",
'avg_like_rate': f"{float(latest_day_data.get('likes_rate', 0) or 0):.4f}",
'avg_favorite_rate': f"{float(latest_day_data.get('collect_rate', 0) or 0):.4f}",
'avg_share_rate': f"{float(latest_day_data.get('share_rate', 0) or 0):.4f}",
'avg_slide_ratio': f"{float(latest_day_data.get('pic_slide_rate', 0) or 0):.4f}",
# 所有rate字段API返回的都是百分制需要除以100转换为小数
'avg_comment_rate': f"{(float(latest_day_data.get('comment_rate', 0) or 0) / 100):.4f}",
'avg_like_rate': f"{(float(latest_day_data.get('likes_rate', 0) or 0) / 100):.4f}",
'avg_favorite_rate': f"{(float(latest_day_data.get('collect_rate', 0) or 0) / 100):.4f}",
'avg_share_rate': f"{(float(latest_day_data.get('share_rate', 0) or 0) / 100):.4f}",
'avg_slide_ratio': f"{(float(latest_day_data.get('pic_slide_rate', 0) or 0) / 100):.4f}",
'total_baidu_search_volume': int(latest_day_data.get('disp_pv', 0) or 0),
}
@@ -763,6 +912,10 @@ class DataExporter:
print(f"[OK] ai_statistics_day 表数据已导出到: {self.output_ai_statistics_day}")
print(f"{len(csv_rows)} 条记录")
print(f"{'='*70}")
# 备份CSV文件
self._backup_csv_file(self.output_ai_statistics_day)
return True
else:
print("\n[!] 没有数据可导出")
@@ -779,7 +932,7 @@ class DataExporter:
注意:
- daily_published_count: 优先从ai_articles表查询否则使用API数据
- cumulative_published_count: 优先从ai_articles表查询从起始日到stat_date的累计发文量
- monthly_revenue: stat_date所在自然月的总收益使用近30天收益作为近似值
- monthly_revenue: 从ai_statistics_days表汇总计算当月1日至stat_date的day_revenue总和
- weekly_revenue: 优先从ai_statistics_days表汇总计算否则使用API数据
Args:
@@ -851,38 +1004,49 @@ class DataExporter:
daily_published = int(latest_day_data.get('publish_count', 0) or 0)
print(f" [使用API] 文章数据: 单日={daily_published}, 累计={cumulative_count}")
# 计算当周收益:数据库中本周已有的收益 + 当日新抓取的收益
# 计算当周收益:数据库汇总本周一至周日的day_revenue总和
if use_db_weekly_revenue and author_id > 0:
# 从数据库查询本周已有的收益(不包括今天,因为今天的数据还没导入
weekly_revenue_db = self.calculate_weekly_revenue_from_db(author_id, formatted_date)
# 当周收益 = 数据库中的历史收益 + 当日新抓取的收益
day_revenue = income_metrics['day_revenue']
weekly_revenue_total = weekly_revenue_db + day_revenue
# 从数据库查询本周的收益总和(传入当日收益
weekly_revenue_total = self.calculate_weekly_revenue_from_db(
author_id,
formatted_date,
today_revenue=income_metrics['day_revenue'] # 传入当日收益
)
income_metrics['weekly_revenue'] = weekly_revenue_total
print(f" [数据库] 本周已有收益: ¥{weekly_revenue_db:.2f}")
print(f" [API] 当日新增收益: ¥{day_revenue:.2f}")
print(f" [计算] 当周总收益: ¥{weekly_revenue_total:.2f}")
print(f" [数据库] 当周收益: ¥{weekly_revenue_total:.2f}")
# 计算当月收益从数据库汇总当月1日至stat_date的day_revenue总和
monthly_revenue_total = self.calculate_monthly_revenue_from_db(
author_id,
formatted_date,
today_revenue=income_metrics['day_revenue'] # 传入当日收益
)
income_metrics['monthly_revenue'] = monthly_revenue_total
# 计算周环比:本周 vs 上周
# 公式:周环比 = (本周收益 - 上周收益) / 上周收益
last_week_revenue = self.calculate_last_week_revenue_from_db(author_id, formatted_date)
if last_week_revenue > 0:
income_metrics['revenue_wow_growth_rate'] = (weekly_revenue_total - last_week_revenue) / last_week_revenue
print(f" [计算] 周环比: {income_metrics['revenue_wow_growth_rate']:.2%} (本周¥{weekly_revenue_total:.2f} vs 上周¥{last_week_revenue:.2f})")
else:
print(f" [计算] 周环比: 无法计算(上周没有数据)")
# 分母为0时设为1避免除零错误
denominator = last_week_revenue if last_week_revenue > 0 else 1
wow_rate = (weekly_revenue_total - last_week_revenue) / denominator
income_metrics['revenue_wow_growth_rate'] = wow_rate
print(f" [计算] 周环比: {wow_rate:.4f} (本周¥{weekly_revenue_total:.2f} vs 上周¥{last_week_revenue:.2f})")
# 计算月环比:当月 vs 上月
# 公式:月环比 = (当月收益 - 上月收益) / 上月收益
last_month_revenue = self.calculate_last_month_revenue_from_db(author_id, formatted_date)
monthly_revenue = income_metrics['monthly_revenue']
if last_month_revenue > 0:
income_metrics['revenue_mom_growth_rate'] = (monthly_revenue - last_month_revenue) / last_month_revenue
print(f" [计算] 月环比: {income_metrics['revenue_mom_growth_rate']:.2%} (当月¥{monthly_revenue:.2f} vs 上月¥{last_month_revenue:.2f})")
else:
print(f" [计算] 月环比: 无法计算(上月没有数据)")
# 分母为0时设为1避免除零错误
denominator = last_month_revenue if last_month_revenue > 0 else 1
mom_rate = (monthly_revenue - last_month_revenue) / denominator
income_metrics['revenue_mom_growth_rate'] = mom_rate
print(f" [计算] 月环比: {mom_rate:.4f} (当月¥{monthly_revenue:.2f} vs 上月¥{last_month_revenue:.2f})")
else:
# 如果不使用数据库weekly_revenue = 当日收益
income_metrics['weekly_revenue'] = income_metrics['day_revenue']
income_metrics['monthly_revenue'] = income_metrics['day_revenue']
print(f" [跳过数据库] 当周收益 = 当日收益: ¥{income_metrics['day_revenue']:.2f}")
print(f" [跳过数据库] 当月收益 = 当日收益: ¥{income_metrics['day_revenue']:.2f}")
row = {
'author_id': author_id,
@@ -940,6 +1104,10 @@ class DataExporter:
print(f"[OK] ai_statistics_days 表数据已导出到: {self.output_ai_statistics_days}")
print(f"{len(csv_rows)} 条记录")
print(f"{'='*70}")
# 备份CSV文件
self._backup_csv_file(self.output_ai_statistics_days)
return True
else:
print("\n[!] 没有数据可导出")
@@ -1439,9 +1607,6 @@ class DataExporter:
# 滑图占比需要限制在decimal(5,4)范围内0-9.9999
slide_ratio_value = float(metrics['slide_ratio'])
# 如果值大于10说明是百分比格式需要除以100
if slide_ratio_value > 10:
slide_ratio_value = slide_ratio_value / 100
# 确保不超过9.9999
slide_ratio_value = min(slide_ratio_value, 9.9999)
@@ -1547,14 +1712,28 @@ class DataExporter:
else:
print(f" [使用API] 投稿量: {total_submission_count}")
# 滑图占比需要限制在decimal(5,4)范围内0-9.9999
slide_ratio_value = float(latest_day_data.get('pic_slide_rate', 0) or 0)
# 如果值大于10说明是百分比格式需要除以100
if slide_ratio_value > 10:
slide_ratio_value = slide_ratio_value / 100
# 确保不超过9.9999
# 所有rate字段需要限制在decimal(5,4)范围内0-9.9999
# API返回的都是百分制需要除以100转换为小数
slide_ratio_raw = float(latest_day_data.get('pic_slide_rate', 0) or 0)
slide_ratio_value = (slide_ratio_raw / 100 if slide_ratio_raw > 0 else 0.0)
slide_ratio_value = min(slide_ratio_value, 9.9999)
comment_rate_raw = float(latest_day_data.get('comment_rate', 0) or 0)
comment_rate_value = (comment_rate_raw / 100 if comment_rate_raw > 0 else 0.0)
comment_rate_value = min(comment_rate_value, 9.9999)
like_rate_raw = float(latest_day_data.get('likes_rate', 0) or 0)
like_rate_value = (like_rate_raw / 100 if like_rate_raw > 0 else 0.0)
like_rate_value = min(like_rate_value, 9.9999)
favorite_rate_raw = float(latest_day_data.get('collect_rate', 0) or 0)
favorite_rate_value = (favorite_rate_raw / 100 if favorite_rate_raw > 0 else 0.0)
favorite_rate_value = min(favorite_rate_value, 9.9999)
share_rate_raw = float(latest_day_data.get('share_rate', 0) or 0)
share_rate_value = (share_rate_raw / 100 if share_rate_raw > 0 else 0.0)
share_rate_value = min(share_rate_value, 9.9999)
record = {
'author_id': author_id,
'author_name': account_id,
@@ -1566,10 +1745,10 @@ class DataExporter:
'total_like_count': int(latest_day_data.get('likes_count', 0) or 0),
'total_favorite_count': int(latest_day_data.get('collect_count', 0) or 0),
'total_share_count': int(latest_day_data.get('share_count', 0) or 0),
'avg_comment_rate': float(latest_day_data.get('comment_rate', 0) or 0),
'avg_like_rate': float(latest_day_data.get('likes_rate', 0) or 0),
'avg_favorite_rate': float(latest_day_data.get('collect_rate', 0) or 0),
'avg_share_rate': float(latest_day_data.get('share_rate', 0) or 0),
'avg_comment_rate': comment_rate_value,
'avg_like_rate': like_rate_value,
'avg_favorite_rate': favorite_rate_value,
'avg_share_rate': share_rate_value,
'avg_slide_ratio': slide_ratio_value,
'total_baidu_search_volume': int(latest_day_data.get('disp_pv', 0) or 0),
}
@@ -1698,18 +1877,38 @@ class DataExporter:
def main():
import argparse
# 解析命令行参数
parser = argparse.ArgumentParser(
description='百家号数据导出工具 - 从 bjh_integrated_data.json 导出',
formatter_class=argparse.RawDescriptionHelpFormatter
)
parser.add_argument(
'--mode',
type=str,
choices=['csv', 'database'],
default='csv',
help='导出模式csv=导出CSV文件, database=直接插入数据库 (默认: csv)'
)
parser.add_argument(
'--no-confirm',
action='store_true',
help='跳过确认提示,直接执行(用于批量脚本)'
)
args = parser.parse_args()
print("\n" + "="*70)
print("百家号数据导出工具 - 从 bjh_integrated_data.json 导出")
print("="*70)
# 选择导出模式
print("\n请选择导出模式:")
print(" 1. 导出CSV文件")
print(" 2. 直接插入数据库")
use_database = (args.mode == 'database')
mode = input("\n输入选项 (1/2, 默认1): ").strip() or '1'
if mode == '2':
if use_database:
# 数据库模式
exporter = DataExporter(use_database=True)
@@ -1728,14 +1927,16 @@ def main():
print(" 3. ai_statistics_days.csv - 核心指标统计表(含发文量、收益、环比)")
print("="*70)
# 确认执行(除非使用--no-confirm参数
if not args.no_confirm:
confirm = input("\n是否继续? (y/n): ").strip().lower()
if confirm == 'y':
exporter.export_all_tables()
else:
if confirm != 'y':
print("\n已取消")
return
exporter.export_all_tables()
print("\n" + "="*70)
print("完成")
print("="*70 + "\n")

680
fetch_date_statistics.py Normal file
View File

@@ -0,0 +1,680 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
指定日期统计数据获取脚本
功能:获取指定日期的百家号统计数据并填充到数据库三个统计表
"""
import os
import sys
import json
import argparse
import requests
import time
from datetime import datetime, timedelta
from typing import List, Dict, Optional
from decimal import Decimal
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
from database_config import DatabaseManager
from export_to_csv import DataExporter
# 天启代理配置
PROXY_API_URL = 'http://api.tianqiip.com/getip?secret=tmcrmh3q&num=1&type=txt&port=1&mr=1&sign=5451e454a54b9f1f06222606c418e12f'
class DateStatisticsFetcher:
"""指定日期统计数据获取器"""
def __init__(self, target_date: str, use_proxy: bool = True):
"""初始化
Args:
target_date: 目标日期 (YYYY-MM-DD)
use_proxy: 是否使用代理默认True
"""
self.target_date = datetime.strptime(target_date, '%Y-%m-%d')
self.target_date_str = target_date
self.db_manager = DatabaseManager()
self.script_dir = os.path.dirname(os.path.abspath(__file__))
self.use_proxy = use_proxy
self.current_proxy = None
# 创建临时数据目录
self.temp_dir = os.path.join(self.script_dir, 'temp_data')
os.makedirs(self.temp_dir, exist_ok=True)
# 创建请求会话
self.session = requests.Session()
self.session.verify = False
# 禁用SSL警告
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
print(f"[初始化] 目标日期: {target_date}")
print(f"[初始化] 代理模式: {'启用' if use_proxy else '禁用'}")
print(f"[初始化] 临时数据目录: {self.temp_dir}")
def get_all_authors(self) -> List[Dict]:
"""获取所有活跃账号
Returns:
账号列表
"""
try:
sql = """
SELECT id as author_id, author_name, toutiao_cookie
FROM ai_authors
WHERE channel = 1
AND status = 'active'
AND toutiao_cookie IS NOT NULL
AND toutiao_cookie != ''
ORDER BY id
"""
accounts = self.db_manager.execute_query(sql, fetch_one=False, dict_cursor=True)
if accounts:
print(f"[数据库] 找到 {len(accounts)} 个活跃账号")
return accounts
else:
print("[!] 未找到任何活跃账号")
return []
except Exception as e:
print(f"[X] 查询账号失败: {e}")
return []
def get_daily_article_count(self, author_id: int, date_str: str) -> int:
"""从ai_articles表获取指定日期的发文量
Args:
author_id: 作者ID
date_str: 日期字符串 (YYYY-MM-DD)
Returns:
发文量
"""
try:
sql = """
SELECT COUNT(*) as count
FROM ai_articles
WHERE author_id = %s
AND DATE(publish_time) = %s
AND status = 'published'
AND channel = 1
"""
result = self.db_manager.execute_query(
sql,
(author_id, date_str),
fetch_one=True,
dict_cursor=True
)
return result['count'] if result else 0
except Exception as e:
print(f" [!] 查询发文量失败: {e}")
return 0
def fetch_daily_income(self, cookie_string: str, date_timestamp: int, max_retries: int = 3) -> Optional[Dict]:
"""获取指定日期的收入数据(带重试机制)
Args:
cookie_string: Cookie字符串
date_timestamp: 日期Unix时间戳
max_retries: 最大重试次数
Returns:
收入数据字典失败返回None
"""
api_url = "https://baijiahao.baidu.com/author/eco/income4/overviewhomelist"
# 设置Cookie
self.session.cookies.clear()
for item in cookie_string.split(';'):
item = item.strip()
if '=' in item:
key, value = item.split('=', 1)
self.session.cookies.set(key.strip(), value.strip())
# 从Cookie中提取token
token_cookie = self.session.cookies.get('bjhStoken') or self.session.cookies.get('devStoken')
# 请求参数
params = {
'start_date': date_timestamp,
'end_date': date_timestamp
}
# 请求头
headers = {
'Accept': 'application/json, text/plain, */*',
'Accept-Language': 'zh-CN,zh;q=0.9',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Referer': 'https://baijiahao.baidu.com/builder/rc/incomecenter',
'Sec-Fetch-Dest': 'empty',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Site': 'same-origin',
}
if token_cookie:
headers['token'] = token_cookie
retry_count = 0
while retry_count <= max_retries:
try:
# 如果是重试,先等待
if retry_count > 0:
wait_time = retry_count * 3 # 3秒、6秒、9秒
print(f" [重试 {retry_count}/{max_retries}] 等待 {wait_time} 秒...")
time.sleep(wait_time)
# 获取代理
proxies = self.fetch_proxy() if self.use_proxy else None
response = self.session.get(
api_url,
headers=headers,
params=params,
proxies=proxies,
timeout=15
)
if response.status_code == 200:
data = response.json()
if data.get('errno') == 0:
return data
else:
error_msg = data.get('errmsg', '')
errno = data.get('errno')
print(f" [!] API返回错误: errno={errno}, errmsg={error_msg}")
# 异常请求错误,尝试重试
if errno == 10000015 and retry_count < max_retries:
retry_count += 1
continue
return None
else:
print(f" [!] HTTP错误: {response.status_code}")
return None
except Exception as e:
error_type = type(e).__name__
print(f" [!] 请求异常: {error_type} - {e}")
# 判断是否需要重试
is_retry_error = any([
'Connection' in error_type,
'Timeout' in error_type,
'ProxyError' in error_type,
])
if is_retry_error and retry_count < max_retries:
retry_count += 1
continue
return None
return None
def fetch_analytics_api(self, cookie_string: str, target_date: str, max_retries: int = 3) -> Optional[Dict]:
"""调用百家号发文统计API获取阅读量、评论量等数据
Args:
cookie_string: Cookie字符串
target_date: 目标日期 (YYYY-MM-DD)
max_retries: 最大重试次数
Returns:
API返回数据失败返回None
"""
# 设置Cookie
self.session.cookies.clear()
for item in cookie_string.split(';'):
item = item.strip()
if '=' in item:
key, value = item.split('=', 1)
self.session.cookies.set(key.strip(), value.strip(), domain='.baidu.com')
# 从Cookie中提取token
token_cookie = self.session.cookies.get('bjhStoken') or self.session.cookies.get('devStoken')
# 计算日期范围(仅查询目标日期当天)
date_obj = datetime.strptime(target_date, '%Y-%m-%d')
start_day = date_obj.strftime('%Y%m%d')
end_day = start_day # 开始和结束是同一天
# API端点使用appStatisticV3
api_url = "https://baijiahao.baidu.com/author/eco/statistics/appStatisticV3"
# 请求参数
params = {
'type': 'event',
'start_day': start_day,
'end_day': end_day,
'stat': '0',
'special_filter_days': '1'
}
# 请求头
headers = {
'Accept': 'application/json, text/plain, */*',
'Accept-Language': 'zh-CN,zh;q=0.9',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Referer': 'https://baijiahao.baidu.com/builder/rc/analysiscontent',
'Sec-Fetch-Dest': 'empty',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Site': 'same-origin',
}
if token_cookie:
headers['token'] = token_cookie
retry_count = 0
while retry_count <= max_retries:
try:
# 如果是重试,先等待
if retry_count > 0:
wait_time = retry_count * 3
print(f" [重试 {retry_count}/{max_retries}] 等待 {wait_time} 秒...")
time.sleep(wait_time)
# 获取代理
proxies = self.fetch_proxy() if self.use_proxy else None
response = self.session.get(
api_url,
headers=headers,
params=params,
proxies=proxies,
timeout=15
)
if response.status_code == 200:
data = response.json()
errno = data.get('errno', -1)
if errno == 0:
# 提取total_info和list数据
data_content = data.get('data', {})
total_info = data_content.get('total_info', {})
daily_list = data_content.get('list', [])
print(f" [发文统计] 阅读量: {total_info.get('view_count', 0)}")
print(f" [发文统计] 评论量: {total_info.get('comment_count', 0)}")
return data
else:
error_msg = data.get('errmsg', '')
print(f" [!] 发文统计API错误: errno={errno}, errmsg={error_msg}")
if errno == 10000015 and retry_count < max_retries:
retry_count += 1
continue
return None
else:
print(f" [!] HTTP错误: {response.status_code}")
return None
except Exception as e:
error_type = type(e).__name__
print(f" [!] 请求异常: {error_type} - {e}")
is_retry_error = any([
'Connection' in error_type,
'Timeout' in error_type,
'ProxyError' in error_type,
])
if is_retry_error and retry_count < max_retries:
retry_count += 1
continue
return None
return None
def get_cumulative_article_count(self, author_id: int, start_date: str, end_date: str) -> int:
"""从ai_articles表获取累计发文量
Args:
author_id: 作者ID
start_date: 开始日期 (YYYY-MM-DD)
end_date: 结束日期 (YYYY-MM-DD)
Returns:
累计发文量
"""
try:
sql = """
SELECT COUNT(*) as count
FROM ai_articles
WHERE author_id = %s
AND DATE(publish_time) >= %s
AND DATE(publish_time) <= %s
AND status = 'published'
AND channel = 1
"""
result = self.db_manager.execute_query(
sql,
(author_id, start_date, end_date),
fetch_one=True,
dict_cursor=True
)
return result['count'] if result else 0
except Exception as e:
print(f" [!] 查询累计发文量失败: {e}")
return 0
def fetch_proxy(self) -> Optional[Dict]:
"""获取天启代理IP
Returns:
代理配置字典失败返回None
"""
if not self.use_proxy:
return None
try:
resp = requests.get(PROXY_API_URL, timeout=10)
resp.raise_for_status()
text = resp.text.strip()
# 检测是否返回错误信息
if text.upper().startswith('ERROR'):
print(f" [!] 代理API返回错误: {text}")
return None
# 解析IP:PORT格式
lines = text.split('\n')
for line in lines:
line = line.strip()
if ':' in line and line.count(':') == 1:
ip_port = line.split()[0] if ' ' in line else line
host, port = ip_port.split(':', 1)
proxy_url = f'http://{host}:{port}'
self.current_proxy = {
'http': proxy_url,
'https': proxy_url,
}
print(f" [代理] 使用天启IP: {ip_port}")
return self.current_proxy
print(f" [!] 无法解析代理API返回: {text[:100]}")
return None
except Exception as e:
print(f" [!] 获取代理失败: {e}")
return None
def build_integrated_data(self, author_id: int, author_name: str, cookie_string: str) -> Dict:
"""构建指定日期的整合数据
Args:
author_id: 作者ID
author_name: 作者名称
cookie_string: Cookie字符串
Returns:
整合数据字典
"""
print(f"\n [构建] 账号 {author_name} 的整合数据...")
# 计算当月第一天(用于累计发文量)
month_first = self.target_date.replace(day=1).strftime('%Y-%m-%d')
# 从数据库获取发文量
daily_count = self.get_daily_article_count(author_id, self.target_date_str)
cumulative_count = self.get_cumulative_article_count(author_id, month_first, self.target_date_str)
print(f" 单日发文量: {daily_count}")
print(f" 累计发文量: {cumulative_count} (从{month_first}{self.target_date_str})")
# 获取发文统计数据(阅读量、评论量等)
print(f" [API] 获取发文统计数据...")
analytics_data = self.fetch_analytics_api(cookie_string, self.target_date_str)
# 提取total_info和list数据
total_info = {}
daily_list = []
if analytics_data:
data_content = analytics_data.get('data', {})
total_info = data_content.get('total_info', {})
daily_list = data_content.get('list', [])
# 获取收入数据
day_revenue = 0.0
date_timestamp = int(self.target_date.replace(hour=0, minute=0, second=0, microsecond=0).timestamp())
print(f" [API] 获取收入数据...")
income_data = self.fetch_daily_income(cookie_string, date_timestamp)
if income_data and income_data.get('data', {}).get('list'):
income_list = income_data['data']['list']
if income_list and len(income_list) > 0:
total_income = income_list[0].get('total_income', 0)
day_revenue = float(total_income)
print(f" 当日收益: ¥{day_revenue:.2f}")
else:
print(f" 当日收益: ¥0.00 (无收入数据)")
else:
print(f" 当日收益: ¥0.00 (API调用失败)")
# 构建整合数据模拟BaijiahaoAnalytics的数据结构
integrated_data = {
'account_id': author_name,
'author_id': author_id,
'fetch_time': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
'target_date': self.target_date_str,
'status': 'success',
'analytics': {
'apis': [ # 修改需要包装在apis数组中
{
'data': {
'errno': 0,
'data': {
'list': daily_list if daily_list else [
{
'event_day': self.target_date_str.replace('-', ''), # 格式20251225
'date': self.target_date_str,
'publish_count': daily_count,
'daily_published_count': daily_count,
'cumulative_published_count': cumulative_count,
}
],
'latest_event_day': self.target_date_str.replace('-', ''), # 格式20251225
'total_info': total_info if total_info else {
'publish_count': daily_count,
'view_count': 0,
'comment_count': 0,
'comment_rate': 0,
'likes_count': 0,
'likes_rate': 0,
'collect_count': 0,
'collect_rate': 0,
'share_count': 0,
'share_rate': 0,
'pic_slide_rate': 0,
'disp_pv': 0,
}
}
}
}
]
},
'income': {
'errno': 0, # 添加标记API调用成功
'data': {
'income': {
'yesterday': {
'income': day_revenue # 修改使用income字段而不是value
},
'currentMonth': {
'income': 0 # 历史数据无法获取当月收益设为0
}
}
}
}
}
return integrated_data
def process_single_date(self) -> bool:
"""处理单个日期的所有账号数据
Returns:
是否成功
"""
print(f"\n{'='*70}")
print(f"开始处理 {self.target_date_str} 的数据")
print(f"{'='*70}")
# 获取所有账号
accounts = self.get_all_authors()
if not accounts:
print("[X] 没有可用的账号,退出")
return False
# 构建所有账号的整合数据
integrated_data_list = []
for idx, account in enumerate(accounts, 1):
author_id = account.get('author_id')
author_name = account.get('author_name', '')
cookie_string = account.get('toutiao_cookie', '')
if not author_id:
print(f"\n[{idx}/{len(accounts)}] 跳过: {author_name} (缺少author_id)")
continue
if not cookie_string:
print(f"\n[{idx}/{len(accounts)}] 跳过: {author_name} (缺少Cookie)")
continue
print(f"\n[{idx}/{len(accounts)}] 处理账号: {author_name} (ID: {author_id})")
try:
integrated_data = self.build_integrated_data(author_id, author_name, cookie_string)
integrated_data_list.append(integrated_data)
print(f" [OK] 数据构建成功")
# 延迟避免请求过快增加到3-5秒
if idx < len(accounts):
import random
delay = random.uniform(3, 5)
print(f" [延迟] 等待 {delay:.1f} 秒...")
time.sleep(delay)
except Exception as e:
print(f" [X] 数据构建失败: {e}")
import traceback
traceback.print_exc()
continue
if not integrated_data_list:
print("[!] 没有成功构建任何数据")
return False
# 保存整合数据到临时文件
integrated_file = os.path.join(self.temp_dir, f'integrated_{self.target_date_str}.json')
try:
with open(integrated_file, 'w', encoding='utf-8') as f:
json.dump(integrated_data_list, f, ensure_ascii=False, indent=2)
print(f"\n[保存] 整合数据: {integrated_file}")
except Exception as e:
print(f"[X] 保存整合数据失败: {e}")
return False
# 使用DataExporter导出到三个表
print(f"\n[导出] 开始导出到数据库...")
try:
exporter = DataExporter(use_database=False)
# 临时替换整合数据文件路径
original_file = exporter.integrated_file
exporter.integrated_file = integrated_file
# 导出三个表的数据
result = exporter.export_all_tables()
# 恢复原路径
exporter.integrated_file = original_file
if result:
print(f"\n[OK] {self.target_date_str} 数据处理完成")
return True
else:
print(f"\n[!] {self.target_date_str} 数据导出失败")
return False
except Exception as e:
print(f"[X] 导出数据失败: {e}")
import traceback
traceback.print_exc()
return False
def main():
"""主函数"""
parser = argparse.ArgumentParser(
description='获取指定日期的百家号统计数据',
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
示例用法:
python fetch_date_statistics.py 2025-12-01
python fetch_date_statistics.py 2025-12-15
注意事项:
1. 由于百家号API限制无法获取历史日期的收入数据
2. 脚本会从ai_articles表统计发文量数据
3. 收入字段将被设置为0需要在数据产生当天运行才能获取真实收入
"""
)
parser.add_argument(
'date',
type=str,
help='目标日期 (格式: YYYY-MM-DD)'
)
parser.add_argument(
'--no-proxy',
action='store_true',
help='禁用代理(默认启用天启代理)'
)
args = parser.parse_args()
# 验证日期格式
try:
datetime.strptime(args.date, '%Y-%m-%d')
except ValueError:
print(f"[X] 日期格式错误: {args.date}")
print(" 正确格式: YYYY-MM-DD (例如: 2025-12-01)")
return 1
print("\n" + "="*70)
print("百家号指定日期统计数据获取工具")
print("="*70)
print(f"目标日期: {args.date}")
print("="*70)
try:
fetcher = DateStatisticsFetcher(args.date, use_proxy=not args.no_proxy)
success = fetcher.process_single_date()
return 0 if success else 1
except Exception as e:
print(f"\n[X] 程序执行出错: {e}")
import traceback
traceback.print_exc()
return 1
if __name__ == '__main__':
sys.exit(main())

View File

@@ -154,10 +154,8 @@ class CSVImporter:
continue
try:
# 处理slide_ratio值
# 处理slide_ratio值CSV中已是小数格式
slide_ratio_value = float(self.convert_value(row.get('slide_ratio', '0'), 'float') or 0.0)
if slide_ratio_value > 10:
slide_ratio_value = slide_ratio_value / 100
slide_ratio_value = min(slide_ratio_value, 9.9999)
# 获取channel
@@ -271,9 +269,8 @@ class CSVImporter:
continue
try:
# 处理avg_slide_ratio值CSV中已是小数格式
avg_slide_ratio_value = float(self.convert_value(row.get('avg_slide_ratio', '0'), 'float') or 0.0)
if avg_slide_ratio_value > 10:
avg_slide_ratio_value = avg_slide_ratio_value / 100
avg_slide_ratio_value = min(avg_slide_ratio_value, 9.9999)
# 获取channel并查询author_id
@@ -348,13 +345,14 @@ class CSVImporter:
return success_count > 0
def import_ai_statistics_days(self, batch_size: int = 50) -> bool:
"""导入 ai_statistics_days 表数据(使用批量提交
"""导入 ai_statistics_days 表数据(仅当日数据day_revenue
同时自动拆分数据到 ai_statistics_weekly 和 ai_statistics_monthly 表
Args:
batch_size: 批量提交大小默认50条
"""
print("\n" + "="*70)
print("开始导入 ai_statistics_days 表数据")
print("开始导入 ai_statistics_days 表数据拆分到3个表")
print("="*70)
csv_file = self.csv_files['ai_statistics_days']
@@ -365,14 +363,27 @@ class CSVImporter:
self.logger.warning("ai_statistics_days表没有数据可导入")
return False
self.logger.info(f"开始导入ai_statistics_days表数据,共 {len(rows)} 条记录,批量大小: {batch_size}")
print(f"\n总计 {len(rows)} 条记录,分批导入(每批 {batch_size} 条)\n")
self.logger.info(f"开始导入数据,共 {len(rows)} 条记录,批量大小: {batch_size}")
print(f"\n总计 {len(rows)} 条记录,将拆分到3个表\n")
success_count = 0
# 三个表的统计
days_success = 0
weekly_success = 0
monthly_success = 0
failed_count = 0
batch_params = []
first_record_keys = None
sql_template = None
# 批量参数
days_batch = []
weekly_batch = []
monthly_batch = []
# SQL模板
days_sql = None
weekly_sql = None
monthly_sql = None
days_keys = None
weekly_keys = None
monthly_keys = None
for idx, row in enumerate(rows, 1):
author_name = row.get('author_name', '').strip()
@@ -388,68 +399,153 @@ class CSVImporter:
failed_count += 1
continue
# 处理day_revenue字段每日收益
day_revenue_value = self.convert_value(row.get('day_revenue', '0'), 'decimal')
if day_revenue_value is None:
day_revenue_value = Decimal('0')
stat_date = row.get('stat_date', '').strip()
record = {
# 1. ai_statistics_days 表数据(仅当日数据)
day_revenue = self.convert_value(row.get('day_revenue', '0'), 'decimal') or Decimal('0')
daily_published_count = self.convert_value(row.get('daily_published_count', '0'), 'int') or 0
cumulative_published_count = self.convert_value(row.get('cumulative_published_count', '0'), 'int') or 0
days_record = {
'author_id': author_id,
'author_name': author_name,
'channel': channel,
'stat_date': row.get('stat_date', '').strip(),
'daily_published_count': self.convert_value(row.get('daily_published_count', '0'), 'int') or 0,
'cumulative_published_count': self.convert_value(row.get('cumulative_published_count', '0'), 'int') or 0,
'day_revenue': day_revenue_value, # 每日收益
'monthly_revenue': self.convert_value(row.get('monthly_revenue', '0'), 'decimal') or Decimal('0'),
'weekly_revenue': self.convert_value(row.get('weekly_revenue', '0'), 'decimal') or Decimal('0'),
'revenue_mom_growth_rate': self.convert_value(row.get('revenue_mom_growth_rate', '0'), 'decimal') or Decimal('0'),
'revenue_wow_growth_rate': self.convert_value(row.get('revenue_wow_growth_rate', '0'), 'decimal') or Decimal('0'),
'updated_at': datetime.now().strftime('%Y-%m-%d %H:%M:%S'), # 添加更新时间戳,强制更新
'stat_date': stat_date,
'daily_published_count': daily_published_count,
'day_revenue': day_revenue,
'updated_at': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
}
if sql_template is None:
first_record_keys = list(record.keys())
columns = ', '.join(first_record_keys)
placeholders = ', '.join(['%s'] * len(first_record_keys))
update_parts = [f"{key} = VALUES({key})" for key in first_record_keys if key not in ['author_name', 'channel', 'stat_date']]
sql_template = f"""
# 2. ai_statistics_weekly 表数据
weekly_revenue = self.convert_value(row.get('weekly_revenue', '0'), 'decimal') or Decimal('0')
revenue_wow_growth_rate = self.convert_value(row.get('revenue_wow_growth_rate', '0'), 'decimal') or Decimal('0')
# 计算该日期所在周次格式WW如51
from datetime import datetime as dt, timedelta
date_obj = dt.strptime(stat_date, '%Y-%m-%d')
# 使用isocalendar()获取ISO周数周一为一周开始
year, week_num, _ = date_obj.isocalendar()
stat_weekly = week_num # 直接使用数字
weekly_record = {
'author_id': author_id,
'author_name': author_name,
'channel': channel,
'stat_weekly': stat_weekly,
'weekly_revenue': weekly_revenue,
'revenue_wow_growth_rate': revenue_wow_growth_rate,
'updated_at': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
}
# 3. ai_statistics_monthly 表数据
monthly_revenue = self.convert_value(row.get('monthly_revenue', '0'), 'decimal') or Decimal('0')
revenue_mom_growth_rate = self.convert_value(row.get('revenue_mom_growth_rate', '0'), 'decimal') or Decimal('0')
# 计算该日期所在月份格式YYYY-MM如2025-12
stat_monthly = date_obj.strftime('%Y-%m')
monthly_record = {
'author_id': author_id,
'author_name': author_name,
'channel': channel,
'stat_monthly': stat_monthly,
'monthly_revenue': monthly_revenue,
'revenue_mom_growth_rate': revenue_mom_growth_rate,
'updated_at': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
}
# 构建SQL模板首次
if days_sql is None:
days_keys = list(days_record.keys())
columns = ', '.join(days_keys)
placeholders = ', '.join(['%s'] * len(days_keys))
update_parts = [f"{key} = VALUES({key})" for key in days_keys if key not in ['author_name', 'channel', 'stat_date']]
days_sql = f"""
INSERT INTO ai_statistics_days ({columns})
VALUES ({placeholders})
ON DUPLICATE KEY UPDATE {', '.join(update_parts)}
"""
if first_record_keys is not None:
batch_params.append(tuple(record[key] for key in first_record_keys))
if weekly_sql is None:
weekly_keys = list(weekly_record.keys())
columns = ', '.join(weekly_keys)
placeholders = ', '.join(['%s'] * len(weekly_keys))
update_parts = [f"{key} = VALUES({key})" for key in weekly_keys if key not in ['author_name', 'channel', 'stat_weekly']]
weekly_sql = f"""
INSERT INTO ai_statistics_weekly ({columns})
VALUES ({placeholders})
ON DUPLICATE KEY UPDATE {', '.join(update_parts)}
"""
if len(batch_params) >= batch_size or idx == len(rows):
if monthly_sql is None:
monthly_keys = list(monthly_record.keys())
columns = ', '.join(monthly_keys)
placeholders = ', '.join(['%s'] * len(monthly_keys))
update_parts = [f"{key} = VALUES({key})" for key in monthly_keys if key not in ['author_name', 'channel', 'stat_monthly']]
monthly_sql = f"""
INSERT INTO ai_statistics_monthly ({columns})
VALUES ({placeholders})
ON DUPLICATE KEY UPDATE {', '.join(update_parts)}
"""
# 添加到批量参数
days_batch.append(tuple(days_record[key] for key in days_keys))
weekly_batch.append(tuple(weekly_record[key] for key in weekly_keys))
monthly_batch.append(tuple(monthly_record[key] for key in monthly_keys))
# 批量提交
if len(days_batch) >= batch_size or idx == len(rows):
try:
result_count = self.db_manager.execute_many(sql_template, batch_params, autocommit=True)
success_count += result_count
print(f"[批次提交] 已导入 {success_count} 条记录(本批: {result_count}/{len(batch_params)}")
self.logger.info(f"ai_statistics_days表批量提交: {result_count}/{len(batch_params)}记录")
batch_params = []
except Exception as batch_error:
failed_count += len(batch_params)
print(f" [X] 批次提交失败: {batch_error}")
self.logger.error(f"ai_statistics_days表批量提交失败: {batch_error}")
batch_params = []
# 提交 ai_statistics_days
result = self.db_manager.execute_many(days_sql, days_batch, autocommit=True)
days_success += result
print(f"[days] 已导入 {days_success}")
days_batch = []
except Exception as e:
print(f" [X] days表提交失败: {e}")
self.logger.error(f"ai_statistics_days批量提交失败: {e}")
failed_count += len(days_batch)
days_batch = []
try:
# 提交 ai_statistics_weekly
result = self.db_manager.execute_many(weekly_sql, weekly_batch, autocommit=True)
weekly_success += result
print(f"[weekly] 已导入 {weekly_success}")
weekly_batch = []
except Exception as e:
print(f" [X] weekly表提交失败: {e}")
self.logger.error(f"ai_statistics_weekly批量提交失败: {e}")
weekly_batch = []
try:
# 提交 ai_statistics_monthly
result = self.db_manager.execute_many(monthly_sql, monthly_batch, autocommit=True)
monthly_success += result
print(f"[monthly] 已导入 {monthly_success}")
monthly_batch = []
except Exception as e:
print(f" [X] monthly表提交失败: {e}")
self.logger.error(f"ai_statistics_monthly批量提交失败: {e}")
monthly_batch = []
except Exception as e:
failed_count += 1
print(f" [X] 处理失败 ({author_name}): {e}")
self.logger.error(f"ai_statistics_days表处理失败: {author_name}, 错误: {e}")
self.logger.error(f"数据处理失败: {author_name}, 错误: {e}")
continue
print("\n" + "="*70)
print(f"[OK] ai_statistics_days 表数据导入完成")
print(f" 成功: {success_count}记录")
print(f"[OK] 数据导入完成拆分到3个表")
print(f" ai_statistics_days: {days_success}")
print(f" ai_statistics_weekly: {weekly_success}")
print(f" ai_statistics_monthly: {monthly_success}")
if failed_count > 0:
print(f" 失败: {failed_count}记录")
print(f" 失败: {failed_count}")
print("="*70)
self.logger.info(f"ai_statistics_days表数据导入完成: 成功 {success_count} 条,失败 {failed_count}")
return success_count > 0
self.logger.info(f"数据导入完成: days={days_success}, weekly={weekly_success}, monthly={monthly_success}, failed={failed_count}")
return days_success > 0
def import_all(self) -> bool:
"""导入所有CSV文件"""

0
input.txt Normal file
View File

0
query_statistics.py Normal file
View File

15
sms_config.json Normal file
View File

@@ -0,0 +1,15 @@
{
"阿里云配置说明": "请填写您的阿里云短信服务配置",
"access_key_id": "LTAI5tSMvnCJdqkZtCVWgh8R",
"access_key_secret": "nyFzXyIi47peVLK4wR2qqbPezmU79W",
"sign_name": "北京乐航时代科技",
"template_code": "SMS_486210104",
"phone_numbers": "13621242430",
"endpoint": "dysmsapi.aliyuncs.com",
"注意事项": [
"access_key_id 和 access_key_secret 可在 https://ram.console.aliyun.com/manage/ak 获取",
"sign_name 需要在阿里云短信服务控制台申请并通过审核",
"template_code 是短信模板代码,需要在阿里云短信服务控制台申请",
"phone_numbers 可以配置多个手机号,用逗号分隔,如: 13621242430,13800138000"
]
}

View File

@@ -277,17 +277,44 @@ class CookieSyncToDB:
# 转换Cookie为字符串
cookie_string = self.cookie_dict_to_string(cookies)
# 提取其他信息
# 提取其他信息使用username和nick作为author_name进行匹配
username = account_info.get('username', '') # 优先用于与数据库author_name匹配
nick = account_info.get('nick', '') # 备用匹配字段
app_id = account_info.get('app_id', '')
nick = account_info.get('nick', '')
domain = account_info.get('domain', '')
level = account_info.get('level', '')
# 查找作者(使用 author_name + channel 作为唯一键)
# 验证username或nick至少有一个存在
if not username and not nick:
print(" [!] 该账号没有username和nick字段跳过")
stats['skipped'] += 1
continue
print(f" Username: {username}")
print(f" 昵称: {nick}")
# 查找作者使用双重匹配机制先username后nick
channel = 1 # 百家号固定为channel=1
author = self.find_author_by_name(account_name, channel)
author = None
matched_field = None
# 1. 首先尝试使用username匹配
if username:
author = self.find_author_by_name(username, channel)
if author:
print(f" [√] 找到作者: {author['author_name']} (ID: {author['id']}, Channel: {author['channel']})")
matched_field = 'username'
print(f" [√] 通过username匹配到作者: {author['author_name']} (ID: {author['id']}, Channel: {author['channel']})")
# 2. 如果username匹配失败尝试使用nick匹配
if not author and nick:
author = self.find_author_by_name(nick, channel)
if author:
matched_field = 'nick'
print(f" [√] 通过nick匹配到作者: {author['author_name']} (ID: {author['id']}, Channel: {author['channel']})")
# 3. 如果都没匹配到
if not author:
print(f" [!] 未找到匹配的作者已尝试username和nick")
# 更新或创建
if author:
@@ -300,15 +327,17 @@ class CookieSyncToDB:
)
if success:
print(f" [OK] Cookie已更新")
print(f" [OK] Cookie已更新(匹配字段: {matched_field}")
stats['updated'] += 1
# 记录成功
success_records.append({
'account_name': account_name,
'app_id': app_id,
'username': username,
'nick': nick,
'app_id': app_id,
'domain': domain,
'action': 'updated',
'matched_field': matched_field,
'db_author_id': author['id'],
'db_author_name': author['author_name']
})
@@ -318,17 +347,21 @@ class CookieSyncToDB:
# 记录失败
failed_records.append({
'account_name': account_name,
'app_id': app_id,
'username': username,
'nick': nick,
'app_id': app_id,
'reason': '数据库更新失败',
'matched_field': matched_field,
'db_author_id': author['id']
})
else:
# 作者不存在
# 作者不存在,考虑创建
if auto_create:
print(f" [*] 作者不存在,创建新记录...")
# 优先使用username如果没有则使用nick
author_name_to_create = username if username else nick
print(f" [*] 作者不存在创建新记录author_name: {author_name_to_create}...")
success = self.insert_new_author(
account_name,
author_name_to_create, # 优先使用username否则使用nick
cookie_string,
app_id,
nick,
@@ -337,15 +370,17 @@ class CookieSyncToDB:
)
if success:
print(f" [OK] 新作者已创建")
print(f" [OK] 新作者已创建 (author_name: {author_name_to_create})")
stats['created'] += 1
# 记录成功
success_records.append({
'account_name': account_name,
'app_id': app_id,
'username': username,
'nick': nick,
'app_id': app_id,
'domain': domain,
'action': 'created'
'action': 'created',
'created_with': 'username' if username else 'nick'
})
else:
print(f" [X] 创建作者失败")
@@ -353,8 +388,9 @@ class CookieSyncToDB:
# 记录失败
failed_records.append({
'account_name': account_name,
'app_id': app_id,
'username': username,
'nick': nick,
'app_id': app_id,
'reason': '数据库插入失败'
})
else:
@@ -363,9 +399,10 @@ class CookieSyncToDB:
# 记录失败(数据库中不存在)
failed_records.append({
'account_name': account_name,
'app_id': app_id,
'username': username,
'nick': nick,
'reason': '数据库中不存在该账号,且未开启自动创建'
'app_id': app_id,
'reason': '数据库中不存在该账号已尝试username和nick且未开启自动创建'
})
# 保存记录文件

40
test_validation_sms.bat Normal file
View File

@@ -0,0 +1,40 @@
@echo off
chcp 65001 >nul
echo ============================================================
echo 数据验证与短信告警系统 - 快速测试
echo ============================================================
echo.
echo [步骤1] 检查Python环境...
python --version
if %errorlevel% neq 0 (
echo [错误] Python未安装或未添加到PATH
pause
exit /b 1
)
echo.
echo [步骤2] 测试短信发送功能...
echo.
python data_validation_with_sms.py --test-sms
if %errorlevel% neq 0 (
echo.
echo [错误] 短信发送测试失败
echo 请检查:
echo 1. 阿里云SDK是否已安装
echo 2. sms_config.json配置是否正确
echo 3. AccessKey和Secret是否有效
pause
exit /b 1
)
echo.
echo [步骤3] 执行数据验证...
echo.
python data_validation_with_sms.py
echo.
echo ============================================================
echo 测试完成!
echo ============================================================
pause

47
test_validation_sms.sh Normal file
View File

@@ -0,0 +1,47 @@
#!/bin/bash
# 数据验证与短信告警系统 - 快速测试Linux版本
echo "============================================================"
echo "数据验证与短信告警系统 - 快速测试"
echo "============================================================"
echo ""
echo "[步骤1] 检查Python环境..."
if command -v python3 &> /dev/null; then
PYTHON_CMD=python3
elif command -v python &> /dev/null; then
PYTHON_CMD=python
else
echo "[错误] Python未安装或未添加到PATH"
exit 1
fi
$PYTHON_CMD --version
if [ $? -ne 0 ]; then
echo "[错误] Python版本检查失败"
exit 1
fi
echo ""
echo "[步骤2] 测试短信发送功能..."
echo ""
$PYTHON_CMD data_validation_with_sms.py --test-sms
if [ $? -ne 0 ]; then
echo ""
echo "[错误] 短信发送测试失败"
echo "请检查:"
echo " 1. 阿里云SDK是否已安装"
echo " 2. sms_config.json配置是否正确"
echo " 3. AccessKey和Secret是否有效"
exit 1
fi
echo ""
echo "[步骤3] 执行数据验证..."
echo ""
$PYTHON_CMD data_validation_with_sms.py
echo ""
echo "============================================================"
echo "测试完成!"
echo "============================================================"

151
update_day_revenue.py Normal file
View File

@@ -0,0 +1,151 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
更新day_revenue脚本
功能从CSV文件读取数据只更新ai_statistics_days表中的day_revenue字段
"""
import os
import sys
import csv
from typing import List, Dict
from decimal import Decimal
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
from database_config import DatabaseManager
class DayRevenueUpdater:
"""day_revenue字段更新器"""
def __init__(self):
"""初始化"""
self.db_manager = DatabaseManager()
self.script_dir = os.path.dirname(os.path.abspath(__file__))
self.csv_file = os.path.join(self.script_dir, 'ai_statistics_days.csv')
print(f"[初始化] CSV文件: {self.csv_file}")
def read_csv_data(self) -> List[Dict]:
"""读取CSV文件数据
Returns:
数据列表
"""
if not os.path.exists(self.csv_file):
print(f"[X] CSV文件不存在: {self.csv_file}")
return []
try:
with open(self.csv_file, 'r', encoding='utf-8-sig') as f:
reader = csv.DictReader(f)
data = list(reader)
print(f"[OK] 读取到 {len(data)} 条记录")
return data
except Exception as e:
print(f"[X] 读取CSV失败: {e}")
import traceback
traceback.print_exc()
return []
def update_day_revenue(self, batch_size: int = 50) -> bool:
"""更新day_revenue字段
Args:
batch_size: 批量更新大小
Returns:
是否成功
"""
# 读取CSV数据
csv_data = self.read_csv_data()
if not csv_data:
print("[!] 没有数据需要更新")
return False
print(f"\n[开始] 更新day_revenue字段...")
# 准备批量更新
update_sql = """
UPDATE ai_statistics_days
SET day_revenue = %s,
updated_at = NOW()
WHERE author_id = %s
AND stat_date = %s
AND channel = %s
"""
success_count = 0
failed_count = 0
not_found_count = 0
# 逐条更新
for idx, row in enumerate(csv_data, 1):
try:
author_id = int(row.get('author_id', 0))
stat_date = row.get('stat_date', '')
channel = int(row.get('channel', 1))
day_revenue = Decimal(row.get('day_revenue', '0.00'))
# 执行更新
affected_rows = self.db_manager.execute_update(
update_sql,
(day_revenue, author_id, stat_date, channel)
)
if affected_rows > 0:
success_count += 1
print(f" [{idx}/{len(csv_data)}] ✓ 更新成功: author_id={author_id}, stat_date={stat_date}, day_revenue={day_revenue}")
else:
not_found_count += 1
print(f" [{idx}/{len(csv_data)}] - 未找到记录: author_id={author_id}, stat_date={stat_date}")
except Exception as e:
failed_count += 1
print(f" [{idx}/{len(csv_data)}] ✗ 更新失败: {e}")
print(f" 数据: {row}")
# 输出统计
print(f"\n{'='*70}")
print(f"更新完成")
print(f"{'='*70}")
print(f"成功更新: {success_count}/{len(csv_data)}")
print(f"未找到记录: {not_found_count}/{len(csv_data)}")
print(f"更新失败: {failed_count}/{len(csv_data)}")
print(f"{'='*70}")
return failed_count == 0
def main():
"""主函数"""
print("\n" + "="*70)
print("day_revenue字段批量更新工具")
print("="*70)
print("功能:从 ai_statistics_days.csv 读取数据,只更新数据库中的 day_revenue 字段")
print("="*70)
try:
updater = DayRevenueUpdater()
# 确认执行
confirm = input("\n是否开始更新? (y/n): ").strip().lower()
if confirm != 'y':
print("已取消")
return 0
success = updater.update_day_revenue()
return 0 if success else 1
except Exception as e:
print(f"\n[X] 程序执行出错: {e}")
import traceback
traceback.print_exc()
return 1
if __name__ == '__main__':
sys.exit(main())

View File

@@ -0,0 +1,8 @@
数据验证报告
======================================================================
生成时间: 2025-12-30 10:21:31
目标日期: 2025-12-30
顺序验证结果
----------------------------------------------------------------------

View File

@@ -0,0 +1,34 @@
数据验证报告
======================================================================
生成时间: 2025-12-30 11:33:38
目标日期: 2025-12-29
顺序验证结果
----------------------------------------------------------------------
json vs csv
顺序匹配: 是
json 记录数: 84
csv 记录数: 84
交叉验证结果
----------------------------------------------------------------------
json vs csv
共同记录: 84
仅在json: 0
仅在csv: 0
字段不匹配: 0
json vs database
共同记录: 84
仅在json: 0
仅在database: 0
字段不匹配: 0
csv vs database
共同记录: 84
仅在csv: 0
仅在database: 0
字段不匹配: 0

View File

@@ -28,12 +28,12 @@ logger = setup_baidu_crawl_logger()
# 简单的代理获取配置 - 大麦代理IP
PROXY_API_URL = (
'https://api2.damaiip.com/index.php?s=/front/user/getIPlist&xsn=e054861d08471263d970bde4f4905181&osn=TC_NO176655872088456223&tiqu=1'
'https://api2.damaiip.com/index.php?s=/front/user/getIPlist&xsn=2912cb2b22d3b7ae724f045012790479&osn=TC_NO176707424165606223&tiqu=1'
)
# 大麦代理账号密码认证
PROXY_USERNAME = '694b8c3172af7'
PROXY_PASSWORD = 'q8yA8x1dwCpdyIK'
PROXY_USERNAME = '69538fdef04e1'
PROXY_PASSWORD = '63v0kQBr2yJXnjf'
# 备用固定代理IP池格式'IP:端口', '用户名', '密码'
BACKUP_PROXY_POOL = [

View File

@@ -27,12 +27,12 @@ if 'https_proxy' in os.environ:
# 简单的代理获取配置 - 大麦代理IP
PROXY_API_URL = (
'https://api2.damaiip.com/index.php?s=/front/user/getIPlist&xsn=e054861d08471263d970bde4f4905181&osn=TC_NO176655872088456223&tiqu=1'
'https://api2.damaiip.com/index.php?s=/front/user/getIPlist&xsn=2912cb2b22d3b7ae724f045012790479&osn=TC_NO176707424165606223&tiqu=1'
)
# 大麦代理账号密码认证
PROXY_USERNAME = '694b8c3172af7'
PROXY_PASSWORD = 'q8yA8x1dwCpdyIK'
PROXY_USERNAME = '69538fdef04e1'
PROXY_PASSWORD = '63v0kQBr2yJXnjf'
# 备用固定代理IP池格式'IP:端口', '用户名', '密码'
BACKUP_PROXY_POOL = [

File diff suppressed because it is too large Load Diff

View File

View File

View File

@@ -0,0 +1,341 @@
# 📱 数据验证与短信告警系统
## 🎯 功能概述
自动化数据验证系统,每天定时检查数据一致性,发现问题时通过阿里云短信服务发送告警。
**核心功能:**
- ✅ 自动验证 JSON/CSV/数据库 三个数据源的一致性
- ✅ 验证失败自动发送短信告警错误代码2222
- ✅ 支持定时任务每天上午9点执行
- ✅ 生成详细的验证报告
- ✅ 支持多手机号接收告警
---
## 📁 文件结构
```
xhh_baijiahao/
├── data_validation.py # 数据验证核心模块
├── data_validation_with_sms.py # 数据验证+短信告警集成脚本 ⭐
├── sms_config.json # 短信服务配置文件 ⭐
├── test_validation_sms.bat # Windows快速测试脚本
├── 数据验证短信告警使用说明.md # 详细使用文档
└── ai_sms/ # 阿里云短信SDK示例
└── alibabacloud_sample/
└── sample.py
```
---
## 🚀 快速开始5分钟
### 1⃣ 安装依赖
```bash
pip install alibabacloud_dysmsapi20170525 alibabacloud_credentials alibabacloud_tea_openapi alibabacloud_tea_util
```
### 2⃣ 配置短信服务
编辑 `sms_config.json`
```json
{
"access_key_id": "您的AccessKey ID",
"access_key_secret": "您的AccessKey Secret",
"sign_name": "北京乐航时代科技",
"template_code": "SMS_486210104",
"phone_numbers": "13621242430"
}
```
**获取AccessKey** https://ram.console.aliyun.com/manage/ak
### 3⃣ 测试运行
**Windows用户双击运行**
```
test_validation_sms.bat
```
**命令行运行:**
```bash
# 测试短信发送
python data_validation_with_sms.py --test-sms
# 执行数据验证
python data_validation_with_sms.py
```
### 4⃣ 配置定时任务
```bash
# 查看配置命令
python data_validation_with_sms.py --setup-schedule
```
按照提示配置Windows任务计划程序设置每天9点自动执行。
---
## 📖 使用场景
### 场景1每日自动验证
**定时任务配置每天9点**
- 程序:`C:\Python\python.exe`
- 参数:`D:\workspace\xhh_baijiahao\data_validation_with_sms.py`
- 触发器每天上午9:00
**执行流程:**
1. 自动验证昨天的数据JSON/CSV/数据库)
2. 如果发现问题 → 发送短信告警错误代码2222
3. 生成详细验证报告
### 场景2手动验证指定日期
```bash
# 验证2025-12-29的数据
python data_validation_with_sms.py --date 2025-12-29
# 验证指定表
python data_validation_with_sms.py --table ai_statistics_day --date 2025-12-29
# 只验证特定数据源
python data_validation_with_sms.py --source csv database
```
### 场景3仅验证不发短信
```bash
# 适用于调试或测试
python data_validation_with_sms.py --no-sms
```
---
## 🔔 短信告警说明
### 触发条件
发送短信告警(错误代码:**2222**)的情况:
| 问题类型 | 说明 | 示例 |
|---------|-----|-----|
| **顺序不一致** | JSON和CSV记录顺序不同 | 账号A在JSON第1位CSV第3位 |
| **缺失记录** | 某个数据源少了记录 | JSON有10条CSV只有9条 |
| **多余记录** | 某个数据源多了记录 | CSV有11条数据库只有10条 |
| **字段差异** | 相同记录的字段值不同 | 阅读量JSON=1000, CSV=999 |
### 短信内容
```
【北京乐航时代科技】您的验证码是2222
```
> 💡 **说明**:由于使用验证码模板,错误代码固定为 `2222`。具体错误详情请查看验证报告文件。
### 多号码配置
`sms_config.json` 中配置多个接收号码:
```json
{
"phone_numbers": "13621242430,13800138000,13900139000"
}
```
---
## 📊 验证报告
每次验证自动生成详细报告:
```
validation_report_20250104_090523.txt
```
**报告内容:**
- ✅ 顺序验证结果
- ✅ 交叉验证结果
- ✅ 数据差异统计
- ✅ 详细错误列表(记录级别)
**示例报告片段:**
```
交叉验证结果
----------------------------------------------------------------------
json vs csv
共同记录: 48 条
仅在json: 0 条
仅在csv: 2 条
字段不匹配: 3 条
仅在csv中的记录前5条:
- 测试账号1|1
- 测试账号2|1
字段值不匹配的记录前3条:
记录: 主力账号|1
字段 read_count:
json: 150000
csv: 149999
```
---
## ⚙️ 高级配置
### 环境变量方式(更安全)
**Windows PowerShell**
```powershell
$env:ALIBABA_CLOUD_ACCESS_KEY_ID="您的AccessKey ID"
$env:ALIBABA_CLOUD_ACCESS_KEY_SECRET="您的AccessKey Secret"
```
**Linux/macOS**
```bash
export ALIBABA_CLOUD_ACCESS_KEY_ID="您的AccessKey ID"
export ALIBABA_CLOUD_ACCESS_KEY_SECRET="您的AccessKey Secret"
```
### 配置优先级
1. **环境变量** (最高优先级)
2. **sms_config.json** 配置文件
3. **代码中的默认值**
---
## 🔧 故障排查
### ❌ 短信发送失败
**常见原因:**
1. AccessKey ID/Secret 不正确
2. 短信签名未审核通过
3. 短信模板未审核通过
4. 账户余额不足
5. 手机号格式错误
**解决方法:**
```bash
# 测试短信发送
python data_validation_with_sms.py --test-sms
```
查看控制台输出的详细错误信息和诊断地址。
### ❌ 导入错误
```
ImportError: No module named 'alibabacloud_dysmsapi20170525'
```
**解决方法:**
```bash
pip install alibabacloud_dysmsapi20170525 alibabacloud_credentials alibabacloud_tea_openapi alibabacloud_tea_util
```
### ❌ 数据库连接失败
检查 `database_config.py` 配置是否正确。
### ❌ 定时任务不执行
**检查项:**
1. 任务计划程序中任务状态
2. Python路径和脚本路径是否正确
3. 查看任务历史记录
---
## 📝 命令行参数
```bash
python data_validation_with_sms.py [参数]
```
| 参数 | 说明 | 示例 |
|-----|-----|-----|
| `--date` | 指定验证日期 | `--date 2025-12-29` |
| `--source` | 指定数据源 | `--source json csv` |
| `--table` | 指定验证表 | `--table ai_statistics_day` |
| `--setup-schedule` | 配置定时任务 | `--setup-schedule` |
| `--test-sms` | 测试短信功能 | `--test-sms` |
| `--no-sms` | 禁用短信发送 | `--no-sms` |
---
## 🔐 安全建议
1.**使用环境变量存储敏感信息**
- 不要将 AccessKey 提交到 Git
-`sms_config.json` 添加到 `.gitignore`
2.**定期轮换 AccessKey**
- 建议每3-6个月更换一次
3.**使用 RAM 子账号**
- 为短信服务创建专用子账号
- 仅授予短信发送权限
4.**设置 IP 白名单**
- 在阿里云 RAM 控制台限制访问 IP
---
## 📞 技术支持
### 阿里云短信服务
- 控制台https://dysms.console.aliyun.com/
- 文档https://help.aliyun.com/product/44282.html
- API参考https://api.aliyun.com/product/Dysmsapi
### 常见问题
**Q短信收不到**
A检查手机号是否正确短信签名和模板是否已审核通过。
**Q如何查看短信发送记录**
A登录阿里云短信服务控制台 → 业务统计 → 发送记录查询。
**Q短信费用多少**
A验证码短信约0.045元/条,具体价格以阿里云官网为准。
**Q可以自定义短信内容吗**
A需要在阿里云控制台申请新的短信模板审核通过后修改 `template_code` 配置。
---
## 🎉 快速测试检查清单
- [ ] 安装Python依赖
- [ ] 配置 `sms_config.json`
- [ ] 运行 `test_validation_sms.bat`
- [ ] 收到测试短信
- [ ] 执行数据验证
- [ ] 生成验证报告
- [ ] 配置定时任务
---
## 📅 版本历史
### v1.0.0 (2025-01-04)
- ✨ 初始版本发布
- ✅ 数据验证功能
- ✅ 阿里云短信告警
- ✅ 定时任务支持
- ✅ 多数据源支持JSON/CSV/数据库)
- ✅ 详细验证报告
- ✅ 配置文件支持
---
**开发团队:** 北京乐航时代科技
**最后更新:** 2025-01-04

View File

@@ -0,0 +1,294 @@
# 数据验证与短信告警系统 - 使用说明
## 📋 功能概述
自动执行数据验证JSON/CSV/数据库),当验证失败时通过阿里云短信服务发送告警通知。
**核心功能:**
- ✅ 每天定时执行数据验证默认上午9点
- ✅ 验证失败自动发送短信告警错误代码2222
- ✅ 支持多个手机号接收告警
- ✅ 生成详细的验证报告
---
## 🚀 快速开始
### 1. 安装依赖
```bash
# 安装阿里云短信SDK
pip install alibabacloud_dysmsapi20170525
pip install alibabacloud_credentials
pip install alibabacloud_tea_openapi
pip install alibabacloud_tea_util
```
### 2. 配置短信服务
编辑 `sms_config.json` 文件,填写您的阿里云配置:
```json
{
"access_key_id": "您的AccessKey ID",
"access_key_secret": "您的AccessKey Secret",
"sign_name": "您的短信签名",
"template_code": "SMS_486210104",
"phone_numbers": "13621242430,13800138000"
}
```
**获取方式:**
- AccessKeyhttps://ram.console.aliyun.com/manage/ak
- 短信签名和模板https://dysms.console.aliyun.com/
### 3. 测试短信功能
```bash
# 测试短信发送
python data_validation_with_sms.py --test-sms
```
### 4. 手动执行验证
```bash
# 验证昨天的数据
python data_validation_with_sms.py
# 验证指定日期
python data_validation_with_sms.py --date 2025-12-29
# 验证特定表
python data_validation_with_sms.py --table ai_statistics_day
```
### 5. 配置定时任务
**Windows系统**
```bash
# 查看任务计划配置命令
python data_validation_with_sms.py --setup-schedule
```
然后使用管理员权限运行显示的PowerShell命令或者手动配置
1. 打开 `任务计划程序` (按 Win+R输入 `taskschd.msc`)
2. 点击 "创建基本任务"
3. 填写任务信息:
- **名称**: DataValidationWithSMS
- **描述**: 每天9点执行数据验证并发送短信告警
4. 触发器:**每天**,时间:**上午 9:00**
5. 操作:**启动程序**
- 程序:`C:\Python\python.exe` (您的Python路径)
- 参数:`D:\workspace\xhh_baijiahao\data_validation_with_sms.py`
6. 完成
**Linux/macOS系统**
编辑crontab
```bash
crontab -e
```
添加定时任务每天9点执行
```
0 9 * * * /usr/bin/python3 /path/to/data_validation_with_sms.py
```
---
## 📖 使用示例
### 示例1验证JSON和CSV
```bash
python data_validation_with_sms.py --source json csv
```
### 示例2验证CSV和数据库
```bash
python data_validation_with_sms.py --source csv database
```
### 示例3完整验证三个数据源
```bash
python data_validation_with_sms.py --source json csv database
```
### 示例4验证 ai_statistics_day 表
```bash
python data_validation_with_sms.py --table ai_statistics_day --source csv database
```
---
## 📧 短信告警说明
### 触发条件
以下情况会发送短信告警错误代码2222
1. **顺序不一致**JSON和CSV中记录的顺序不匹配
2. **缺失记录**:某个数据源缺少记录
3. **多余记录**:某个数据源有多余的记录
4. **字段差异**:相同记录的字段值不一致
### 短信内容
短信格式(使用验证码模板):
```
【北京乐航时代科技】您的验证码是2222
```
**说明**:由于使用的是验证码模板,错误代码固定为 `2222`,具体错误详情请查看生成的验证报告文件。
### 多号码配置
`sms_config.json` 中配置多个手机号:
```json
{
"phone_numbers": "13621242430,13800138000,13900139000"
}
```
---
## 📊 验证报告
每次验证后会自动生成详细报告,保存在项目根目录:
```
validation_report_20250104_090000.txt
```
报告内容包括:
- 顺序验证结果
- 交叉验证结果
- 数据差异统计
- 详细的错误列表
---
## ⚙️ 配置说明
### SMSAlertConfig 类
`data_validation_with_sms.py` 中可以修改配置:
```python
class SMSAlertConfig:
# 阿里云访问凭据
ACCESS_KEY_ID = os.environ.get('ALIBABA_CLOUD_ACCESS_KEY_ID', '默认值')
ACCESS_KEY_SECRET = os.environ.get('ALIBABA_CLOUD_ACCESS_KEY_SECRET', '默认值')
# 短信签名和模板
SIGN_NAME = '北京乐航时代科技'
TEMPLATE_CODE = 'SMS_486210104'
# 接收号码
PHONE_NUMBERS = '13621242430'
# endpoint
ENDPOINT = 'dysmsapi.aliyuncs.com'
```
### 环境变量方式(推荐)
为了安全起见,建议使用环境变量存储敏感信息:
**Windows PowerShell:**
```powershell
$env:ALIBABA_CLOUD_ACCESS_KEY_ID="您的AccessKey ID"
$env:ALIBABA_CLOUD_ACCESS_KEY_SECRET="您的AccessKey Secret"
```
**Linux/macOS:**
```bash
export ALIBABA_CLOUD_ACCESS_KEY_ID="您的AccessKey ID"
export ALIBABA_CLOUD_ACCESS_KEY_SECRET="您的AccessKey Secret"
```
---
## 🔍 故障排查
### 1. 短信发送失败
**检查项:**
- AccessKey ID 和 Secret 是否正确
- 短信签名是否已审核通过
- 短信模板是否已审核通过
- 账户余额是否充足
- 手机号格式是否正确国内号码11位
**查看错误日志:**
```bash
python data_validation_with_sms.py --test-sms
```
### 2. 数据库连接失败
检查 `database_config.py` 中的配置是否正确。
### 3. 验证失败
查看生成的验证报告文件,了解详细错误信息。
### 4. 定时任务不执行
**Windows:**
- 检查任务计划程序中任务状态
- 查看任务历史记录
- 确认Python路径和脚本路径正确
**Linux/macOS:**
- 检查crontab配置`crontab -l`
- 查看系统日志:`grep CRON /var/log/syslog`
---
## 📝 日志位置
- 验证报告:`validation_report_*.txt`
- 短信发送日志:控制台输出
- 系统日志:根据日志配置(如有)
---
## 🔐 安全建议
1. **不要提交敏感信息到Git**
-`sms_config.json` 添加到 `.gitignore`
- 使用环境变量存储AccessKey
2. **定期轮换AccessKey**
- 建议每3-6个月更换一次
3. **使用RAM子账号**
- 为短信服务创建专用的RAM子账号
- 仅授予必要的短信发送权限
4. **设置IP白名单**
- 在阿里云RAM控制台设置IP访问限制
---
## 📞 联系支持
如有问题,请联系技术支持。
---
## 📅 更新日志
### v1.0.0 (2025-01-04)
- ✨ 初始版本
- ✅ 集成数据验证功能
- ✅ 阿里云短信告警
- ✅ 定时任务调度
- ✅ 多数据源支持